First project in my udacity-misk Data Science nano-degree: Blog post
Traffic accidents in Saudi Arabia have become one of the main problems the kingdom is dealing with. With this project I try to look at the number of children involved in those accidents and find out when and where their percentage is lower.
- What is the total number of accidents per month?
- What is the percentage of children that are involved in these accidents per month?
- What is the the percentage of children involved for each region during the whole year?
- Comparing the largest regions in the kingdom of Saudi Arabia, where is the highest percentage of children involvement into traffic accidents?
Files included:
-
traffic-accident-statistics-as-of-1439-h.xls (data file in Excel format). It contains 17 sheets with the same number of columns and same headers. I have concatenated them to one dataframe in pandas.
The data is provided by the Ministry of Interior - General Directorate of Traffic -
traffic_KSA.ipynb (Jupyter notebook using Python 3.8, and common libraries).
Installed modules to produce the jupyter notebook:
- pandas
- matplotlib
- numpy
- ploty
- plotly express
- seaborn
Acknoledgement:
The data I have used was sent to me from my mentor Mr. Haroon. I highly appreciate his help.
The data can be found on the Saudi portal for Open Data at:
https://data.gov.sa/Data/en/dataset/traffic-accident-statistics-as-of-1439-h
Key Steps for Project
Following the CRISP-DM process in finding solutions
-
I have used the traffic accidents data in KSA for the year 1439 Hijri calendar.
-
My business questions are:
- What is the total number of accidents per month?
- What is the percentage of children that are involved in these accidents per month?
- What is the the percentage of children involved for each region during the whole year?
- Comparing the largest regions in the kingdom of Saudi Arabia, where is the highest percentage of children involvement into traffic accidents?
-
I have created a Jupyter Notebook to explore and analyse the data:
Preparing the data:
- Read all sheets in the excel workbook and combine them.
- Clean data, rename columns, and change the arabic naming of months to english words.
- Create a dataframe for the data containing the age categories.
- Provide visualized answers for my business questions
-
Please go to my blog post, where I present my insights:
https://medium.com/@fatsammar/traffic-accidents-analysis-in-saudi-arabia-during-1939-hijri-df581248c1d4