# Coursera Capstone - Characteristics of car accidents and their impact on police, ambulance and fire brigades

## Table of contents:

1. Introduction Section
    - Scenario and background
    - Problem to be solved
    - Recipients of the report
2. Data Section
    - Data required to resolve the problem and how the data will be used to solve the problem
3. Methodology Section
4. Result Section
5. Discussion Section
6. Conslusion Section


## 1. Introduction Section

### Scenario and background
Observing the statistics of car accidents in Seatlle, I came to the conclusion that it is possible to prepare services such as the police, fire brigade or ambulance in advance for serious cases of road accidents involving a large number of people and which may result from weather conditions in specific places of Seattle.

### Problem to be solved
The challenge to be solved is to properly prepare and inform the police, fire brigade and ambulance in advance about the possibility of very serious road accidents depending on the prevailing road conditions in specific places. Therefore, I want to find accidents with the following assumptions:
- Severity is a minimum of 2
- More than 4 are injured, including pedestrians and cyclists
- Depending on the weather condition and road condition
- Depending on the time of day
- Cases resulting from the use of alcohol and other intoxicants are rejected as not resulting directly from weather factors.

### Recipients of the report
I believe that this is a suitable project for a major city security center in Europe, the United States or Asia, as the approach and methodology used here will help prepare services such as the police, fire brigade or hospitals in advance for emergencies (e.g. severe road accidents) from weather conditions.

## 2. Data Section

### Data required to resolve the problem and how the data will be used to solve the problem
The data will be used as follows: First, I sort out the data for which the UNDERINFL parameter (meaning under the influence of intoxicants) is empty or equal to 1 or Y. Thanks to this, I will separate the cases not directly related to the weather conditions, which is based on my assumptions. Then I will sort the data by the SEVERITYCODE attribute for which the minimum value is 2 which will give me only the cases with injuries.
In the next step, I will sort by the number of people participating in the event (PERSONCOUNT) for a minimum value of 4 people and then group them depending on weather conditions (WEATHER), road condition (ROADCOND) and time of day (LIGHTCOND). Finally, I will use the location data (X and Y) to plot the most frequent crash sites on the Seatlle map so the services will know exactly the accidents can happen on particular weather and day conditions.

## 3. Methodology
The paragraph below describes the individual components of the report in which data is collected, prepared and processed, in accordance with the adopted strategy.

### Data extraction and cleaning

In [1]:
import pandas as pd
import requests
import numpy as np
import warnings
warnings.filterwarnings("ignore")

import matplotlib.pyplot as plt
%matplotlib inline

import seaborn as sns

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

!pip install folium==0.5.0
import folium

print('Lib were Imported')

Collecting folium==0.5.0
[?25l  Downloading https://files.pythonhosted.org/packages/07/37/456fb3699ed23caa0011f8b90d9cad94445eddc656b601e6268090de35f5/folium-0.5.0.tar.gz (79kB)
[K     |████████████████████████████████| 81kB 8.2MB/s eta 0:00:011
[?25hCollecting branca (from folium==0.5.0)
  Downloading https://files.pythonhosted.org/packages/13/fb/9eacc24ba3216510c6b59a4ea1cd53d87f25ba76237d7f4393abeaf4c94e/branca-0.4.1-py3-none-any.whl
Building wheels for collected packages: folium
  Building wheel for folium (setup.py) ... [?25ldone
[?25h  Stored in directory: /home/dsxuser/.cache/pip/wheels/f8/98/ff/954791afc47740d554f0d9e5885fa09dd60c2265d42578e665
Successfully built folium
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.5.0
Lib were Imported


In [2]:
!wget -O Data-Collisions.csv https://s3.us.cloud-object-storage.appdomain.cloud/cf-courses-data/CognitiveClass/DP0701EN/version-2/Data-Collisions.csv

--2020-09-18 06:44:45--  https://s3.us.cloud-object-storage.appdomain.cloud/cf-courses-data/CognitiveClass/DP0701EN/version-2/Data-Collisions.csv
Resolving s3.us.cloud-object-storage.appdomain.cloud (s3.us.cloud-object-storage.appdomain.cloud)... 67.228.254.196
Connecting to s3.us.cloud-object-storage.appdomain.cloud (s3.us.cloud-object-storage.appdomain.cloud)|67.228.254.196|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 73917638 (70M) [text/csv]
Saving to: ‘Data-Collisions.csv’


2020-09-18 06:44:47 (34.8 MB/s) - ‘Data-Collisions.csv’ saved [73917638/73917638]



In [3]:
df_raw = pd.read_csv('Data-Collisions.csv')

In [4]:
len(df_raw)

194673

The file contains the information about 194673 collisions which happens at Seatlle. Now we will drop the columns not relevant to our analysis.

In [5]:
df_raw = df_raw.dropna(subset = ['UNDERINFL'])
df_raw = df_raw.dropna(subset = ['X'])
df_raw = df_raw.dropna(subset = ['Y'])

In [6]:
df_acc = df_raw[['SEVERITYCODE', 'PERSONCOUNT', 'PEDCOUNT', 'PEDCYLCOUNT', 'VEHCOUNT', 'INCDTTM', 'UNDERINFL', 'WEATHER', 'ROADCOND', 'LIGHTCOND', 'X', 'Y']]

Now let's check the value types for the column UNDERINFL. Then we will need to normalize the values for this column before the grouping.

In [7]:
df_acc['UNDERINFL'].value_counts(normalize=True)

N    0.528916
0    0.422975
Y    0.027096
1    0.021013
Name: UNDERINFL, dtype: float64

In [8]:
df_acc['UNDERINFL'].replace(to_replace = ['1','0'], value = [1,0], inplace = True)
df_acc['UNDERINFL'].replace(to_replace = ['Y','N'], value = [1,0], inplace = True)
df_acc['UNDERINFL'].value_counts(normalize=True)

0    0.951891
1    0.048109
Name: UNDERINFL, dtype: float64

Now let's normalize the LIGHTCOND column. I assume two states for it - Daylight and Dark.

In [9]:
df_acc['DAYLIGHT'] = df_acc['LIGHTCOND']
df_acc['DAYLIGHT'].value_counts(normalize=True)

Daylight                    0.616198
Dark - Street Lights On     0.256685
Unknown                     0.067445
Dusk                        0.031330
Dawn                        0.013140
Dark - No Street Lights     0.007872
Dark - Street Lights Off    0.006250
Other                       0.001020
Dark - Unknown Lighting     0.000060
Name: DAYLIGHT, dtype: float64

In [10]:
def was_daylight(condition):
    if condition != 'Daylight':
        return 0
    else:
        return 1

df_acc['DAYLIGHT'] = df_acc['DAYLIGHT'].apply(was_daylight)

In [11]:
df_acc['DAYLIGHT'].value_counts(normalize=True)

1    0.61528
0    0.38472
Name: DAYLIGHT, dtype: float64

Next, let's do the same trick with WEATHER and ROADCOND columns. I would like to have them divided as good weather/ bad weather and good road condition/ bad road condition. I assume the dry / clear condition is the possitive and rest are negative values.

In [12]:
df_acc['GOOD_WEATHER'] = df_acc['WEATHER']
df_acc['GOOD_WEATHER'].value_counts(normalize=True)

Clear                       0.590839
Raining                     0.173604
Overcast                    0.147147
Unknown                     0.075336
Snowing                     0.004848
Other                       0.004192
Fog/Smog/Smoke              0.002999
Sleet/Hail/Freezing Rain    0.000607
Blowing Sand/Dirt           0.000271
Severe Crosswind            0.000130
Partly Cloudy               0.000027
Name: GOOD_WEATHER, dtype: float64

In [13]:
def was_good_weather(condition):
    if condition != 'Clear':
        return 0
    else:
        return 1

df_acc['GOOD_WEATHER'] = df_acc['GOOD_WEATHER'].apply(was_good_weather)

In [14]:
df_acc['GOOD_WEATHER'].value_counts(normalize=True)

1    0.590237
0    0.409763
Name: GOOD_WEATHER, dtype: float64

In [15]:
df_acc['GOOD_ROAD'] = df_acc['ROADCOND']
df_acc['GOOD_ROAD'].value_counts(normalize=True)

Dry               0.661727
Wet               0.249695
Unknown           0.075016
Ice               0.006380
Snow/Slush        0.005361
Other             0.000634
Standing Water    0.000553
Sand/Mud/Dirt     0.000347
Oil               0.000287
Name: GOOD_ROAD, dtype: float64

In [16]:
def was_good_road(condition):
    if condition != 'Dry':
        return 0
    else:
        return 1

df_acc['GOOD_ROAD'] = df_acc['GOOD_ROAD'].apply(was_good_road)

In [17]:
df_acc['GOOD_ROAD'].value_counts(normalize=True)

1    0.661293
0    0.338707
Name: GOOD_ROAD, dtype: float64

In [18]:
df_acc = df_acc.drop(columns=['WEATHER','ROADCOND','LIGHTCOND'])

In [19]:
df_acc = df_acc[['SEVERITYCODE','UNDERINFL','DAYLIGHT','GOOD_WEATHER','GOOD_ROAD','PERSONCOUNT', 'PEDCOUNT', 'PEDCYLCOUNT', 'VEHCOUNT', 'INCDTTM', 'X', 'Y']]

In [20]:
len(df_acc)

184602

In [21]:
# Dropping the accidents with severity smaller than 2
df_acc = df_acc.drop(df_acc[df_acc.SEVERITYCODE < 2].index)

In [22]:
# Dropping the accidents with casaulties less than 4
df_acc = df_acc.drop(df_acc[df_acc.PERSONCOUNT < 4].index)

# Filtering for the night accidents
df_night = df_acc.drop(df_acc[df_acc.DAYLIGHT < 1].index)

In [23]:
len(df_night)

7643

In [24]:
# Filtering for the bad weather accidents
df_bad_weather = df_night.drop(df_night[df_night.GOOD_WEATHER < 1].index)
len(df_bad_weather)

5313

In [None]:
# The code was removed by Watson Studio for sharing.

In [None]:
radius = 2000 
LIMIT = 1

def getNearbyEmergencies(names, latitudes, longitudes, radius=2000):
    emg_list=[]
    for lat, lng in zip(df_bad_weather['Y'], df_bad_weather['X']):
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&query={}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # making GET request
        venue_results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in venue_results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [25]:
latitude_x = 47.54
longitude_y = -122.32
map_Seattle = folium.Map(location=[latitude_x, longitude_y], zoom_start=10)

for lat, lng in zip(df_bad_weather['Y'], df_bad_weather['X']):
    print('Latitude and longitude values of Etobicoke are {},{}.'.format(lat, lng))
#     folium.CircleMarker(
#         [lat, lng],
#         radius=5,
#         popup=None,
#         color='blue',
#         fill=True,
#         fill_color='#3186cc',
#         fill_opacity=0.7,
#         parse_html=False).add_to(map_Seattle)

map_Seattle

Latitude and longitude values of Etobicoke are 47.57142045,-122.3282695.
Latitude and longitude values of Etobicoke are 47.70853522,-122.31378590000001.
Latitude and longitude values of Etobicoke are 47.56640458,-122.3067051.
Latitude and longitude values of Etobicoke are 47.62090273,-122.34629640000001.
Latitude and longitude values of Etobicoke are 47.70856922,-122.3192088.
Latitude and longitude values of Etobicoke are 47.67053925,-122.37888290000001.
Latitude and longitude values of Etobicoke are 47.66503525,-122.3380866.
Latitude and longitude values of Etobicoke are 47.58085271,-122.32905600000001.
Latitude and longitude values of Etobicoke are 47.52935807,-122.33425559999999.
Latitude and longitude values of Etobicoke are 47.62559265,-122.36702360000001.
Latitude and longitude values of Etobicoke are 47.6023056,-122.33251940000001.
Latitude and longitude values of Etobicoke are 47.539645799999995,-122.33508359999999.
Latitude and longitude values of Etobicoke are 47.62583781,-12