# NYC traffic accidents over a 4 year period 
## Filter and Subset

Download <a href="https://www.dropbox.com/s/585wrgl08djzlyt/accidents-nyc.csv?dl=0">this dataset</a> stored on dropbox.

In [1]:
## import necessary libraries
import pandas as pd

In [2]:
## read the dataset into notebook
accidents_nyc = pd.read_csv("data/accidents-nyc.csv")

In [3]:
## see the overall info about this dataset
accidents_nyc.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 282873 entries, 0 to 282872
Data columns (total 16 columns):
 #   Column                         Non-Null Count   Dtype 
---  ------                         --------------   ----- 
 0   CRASH DATE                     282873 non-null  object
 1   CRASH TIME                     282873 non-null  object
 2   BOROUGH                        282873 non-null  object
 3   NUMBER OF PERSONS INJURED      282873 non-null  int64 
 4   NUMBER OF PERSONS KILLED       282873 non-null  int64 
 5   NUMBER OF PEDESTRIANS INJURED  282873 non-null  int64 
 6   NUMBER OF PEDESTRIANS KILLED   282873 non-null  int64 
 7   NUMBER OF CYCLIST INJURED      282873 non-null  int64 
 8   NUMBER OF CYCLIST KILLED       282873 non-null  int64 
 9   NUMBER OF MOTORIST INJURED     282873 non-null  int64 
 10  NUMBER OF MOTORIST KILLED      282873 non-null  int64 
 11  CONTRIBUTING FACTOR VEHICLE 1  281489 non-null  object
 12  CONTRIBUTING FACTOR VEHICLE 2  224591 non-nu

In [13]:
accidents_nyc.sample()

Unnamed: 0,CRASH DATE,CRASH TIME,BOROUGH,NUMBER OF PERSONS INJURED,NUMBER OF PERSONS KILLED,NUMBER OF PEDESTRIANS INJURED,NUMBER OF PEDESTRIANS KILLED,NUMBER OF CYCLIST INJURED,NUMBER OF CYCLIST KILLED,NUMBER OF MOTORIST INJURED,NUMBER OF MOTORIST KILLED,CONTRIBUTING FACTOR VEHICLE 1,CONTRIBUTING FACTOR VEHICLE 2,COLLISION_ID,VEHICLE TYPE CODE 1,VEHICLE TYPE CODE 2
129041,1/29/20,3:03,BROOKLYN,0,0,0,0,0,0,0,0,Unspecified,,4280455,Sedan,


In [5]:
## create a series of crash dates.
accidents_crash_dates = accidents_nyc["CRASH DATE"]
accidents_crash_dates

0         4/13/21
1         4/13/21
2         4/13/21
3         4/11/21
4         4/15/21
           ...   
282868     1/1/19
282869     1/1/19
282870     1/1/19
282871     1/1/19
282872     1/1/19
Name: CRASH DATE, Length: 282873, dtype: object

In [6]:
## Which borough had the most crashes?
accidents_nyc["BOROUGH"].value_counts()

BROOKLYN         95099
QUEENS           80085
BRONX            50123
MANHATTAN        48864
STATEN ISLAND     8702
Name: BOROUGH, dtype: int64

In [8]:
## which type of vehicle was primary vehicle involved in crashes?
accidents_nyc["VEHICLE TYPE CODE 1"].value_counts()
#SEDANS

Sedan                                  129987
Station Wagon/Sport Utility Vehicle    102850
Taxi                                    10647
Pick-up Truck                            7183
Box Truck                                5504
                                        ...  
SLINGSHOT                                   1
CHEVY EXPR                                  1
Go kart                                     1
FDNY Engin                                  1
MAC T                                       1
Name: VEHICLE TYPE CODE 1, Length: 633, dtype: int64

In [9]:
accidents_nyc["VEHICLE TYPE CODE 2"].value_counts()

Sedan                                  84241
Station Wagon/Sport Utility Vehicle    68723
Bike                                    8544
Taxi                                    6939
Box Truck                               6365
                                       ...  
e-bike                                     1
FRIEGHTLIN                                 1
ESCOOTER                                   1
Electric s                                 1
G COM                                      1
Name: VEHICLE TYPE CODE 2, Length: 735, dtype: int64

In [11]:
## SHOW ONLY THE TOP 7
accidents_nyc["VEHICLE TYPE CODE 1"].value_counts().head(7)

Sedan                                  129987
Station Wagon/Sport Utility Vehicle    102850
Taxi                                    10647
Pick-up Truck                            7183
Box Truck                                5504
Bus                                      4697
Bike                                     3177
Name: VEHICLE TYPE CODE 1, dtype: int64

In [12]:
## What were a FIVE unusual primary vehicles to get into a crash?
accidents_nyc["VEHICLE TYPE CODE 1"].value_counts().tail(5)

SLINGSHOT     1
CHEVY EXPR    1
Go kart       1
FDNY Engin    1
MAC T         1
Name: VEHICLE TYPE CODE 1, dtype: int64

In [14]:
## create a subset of data for only Queens
## place it in a dataframe called df_q
df_q = accidents_nyc[accidents_nyc["BOROUGH"] == "QUEENS"]

In [24]:
## CHALLENGE (as in you have to google this)
## How many people were killed in Queens in accidents?
queens_deaths = df_q[df_q["NUMBER OF PERSONS KILLED"] >= 1]
queens_deaths["NUMBER OF PERSONS KILLED"].sum()

120

In [27]:
## Same
## how many cyclists were killed in Queens?
queens_deaths["NUMBER OF CYCLIST KILLED"].sum()

8

In [36]:
## Filter and subset 
## create a dataset for Manhattan that involved taxi cabs as the primary vehicle cause
filter_manhattan = accidents_nyc["BOROUGH"] == "MANHATTAN"
filter_cabs = accidents_nyc["VEHICLE TYPE CODE 1"] == "Taxi"
df_manhattan_cabs = accidents_nyc[filter_cabs & filter_manhattan]

In [38]:
## What were the top 5 causes of accidents across all the boroughs?
## by primary vehicle cause

accidents_nyc["CONTRIBUTING FACTOR VEHICLE 1"].value_counts().head()



Unspecified                       78494
Driver Inattention/Distraction    70615
Failure to Yield Right-of-Way     20691
Following Too Closely             14407
Backing Unsafely                  13348
Name: CONTRIBUTING FACTOR VEHICLE 1, dtype: int64

In [39]:
## What were the top 5 causes of accidents across all the boroughs?
## by secondary vehicle cause
accidents_nyc["CONTRIBUTING FACTOR VEHICLE 2"].value_counts().head()

Unspecified                       190456
Driver Inattention/Distraction     14186
Other Vehicular                     3529
Failure to Yield Right-of-Way       2233
Passing or Lane Usage Improper      2171
Name: CONTRIBUTING FACTOR VEHICLE 2, dtype: int64

In [40]:
## What were the 5 RAREST causes for primary vehicles causing the accident

accidents_nyc["CONTRIBUTING FACTOR VEHICLE 1"].value_counts().tail()

Shoulders Defective/Improper    13
Texting                          8
Cell Phone (hands-free)          8
Windshield Inadequate            3
Listening/Using Headphones       2
Name: CONTRIBUTING FACTOR VEHICLE 1, dtype: int64

In [42]:
## list ALL the causes as unique values (in other words, create a list of the causes)
## WHAT ARE SOME UNUSUAL REASONS FOR ACCIDENTS
primary_vehicle_reasons = accidents_nyc["CONTRIBUTING FACTOR VEHICLE 1"]
secondary_vehicle_reasons = accidents_nyc["CONTRIBUTING FACTOR VEHICLE 2"]

primary_secondary_vehicle_reasons = [primary_vehicle_reasons, secondary_vehicle_reasons]

all_reasons = pd.concat(primary_secondary_vehicle_reasons)

In [45]:
all_reasons.value_counts().tail(5)

Shoulders Defective/Improper    14
Cell Phone (hands-free)         11
Texting                          9
Listening/Using Headphones       8
Windshield Inadequate            3
dtype: int64