<a href="https://colab.research.google.com/github/medicalmom/M12/blob/master/M12.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
import numpy as np
import pandas as pd
import datetime as dt
import sys
from scipy import stats

We start by importing modules and then we ingest the dataset we would like to work with.

The New York City Open Data set 311 maintains a dataset for motor vehicle crashes
https://data.cityofnewyork.us/resource/h9gi-nx95.csv

This dataset will be used to examine and look for information on the crashes reported. The data set is ingested below. We will do some cleaning, and grouping before looking for patterns. Any findings, we will use this to look for statistical signficance if we find interesteing observations. 

In [0]:
MVC = pd.read_csv("https://data.cityofnewyork.us/resource/h9gi-nx95.csv?$limit=2000000", low_memory=False)

####Set options on rows and columns to allow all to be seen, or limit to 500 for rows.

In [4]:
pd.set_option('display.max_columns', None) 
pd.set_option('display.max_rows', 500) 
MVC.head()

Unnamed: 0,accident_date,accident_time,borough,zip_code,latitude,longitude,location,on_street_name,off_street_name,cross_street_name,number_of_persons_injured,number_of_persons_killed,number_of_pedestrians_injured,number_of_pedestrians_killed,number_of_cyclist_injured,number_of_cyclist_killed,number_of_motorist_injured,number_of_motorist_killed,contributing_factor_vehicle_1,contributing_factor_vehicle_2,contributing_factor_vehicle_3,contributing_factor_vehicle_4,contributing_factor_vehicle_5,collision_id,vehicle_type_code1,vehicle_type_code2,vehicle_type_code_3,vehicle_type_code_4,vehicle_type_code_5
0,2019-09-25T00:00:00.000,8:00,,,40.746105,-73.727554,POINT (-73.727554 40.746105),,,73-10 COMMONWEALTH BOULEVARD,0.0,0.0,0,0,0,0,0,0,Reaction to Uninvolved Vehicle,Unspecified,,,,4212306,Station Wagon/Sport Utility Vehicle,Station Wagon/Sport Utility Vehicle,,,
1,2019-09-14T00:00:00.000,18:05,QUEENS,11360.0,40.77609,-73.77235,POINT (-73.77235 40.77609),29 AVENUE,215 STREET,,0.0,0.0,0,0,0,0,0,0,Unsafe Speed,Failure to Yield Right-of-Way,Unspecified,,,4206157,Sedan,Station Wagon/Sport Utility Vehicle,Sedan,,
2,2019-09-27T00:00:00.000,17:20,MANHATTAN,10003.0,40.725445,-73.98762,POINT (-73.98762 40.725445),,,133 EAST 4 STREET,1.0,0.0,0,0,1,0,0,0,Passenger Distraction,Other Vehicular,,,,4215053,Taxi,Bike,,,
3,2019-09-05T00:00:00.000,18:40,BROOKLYN,11238.0,40.688324,-73.958244,POINT (-73.958244 40.688324),,,160 CLIFTON PLACE,0.0,0.0,0,0,0,0,0,0,Unspecified,,,,,4203025,Station Wagon/Sport Utility Vehicle,,,,
4,2019-09-27T00:00:00.000,7:30,QUEENS,11385.0,40.70985,-73.917145,POINT (-73.917145 40.70985),ONDERDONK AVENUE,WILLOUGHBY AVENUE,,0.0,0.0,0,0,0,0,0,0,Driver Inattention/Distraction,Unspecified,,,,4214315,Sedan,Sedan,,,


We would like to get some information about the dataset. Below is the information. 

In [5]:
MVC.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1610967 entries, 0 to 1610966
Data columns (total 29 columns):
accident_date                    1610967 non-null object
accident_time                    1610967 non-null object
borough                          1122206 non-null object
zip_code                         1122009 non-null object
latitude                         1414201 non-null float64
longitude                        1414201 non-null float64
location                         1414201 non-null object
on_street_name                   1295103 non-null object
off_street_name                  1070979 non-null object
cross_street_name                223699 non-null object
number_of_persons_injured        1610950 non-null float64
number_of_persons_killed         1610936 non-null float64
number_of_pedestrians_injured    1610967 non-null int64
number_of_pedestrians_killed     1610967 non-null int64
number_of_cyclist_injured        1610967 non-null int64
number_of_cyclist_killed        

We would be interested to know how many null values are in some columns. If there are alot of NaN or null values, we could drop these columns.

In [6]:
MVC[MVC['vehicle_type_code_3'].isnull()].head(50)

Unnamed: 0,accident_date,accident_time,borough,zip_code,latitude,longitude,location,on_street_name,off_street_name,cross_street_name,number_of_persons_injured,number_of_persons_killed,number_of_pedestrians_injured,number_of_pedestrians_killed,number_of_cyclist_injured,number_of_cyclist_killed,number_of_motorist_injured,number_of_motorist_killed,contributing_factor_vehicle_1,contributing_factor_vehicle_2,contributing_factor_vehicle_3,contributing_factor_vehicle_4,contributing_factor_vehicle_5,collision_id,vehicle_type_code1,vehicle_type_code2,vehicle_type_code_3,vehicle_type_code_4,vehicle_type_code_5
0,2019-09-25T00:00:00.000,8:00,,,40.746105,-73.727554,POINT (-73.727554 40.746105),,,73-10 COMMONWEALTH BOULEVARD,0.0,0.0,0,0,0,0,0,0,Reaction to Uninvolved Vehicle,Unspecified,,,,4212306,Station Wagon/Sport Utility Vehicle,Station Wagon/Sport Utility Vehicle,,,
2,2019-09-27T00:00:00.000,17:20,MANHATTAN,10003.0,40.725445,-73.98762,POINT (-73.98762 40.725445),,,133 EAST 4 STREET,1.0,0.0,0,0,1,0,0,0,Passenger Distraction,Other Vehicular,,,,4215053,Taxi,Bike,,,
3,2019-09-05T00:00:00.000,18:40,BROOKLYN,11238.0,40.688324,-73.958244,POINT (-73.958244 40.688324),,,160 CLIFTON PLACE,0.0,0.0,0,0,0,0,0,0,Unspecified,,,,,4203025,Station Wagon/Sport Utility Vehicle,,,,
4,2019-09-27T00:00:00.000,7:30,QUEENS,11385.0,40.70985,-73.917145,POINT (-73.917145 40.70985),ONDERDONK AVENUE,WILLOUGHBY AVENUE,,0.0,0.0,0,0,0,0,0,0,Driver Inattention/Distraction,Unspecified,,,,4214315,Sedan,Sedan,,,
5,2019-09-28T00:00:00.000,19:50,,,,,,VERRAZANO BRIDGE UPPER,,,0.0,0.0,0,0,0,0,0,0,Following Too Closely,Unspecified,,,,4215379,Sedan,Sedan,,,
6,2019-09-11T00:00:00.000,9:30,BROOKLYN,11221.0,40.688393,-73.91379,POINT (-73.91379 40.688393),,,93 WEIRFIELD STREET,0.0,0.0,0,0,0,0,0,0,Unspecified,,,,,4207295,Sedan,,,,
8,2019-09-19T00:00:00.000,12:30,,,40.840225,-73.91769,POINT (-73.91769 40.840225),JEROME AVENUE,,,1.0,0.0,1,0,0,0,0,0,Passing Too Closely,,,,,4210066,Station Wagon/Sport Utility Vehicle,,,,
9,2019-09-19T00:00:00.000,7:50,,,40.75765,-73.825195,POINT (-73.825195 40.75765),SANFORD AVENUE,,,1.0,0.0,1,0,0,0,0,0,Passing or Lane Usage Improper,,,,,4208978,Station Wagon/Sport Utility Vehicle,,,,
10,2019-09-16T00:00:00.000,15:55,,,40.677277,-73.9266,POINT (-73.9266 40.677277),ATLANTIC AVENUE,,,1.0,0.0,0,0,1,0,0,0,Driver Inattention/Distraction,Unspecified,,,,4206992,Sedan,Bike,,,
11,2019-09-19T00:00:00.000,18:50,,,,,,WHITESTONE EXPRESSWAY,,,0.0,0.0,0,0,0,0,0,0,Glare,Unspecified,,,,4209076,Sedan,Station Wagon/Sport Utility Vehicle,,,


In [7]:
MVC.dropna()
MVC

Unnamed: 0,accident_date,accident_time,borough,zip_code,latitude,longitude,location,on_street_name,off_street_name,cross_street_name,number_of_persons_injured,number_of_persons_killed,number_of_pedestrians_injured,number_of_pedestrians_killed,number_of_cyclist_injured,number_of_cyclist_killed,number_of_motorist_injured,number_of_motorist_killed,contributing_factor_vehicle_1,contributing_factor_vehicle_2,contributing_factor_vehicle_3,contributing_factor_vehicle_4,contributing_factor_vehicle_5,collision_id,vehicle_type_code1,vehicle_type_code2,vehicle_type_code_3,vehicle_type_code_4,vehicle_type_code_5
0,2019-09-25T00:00:00.000,8:00,,,40.746105,-73.727554,POINT (-73.727554 40.746105),,,73-10 COMMONWEALTH BOULEVARD,0.0,0.0,0,0,0,0,0,0,Reaction to Uninvolved Vehicle,Unspecified,,,,4212306,Station Wagon/Sport Utility Vehicle,Station Wagon/Sport Utility Vehicle,,,
1,2019-09-14T00:00:00.000,18:05,QUEENS,11360,40.776090,-73.772350,POINT (-73.77235 40.77609),29 AVENUE,215 STREET,,0.0,0.0,0,0,0,0,0,0,Unsafe Speed,Failure to Yield Right-of-Way,Unspecified,,,4206157,Sedan,Station Wagon/Sport Utility Vehicle,Sedan,,
2,2019-09-27T00:00:00.000,17:20,MANHATTAN,10003,40.725445,-73.987620,POINT (-73.98762 40.725445),,,133 EAST 4 STREET,1.0,0.0,0,0,1,0,0,0,Passenger Distraction,Other Vehicular,,,,4215053,Taxi,Bike,,,
3,2019-09-05T00:00:00.000,18:40,BROOKLYN,11238,40.688324,-73.958244,POINT (-73.958244 40.688324),,,160 CLIFTON PLACE,0.0,0.0,0,0,0,0,0,0,Unspecified,,,,,4203025,Station Wagon/Sport Utility Vehicle,,,,
4,2019-09-27T00:00:00.000,7:30,QUEENS,11385,40.709850,-73.917145,POINT (-73.917145 40.70985),ONDERDONK AVENUE,WILLOUGHBY AVENUE,,0.0,0.0,0,0,0,0,0,0,Driver Inattention/Distraction,Unspecified,,,,4214315,Sedan,Sedan,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1610962,2013-03-26T00:00:00.000,8:55,MANHATTAN,10003,40.735433,-73.982694,POINT (-73.9826943 40.7354326),EAST 19 STREET,2 AVENUE,,0.0,0.0,0,0,0,0,0,0,Driver Inattention/Distraction,Prescription Medication,,,,23521,PASSENGER VEHICLE,LARGE COM VEH(6 OR MORE TIRES),,,
1610963,2013-03-13T00:00:00.000,13:10,BROOKLYN,11225,40.666288,-73.950836,POINT (-73.9508365 40.6662883),NOSTRAND AVENUE,CROWN STREET,,0.0,0.0,0,0,0,0,0,0,Failure to Yield Right-of-Way,Unspecified,,,,153733,PASSENGER VEHICLE,SPORT UTILITY / STATION WAGON,,,
1610964,2013-03-13T00:00:00.000,11:00,,,,,,206 STREET,48 AVENUE,,0.0,0.0,0,0,0,0,0,0,Unspecified,Unspecified,,,,261259,SPORT UTILITY / STATION WAGON,LARGE COM VEH(6 OR MORE TIRES),,,
1610965,2013-03-22T00:00:00.000,18:25,MANHATTAN,10013,40.725026,-74.005895,POINT (-74.0058952 40.7250257),VARICK STREET,DOMINICK STREET,,0.0,0.0,0,0,0,0,0,0,Unspecified,Unspecified,,,,2181,PASSENGER VEHICLE,SPORT UTILITY / STATION WAGON,,,


In [8]:
MVC2 = MVC.drop(['contributing_factor_vehicle_3', 'contributing_factor_vehicle_4', 'contributing_factor_vehicle_5'],axis=1)
MVC2

Unnamed: 0,accident_date,accident_time,borough,zip_code,latitude,longitude,location,on_street_name,off_street_name,cross_street_name,number_of_persons_injured,number_of_persons_killed,number_of_pedestrians_injured,number_of_pedestrians_killed,number_of_cyclist_injured,number_of_cyclist_killed,number_of_motorist_injured,number_of_motorist_killed,contributing_factor_vehicle_1,contributing_factor_vehicle_2,collision_id,vehicle_type_code1,vehicle_type_code2,vehicle_type_code_3,vehicle_type_code_4,vehicle_type_code_5
0,2019-09-25T00:00:00.000,8:00,,,40.746105,-73.727554,POINT (-73.727554 40.746105),,,73-10 COMMONWEALTH BOULEVARD,0.0,0.0,0,0,0,0,0,0,Reaction to Uninvolved Vehicle,Unspecified,4212306,Station Wagon/Sport Utility Vehicle,Station Wagon/Sport Utility Vehicle,,,
1,2019-09-14T00:00:00.000,18:05,QUEENS,11360,40.776090,-73.772350,POINT (-73.77235 40.77609),29 AVENUE,215 STREET,,0.0,0.0,0,0,0,0,0,0,Unsafe Speed,Failure to Yield Right-of-Way,4206157,Sedan,Station Wagon/Sport Utility Vehicle,Sedan,,
2,2019-09-27T00:00:00.000,17:20,MANHATTAN,10003,40.725445,-73.987620,POINT (-73.98762 40.725445),,,133 EAST 4 STREET,1.0,0.0,0,0,1,0,0,0,Passenger Distraction,Other Vehicular,4215053,Taxi,Bike,,,
3,2019-09-05T00:00:00.000,18:40,BROOKLYN,11238,40.688324,-73.958244,POINT (-73.958244 40.688324),,,160 CLIFTON PLACE,0.0,0.0,0,0,0,0,0,0,Unspecified,,4203025,Station Wagon/Sport Utility Vehicle,,,,
4,2019-09-27T00:00:00.000,7:30,QUEENS,11385,40.709850,-73.917145,POINT (-73.917145 40.70985),ONDERDONK AVENUE,WILLOUGHBY AVENUE,,0.0,0.0,0,0,0,0,0,0,Driver Inattention/Distraction,Unspecified,4214315,Sedan,Sedan,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1610962,2013-03-26T00:00:00.000,8:55,MANHATTAN,10003,40.735433,-73.982694,POINT (-73.9826943 40.7354326),EAST 19 STREET,2 AVENUE,,0.0,0.0,0,0,0,0,0,0,Driver Inattention/Distraction,Prescription Medication,23521,PASSENGER VEHICLE,LARGE COM VEH(6 OR MORE TIRES),,,
1610963,2013-03-13T00:00:00.000,13:10,BROOKLYN,11225,40.666288,-73.950836,POINT (-73.9508365 40.6662883),NOSTRAND AVENUE,CROWN STREET,,0.0,0.0,0,0,0,0,0,0,Failure to Yield Right-of-Way,Unspecified,153733,PASSENGER VEHICLE,SPORT UTILITY / STATION WAGON,,,
1610964,2013-03-13T00:00:00.000,11:00,,,,,,206 STREET,48 AVENUE,,0.0,0.0,0,0,0,0,0,0,Unspecified,Unspecified,261259,SPORT UTILITY / STATION WAGON,LARGE COM VEH(6 OR MORE TIRES),,,
1610965,2013-03-22T00:00:00.000,18:25,MANHATTAN,10013,40.725026,-74.005895,POINT (-74.0058952 40.7250257),VARICK STREET,DOMINICK STREET,,0.0,0.0,0,0,0,0,0,0,Unspecified,Unspecified,2181,PASSENGER VEHICLE,SPORT UTILITY / STATION WAGON,,,


Now that I have dropped some columns, I would like to group by borough and number of persons killed in the crash to see which boroughs have higher numbers of persons killed. 

In [9]:
MVC1 = MVC2.groupby(['borough','number_of_persons_killed'])
MVC1.first()

Unnamed: 0_level_0,Unnamed: 1_level_0,accident_date,accident_time,zip_code,latitude,longitude,location,on_street_name,off_street_name,cross_street_name,number_of_persons_injured,number_of_pedestrians_injured,number_of_pedestrians_killed,number_of_cyclist_injured,number_of_cyclist_killed,number_of_motorist_injured,number_of_motorist_killed,contributing_factor_vehicle_1,contributing_factor_vehicle_2,collision_id,vehicle_type_code1,vehicle_type_code2,vehicle_type_code_3,vehicle_type_code_4,vehicle_type_code_5
borough,number_of_persons_killed,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1
BRONX,0.0,2019-11-24T00:00:00.000,0:00,10475,40.846893,-73.92055,POINT (-73.92055 40.846893),UNIVERSITY AVENUE,WEST 174 STREET,691 COOP CITY BOULEVARD,0.0,0,0,0,0,0,0,Unspecified,Unspecified,4246776,Station Wagon/Sport Utility Vehicle,Sedan,Sedan,Taxi,Station Wagon/Sport Utility Vehicle
BRONX,1.0,2019-09-17T00:00:00.000,15:56,10467,40.88065,-73.86494,POINT (-73.86494 40.88065),WHITE PLAINS ROAD,EAST 215 STREET,530 ELLSWORTH AVENUE,1.0,0,1,0,0,1,0,Driver Inexperience,Unspecified,4207959,Station Wagon/Sport Utility Vehicle,Station Wagon/Sport Utility Vehicle,Pick-up Truck,PASSENGER VEHICLE,SPORT UTILITY / STATION WAGON
BRONX,2.0,2015-03-20T00:00:00.000,18:34,10456,40.838722,-73.913771,POINT (-73.9137706 40.8387216),GRAND CONCOURSE,EAST 170 STREET,,1.0,1,2,0,0,0,0,Following Too Closely,Unspecified,3189701,TAXI,SPORT UTILITY / STATION WAGON,,,
BROOKLYN,0.0,2019-09-05T00:00:00.000,18:40,11238,40.688324,-73.958244,POINT (-73.958244 40.688324),BERRY STREET,SOUTH 6 STREET,160 CLIFTON PLACE,0.0,0,0,0,0,0,0,Unspecified,Unspecified,4203025,Station Wagon/Sport Utility Vehicle,Station Wagon/Sport Utility Vehicle,Station Wagon/Sport Utility Vehicle,Station Wagon/Sport Utility Vehicle,Sedan
BROOKLYN,1.0,2019-09-15T00:00:00.000,5:30,11213,40.673706,-73.93899,POINT (-73.93899 40.673706),ALBANY AVENUE,PROSPECT PLACE,805 BROADWAY,0.0,0,1,0,0,0,0,Unspecified,Unspecified,4207715,Sedan,Station Wagon/Sport Utility Vehicle,Sedan,Station Wagon/Sport Utility Vehicle,Station Wagon/Sport Utility Vehicle
BROOKLYN,2.0,2019-06-08T00:00:00.000,0:07,11234,40.608204,-73.920715,POINT (-73.920715 40.608204),FLATBUSH AVENUE,AVENUE V,,0.0,0,0,0,0,0,2,Unsafe Speed,Unspecified,4147255,Sedan,Station Wagon/Sport Utility Vehicle,PASSENGER VEHICLE,PASSENGER VEHICLE,SPORT UTILITY / STATION WAGON
BROOKLYN,3.0,2013-01-05T00:00:00.000,2:08,11223,40.597682,-73.96685,POINT (-73.9668499 40.5976825),AVENUE U,EAST 5 STREET,,3.0,0,0,0,0,3,3,Traffic Control Disregarded,Unspecified,117875,PASSENGER VEHICLE,PASSENGER VEHICLE,BUS,SPORT UTILITY / STATION WAGON,PASSENGER VEHICLE
MANHATTAN,0.0,2019-09-27T00:00:00.000,17:20,10003,40.725445,-73.98762,POINT (-73.98762 40.725445),3 AVENUE,EAST 46 STREET,133 EAST 4 STREET,1.0,0,0,1,0,0,0,Passenger Distraction,Other Vehicular,4215053,Taxi,Bike,Station Wagon/Sport Utility Vehicle,Sedan,Sedan
MANHATTAN,1.0,2018-11-15T00:00:00.000,11:32,10002,40.715034,-73.99681,POINT (-73.99681 40.715034),BOWERY,BAYARD STREET,30 EAST 20 STREET,0.0,0,1,0,0,0,0,Driver Inattention/Distraction,Unspecified,4034662,Station Wagon/Sport Utility Vehicle,Motorcycle,Sedan,Sedan,Sedan
MANHATTAN,8.0,2017-10-31T00:00:00.000,15:08,10014,40.729046,-74.01073,POINT (-74.01073 40.729046),WEST STREET,WEST HOUSTON STREET,,12.0,7,6,1,2,4,0,Other Vehicular,Unspecified,3782508,FB,BU,BICYCLE,BICYCLE,BICYCLE


Look at this same information as the dataframe.

In [10]:
MVC1.head(20)


Unnamed: 0,accident_date,accident_time,borough,zip_code,latitude,longitude,location,on_street_name,off_street_name,cross_street_name,number_of_persons_injured,number_of_persons_killed,number_of_pedestrians_injured,number_of_pedestrians_killed,number_of_cyclist_injured,number_of_cyclist_killed,number_of_motorist_injured,number_of_motorist_killed,contributing_factor_vehicle_1,contributing_factor_vehicle_2,collision_id,vehicle_type_code1,vehicle_type_code2,vehicle_type_code_3,vehicle_type_code_4,vehicle_type_code_5
0,2019-09-25T00:00:00.000,8:00,,,40.746105,-73.727554,POINT (-73.727554 40.746105),,,73-10 COMMONWEALTH BOULEVARD,0.0,0.0,0,0,0,0,0,0,Reaction to Uninvolved Vehicle,Unspecified,4212306,Station Wagon/Sport Utility Vehicle,Station Wagon/Sport Utility Vehicle,,,
1,2019-09-14T00:00:00.000,18:05,QUEENS,11360.0,40.77609,-73.77235,POINT (-73.77235 40.77609),29 AVENUE,215 STREET,,0.0,0.0,0,0,0,0,0,0,Unsafe Speed,Failure to Yield Right-of-Way,4206157,Sedan,Station Wagon/Sport Utility Vehicle,Sedan,,
2,2019-09-27T00:00:00.000,17:20,MANHATTAN,10003.0,40.725445,-73.98762,POINT (-73.98762 40.725445),,,133 EAST 4 STREET,1.0,0.0,0,0,1,0,0,0,Passenger Distraction,Other Vehicular,4215053,Taxi,Bike,,,
3,2019-09-05T00:00:00.000,18:40,BROOKLYN,11238.0,40.688324,-73.958244,POINT (-73.958244 40.688324),,,160 CLIFTON PLACE,0.0,0.0,0,0,0,0,0,0,Unspecified,,4203025,Station Wagon/Sport Utility Vehicle,,,,
4,2019-09-27T00:00:00.000,7:30,QUEENS,11385.0,40.70985,-73.917145,POINT (-73.917145 40.70985),ONDERDONK AVENUE,WILLOUGHBY AVENUE,,0.0,0.0,0,0,0,0,0,0,Driver Inattention/Distraction,Unspecified,4214315,Sedan,Sedan,,,
5,2019-09-28T00:00:00.000,19:50,,,,,,VERRAZANO BRIDGE UPPER,,,0.0,0.0,0,0,0,0,0,0,Following Too Closely,Unspecified,4215379,Sedan,Sedan,,,
6,2019-09-11T00:00:00.000,9:30,BROOKLYN,11221.0,40.688393,-73.91379,POINT (-73.91379 40.688393),,,93 WEIRFIELD STREET,0.0,0.0,0,0,0,0,0,0,Unspecified,,4207295,Sedan,,,,
7,2019-09-02T00:00:00.000,8:25,BROOKLYN,11233.0,40.682552,-73.92388,POINT (-73.92388 40.682552),,,491 DECATUR STREET,0.0,0.0,0,0,0,0,0,0,Driver Inattention/Distraction,Unspecified,4199154,Sedan,Station Wagon/Sport Utility Vehicle,Station Wagon/Sport Utility Vehicle,Station Wagon/Sport Utility Vehicle,Sedan
8,2019-09-19T00:00:00.000,12:30,,,40.840225,-73.91769,POINT (-73.91769 40.840225),JEROME AVENUE,,,1.0,0.0,1,0,0,0,0,0,Passing Too Closely,,4210066,Station Wagon/Sport Utility Vehicle,,,,
9,2019-09-19T00:00:00.000,7:50,,,40.75765,-73.825195,POINT (-73.825195 40.75765),SANFORD AVENUE,,,1.0,0.0,1,0,0,0,0,0,Passing or Lane Usage Improper,,4208978,Station Wagon/Sport Utility Vehicle,,,,


I would like to know how many crashes occured in each of the boroughs and then do the same for number_of_persons killed. I would then like to see if there is a correlation between the two. Look first at the value counts for the boroughs. 

In [11]:
MVCcounts = MVC2['borough'].value_counts()
MVCcounts


BROOKLYN         348590
QUEENS           299053
MANHATTAN        269469
BRONX            156517
STATEN ISLAND     48577
Name: borough, dtype: int64

To get a better idea of these numbers, let's get a percentage of total number of crashes for each borough. 

In [12]:
MVC2['borough'].value_counts(normalize=True) * 100

BROOKLYN         31.062924
QUEENS           26.648672
MANHATTAN        24.012436
BRONX            13.947261
STATEN ISLAND     4.328706
Name: borough, dtype: float64

Now look at the number of persons killed.

In [13]:
MVCcounts1 = MVC2['number_of_persons_killed'].value_counts()
MVCcounts1

0.0    1609129
1.0       1754
2.0         43
3.0          6
4.0          2
8.0          1
5.0          1
Name: number_of_persons_killed, dtype: int64

I want to know the same information for number of deaths. What percentage of the total is each value, or what percentage is the number of deaths involving one person, two persons, up to the highest number which is 8. 


In [14]:
MVC2['number_of_persons_killed'].value_counts(normalize=True) * 100

0.0    99.887829
1.0     0.108881
2.0     0.002669
3.0     0.000372
4.0     0.000124
8.0     0.000062
5.0     0.000062
Name: number_of_persons_killed, dtype: float64

It appears that for the majority of crashes in all boroughs, there are usually no deaths. The percentages show that 99.8% of crashes did not involve deaths. I would like to remove the values that are zero and no deaths to a new dataframe without the zero values. 

In [15]:
MVC3 = MVC2[MVC2.number_of_persons_killed > 0]
MVC3

Unnamed: 0,accident_date,accident_time,borough,zip_code,latitude,longitude,location,on_street_name,off_street_name,cross_street_name,number_of_persons_injured,number_of_persons_killed,number_of_pedestrians_injured,number_of_pedestrians_killed,number_of_cyclist_injured,number_of_cyclist_killed,number_of_motorist_injured,number_of_motorist_killed,contributing_factor_vehicle_1,contributing_factor_vehicle_2,collision_id,vehicle_type_code1,vehicle_type_code2,vehicle_type_code_3,vehicle_type_code_4,vehicle_type_code_5
172,2019-09-10T00:00:00.000,1:14,QUEENS,11420,40.682250,-73.815170,POINT (-73.81517 40.68225),111 AVENUE,126 STREET,,1.0,1.0,0,0,0,0,1,1,Traffic Control Disregarded,Unspecified,4202753,Station Wagon/Sport Utility Vehicle,Station Wagon/Sport Utility Vehicle,,,
1303,2019-09-21T00:00:00.000,21:13,QUEENS,11367,40.729443,-73.825510,POINT (-73.82551 40.729443),JEWEL AVENUE,140 STREET,,0.0,1.0,0,1,0,0,0,0,Unspecified,,4210967,Sedan,,,,
2836,2019-09-17T00:00:00.000,15:56,BRONX,10467,40.880650,-73.864940,POINT (-73.86494 40.88065),WHITE PLAINS ROAD,EAST 215 STREET,,1.0,1.0,0,1,0,0,1,0,Driver Inexperience,Unspecified,4207959,Station Wagon/Sport Utility Vehicle,Station Wagon/Sport Utility Vehicle,,,
3188,2019-09-09T00:00:00.000,13:22,,,40.870888,-73.872215,POINT (-73.872215 40.870888),BRONX RIVER PARKWAY,,,3.0,1.0,0,0,0,0,3,1,Unsafe Speed,,4203598,Sedan,,,,
4952,2019-09-22T00:00:00.000,3:15,,,,,,CONDUIT BOULEVARD,CRESCENT STREET,,0.0,1.0,0,1,0,0,0,0,Unspecified,,4210961,Station Wagon/Sport Utility Vehicle,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1607714,2013-03-20T00:00:00.000,16:04,,,,,,WEBSTER AVENUE,EAST 187 STREET,,4.0,1.0,0,0,0,0,4,1,Turning Improperly,Unspecified,93175,PASSENGER VEHICLE,PASSENGER VEHICLE,SPORT UTILITY / STATION WAGON,SPORT UTILITY / STATION WAGON,SPORT UTILITY / STATION WAGON
1609305,2013-03-07T00:00:00.000,19:32,QUEENS,11364,40.742405,-73.753472,POINT (-73.7534717 40.7424052),SPRINGFIELD BOULEVARD,73 AVENUE,,0.0,1.0,0,1,0,0,0,0,Failure to Yield Right-of-Way,,261208,BUS,,,,
1610038,2013-03-13T00:00:00.000,6:37,,,,,,HAMILTON AVENUE,COURT STREET,,0.0,1.0,0,1,0,0,0,0,View Obstructed/Limited,,171461,OTHER,,,,
1610064,2013-03-08T00:00:00.000,4:35,BROOKLYN,11219,40.628445,-73.996336,POINT (-73.9963362 40.6284454),59 STREET,NEW UTRECHT AVENUE,,0.0,1.0,0,0,0,1,0,0,Unspecified,Unspecified,131926,PASSENGER VEHICLE,SPORT UTILITY / STATION WAGON,BICYCLE,,


Now we can look again at the percentages for deaths from total crashes without the incidents involving zero deaths. 

In [16]:
MVC3['number_of_persons_killed'].value_counts(normalize=True) * 100

1.0    97.066962
2.0     2.379635
3.0     0.332042
4.0     0.110681
5.0     0.055340
8.0     0.055340
Name: number_of_persons_killed, dtype: float64

In [0]:
?stats.pearsonr

In [18]:
stats.pearsonr(MVC['borough'], MVC['number_of_persons_killed'])

ValueError: ignored