# Project I
### Wrangling the API data

The point of this exercise is to try data enrichment with data from external APIs. We are going to take data about car crashes in Monroe County, Indiana from 2003 to 2015 and try to figure out the weather during the accident and how many bars there are in the area. We will work with two different APIs during this challenge:

- Foursquare API
- Visual Crossing API

We will try to find correlations between the severity of crash and weather/number of bars in the area. To indicate the severity of a crash, we will use column `Injury Type`.

## Data

The data for this exercise can be found [here](https://drive.google.com/file/d/1_KF9oIJV8cB8i3ngA4JPOLWIE_ETE6CJ/view?usp=sharing).

Just run the cells below to get your data ready. Little help from us.


In [1]:
import pandas as pd
import os
pd.set_option("display.max_columns", None)

### Data Preparation

Tasks:
- load the data in Pandas DataFrame
- drop the columns where latitude or longitude is missing
- drop the columns where latitude and longitude are 0.0
- create column `ll` that contains concatenated strings of latitude and longitude

In [2]:
df = pd.read_csv("monroe-county-crash-data2003-to-2015.csv", encoding= 'unicode_escape')
df.head(10)

Unnamed: 0,Master Record Number,Year,Month,Day,Weekend?,Hour,Collision Type,Injury Type,Primary Factor,Reported_Location,Latitude,Longitude
0,902363382,2015,1,5,Weekday,0.0,2-Car,No injury/unknown,OTHER (DRIVER) - EXPLAIN IN NARRATIVE,1ST & FESS,39.159207,-86.525874
1,902364268,2015,1,6,Weekday,1500.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,2ND & COLLEGE,39.16144,-86.534848
2,902364412,2015,1,6,Weekend,2300.0,2-Car,Non-incapacitating,DISREGARD SIGNAL/REG SIGN,BASSWOOD & BLOOMFIELD,39.14978,-86.56889
3,902364551,2015,1,7,Weekend,900.0,2-Car,Non-incapacitating,FAILURE TO YIELD RIGHT OF WAY,GATES & JACOBS,39.165655,-86.575956
4,902364615,2015,1,7,Weekend,1100.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,W 3RD,39.164848,-86.579625
5,902364664,2015,1,6,Weekday,1800.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,BURKS & WALNUT,39.12667,-86.53137
6,902364682,2015,1,6,Weekday,1200.0,2-Car,No injury/unknown,DRIVER DISTRACTED - EXPLAIN IN NARRATIVE,SOUTH CURRY PIKE LOT 71,39.150825,-86.584899
7,902364683,2015,1,6,Weekday,1400.0,1-Car,Incapacitating,ENGINE FAILURE OR DEFECTIVE,NORTH LOUDEN RD,39.199272,-86.637024
8,902364714,2015,1,7,Weekend,1400.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,LIBERTY & W 3RD,39.16461,-86.57913
9,902364756,2015,1,7,Weekend,1600.0,1-Car,No injury/unknown,RAN OFF ROAD RIGHT,PATTERSON & W 3RD,39.16344,-86.55128


In [3]:
df.shape

(53943, 12)

In [4]:
df.dtypes

Master Record Number      int64
Year                      int64
Month                     int64
Day                       int64
Weekend?                 object
Hour                    float64
Collision Type           object
Injury Type              object
Primary Factor           object
Reported_Location        object
Latitude                float64
Longitude               float64
dtype: object

In [5]:
df2 = df.dropna(subset=['Latitude', 'Longitude'])
df2.head(10)

Unnamed: 0,Master Record Number,Year,Month,Day,Weekend?,Hour,Collision Type,Injury Type,Primary Factor,Reported_Location,Latitude,Longitude
0,902363382,2015,1,5,Weekday,0.0,2-Car,No injury/unknown,OTHER (DRIVER) - EXPLAIN IN NARRATIVE,1ST & FESS,39.159207,-86.525874
1,902364268,2015,1,6,Weekday,1500.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,2ND & COLLEGE,39.16144,-86.534848
2,902364412,2015,1,6,Weekend,2300.0,2-Car,Non-incapacitating,DISREGARD SIGNAL/REG SIGN,BASSWOOD & BLOOMFIELD,39.14978,-86.56889
3,902364551,2015,1,7,Weekend,900.0,2-Car,Non-incapacitating,FAILURE TO YIELD RIGHT OF WAY,GATES & JACOBS,39.165655,-86.575956
4,902364615,2015,1,7,Weekend,1100.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,W 3RD,39.164848,-86.579625
5,902364664,2015,1,6,Weekday,1800.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,BURKS & WALNUT,39.12667,-86.53137
6,902364682,2015,1,6,Weekday,1200.0,2-Car,No injury/unknown,DRIVER DISTRACTED - EXPLAIN IN NARRATIVE,SOUTH CURRY PIKE LOT 71,39.150825,-86.584899
7,902364683,2015,1,6,Weekday,1400.0,1-Car,Incapacitating,ENGINE FAILURE OR DEFECTIVE,NORTH LOUDEN RD,39.199272,-86.637024
8,902364714,2015,1,7,Weekend,1400.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,LIBERTY & W 3RD,39.16461,-86.57913
9,902364756,2015,1,7,Weekend,1600.0,1-Car,No injury/unknown,RAN OFF ROAD RIGHT,PATTERSON & W 3RD,39.16344,-86.55128


In [6]:
df2.shape

(53913, 12)

In [7]:
# Here we are removing the columns where latitude and longitude are 0.0.

df3 = df2[(df2["Latitude"] != 0) | (df2["Longitude"] != 0)] # | represents Boolean OR.
df3.shape

(49005, 12)

In [8]:
df3.head(10)

Unnamed: 0,Master Record Number,Year,Month,Day,Weekend?,Hour,Collision Type,Injury Type,Primary Factor,Reported_Location,Latitude,Longitude
0,902363382,2015,1,5,Weekday,0.0,2-Car,No injury/unknown,OTHER (DRIVER) - EXPLAIN IN NARRATIVE,1ST & FESS,39.159207,-86.525874
1,902364268,2015,1,6,Weekday,1500.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,2ND & COLLEGE,39.16144,-86.534848
2,902364412,2015,1,6,Weekend,2300.0,2-Car,Non-incapacitating,DISREGARD SIGNAL/REG SIGN,BASSWOOD & BLOOMFIELD,39.14978,-86.56889
3,902364551,2015,1,7,Weekend,900.0,2-Car,Non-incapacitating,FAILURE TO YIELD RIGHT OF WAY,GATES & JACOBS,39.165655,-86.575956
4,902364615,2015,1,7,Weekend,1100.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,W 3RD,39.164848,-86.579625
5,902364664,2015,1,6,Weekday,1800.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,BURKS & WALNUT,39.12667,-86.53137
6,902364682,2015,1,6,Weekday,1200.0,2-Car,No injury/unknown,DRIVER DISTRACTED - EXPLAIN IN NARRATIVE,SOUTH CURRY PIKE LOT 71,39.150825,-86.584899
7,902364683,2015,1,6,Weekday,1400.0,1-Car,Incapacitating,ENGINE FAILURE OR DEFECTIVE,NORTH LOUDEN RD,39.199272,-86.637024
8,902364714,2015,1,7,Weekend,1400.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,LIBERTY & W 3RD,39.16461,-86.57913
9,902364756,2015,1,7,Weekend,1600.0,1-Car,No injury/unknown,RAN OFF ROAD RIGHT,PATTERSON & W 3RD,39.16344,-86.55128


In [9]:
df3['ll'] = df3["Latitude"].astype(str) + "," + df3["Longitude"].astype(str)
df3.head(10)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df3['ll'] = df3["Latitude"].astype(str) + "," + df3["Longitude"].astype(str)


Unnamed: 0,Master Record Number,Year,Month,Day,Weekend?,Hour,Collision Type,Injury Type,Primary Factor,Reported_Location,Latitude,Longitude,ll
0,902363382,2015,1,5,Weekday,0.0,2-Car,No injury/unknown,OTHER (DRIVER) - EXPLAIN IN NARRATIVE,1ST & FESS,39.159207,-86.525874,"39.15920668,-86.52587356"
1,902364268,2015,1,6,Weekday,1500.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,2ND & COLLEGE,39.16144,-86.534848,"39.16144,-86.534848"
2,902364412,2015,1,6,Weekend,2300.0,2-Car,Non-incapacitating,DISREGARD SIGNAL/REG SIGN,BASSWOOD & BLOOMFIELD,39.14978,-86.56889,"39.14978027,-86.56889006"
3,902364551,2015,1,7,Weekend,900.0,2-Car,Non-incapacitating,FAILURE TO YIELD RIGHT OF WAY,GATES & JACOBS,39.165655,-86.575956,"39.165655,-86.57595635"
4,902364615,2015,1,7,Weekend,1100.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,W 3RD,39.164848,-86.579625,"39.164848,-86.57962482"
5,902364664,2015,1,6,Weekday,1800.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,BURKS & WALNUT,39.12667,-86.53137,"39.12666969,-86.53136998"
6,902364682,2015,1,6,Weekday,1200.0,2-Car,No injury/unknown,DRIVER DISTRACTED - EXPLAIN IN NARRATIVE,SOUTH CURRY PIKE LOT 71,39.150825,-86.584899,"39.150825,-86.584899"
7,902364683,2015,1,6,Weekday,1400.0,1-Car,Incapacitating,ENGINE FAILURE OR DEFECTIVE,NORTH LOUDEN RD,39.199272,-86.637024,"39.19927216,-86.63702393"
8,902364714,2015,1,7,Weekend,1400.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,LIBERTY & W 3RD,39.16461,-86.57913,"39.16461021,-86.57913007"
9,902364756,2015,1,7,Weekend,1600.0,1-Car,No injury/unknown,RAN OFF ROAD RIGHT,PATTERSON & W 3RD,39.16344,-86.55128,"39.16344009,-86.55128002"


In [10]:
df3.shape

(49005, 13)

In [11]:
df3.dtypes

Master Record Number      int64
Year                      int64
Month                     int64
Day                       int64
Weekend?                 object
Hour                    float64
Collision Type           object
Injury Type              object
Primary Factor           object
Reported_Location        object
Latitude                float64
Longitude               float64
ll                       object
dtype: object

In [12]:
df3["Injury Type"].value_counts()

No injury/unknown     37467
Non-incapacitating    10427
Incapacitating         1003
Fatal                   108
Name: Injury Type, dtype: int64

In [13]:
len(df3['ll'].value_counts())

19397

In [14]:
df3['ll'].value_counts().head(10)

39.16424,-86.4984                279
39.164752,-86.573104             275
39.148144,-86.572992             232
39.125312,-86.610496             216
39.186368,-86.5344               211
39.164640000000006,-86.5324      188
39.176544,-86.5624               187
39.164640000000006,-86.534816    178
39.186544,-86.537888             176
39.16424,-86.492944              176
Name: ll, dtype: int64

In [15]:
df3['ll'].value_counts().tail(10)

39.15642963,-86.40841206           1
39.17175546,-86.5324               1
39.164590000000004,-86.57617403    1
39.16421928,-86.50195219           1
39.14663118,-86.516192             1
39.12578993,-86.51912972           1
39.21509467,-86.59272356           1
39.151184,-86.49629200000001       1
39.24064407,-86.43734756           1
39.15565327,-86.49772              1
Name: ll, dtype: int64

In [16]:
# We will create a new column called "Injury Type Value" which is just converting the "Injury Type" column into a numerical column.
# This new column will be easier to work with.
 
df3["Injury Type Value"] = df3["Injury Type"].replace({"No injury/unknown": 0,
                                                      "Non-incapacitating": 1,
                                                      "Incapacitating": 2,
                                                      "Fatal": 3})

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df3["Injury Type Value"] = df3["Injury Type"].replace({"No injury/unknown": 0,


In [17]:
df3.head()

Unnamed: 0,Master Record Number,Year,Month,Day,Weekend?,Hour,Collision Type,Injury Type,Primary Factor,Reported_Location,Latitude,Longitude,ll,Injury Type Value
0,902363382,2015,1,5,Weekday,0.0,2-Car,No injury/unknown,OTHER (DRIVER) - EXPLAIN IN NARRATIVE,1ST & FESS,39.159207,-86.525874,"39.15920668,-86.52587356",0
1,902364268,2015,1,6,Weekday,1500.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,2ND & COLLEGE,39.16144,-86.534848,"39.16144,-86.534848",0
2,902364412,2015,1,6,Weekend,2300.0,2-Car,Non-incapacitating,DISREGARD SIGNAL/REG SIGN,BASSWOOD & BLOOMFIELD,39.14978,-86.56889,"39.14978027,-86.56889006",1
3,902364551,2015,1,7,Weekend,900.0,2-Car,Non-incapacitating,FAILURE TO YIELD RIGHT OF WAY,GATES & JACOBS,39.165655,-86.575956,"39.165655,-86.57595635",1
4,902364615,2015,1,7,Weekend,1100.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,W 3RD,39.164848,-86.579625,"39.164848,-86.57962482",0


In [18]:
df3.shape


(49005, 14)

In [19]:
df3.dtypes

Master Record Number      int64
Year                      int64
Month                     int64
Day                       int64
Weekend?                 object
Hour                    float64
Collision Type           object
Injury Type              object
Primary Factor           object
Reported_Location        object
Latitude                float64
Longitude               float64
ll                       object
Injury Type Value         int64
dtype: object

# Foursquare API

Foursquare API documentation is [here](https://developer.foursquare.com/)

1. Start a foursquare application and get your keys.
2. For each crash, create the function **get_venues** that will pull bars in the radius of 5km around the crash

#### example
`get_venues('48.146394, 17.107969')`

3. Find a relationship (if there is any) between number of bars in the area and severity of the crash.

HINTs: 
- check out python package "foursquare" (no need to send HTTP requests directly with library `requests`)
- **categoryId** for bars and nightlife needs to be found in the [foursquare API documentation](https://developer.foursquare.com/docs/api-reference/venues/search/)

In [20]:
import foursquare

In [21]:
#set the keys
foursquare_id = os.environ["FOURSQUARE_API_ID"]
foursquare_secret = os.environ["FOURSQUARE_API_SECRET"]

# Set the versioning parameter.

v = "20200731"

In [22]:
client = foursquare.Foursquare(client_id=foursquare_id, client_secret=foursquare_secret, version=v)

In [23]:
# To find the bars, we will use the categoryID for nightlife spots.

def get_venues(coords):
    """Returns a response from an HTTP request which contains information about the bars in a 5 km radius of the input coordinates."""

    response = client.venues.search(
        params={'ll': coords, 'radius': 5000, "query": "bar", 
                "categoryId": "4d4b7105d754a06376d81259"})
    return response

In [20]:
coords_1 = df3['ll'].iloc[0]
print(coords_1)

39.15920668,-86.52587356


In [21]:
practice = get_venues(coords_1)
practice

{'venues': [{'id': '4aea47d8f964a520bbba21e3',
   'name': "Kilroy's Bar & Grill",
   'location': {'address': '502 E Kirkwood Ave',
    'lat': 39.16647162279348,
    'lng': -86.52808720706821,
    'labeledLatLngs': [{'label': 'display',
      'lat': 39.16647162279348,
      'lng': -86.52808720706821},
     {'label': 'entrance', 'lat': 39.166417, 'lng': -86.528033}],
    'distance': 830,
    'postalCode': '47408',
    'cc': 'US',
    'city': 'Bloomington',
    'state': 'IN',
    'country': 'United States',
    'formattedAddress': ['502 E Kirkwood Ave',
     'Bloomington, IN 47408',
     'United States']},
   'categories': [{'id': '4bf58dd8d48988d116941735',
     'name': 'Bar',
     'pluralName': 'Bars',
     'shortName': 'Bar',
     'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/nightlife/pub_',
      'suffix': '.png'},
     'primary': True}],
   'venuePage': {'id': '54966015'},
   'referralId': 'v-1605809190',
   'hasPerk': False},
  {'id': '4b074185f964a520ccfa22e3',
   'na

In [22]:
# By examining the JSON response, we can see that the number of bars can be calculated.

len(practice["venues"])  

29

In [24]:
def number_of_bars(coords):
    """Returns the number of bars in a 5 km radius of the input coordinates."""
    
    response = get_venues(coords)
    return (len(response["venues"]))

In [24]:
number_of_bars(coords_1)

29

In [25]:
coords_2 = df3['ll'].value_counts().index[0]
print(coords_2)

39.16424,-86.4984


In [26]:
number_of_bars(coords_2)

28

In [25]:
df4 = pd.DataFrame({'ll': df3['ll'].unique()})
print(df4.head())
print(len(df4))

                         ll
0  39.15920668,-86.52587356
1       39.16144,-86.534848
2  39.14978027,-86.56889006
3    39.165655,-86.57595635
4    39.164848,-86.57962482
19397


In [43]:
# We will take a random sample of size 300 of the locations.

df_bars = df4.sample(n=300)
len(df_bars)

300

In [44]:
df_bars.head()

Unnamed: 0,ll
5397,"39.15134698,-86.61031799"
3780,"39.13608,-86.529792"
15373,"39.136144,-86.5284764"
126,"39.079248,-86.619792"
19122,"39.14500942,-86.57779384"


In [45]:
df_bars.nunique()

ll    300
dtype: int64

In [50]:
df_bars['ll'].iloc[0:10].apply(number_of_bars)

6244     29
18469    27
13083     4
13826    28
3363      1
9059      1
4076      1
7132     13
12245    27
18530    27
Name: ll, dtype: int64

In [46]:
df_bars["Number of bars"] = df_bars['ll'].apply(number_of_bars)

On November 19, 2020 at 1:40 PM, I exceeded the call limits for the Foursquare API because I tried to make at least 8000 requests. 
If I get another chance, I will limit the number of requests to 250.

After receiving assistance from Socorro through Compass, she said that the daily call quota is 950. I will limit the number of requests to 300.

In [47]:
df_bars.head()

Unnamed: 0,ll,Number of bars
5397,"39.15134698,-86.61031799",3
3780,"39.13608,-86.529792",29
15373,"39.136144,-86.5284764",29
126,"39.079248,-86.619792",0
19122,"39.14500942,-86.57779384",19


In [48]:
df5 = pd.merge(df3, df_bars, on='ll')

In [49]:
df5.head()

Unnamed: 0,Master Record Number,Year,Month,Day,Weekend?,Hour,Collision Type,Injury Type,Primary Factor,Reported_Location,Latitude,Longitude,ll,Injury Type Value,Number of bars
0,902370118,2015,1,6,Weekday,1200.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,HENDERSON & HILLSIDE,39.150636,-86.527207,"39.15063632,-86.52720688",0,29
1,902374549,2015,1,5,Weekday,800.0,1-Car,No injury/unknown,UNSAFE LANE MOVEMENT,FOREST & KIRKWOOD,39.166089,-86.519626,"39.16608943,-86.51962569999999",0,27
2,902375928,2015,1,6,Weekday,1800.0,1-Car,No injury/unknown,RAN OFF ROAD RIGHT,KOONTZ & W DUVALL,39.079248,-86.619792,"39.079248,-86.619792",0,0
3,502617,2003,10,2,Weekday,700.0,2-Car,No injury/unknown,UNSAFE SPEED,DUVALL & KOONTY RD,39.079248,-86.619792,"39.079248,-86.619792",0,0
4,902402485,2015,2,4,Weekday,1200.0,2-Car,No injury/unknown,UNSAFE BACKING,E 8TH & N FESS,39.1696,-86.525712,"39.1696,-86.525712",0,27


In [50]:
len(df5)

683

In [51]:
df5["Number of bars"].value_counts().head(10)

27    201
28    143
0      41
29     40
21     35
22     30
1      21
2      21
4      20
23     19
Name: Number of bars, dtype: int64

In [52]:
df6 = df5.iloc[:, -3:]

In [53]:
df6.head()

Unnamed: 0,ll,Injury Type Value,Number of bars
0,"39.15063632,-86.52720688",0,29
1,"39.16608943,-86.51962569999999",0,27
2,"39.079248,-86.619792",0,0
3,"39.079248,-86.619792",0,0
4,"39.1696,-86.525712",0,27


In [54]:
df6.groupby(["Injury Type Value", 'Number of bars']).size().head(10)

Injury Type Value  Number of bars
0                  0                 31
                   1                 10
                   2                 11
                   3                  7
                   4                 14
                   5                  7
                   7                  2
                   8                  1
                   10                 3
                   13                 5
dtype: int64

In [55]:
df6["Injury Type Value"].value_counts()

0    542
1    123
2     16
3      2
Name: Injury Type Value, dtype: int64

In [56]:
df6.corr()

Unnamed: 0,Injury Type Value,Number of bars
Injury Type Value,1.0,-0.257495
Number of bars,-0.257495,1.0


In [57]:
df6.corr().iloc[0, 1]

-0.2574949702494932

In [58]:
f"The correlation between the number of bars in the area and severity of the crash is {df6.corr().iloc[0, 1]}."

'The correlation between the number of bars in the area and severity of the crash is -0.2574949702494932.'

This means that there is a weak negative relationship between the number of bars in the area and severity of the crash.

Note that the sample size is small and is biased towards the "No injury/unknown" group.

We will take another sample where it is equally divided among the 4 "Injury Type Value" groups.

In [59]:
df_itv_0 = df3[df3["Injury Type Value"] == 0]
len(df_itv_0)

37467

In [60]:
df_itv_0_sample = df_itv_0.sample(n=75)
len(df_itv_0_sample)

75

In [61]:
df_itv_1 = df3[df3["Injury Type Value"] == 1]
len(df_itv_1)

10427

In [62]:
df_itv_1_sample = df_itv_1.sample(n=75)
len(df_itv_1_sample)

75

In [63]:
df_itv_2 = df3[df3["Injury Type Value"] == 2]
len(df_itv_2)

1003

In [64]:
df_itv_2_sample = df_itv_2.sample(n=75)
len(df_itv_2_sample)

75

In [65]:
df_itv_3 = df3[df3["Injury Type Value"] == 3]
len(df_itv_3)

108

In [66]:
df_itv_3_sample = df_itv_3.sample(n=75)
len(df_itv_3_sample)

75

In [67]:
pieces = [df_itv_0_sample, df_itv_1_sample, df_itv_2_sample, df_itv_3_sample]
df_sample = pd.concat(pieces)


In [68]:
df_sample.head()

Unnamed: 0,Master Record Number,Year,Month,Day,Weekend?,Hour,Collision Type,Injury Type,Primary Factor,Reported_Location,Latitude,Longitude,ll,Injury Type Value
47492,1580362,2004,11,5,Weekday,1200.0,2-Car,No injury/unknown,IMPROPER PASSING,3RD ST & BALLANTINE,39.16424,-86.518921,"39.16424,-86.51892108",0
47632,1934051,2004,12,7,Weekend,1500.0,2-Car,No injury/unknown,DISREGARD SIGNAL/REG SIGN,ATWATER ST & HENDERSON,39.163344,-86.5272,"39.163344,-86.5272",0
42482,1836456,2005,1,3,Weekday,1500.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,7TH & MORTON,39.168672,-86.536,"39.168671999999994,-86.536",0
1047,902416939,2015,3,1,Weekend,200.0,1-Car,No injury/unknown,RAN OFF ROAD RIGHT,SR46W,39.197824,-86.573989,"39.19782378,-86.57398881",0
24356,901482224,2010,8,4,Weekday,1100.0,1-Car,No injury/unknown,OTHER (DRIVER) - EXPLAIN IN NARRATIVE,VALLEY MISSION RD,39.008304,-86.507552,"39.008303999999995,-86.507552",0


In [69]:
df_sample["Injury Type Value"].value_counts()

3    75
2    75
1    75
0    75
Name: Injury Type Value, dtype: int64

In [70]:
df_sample["Number of bars"] = df_sample['ll'].apply(number_of_bars)

In [71]:
df_sample.head()

Unnamed: 0,Master Record Number,Year,Month,Day,Weekend?,Hour,Collision Type,Injury Type,Primary Factor,Reported_Location,Latitude,Longitude,ll,Injury Type Value,Number of bars
47492,1580362,2004,11,5,Weekday,1200.0,2-Car,No injury/unknown,IMPROPER PASSING,3RD ST & BALLANTINE,39.16424,-86.518921,"39.16424,-86.51892108",0,28
47632,1934051,2004,12,7,Weekend,1500.0,2-Car,No injury/unknown,DISREGARD SIGNAL/REG SIGN,ATWATER ST & HENDERSON,39.163344,-86.5272,"39.163344,-86.5272",0,27
42482,1836456,2005,1,3,Weekday,1500.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,7TH & MORTON,39.168672,-86.536,"39.168671999999994,-86.536",0,27
1047,902416939,2015,3,1,Weekend,200.0,1-Car,No injury/unknown,RAN OFF ROAD RIGHT,SR46W,39.197824,-86.573989,"39.19782378,-86.57398881",0,21
24356,901482224,2010,8,4,Weekday,1100.0,1-Car,No injury/unknown,OTHER (DRIVER) - EXPLAIN IN NARRATIVE,VALLEY MISSION RD,39.008304,-86.507552,"39.008303999999995,-86.507552",0,2


In [72]:
df_sample_2 = df_sample.iloc[:, -3:]
df_sample_2.head()

Unnamed: 0,ll,Injury Type Value,Number of bars
47492,"39.16424,-86.51892108",0,28
47632,"39.163344,-86.5272",0,27
42482,"39.168671999999994,-86.536",0,27
1047,"39.19782378,-86.57398881",0,21
24356,"39.008303999999995,-86.507552",0,2


In [73]:
df_sample_2.corr()

Unnamed: 0,Injury Type Value,Number of bars
Injury Type Value,1.0,-0.334923
Number of bars,-0.334923,1.0


In [74]:
df_sample_2.corr().iloc[0, 1]

-0.334922646902454

In [75]:
f"The correlation between the number of bars in the area and severity of the crash is {df_sample_2.corr().iloc[0, 1]}."

'The correlation between the number of bars in the area and severity of the crash is -0.334922646902454.'

This means that there is a moderate negative relationship between the number of bars in the area and severity of the crash.

On November 23, 2020, I spoke to the mentor Arunabh Singh to discuss this question and he said the following.

"There is no correlation between number of bars and the intensity of accident.  This is because if someone is drunk driving he just needs to hit once and it can be critical. It does not need to be correlated to many bars in the area.
The mean,median are similar or unrelated for severity of crash.

question 3 states

Find a relationship (if there is any) between number of bars in the area and severity of the crash

so there isnt any relationship

On November 24, 2020, I spoke to Arunabh Singh about the problem again.  He used df5.groupby["Injury Type Value"]["Number of bars"].mean() (or median).  When he did this, for the first 3 Injury Type Values, the mean/median was about 26 and for Injury Type Value "3", it was about 5.  I was getting a negative correlation because for the highest "Injury Type Value", the mean/median was very low.

We will take a random sample of size 500 of the locations.

In [26]:

df_bars2 = df4.sample(n=500)
len(df_bars2)

500

In [27]:
df_bars2.head()

Unnamed: 0,ll
230,"39.137088,-86.53056"
14907,"39.1645,-86.567998"
3666,"39.14654277,-86.53242392"
1877,"39.16249847,-86.58585357"
18295,"39.211104,-86.62055369"


In [28]:
df_bars2.nunique()

ll    500
dtype: int64

In [29]:
df_bars2["Number of bars"] = df_bars2['ll'].apply(number_of_bars)

In [30]:
df_bars2.head()

Unnamed: 0,ll,Number of bars
230,"39.137088,-86.53056",29
14907,"39.1645,-86.567998",22
3666,"39.14654277,-86.53242392",29
1877,"39.16249847,-86.58585357",19
18295,"39.211104,-86.62055369",4


In [31]:
df7 = pd.merge(df3, df_bars2, on='ll')

In [32]:
df7.head()

Unnamed: 0,Master Record Number,Year,Month,Day,Weekend?,Hour,Collision Type,Injury Type,Primary Factor,Reported_Location,Latitude,Longitude,ll,Injury Type Value,Number of bars
0,902374663,2015,1,5,Weekday,1500.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,N WALNUT & SR46E,39.186368,-86.5344,"39.186368,-86.5344",0,28
1,902404215,2015,2,7,Weekend,1100.0,2-Car,No injury/unknown,IMPROPER LANE USAGE,N WALNUT & SR4546W,39.186368,-86.5344,"39.186368,-86.5344",0,28
2,902409648,2015,3,7,Weekend,1800.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,SR46E & WALNUT,39.186368,-86.5344,"39.186368,-86.5344",0,28
3,902383313,2015,1,5,Weekday,1000.0,2-Car,Non-incapacitating,ACCELERATOR FAILURE OR DEFECTIVE,N WALNUT & SR46W,39.186368,-86.5344,"39.186368,-86.5344",1,28
4,902390384,2015,2,1,Weekday,2000.0,2-Car,Non-incapacitating,FOLLOWING TOO CLOSELY,N WALNUT & SR46E,39.186368,-86.5344,"39.186368,-86.5344",1,28


In [33]:
len(df7)

1805

In [34]:
df7["Number of bars"].value_counts().head(10)

28    569
27    522
29    140
21     88
25     69
4      62
0      55
19     36
22     35
26     34
Name: Number of bars, dtype: int64

In [37]:
df7.groupby("Injury Type Value")["Number of bars"].mean()

Injury Type Value
0    23.509353
1    23.064767
2    17.875000
3    13.200000
Name: Number of bars, dtype: float64

In [38]:
df7.groupby("Injury Type Value")["Number of bars"].median()

Injury Type Value
0    27
1    27
2    27
3    19
Name: Number of bars, dtype: int64

In [39]:
df8 = df7.iloc[:, -3:]

In [40]:
df8.head()

Unnamed: 0,ll,Injury Type Value,Number of bars
0,"39.186368,-86.5344",0,28
1,"39.186368,-86.5344",0,28
2,"39.186368,-86.5344",0,28
3,"39.186368,-86.5344",1,28
4,"39.186368,-86.5344",1,28


In [41]:
df8.groupby(["Injury Type Value", 'Number of bars']).size().head(10)

Injury Type Value  Number of bars
0                  0                 37
                   1                 18
                   2                 16
                   3                 24
                   4                 42
                   5                 16
                   6                  6
                   7                  5
                   8                  3
                   9                 15
dtype: int64

In [42]:
df8["Injury Type Value"].value_counts()

0    1390
1     386
2      24
3       5
Name: Injury Type Value, dtype: int64

In [43]:
df8.corr()

Unnamed: 0,Injury Type Value,Number of bars
Injury Type Value,1.0,-0.069217
Number of bars,-0.069217,1.0


In [44]:
df8.corr().iloc[0, 1]

-0.06921726778921909

In [45]:
f"The correlation between the number of bars in the area and severity of the crash is {df8.corr().iloc[0, 1]}."

'The correlation between the number of bars in the area and severity of the crash is -0.06921726778921909.'

This means that there is no or a negligible relationship between the number of bars in the area and severity of the crash.

# Visual Crossing API

Virtual Crossing API documentation is [here](https://www.visualcrossing.com/resources/documentation/)

1. Sign up for FREE api key if you haven't done that before.
2. For each crush, get the weather for the location and date.
3. Find a relationship between the weather and severity of the crash.

Hints:

* randomly sample only 250 or so (due to API limits), or pull weather only for smaller sample of crashes
* for sending HTTP requests check out "requests" library [here](http://docs.python-requests.org/en/master/)


In [20]:
df3.head(10)

Unnamed: 0,Master Record Number,Year,Month,Day,Weekend?,Hour,Collision Type,Injury Type,Primary Factor,Reported_Location,Latitude,Longitude,ll,Injury Type Value
0,902363382,2015,1,5,Weekday,0.0,2-Car,No injury/unknown,OTHER (DRIVER) - EXPLAIN IN NARRATIVE,1ST & FESS,39.159207,-86.525874,"39.15920668,-86.52587356",0
1,902364268,2015,1,6,Weekday,1500.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,2ND & COLLEGE,39.16144,-86.534848,"39.16144,-86.534848",0
2,902364412,2015,1,6,Weekend,2300.0,2-Car,Non-incapacitating,DISREGARD SIGNAL/REG SIGN,BASSWOOD & BLOOMFIELD,39.14978,-86.56889,"39.14978027,-86.56889006",1
3,902364551,2015,1,7,Weekend,900.0,2-Car,Non-incapacitating,FAILURE TO YIELD RIGHT OF WAY,GATES & JACOBS,39.165655,-86.575956,"39.165655,-86.57595635",1
4,902364615,2015,1,7,Weekend,1100.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,W 3RD,39.164848,-86.579625,"39.164848,-86.57962482",0
5,902364664,2015,1,6,Weekday,1800.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,BURKS & WALNUT,39.12667,-86.53137,"39.12666969,-86.53136998",0
6,902364682,2015,1,6,Weekday,1200.0,2-Car,No injury/unknown,DRIVER DISTRACTED - EXPLAIN IN NARRATIVE,SOUTH CURRY PIKE LOT 71,39.150825,-86.584899,"39.150825,-86.584899",0
7,902364683,2015,1,6,Weekday,1400.0,1-Car,Incapacitating,ENGINE FAILURE OR DEFECTIVE,NORTH LOUDEN RD,39.199272,-86.637024,"39.19927216,-86.63702393",2
8,902364714,2015,1,7,Weekend,1400.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,LIBERTY & W 3RD,39.16461,-86.57913,"39.16461021,-86.57913007",0
9,902364756,2015,1,7,Weekend,1600.0,1-Car,No injury/unknown,RAN OFF ROAD RIGHT,PATTERSON & W 3RD,39.16344,-86.55128,"39.16344009,-86.55128002",0


In [21]:
df3.dtypes

Master Record Number      int64
Year                      int64
Month                     int64
Day                       int64
Weekend?                 object
Hour                    float64
Collision Type           object
Injury Type              object
Primary Factor           object
Reported_Location        object
Latitude                float64
Longitude               float64
ll                       object
Injury Type Value         int64
dtype: object

In [22]:
df3.shape

(49005, 14)

In [23]:
df3.dropna(subset=['Year', 'Month', 'Day', 'Hour'], inplace=True)
df3.shape

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df3.dropna(subset=['Year', 'Month', 'Day', 'Hour'], inplace=True)


(48808, 14)

In [24]:
df3['Hour'].unique()

array([   0., 1500., 2300.,  900., 1100., 1800., 1200., 1400., 1600.,
       1700., 1300.,  700., 2100., 2000., 1900.,  400., 1000.,  600.,
        800., 2200.,  100.,  200.,  300.,  500.])

In [25]:
df3['Hour'] = (df3['Hour'] / 100)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df3['Hour'] = (df3['Hour'] / 100)


In [26]:
df3['Hour'].unique()

array([ 0., 15., 23.,  9., 11., 18., 12., 14., 16., 17., 13.,  7., 21.,
       20., 19.,  4., 10.,  6.,  8., 22.,  1.,  2.,  3.,  5.])

In [27]:
start_times = pd.to_datetime(df3[['Year', 'Month', 'Day', 'Hour']], utc=True)

In [28]:
df3['startDateTime'] = start_times.dt.strftime('%Y-%m-%dT%H:%M%:%S')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df3['startDateTime'] = start_times.dt.strftime('%Y-%m-%dT%H:%M%:%S')


In [29]:
df3.head(10)

Unnamed: 0,Master Record Number,Year,Month,Day,Weekend?,Hour,Collision Type,Injury Type,Primary Factor,Reported_Location,Latitude,Longitude,ll,Injury Type Value,startDateTime
0,902363382,2015,1,5,Weekday,0.0,2-Car,No injury/unknown,OTHER (DRIVER) - EXPLAIN IN NARRATIVE,1ST & FESS,39.159207,-86.525874,"39.15920668,-86.52587356",0,2015-01-05T00:00:00
1,902364268,2015,1,6,Weekday,15.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,2ND & COLLEGE,39.16144,-86.534848,"39.16144,-86.534848",0,2015-01-06T15:00:00
2,902364412,2015,1,6,Weekend,23.0,2-Car,Non-incapacitating,DISREGARD SIGNAL/REG SIGN,BASSWOOD & BLOOMFIELD,39.14978,-86.56889,"39.14978027,-86.56889006",1,2015-01-06T23:00:00
3,902364551,2015,1,7,Weekend,9.0,2-Car,Non-incapacitating,FAILURE TO YIELD RIGHT OF WAY,GATES & JACOBS,39.165655,-86.575956,"39.165655,-86.57595635",1,2015-01-07T09:00:00
4,902364615,2015,1,7,Weekend,11.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,W 3RD,39.164848,-86.579625,"39.164848,-86.57962482",0,2015-01-07T11:00:00
5,902364664,2015,1,6,Weekday,18.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,BURKS & WALNUT,39.12667,-86.53137,"39.12666969,-86.53136998",0,2015-01-06T18:00:00
6,902364682,2015,1,6,Weekday,12.0,2-Car,No injury/unknown,DRIVER DISTRACTED - EXPLAIN IN NARRATIVE,SOUTH CURRY PIKE LOT 71,39.150825,-86.584899,"39.150825,-86.584899",0,2015-01-06T12:00:00
7,902364683,2015,1,6,Weekday,14.0,1-Car,Incapacitating,ENGINE FAILURE OR DEFECTIVE,NORTH LOUDEN RD,39.199272,-86.637024,"39.19927216,-86.63702393",2,2015-01-06T14:00:00
8,902364714,2015,1,7,Weekend,14.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,LIBERTY & W 3RD,39.16461,-86.57913,"39.16461021,-86.57913007",0,2015-01-07T14:00:00
9,902364756,2015,1,7,Weekend,16.0,1-Car,No injury/unknown,RAN OFF ROAD RIGHT,PATTERSON & W 3RD,39.16344,-86.55128,"39.16344009,-86.55128002",0,2015-01-07T16:00:00


In [30]:
end_times = start_times +  pd.Timedelta(1, unit='hour')

In [31]:
df3['endDateTime'] = end_times.dt.strftime('%Y-%m-%dT%H:%M%:%S')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df3['endDateTime'] = end_times.dt.strftime('%Y-%m-%dT%H:%M%:%S')


In [32]:
df3.head(10)

Unnamed: 0,Master Record Number,Year,Month,Day,Weekend?,Hour,Collision Type,Injury Type,Primary Factor,Reported_Location,Latitude,Longitude,ll,Injury Type Value,startDateTime,endDateTime
0,902363382,2015,1,5,Weekday,0.0,2-Car,No injury/unknown,OTHER (DRIVER) - EXPLAIN IN NARRATIVE,1ST & FESS,39.159207,-86.525874,"39.15920668,-86.52587356",0,2015-01-05T00:00:00,2015-01-05T01:00:00
1,902364268,2015,1,6,Weekday,15.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,2ND & COLLEGE,39.16144,-86.534848,"39.16144,-86.534848",0,2015-01-06T15:00:00,2015-01-06T16:00:00
2,902364412,2015,1,6,Weekend,23.0,2-Car,Non-incapacitating,DISREGARD SIGNAL/REG SIGN,BASSWOOD & BLOOMFIELD,39.14978,-86.56889,"39.14978027,-86.56889006",1,2015-01-06T23:00:00,2015-01-07T00:00:00
3,902364551,2015,1,7,Weekend,9.0,2-Car,Non-incapacitating,FAILURE TO YIELD RIGHT OF WAY,GATES & JACOBS,39.165655,-86.575956,"39.165655,-86.57595635",1,2015-01-07T09:00:00,2015-01-07T10:00:00
4,902364615,2015,1,7,Weekend,11.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,W 3RD,39.164848,-86.579625,"39.164848,-86.57962482",0,2015-01-07T11:00:00,2015-01-07T12:00:00
5,902364664,2015,1,6,Weekday,18.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,BURKS & WALNUT,39.12667,-86.53137,"39.12666969,-86.53136998",0,2015-01-06T18:00:00,2015-01-06T19:00:00
6,902364682,2015,1,6,Weekday,12.0,2-Car,No injury/unknown,DRIVER DISTRACTED - EXPLAIN IN NARRATIVE,SOUTH CURRY PIKE LOT 71,39.150825,-86.584899,"39.150825,-86.584899",0,2015-01-06T12:00:00,2015-01-06T13:00:00
7,902364683,2015,1,6,Weekday,14.0,1-Car,Incapacitating,ENGINE FAILURE OR DEFECTIVE,NORTH LOUDEN RD,39.199272,-86.637024,"39.19927216,-86.63702393",2,2015-01-06T14:00:00,2015-01-06T15:00:00
8,902364714,2015,1,7,Weekend,14.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,LIBERTY & W 3RD,39.16461,-86.57913,"39.16461021,-86.57913007",0,2015-01-07T14:00:00,2015-01-07T15:00:00
9,902364756,2015,1,7,Weekend,16.0,1-Car,No injury/unknown,RAN OFF ROAD RIGHT,PATTERSON & W 3RD,39.16344,-86.55128,"39.16344009,-86.55128002",0,2015-01-07T16:00:00,2015-01-07T17:00:00


In [33]:
import requests
api_key = os.environ["VISUAL_CROSSING_API_KEY"]

In [34]:
url = "https://weather.visualcrossing.com/VisualCrossingWebServices/rest/services/weatherdata/history"

In [35]:
def get_weather(coords, start_time, end_time):
    """Returns a response from an HTTP request which contains information 
    about the weather from the start time to the end time at the 
    input coordinates."""
    
    params_gw = {"locations": coords,
                 "aggregateHours": 1,
                 "unitGroup": 'metric',
                 "startDateTime": start_time,
                 "endDateTime": end_time,
                 "shortColumnNames": 'false',
                 "contentType": 'json',
                 "locationMode": 'single',
                 "key": api_key
    }
    
    response = requests.get(url, params = params_gw)
    return response.json()

In [36]:
coords0 = df3['ll'].iloc[0]
start_time0 = df3['startDateTime'].iloc[0]
end_time0 = df3['endDateTime'].iloc[0]

print(coords0)
print(start_time0)
print(end_time0)

39.15920668,-86.52587356
2015-01-05T00:00:00
2015-01-05T01:00:00


In [41]:
practice_0 = get_weather(coords0, start_time0, end_time0)
practice_0

{'columns': {'wdir': {'id': 'wdir',
   'name': 'Wind Direction',
   'type': 2,
   'unit': None},
  'latitude': {'id': 'latitude', 'name': 'Latitude', 'type': 2, 'unit': None},
  'cloudcover': {'id': 'cloudcover',
   'name': 'Cloud Cover',
   'type': 2,
   'unit': '%'},
  'mint': {'id': 'mint',
   'name': 'Minimum Temperature',
   'type': 2,
   'unit': 'degC'},
  'datetime': {'id': 'datetime', 'name': 'Date time', 'type': 3, 'unit': None},
  'precip': {'id': 'precip', 'name': 'Precipitation', 'type': 2, 'unit': 'mm'},
  'solarradiation': {'id': 'solarradiation',
   'name': 'Solar Radiation',
   'type': 2,
   'unit': 'null/m^2'},
  'dew': {'id': 'dew', 'name': 'Dew Point', 'type': 2, 'unit': 'degC'},
  'humidity': {'id': 'humidity',
   'name': 'Relative Humidity',
   'type': 2,
   'unit': '%'},
  'precipcover': {'id': 'precipcover',
   'name': 'Precipitation Cover',
   'type': 2,
   'unit': '%'},
  'longitude': {'id': 'longitude',
   'name': 'Longitude',
   'type': 2,
   'unit': None},
 

In [42]:
practice_1 = get_weather(coords0, start_time0, start_time0)
practice_1

{'columns': {'wdir': {'id': 'wdir',
   'name': 'Wind Direction',
   'type': 2,
   'unit': None},
  'latitude': {'id': 'latitude', 'name': 'Latitude', 'type': 2, 'unit': None},
  'cloudcover': {'id': 'cloudcover',
   'name': 'Cloud Cover',
   'type': 2,
   'unit': '%'},
  'mint': {'id': 'mint',
   'name': 'Minimum Temperature',
   'type': 2,
   'unit': 'degC'},
  'datetime': {'id': 'datetime', 'name': 'Date time', 'type': 3, 'unit': None},
  'precip': {'id': 'precip', 'name': 'Precipitation', 'type': 2, 'unit': 'mm'},
  'solarradiation': {'id': 'solarradiation',
   'name': 'Solar Radiation',
   'type': 2,
   'unit': 'null/m^2'},
  'dew': {'id': 'dew', 'name': 'Dew Point', 'type': 2, 'unit': 'degC'},
  'humidity': {'id': 'humidity',
   'name': 'Relative Humidity',
   'type': 2,
   'unit': '%'},
  'precipcover': {'id': 'precipcover',
   'name': 'Precipitation Cover',
   'type': 2,
   'unit': '%'},
  'longitude': {'id': 'longitude',
   'name': 'Longitude',
   'type': 2,
   'unit': None},
 

In [52]:
practice_1['location']['values'][0]['temp']

-8.9

It seems that we do not need the 'endDateTime' column or 
the end_time argument in the function get_weather.

In [36]:
def get_weather2(coords, start_time):
    """Returns a response from an HTTP request which contains information 
    about the weather at the input coordinates and start time."""
    
    params_gw2 = {"locations": coords,
                 "aggregateHours": 1,
                 "unitGroup": 'metric',
                 "startDateTime": start_time,
                 "endDateTime": start_time,
                 "shortColumnNames": 'false',
                 "contentType": 'json',
                 "locationMode": 'single',
                 "key": api_key
    }
    
    response = requests.get(url, params = params_gw2)
    return response.json()

In [55]:
practice_2 = get_weather2(coords0, start_time0)
practice_2

{'columns': {'wdir': {'id': 'wdir',
   'name': 'Wind Direction',
   'type': 2,
   'unit': None},
  'latitude': {'id': 'latitude', 'name': 'Latitude', 'type': 2, 'unit': None},
  'cloudcover': {'id': 'cloudcover',
   'name': 'Cloud Cover',
   'type': 2,
   'unit': '%'},
  'mint': {'id': 'mint',
   'name': 'Minimum Temperature',
   'type': 2,
   'unit': 'degC'},
  'datetime': {'id': 'datetime', 'name': 'Date time', 'type': 3, 'unit': None},
  'precip': {'id': 'precip', 'name': 'Precipitation', 'type': 2, 'unit': 'mm'},
  'solarradiation': {'id': 'solarradiation',
   'name': 'Solar Radiation',
   'type': 2,
   'unit': 'null/m^2'},
  'dew': {'id': 'dew', 'name': 'Dew Point', 'type': 2, 'unit': 'degC'},
  'humidity': {'id': 'humidity',
   'name': 'Relative Humidity',
   'type': 2,
   'unit': '%'},
  'precipcover': {'id': 'precipcover',
   'name': 'Precipitation Cover',
   'type': 2,
   'unit': '%'},
  'longitude': {'id': 'longitude',
   'name': 'Longitude',
   'type': 2,
   'unit': None},
 

In [56]:
practice_2['location']['values'][0]['temp']

-8.9

In [37]:
def get_temp(coords, start_time):
    """Returns the temperature in degrees Celsius 
    at the input coordinates and start time."""
    
    response = get_weather2(coords, start_time)
    temp = response['location']['values'][0]['temp']
    return temp

In [58]:
get_temp(coords0, start_time0)

-8.9

In [38]:
df3.head().apply(lambda x: get_temp(x['ll'], x['startDateTime']), axis=1)

0    -8.9
1    -3.9
2   -10.0
3   -14.4
4   -13.9
dtype: float64

In [39]:
# We will take a random sample of size 200 of the records.

df_sample3 = df3.sample(n=200)
len(df_sample3)

200

In [40]:
df_sample3.head(10)

Unnamed: 0,Master Record Number,Year,Month,Day,Weekend?,Hour,Collision Type,Injury Type,Primary Factor,Reported_Location,Latitude,Longitude,ll,Injury Type Value,startDateTime,endDateTime
53068,502526,2003,10,7,Weekend,13.0,2-Car,No injury/unknown,ROADWAY SURFACE CONDITION,RHORER & WALNUT ST,39.12144,-86.526496,"39.12144,-86.526496",0,2003-10-07T13:00:00,2003-10-07T14:00:00
42311,1791260,2005,12,2,Weekday,15.0,2-Car,No injury/unknown,UNSAFE BACKING,10TH & 10TH ST,39.171472,-86.509376,"39.171471999999994,-86.509376",0,2005-12-02T15:00:00,2005-12-02T16:00:00
3514,902547203,2015,10,3,Weekday,17.0,2-Car,Non-incapacitating,FOLLOWING TOO CLOSELY,CONSTITUTION & LIBERTY,39.148729,-86.575914,"39.14872891,-86.57591446",1,2015-10-03T17:00:00,2015-10-03T18:00:00
1050,902416995,2015,3,1,Weekend,10.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,W 3RD,39.164848,-86.575882,"39.164848,-86.57588174",0,2015-03-01T10:00:00,2015-03-01T11:00:00
50841,502281,2003,3,4,Weekday,20.0,1-Car,No injury/unknown,,SR48 & VERNAL,39.168991,-86.66458,"39.16899073,-86.66458038",0,2003-03-04T20:00:00,2003-03-04T21:00:00
50990,283556,2003,5,6,Weekday,14.0,2-Car,No injury/unknown,OTHER (DRIVER) - EXPLAIN IN NARRATIVE,7TH & UNION ST,39.167255,-86.509301,"39.16725453,-86.50930090000001",0,2003-05-06T14:00:00,2003-05-06T15:00:00
6879,902168980,2014,1,7,Weekend,12.0,Pedestrian,Non-incapacitating,DISREGARD SIGNAL/REG SIGN,2ND ST & COLLEGE,39.16138,-86.53493,"39.16138016,-86.53492994",1,2014-01-07T12:00:00,2014-01-07T13:00:00
50238,502282,2003,3,5,Weekday,16.0,2-Car,Non-incapacitating,OTHER (DRIVER) - EXPLAIN IN NARRATIVE,SR37 & VERNA,39.176544,-86.5624,"39.176544,-86.5624",1,2003-03-05T16:00:00,2003-03-05T17:00:00
44195,1748061,2005,7,6,Weekday,15.0,3+ Cars,Non-incapacitating,FOLLOWING TOO CLOSELY,ROGERS & SARE,39.13584,-86.496288,"39.13584,-86.496288",1,2005-07-06T15:00:00,2005-07-06T16:00:00
22749,901543641,2010,12,5,Weekday,17.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,E 3RD ST & JEFFERSON,39.164304,-86.506992,"39.164304,-86.50699200000001",0,2010-12-05T17:00:00,2010-12-05T18:00:00


In [41]:
df_sample3['Temperature'] = df_sample3.apply(lambda x: get_temp(x['ll'], x['startDateTime']), axis=1)

In [42]:
df_sample3.head()

Unnamed: 0,Master Record Number,Year,Month,Day,Weekend?,Hour,Collision Type,Injury Type,Primary Factor,Reported_Location,Latitude,Longitude,ll,Injury Type Value,startDateTime,endDateTime,Temperature
53068,502526,2003,10,7,Weekend,13.0,2-Car,No injury/unknown,ROADWAY SURFACE CONDITION,RHORER & WALNUT ST,39.12144,-86.526496,"39.12144,-86.526496",0,2003-10-07T13:00:00,2003-10-07T14:00:00,24.0
42311,1791260,2005,12,2,Weekday,15.0,2-Car,No injury/unknown,UNSAFE BACKING,10TH & 10TH ST,39.171472,-86.509376,"39.171471999999994,-86.509376",0,2005-12-02T15:00:00,2005-12-02T16:00:00,-1.8
3514,902547203,2015,10,3,Weekday,17.0,2-Car,Non-incapacitating,FOLLOWING TOO CLOSELY,CONSTITUTION & LIBERTY,39.148729,-86.575914,"39.14872891,-86.57591446",1,2015-10-03T17:00:00,2015-10-03T18:00:00,10.0
1050,902416995,2015,3,1,Weekend,10.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,W 3RD,39.164848,-86.575882,"39.164848,-86.57588174",0,2015-03-01T10:00:00,2015-03-01T11:00:00,-1.1
50841,502281,2003,3,4,Weekday,20.0,1-Car,No injury/unknown,,SR48 & VERNAL,39.168991,-86.66458,"39.16899073,-86.66458038",0,2003-03-04T20:00:00,2003-03-04T21:00:00,7.0


In [45]:
df_sample4 = df_sample3.iloc[:, -4:]
df_sample4.head()

Unnamed: 0,Injury Type Value,startDateTime,endDateTime,Temperature
53068,0,2003-10-07T13:00:00,2003-10-07T14:00:00,24.0
42311,0,2005-12-02T15:00:00,2005-12-02T16:00:00,-1.8
3514,1,2015-10-03T17:00:00,2015-10-03T18:00:00,10.0
1050,0,2015-03-01T10:00:00,2015-03-01T11:00:00,-1.1
50841,0,2003-03-04T20:00:00,2003-03-04T21:00:00,7.0


In [49]:
df_sample4["Injury Type Value"].value_counts()

0    152
1     46
2      2
Name: Injury Type Value, dtype: int64

In [46]:
df_sample4.corr()

Unnamed: 0,Injury Type Value,Temperature
Injury Type Value,1.0,0.221229
Temperature,0.221229,1.0


In [47]:
df_sample4.corr().iloc[0, 1]

0.22122945424910806

In [48]:
f"The correlation between the weather and severity of the crash is {df_sample4.corr().iloc[0, 1]}."

'The correlation between the weather and severity of the crash is 0.22122945424910806.'

This means that there is a weak positive relationship between the weather and severity of the crash.