# Project I
### Wrangling the API data

The point of this exercise is to try data enrichment with data from external APIs. We are going to take data about car crashes in Monroe County, Indiana from 2003 to 2015 and try to figure out the weather during the accident and how many bars there are in the area. We will work with two different APIs during this challenge:

- Foursquare API
- Visual Crossing API

We will try to find correlations between the severity of crash and weather/number of bars in the area. To indicate the severity of a crash, we will use column `Injury Type`.

## Data

The data for this exercise can be found [here](https://drive.google.com/file/d/1_KF9oIJV8cB8i3ngA4JPOLWIE_ETE6CJ/view?usp=sharing).

Just run the cells below to get your data ready. Little help from us.


In [1]:
import pandas as pd
import os
pd.set_option("display.max_columns", None)

### Data Preparation

Tasks:
- load the data in Pandas DataFrame
- drop the columns where latitude or longitude is missing
- drop the columns where latitude and longitude are 0.0
- create column `ll` that contains concatenated strings of latitude and longitude

In [2]:
df = pd.read_csv("monroe-county-crash-data2003-to-2015.csv", encoding= 'unicode_escape')
df.head(10)

Unnamed: 0,Master Record Number,Year,Month,Day,Weekend?,Hour,Collision Type,Injury Type,Primary Factor,Reported_Location,Latitude,Longitude
0,902363382,2015,1,5,Weekday,0.0,2-Car,No injury/unknown,OTHER (DRIVER) - EXPLAIN IN NARRATIVE,1ST & FESS,39.159207,-86.525874
1,902364268,2015,1,6,Weekday,1500.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,2ND & COLLEGE,39.16144,-86.534848
2,902364412,2015,1,6,Weekend,2300.0,2-Car,Non-incapacitating,DISREGARD SIGNAL/REG SIGN,BASSWOOD & BLOOMFIELD,39.14978,-86.56889
3,902364551,2015,1,7,Weekend,900.0,2-Car,Non-incapacitating,FAILURE TO YIELD RIGHT OF WAY,GATES & JACOBS,39.165655,-86.575956
4,902364615,2015,1,7,Weekend,1100.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,W 3RD,39.164848,-86.579625
5,902364664,2015,1,6,Weekday,1800.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,BURKS & WALNUT,39.12667,-86.53137
6,902364682,2015,1,6,Weekday,1200.0,2-Car,No injury/unknown,DRIVER DISTRACTED - EXPLAIN IN NARRATIVE,SOUTH CURRY PIKE LOT 71,39.150825,-86.584899
7,902364683,2015,1,6,Weekday,1400.0,1-Car,Incapacitating,ENGINE FAILURE OR DEFECTIVE,NORTH LOUDEN RD,39.199272,-86.637024
8,902364714,2015,1,7,Weekend,1400.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,LIBERTY & W 3RD,39.16461,-86.57913
9,902364756,2015,1,7,Weekend,1600.0,1-Car,No injury/unknown,RAN OFF ROAD RIGHT,PATTERSON & W 3RD,39.16344,-86.55128


In [3]:
df.shape

(53943, 12)

In [4]:
df.dtypes

Master Record Number      int64
Year                      int64
Month                     int64
Day                       int64
Weekend?                 object
Hour                    float64
Collision Type           object
Injury Type              object
Primary Factor           object
Reported_Location        object
Latitude                float64
Longitude               float64
dtype: object

In [5]:
df2 = df.dropna(subset=['Latitude', 'Longitude'])
df2.head(10)

Unnamed: 0,Master Record Number,Year,Month,Day,Weekend?,Hour,Collision Type,Injury Type,Primary Factor,Reported_Location,Latitude,Longitude
0,902363382,2015,1,5,Weekday,0.0,2-Car,No injury/unknown,OTHER (DRIVER) - EXPLAIN IN NARRATIVE,1ST & FESS,39.159207,-86.525874
1,902364268,2015,1,6,Weekday,1500.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,2ND & COLLEGE,39.16144,-86.534848
2,902364412,2015,1,6,Weekend,2300.0,2-Car,Non-incapacitating,DISREGARD SIGNAL/REG SIGN,BASSWOOD & BLOOMFIELD,39.14978,-86.56889
3,902364551,2015,1,7,Weekend,900.0,2-Car,Non-incapacitating,FAILURE TO YIELD RIGHT OF WAY,GATES & JACOBS,39.165655,-86.575956
4,902364615,2015,1,7,Weekend,1100.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,W 3RD,39.164848,-86.579625
5,902364664,2015,1,6,Weekday,1800.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,BURKS & WALNUT,39.12667,-86.53137
6,902364682,2015,1,6,Weekday,1200.0,2-Car,No injury/unknown,DRIVER DISTRACTED - EXPLAIN IN NARRATIVE,SOUTH CURRY PIKE LOT 71,39.150825,-86.584899
7,902364683,2015,1,6,Weekday,1400.0,1-Car,Incapacitating,ENGINE FAILURE OR DEFECTIVE,NORTH LOUDEN RD,39.199272,-86.637024
8,902364714,2015,1,7,Weekend,1400.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,LIBERTY & W 3RD,39.16461,-86.57913
9,902364756,2015,1,7,Weekend,1600.0,1-Car,No injury/unknown,RAN OFF ROAD RIGHT,PATTERSON & W 3RD,39.16344,-86.55128


In [6]:
df2.shape

(53913, 12)

In [7]:
# Here we are removing the columns where latitude and longitude are 0.0.

df3 = df2[(df2["Latitude"] != 0) | (df2["Longitude"] != 0)] # | represents Boolean OR.
df3.shape

(49005, 12)

In [8]:
df3.head(10)

Unnamed: 0,Master Record Number,Year,Month,Day,Weekend?,Hour,Collision Type,Injury Type,Primary Factor,Reported_Location,Latitude,Longitude
0,902363382,2015,1,5,Weekday,0.0,2-Car,No injury/unknown,OTHER (DRIVER) - EXPLAIN IN NARRATIVE,1ST & FESS,39.159207,-86.525874
1,902364268,2015,1,6,Weekday,1500.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,2ND & COLLEGE,39.16144,-86.534848
2,902364412,2015,1,6,Weekend,2300.0,2-Car,Non-incapacitating,DISREGARD SIGNAL/REG SIGN,BASSWOOD & BLOOMFIELD,39.14978,-86.56889
3,902364551,2015,1,7,Weekend,900.0,2-Car,Non-incapacitating,FAILURE TO YIELD RIGHT OF WAY,GATES & JACOBS,39.165655,-86.575956
4,902364615,2015,1,7,Weekend,1100.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,W 3RD,39.164848,-86.579625
5,902364664,2015,1,6,Weekday,1800.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,BURKS & WALNUT,39.12667,-86.53137
6,902364682,2015,1,6,Weekday,1200.0,2-Car,No injury/unknown,DRIVER DISTRACTED - EXPLAIN IN NARRATIVE,SOUTH CURRY PIKE LOT 71,39.150825,-86.584899
7,902364683,2015,1,6,Weekday,1400.0,1-Car,Incapacitating,ENGINE FAILURE OR DEFECTIVE,NORTH LOUDEN RD,39.199272,-86.637024
8,902364714,2015,1,7,Weekend,1400.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,LIBERTY & W 3RD,39.16461,-86.57913
9,902364756,2015,1,7,Weekend,1600.0,1-Car,No injury/unknown,RAN OFF ROAD RIGHT,PATTERSON & W 3RD,39.16344,-86.55128


In [9]:
df3['ll'] = df3["Latitude"].astype(str) + "," + df3["Longitude"].astype(str)
df3.head(10)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df3['ll'] = df3["Latitude"].astype(str) + "," + df3["Longitude"].astype(str)


Unnamed: 0,Master Record Number,Year,Month,Day,Weekend?,Hour,Collision Type,Injury Type,Primary Factor,Reported_Location,Latitude,Longitude,ll
0,902363382,2015,1,5,Weekday,0.0,2-Car,No injury/unknown,OTHER (DRIVER) - EXPLAIN IN NARRATIVE,1ST & FESS,39.159207,-86.525874,"39.15920668,-86.52587356"
1,902364268,2015,1,6,Weekday,1500.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,2ND & COLLEGE,39.16144,-86.534848,"39.16144,-86.534848"
2,902364412,2015,1,6,Weekend,2300.0,2-Car,Non-incapacitating,DISREGARD SIGNAL/REG SIGN,BASSWOOD & BLOOMFIELD,39.14978,-86.56889,"39.14978027,-86.56889006"
3,902364551,2015,1,7,Weekend,900.0,2-Car,Non-incapacitating,FAILURE TO YIELD RIGHT OF WAY,GATES & JACOBS,39.165655,-86.575956,"39.165655,-86.57595635"
4,902364615,2015,1,7,Weekend,1100.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,W 3RD,39.164848,-86.579625,"39.164848,-86.57962482"
5,902364664,2015,1,6,Weekday,1800.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,BURKS & WALNUT,39.12667,-86.53137,"39.12666969,-86.53136998"
6,902364682,2015,1,6,Weekday,1200.0,2-Car,No injury/unknown,DRIVER DISTRACTED - EXPLAIN IN NARRATIVE,SOUTH CURRY PIKE LOT 71,39.150825,-86.584899,"39.150825,-86.584899"
7,902364683,2015,1,6,Weekday,1400.0,1-Car,Incapacitating,ENGINE FAILURE OR DEFECTIVE,NORTH LOUDEN RD,39.199272,-86.637024,"39.19927216,-86.63702393"
8,902364714,2015,1,7,Weekend,1400.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,LIBERTY & W 3RD,39.16461,-86.57913,"39.16461021,-86.57913007"
9,902364756,2015,1,7,Weekend,1600.0,1-Car,No injury/unknown,RAN OFF ROAD RIGHT,PATTERSON & W 3RD,39.16344,-86.55128,"39.16344009,-86.55128002"


In [10]:
df3.shape

(49005, 13)

In [11]:
df3.dtypes

Master Record Number      int64
Year                      int64
Month                     int64
Day                       int64
Weekend?                 object
Hour                    float64
Collision Type           object
Injury Type              object
Primary Factor           object
Reported_Location        object
Latitude                float64
Longitude               float64
ll                       object
dtype: object

In [12]:
df3["Injury Type"].value_counts()

No injury/unknown     37467
Non-incapacitating    10427
Incapacitating         1003
Fatal                   108
Name: Injury Type, dtype: int64

In [13]:
len(df3['ll'].value_counts())

19397

In [14]:
df3['ll'].value_counts().head(10)

39.16424,-86.4984                279
39.164752,-86.573104             275
39.148144,-86.572992             232
39.125312,-86.610496             216
39.186368,-86.5344               211
39.164640000000006,-86.5324      188
39.176544,-86.5624               187
39.164640000000006,-86.534816    178
39.186544,-86.537888             176
39.16424,-86.492944              176
Name: ll, dtype: int64

In [15]:
df3['ll'].value_counts().tail(10)

39.16421,-86.492181                1
39.17750426,-86.55644796           1
39.17164293,-86.50184909           1
39.164352,-86.52137169             1
39.164832000000004,-86.58574962    1
39.17350743,-86.51532028           1
39.12569931,-86.48956011           1
39.15179179,-86.52698792           1
39.18054262,-86.54090403           1
39.19026997,-86.53769008           1
Name: ll, dtype: int64

# Foursquare API

Foursquare API documentation is [here](https://developer.foursquare.com/)

1. Start a foursquare application and get your keys.
2. For each crash, create the function **get_venues** that will pull bars in the radius of 5km around the crash

#### example
`get_venues('48.146394, 17.107969')`

3. Find a relationship (if there is any) between number of bars in the area and severity of the crash.

HINTs: 
- check out python package "foursquare" (no need to send HTTP requests directly with library `requests`)
- **categoryId** for bars and nightlife needs to be found in the [foursquare API documentation](https://developer.foursquare.com/docs/api-reference/venues/search/)

In [16]:
import foursquare

In [17]:
#set the keys
foursquare_id = os.environ["FOURSQUARE_API_ID"]
foursquare_secret = os.environ["FOURSQUARE_API_SECRET"]

# Set the versioning parameter.

v = "20200731"

In [18]:
client = foursquare.Foursquare(client_id=foursquare_id, client_secret=foursquare_secret, version=v)

In [19]:
# To find the bars, we will use the categoryID for nightlife spots.

def get_venues(coords):
    """Returns a response from an HTTP request which contains information about the bars in a 5 km radius of the input coordinates."""

    response = client.venues.search(
        params={'ll': coords, 'radius': 5000, "query": "bar", 
                "categoryId": "4d4b7105d754a06376d81259"})
    return response

In [20]:
coords_1 = df3['ll'].iloc[0]
print(coords_1)

39.15920668,-86.52587356


In [21]:
practice = get_venues(coords_1)
practice

{'venues': [{'id': '4aea47d8f964a520bbba21e3',
   'name': "Kilroy's Bar & Grill",
   'location': {'address': '502 E Kirkwood Ave',
    'lat': 39.16647162279348,
    'lng': -86.52808720706821,
    'labeledLatLngs': [{'label': 'display',
      'lat': 39.16647162279348,
      'lng': -86.52808720706821},
     {'label': 'entrance', 'lat': 39.166417, 'lng': -86.528033}],
    'distance': 830,
    'postalCode': '47408',
    'cc': 'US',
    'city': 'Bloomington',
    'state': 'IN',
    'country': 'United States',
    'formattedAddress': ['502 E Kirkwood Ave',
     'Bloomington, IN 47408',
     'United States']},
   'categories': [{'id': '4bf58dd8d48988d116941735',
     'name': 'Bar',
     'pluralName': 'Bars',
     'shortName': 'Bar',
     'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/nightlife/pub_',
      'suffix': '.png'},
     'primary': True}],
   'venuePage': {'id': '54966015'},
   'referralId': 'v-1605809190',
   'hasPerk': False},
  {'id': '4b074185f964a520ccfa22e3',
   'na

In [22]:
# By examining the JSON response, we can see that the number of bars can be calculated.

len(practice["venues"])  

29

In [23]:
def number_of_bars(coords):
    """Returns the number of bars in a 5 km radius of the input coordinates."""
    
    response = get_venues(coords)
    return (len(response["venues"]) )

In [24]:
number_of_bars(coords_1)

29

In [25]:
coords_2 = df3['ll'].value_counts().index[0]
print(coords_2)

39.16424,-86.4984


In [26]:
number_of_bars(coords_2)

28

In [43]:
df4 = pd.DataFrame({'ll': df3['ll'].unique()})
print(df4.head())
print(len(df4))

                         ll
0  39.15920668,-86.52587356
1       39.16144,-86.534848
2  39.14978027,-86.56889006
3    39.165655,-86.57595635
4    39.164848,-86.57962482
19397


In [52]:
# We will take a random sample of size 250 of the locations.

df_bars = df4.sample(n=250)
len(df_bars)

250

In [53]:
df_bars.head()

Unnamed: 0,ll
19233,"39.129344,-86.519328"
15689,"39.23084122,-86.4302342"
2388,"39.28039983,-86.52268029"
16994,"39.24666446,-86.53145486"
128,"39.16851475,-86.52695381"


In [54]:
df_bars.nunique()

ll    250
dtype: int64

In [55]:
df_bars["Number of bars"] = df_bars['ll'].apply(number_of_bars)
df_bars.shape

Unknown error. meta: {'code': 429, 'errorType': 'quota_exceeded', 'errorDetail': 'Quota exceeded', 'requestId': '5fb6bc01d9a71631444d224a'}
Unknown error. meta: {'code': 429, 'errorType': 'quota_exceeded', 'errorDetail': 'Quota exceeded', 'requestId': '5fb6bc02ebf5db5626c3db6e'}
Unknown error. meta: {'code': 429, 'errorType': 'quota_exceeded', 'errorDetail': 'Quota exceeded', 'requestId': '5fb6bc03f20935015a5b8b66'}


FoursquareException: Unknown error. meta: {'code': 429, 'errorType': 'quota_exceeded', 'errorDetail': 'Quota exceeded', 'requestId': '5fb6bc03f20935015a5b8b66'}

In [50]:
df_bars['ll'].iloc[0:10].apply(number_of_bars)

6244     29
18469    27
13083     4
13826    28
3363      1
9059      1
4076      1
7132     13
12245    27
18530    27
Name: ll, dtype: int64

On November 19, 2020 at 1:40 PM, I exceeded the call limits for the Foursquare API because I tried to make at least 8000 requests. 
If I get another chance, I will limit the number of requests to 250.

# Visual Crossing API

Virtual Crossing API documentation is [here](https://www.visualcrossing.com/resources/documentation/)

1. Sign up for FREE api key if you haven't done that before.
2. For each crush, get the weather for the location and date.
3. Find a relationship between the weather and severity of the crash.

Hints:

* randomly sample only 250 or so (due to API limits), or pull weather only for smaller sample of crashes
* for sending HTTP requests check out "requests" library [here](http://docs.python-requests.org/en/master/)


In [193]:
import requests
api_key = os.environ[""]