# Data Wrangling Challenge
### Pull and manipulate the API data

The point of this exercise is to try data enrichment with data from external APIs. We are going to take data about car crashes in Monroe County, Indiana from 2003 to 2015 and try to figure out the weather during the accident and how many bars there are in the area. We will work with two different APIs during this challenge:

- Foursquare API
- World Weather Online API

We will try to find correlations between the severity of crash and weather/number of bars in the area. To indicate the severity of a crash, we will use column `Injury Type`.

## Data

The data for this exercise can be found [here](https://drive.google.com/file/d/1_KF9oIJV8cB8i3ngA4JPOLWIE_ETE6CJ/view?usp=sharing).

Just run the cells below to get your data ready. Little help from us.


In [1]:
import pandas as pd
import os

In [2]:
data = pd.read_csv("data/monroe-county-crash-data2003-to-2015.csv", encoding="unicode_escape")
# ========================
# preparing data
data.dropna(subset=['Latitude', 'Longitude'], inplace=True)
# creation of variable with lon and lat together
data['ll'] = data['Latitude'].astype(str) + ',' + data['Longitude'].astype(str)
data = data[data['ll'] != '0.0,0.0']
print(data.shape)
data.head()

(49005, 13)


Unnamed: 0,Master Record Number,Year,Month,Day,Weekend?,Hour,Collision Type,Injury Type,Primary Factor,Reported_Location,Latitude,Longitude,ll
0,902363382,2015,1,5,Weekday,0.0,2-Car,No injury/unknown,OTHER (DRIVER) - EXPLAIN IN NARRATIVE,1ST & FESS,39.159207,-86.525874,"39.15920668,-86.52587356"
1,902364268,2015,1,6,Weekday,1500.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,2ND & COLLEGE,39.16144,-86.534848,"39.16144,-86.534848"
2,902364412,2015,1,6,Weekend,2300.0,2-Car,Non-incapacitating,DISREGARD SIGNAL/REG SIGN,BASSWOOD & BLOOMFIELD,39.14978,-86.56889,"39.14978027,-86.56889006"
3,902364551,2015,1,7,Weekend,900.0,2-Car,Non-incapacitating,FAILURE TO YIELD RIGHT OF WAY,GATES & JACOBS,39.165655,-86.575956,"39.165655,-86.57595635"
4,902364615,2015,1,7,Weekend,1100.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,W 3RD,39.164848,-86.579625,"39.164848,-86.57962482"


# Foursquare API

Foursquare API documentation is [here](https://developer.foursquare.com/)

1. Start a foursquare application and get your keys.
2. For each crash, create the function **get_venues** that will pull bars in the radius of 5km around the crash

#### example
`get_venues('48.146394, 17.107969')`

3. Find a relationship (if there is any) between number of bars in the area and severity of the crash.

HINTs: 
- check out python package "foursquare" (no need to send HTTP requests directly with library `requests`)
- **categoryId** for bars and nightlife needs to be found in the [foursquare API documentation](https://developer.foursquare.com/docs/api-reference/venues/search/)

In [33]:

import requests
import json
import pandas as pd
from pandas.io.json import json_normalize

#set the keys
foursquare_id = os.environ["FOURSQUARE_CLIENT_ID"]
foursquare_secret = os.environ["FOURSQAURE_CLIENT_SECRET"]
foursquare_api_key = os.environ["FOURSQUARE_API_KEY"]

In [16]:
# client = foursquare.Foursquare(client_id=foursquare_id, client_secret=foursquare_secret, redirect_uri='http://fondu.com/oauth/authorize')
# auth_uri = client.oauth.auth_url()
## make my own auth instead




SyntaxError: invalid syntax (1281618037.py, line 1)

In [115]:
def getVenues(ll):
    """
    Takes a string of ll , i.e. '11111111,1111111' and returns the number of bars within 5km, max 50 results.
    """
    import requests
    import json
    import pandas as pd
    
    search_endpoint = "https://api.foursquare.com/v3/places/search"

    headers = {
    "Accept": "application/json",
    "Authorization": foursquare_api_key
    }

    bar_params = {
        'll' : str(ll),
        #'query': 'bar',
        'radius': '5000',
        'categories': '10032',
        'limit': '50'
    }

    request = requests.get(search_endpoint, headers=headers, params=bar_params)

    return len(pd.json_normalize(request.json()['results']))

In [116]:
length = getVenues('45.018261,-74.728577')
type(length)

int

In [117]:
len(data)

49005

In [176]:
injuries = list(data['Injury Type'].unique())
injuries

['No injury/unknown', 'Non-incapacitating', 'Incapacitating', 'Fatal']

In [202]:
bars_5km = []

data_sample = data[0:999]

for index, row in data_sample.iterrows():
    bars_5km.append(getVenues(str(row['Latitude']) + ',' + str(row['Longitude'])))

In [203]:
bars_series = pd.Series(bars_5km)
data_sample['clubs 5km'] = bars_series.values

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_sample['clubs 5km'] = bars_series.values


In [204]:
data_sample

Unnamed: 0,Master Record Number,Year,Month,Day,Weekend?,Hour,Collision Type,Injury Type,Primary Factor,Reported_Location,Latitude,Longitude,ll,clubs 5km
0,902363382,2015,1,5,Weekday,0.0,2-Car,No injury/unknown,OTHER (DRIVER) - EXPLAIN IN NARRATIVE,1ST & FESS,39.159207,-86.525874,"39.15920668,-86.52587356",4
1,902364268,2015,1,6,Weekday,1500.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,2ND & COLLEGE,39.161440,-86.534848,"39.16144,-86.534848",4
2,902364412,2015,1,6,Weekend,2300.0,2-Car,Non-incapacitating,DISREGARD SIGNAL/REG SIGN,BASSWOOD & BLOOMFIELD,39.149780,-86.568890,"39.14978027,-86.56889006",4
3,902364551,2015,1,7,Weekend,900.0,2-Car,Non-incapacitating,FAILURE TO YIELD RIGHT OF WAY,GATES & JACOBS,39.165655,-86.575956,"39.165655,-86.57595635",4
4,902364615,2015,1,7,Weekend,1100.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,W 3RD,39.164848,-86.579625,"39.164848,-86.57962482",4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1066,902417767,2015,3,2,Weekday,1300.0,1-Car,No injury/unknown,SPEED TOO FAST FOR WEATHER CONDITIONS,W MAIN STREET,39.238336,-86.629328,"39.238336,-86.629328",1
1067,902418005,2015,3,2,Weekday,1600.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,SR37S & TAPP,39.137040,-86.572896,"39.13704,-86.572896",4
1068,902418014,2015,3,2,Weekday,1800.0,2-Car,No injury/unknown,UNSAFE BACKING,W 17TH,39.178960,-86.534720,"39.17896,-86.53472",4
1069,902418058,2015,3,2,Weekday,2200.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,7TH & WALNUT,39.168640,-86.533568,"39.16864,-86.533568",4


In [206]:
injuries = data_sample['Injury Type'].value_counts()
injuries

clubs_mean = data_sample.groupby(by='Injury Type')['clubs 5km'].mean()
clubs_mean

Injury Type
Fatal                 0.000000
Incapacitating        2.652778
No injury/unknown     3.062954
Non-incapacitating    3.275510
Name: clubs 5km, dtype: float64

In [None]:
## out of 1000 samples, how many severe crashes had an amount of bars higher than the mean.

for index, row in data_sample.iterrows():
    if data_sample['Injury Type'] != 

# World Weather Online API

World Weather Online API is [here](https://www.worldweatheronline.com/developer/api/historical-weather-api.aspx)

1. Sign up for FREE api key if you haven't done that before (it's free for **30 days**).
2. For each crush, get the weather for the location and date.
3. Find a relationship between the weather and severity of the crash.

Hints:

* pull weather only for smaller sample of crashes (250 or so) due to API limits
* for sending HTTP requests check out "requests" library [here](http://docs.python-requests.org/en/master/)


In [193]:
import requests
import time
api_key = os.environ["WWO_APIK_KEY"]