# Data Wrangling Challenge
### Pull and manipulate the API data

The point of this exercise is to try data enrichment with data from external APIs. We are going to take data about car crashes in Monroe County, Indiana from 2003 to 2015 and try to figure out the weather during the accident and how many bars there are in the area. We will work with Following APIs during this challenge:

- [Yelp](https://www.yelp.com/developers/documentation/v3/get_started) - to put the number of bars and restaurants in the area!
- [World Weather Online API](https://www.worldweatheronline.com/developer/api/historical-weather-api.aspx) as **Stretch work**

We will try to find correlations between the severity of crash and weather/number of bars in the area. To indicate the severity of a crash, we will use column `Injury Type`.

## Data

The data for this exercise can be found [here](https://drive.google.com/file/d/1_KF9oIJV8cB8i3ngA4JPOLWIE_ETE6CJ/view?usp=sharing).

Just run the cells below to get your data ready. Little help from us.


In [None]:
import pandas as pd
import requests

In [None]:
data = pd.read_csv("data/monroe-county-crash-data2003-to-2015.csv", encoding="unicode_escape")
# ========================
# preparing data
data.dropna(subset=['Latitude', 'Longitude'], inplace=True)
# creation of variable with lon and lat together
data['ll'] = data['Latitude'].astype(str) + ',' + data['Longitude'].astype(str)
data = data[data['ll'] != '0.0,0.0']
print(data.shape)
data.head()

In [None]:
data['Injury Type'].value_counts()

In [None]:
data['severity'] = 0
data.loc[data['Injury Type'] == 'Non-incapacitating', 'severity'] = 1
data.loc[data['Injury Type'] == 'Incapacitating', 'severity'] = 2
data.loc[data['Injury Type'] == 'Fatal', 'severity'] = 3

In [None]:
data['date'] = (data.Year.astype(str) + 
                "-" +
                data.Month.astype(str).str.zfill(2) + 
                "-" +
                data.Day.astype(str).str.zfill(2))

In [None]:
data.head()

# Yelp API

Yelp API documentation is [here](https://www.yelp.com/developers/documentation/v3/get_started)

1. Get Your API Key
2. For each crash, use endpoint `/businesses/search` to pull number of nightlife bar and other entertainments in the radius of 10km around the crash
3. Find a relationship (if there is any) between number of bars in the area and severity of the crash.

HINTs: 
- from the beginning, start pulling number of bars for smaller sample of crashes (500 or so) only
- **list of categories** for bars and nightlife can to be found in the [Yelp API documentation](https://www.yelp.com/developers/documentation/v3/all_category_list)

In [None]:
#set the keys
apikey = "IAKwjSZkQHe6mX6ECDAS2AroHbxbjy3cnLOJ7CHdfuUAYC1_rZHoM_Q5IiE9CX1clRHALZD3fo27pvppu8jWAJObP8mk6gfr4m6sy_PHjZDLvDcHelZVJMLddM5eYXYx"

headers = {
        'Authorization': f'Bearer {apikey}'
    }

In [None]:
url_params = {
        'term': '', # empty string to search all
        'latitude': '40.7128',
        'longitude': '74.0060',
#         'term': 'restaurants',
        'limit': 50, # we want to take first 50
        'radius': 10000 # 1km radius
}

In [None]:
url = 'https://api.yelp.com/v3/businesses/search'

In [None]:
resp = requests.get(url=url, headers = headers, params=url_params)
res = resp.json()

In [None]:
# data

In [None]:
test_data = data.head(1000)

In [None]:
lats = test_data.Latitude.tolist()
lons = test_data.Longitude.tolist()

In [None]:
# # getting restauraYears in the neighborhood
totals = []
for i in range(len(lats)):
# for i in range(5):
    url_params = {
            'latitude': str(lats[i]),
            'longitude': str(lons[i]),
            'categories': 'nightlife',
            'radius': 1000
        }

    url = 'https://api.yelp.com/v3/businesses/search'

    resp = requests.get(url=url, headers=headers, params=url_params)
    res = resp.json()
    totals.append(res['total'])

In [None]:
totals

In [None]:
test_data['no_bars'] = totals

In [None]:
test_data[['severity','no_bars']].corr()

# Stretch

## World Weather Online API

World Weather Online API is [here](https://www.worldweatheronline.com/developer/api/historical-weather-api.aspx)

1. Sign up for FREE api key if you haven't done that before (it's free for **60 days**).
2. For each crush, get the weather for the location and date.
3. Find a relationship between the weather and severity of the crash.


In [None]:
import requests
import time
api_key = ''

In [None]:
import requests

In [None]:
data.head()

In [None]:
data["DATE_"] = data.Year.astype(str) + '-' + data.Month.astype(str).str.zfill(2) + '-' + data.Day.astype(str).str.zfill(2)

In [None]:
data["DATE_"][1]

In [None]:
# this sample URL is from the documentation here: https://www.worldweatheronline.com/developer/api/docs/historical-weather-api.aspx
# api_key = os.environ['']
api_key = ''
date = data["DATE_"][1]
location = place
url = f"https://api.worldweatheronline.com/premium/v1/past-weather.ashx?q={location}&date={date}&format=json&key={api_key}"

In [None]:
res = requests.get(url)

In [None]:
# parse JSON in the result to get the values
res = res.json()["data"]["weather"][0]#.pop('hourly')
hourly = res.pop('hourly')

In [None]:
# pd.DataFrame(res)

In [None]:
weather_results = []
# trying only on 200 rows because of number of requests
for index in data.head(200).index:
    start_date = data["DATE_"][index]
    location = data["ll"][index]
    url = f"https://api.worldweatheronline.com/premium/v1/past-weather.ashx?q={location}&date={start_date}&format=json&key={api_key}"
    res = requests.get(url)
    res = res.json()["data"]["weather"][0]
    hourly = res.pop('hourly')
    weather_results.append(res)

In [None]:
# weather_results[0]

In [None]:
df_weather = pd.DataFrame(weather_results)

In [None]:
df_weather.head()

In [None]:
df_weather.shape

In [None]:
data

In [None]:
# combining data together
df_res = pd.concat([data.head(200), df_weather], axis=1)

In [None]:
df_res

In [None]:
df_res["severity"] = 0
df_res.loc[df_res["Injury Type"] == 'Non-incapacitating',"severity"] = 1
df_res.loc[df_res["Injury Type"] == 'Incapacitating',"severity"] = 2
df_res.loc[df_res["Injury Type"] == 'Fatal',"severity"] = 3

In [None]:
# correlation between the weather and severity of car crash
df_res[['maxtempC','mintempC', 'avgtempC','totalSnow_cm','sunHour','uvIndex', 'severity']].apply(pd.to_numeric).corr()