# Pull and manipulate the API data

The point of this exercise is to try data enrichment with data from external APIs. We are going to take data about road kills in Vermont in 2006 and try to figure out the weather during the accident and how many bars there are in the area. We will work with one api that has a Python wrapper library and one that does not.



## Data

The data for this exercise can be found [here](https://drive.google.com/file/d/1Nk-5RZC1yzJl7HzOJd3tzUjzMENSSdJJ/view?usp=sharing).

Just run the cells below to get your data ready. Little help from us.


In [169]:
import pandas as pd
mypath = "/home/itod/Desktop/basecamp_spring_2019/itodorovic/data/"

In [173]:
data = pd.read_csv(mypath + "VT_VehicleAnimal_Collisions__2006.csv", 
                   na_values=['', ' '],
                   parse_dates=["DATE_"]
                  )
data["MONTH_"] = data.DATE_.dt.month
data.dropna(subset=['X', 'Y'], inplace=True)
data['ll'] = data['Y'].astype(str) + ',' + data['X'].astype(str)
print(data.shape)
data.head()

(2244, 23)


Unnamed: 0,X,Y,OBJECTID,MSRI_CODE,MSRI_DESCR,DATE_,TOWN_ID,ROUTE_DES,ROUTE,BEGIN_MM,...,TOWN,RT_TOWN,RT_NUM,DAY_,MONTH_,YEAR_,REP_AGEN,LOCATION,YEAR_INT,ll
4,-72.989424,42.833309,5,Deer,Deer,2004-04-16,209.0,VT 8,8,0.35,...,READSBORO,,,,4.0,2004,AOT,,2004,"42.83330874047745,-72.98942388725152"
5,-72.375495,43.581005,6,Deer RK AB,"Deer, RoadKill, Adult Buck",2005-03-22,1409.0,US 5,5,5.0,...,HARTLAND,,,,3.0,2005,AOT,,2005,"43.581005107295645,-72.37549549789222"
6,-72.840257,43.235023,7,Lg Bird,"Large Bird (hawk, owl, turkey, waterfowl)",2004-11-15,1310.0,VT 11,11,0.5,...,LONDONDERRY,,,,11.0,2004,AOT,,2004,"43.23502288333559,-72.8402568531839"
7,-72.830991,44.783002,8,Otter,Otter,2004-10-18,601.0,VT 36,36,1.0,...,BAKERSFIELD,,,,10.0,2004,AOT,,2004,"44.78300220380285,-72.83099079061695"
9,-72.514595,44.242323,10,Beaver,Beaver,2005-04-22,1207.0,US 2,2,0.19,...,EAST MONTPEL,,,,4.0,2005,AOT,,2005,"44.24232301744294,-72.51459452575021"


## Foursquare API

Foursquare API documentation is [here](https://developer.foursquare.com/)

1. Start a foursquare application and get your keys.
2. For each crush, pull number of of bars (category "Nightlife") in 5km radius.
3. Find a relationship between number of bars in the area and severity of the crash.
4. (optional) Try to come up with other approaches to get more information out of the data. 
5. (optional) Think about the most generic way to approach the problem.

Hints:

* check out python package "foursquare"
* what happens if the code fails?
* what if you run out of requests? (check out [time](https://docs.python.org/2/library/time.html) package)

In [16]:
#set the keys
foursquare_id = "B1C144JU4YV5KOTLC4BY0QEXPSYWPCTXUN1NOU3CMYNRQVU2"
foursquare_secret = "FOXGD24MZWVHIA0C3BBIDF00ZMKYNNSM0QAC0IF4H5ODEGO5"

In [131]:
#install and load the library
import foursquare
from foursquare import FoursquareException

# Construct the client object
client = foursquare.Foursquare(client_id=foursquare_id, client_secret=foursquare_secret, redirect_uri='http://fondu.com/oauth/authorize')

# Build the authorization url for your app
auth_uri = client.oauth.auth_url()

In [132]:
#declare the client
def get_venues(ll):
    params = {'intent': 'browse',
          'categoryId': '4d4b7105d754a06376d81259',
          'radius': 5000,
          'limit': 50}
    params['ll']=ll
    
    try:
        result = client.venues.search(params=params)
        return len(result['venues'])
    except FoursquareException as e:
        print(e)
        return -1

#get_venues('48.146394, 17.107969')

In [None]:
data['ll'] = data['Y'].astype(str) + ',' + data['X'].astype(str)

for ind, row in data.iterrows():
    print(get_venues(row['ll']))

## DarkSky API

DarkSky API documentation is [here](https://darksky.net/dev/docs/time-machine)

1. Sign up for FREE api key.
2. For each crush, get the weather for the location and time.
3. Find a relationship between the weather and severity of the crash.

Hints:

* randomly sample only 500 or so (due to API limits)
* use "Time Machine" request in DarkSky API
* for sending HTTP requests check out "requests" library [here](http://docs.python-requests.org/en/master/)



In [193]:
import requests
import time
api_key = "bb7c474338ee56898672d089fb85ced8"

In [195]:
# how many unique towns we have, to try to go under API limits
print(data['TOWN_ID'].nunique())

106


In [437]:
import numpy as np
import datetime
#towns = data.groupby('TOWN_ID').min()
#towns['timestamp'] = (pd.DatetimeIndex(towns['DATE_']).astype(np.int64)/1000000).astype(np.int64)
data['TOWN'] = data['TOWN'].str.upper()
towns = data.groupby(['TOWN','DATE_'])['X','Y'].mean()
towns['ll'] = towns['Y'].astype(str) + ',' + towns['X'].astype(str)
towns = towns.reset_index()

In [446]:
towns.shape

(2105, 5)

In [338]:
base_url = 'https://api.darksky.net/forecast/'
ll = '43.581617569506605,-72.51483075364598'
time_only = 'T00:00:00'
sufix = '?exclude=currently,flags'

In [334]:
resp = requests.get(base_url + api_key + '/' + ll + ',' + '2004-04-13' + time_only + sufix)

In [393]:
df = pd.DataFrame(resp.json()['daily']['data'][0], index=[0])

Unnamed: 0,time,summary,icon,sunriseTime,sunsetTime,moonPhase,precipIntensity,precipIntensityMax,precipIntensityMaxTime,precipProbability,...,uvIndexTime,visibility,temperatureMin,temperatureMinTime,temperatureMax,temperatureMaxTime,apparentTemperatureMin,apparentTemperatureMinTime,apparentTemperatureMax,apparentTemperatureMaxTime
0,1081828800,Heavy rain in the morning and evening.,rain,1081851094,1081899240,0.8,0.0279,0.3211,1081904400,1,...,1081868400,7.6,34.16,1081850400,40.76,1081911600,34.16,1081850400,40.76,1081911600


In [455]:
batch = []
batch.append(towns.iloc[0:489, :])
batch.append(towns.iloc[490:879, :])
batch.append(towns.iloc[880:1269, :])
batch.append(towns.iloc[1270:1659, :])
batch.append(towns.iloc[1660:, :])

TOWN             object
DATE_    datetime64[ns]
X               float64
Y               float64
ll               object
dtype: object

In [465]:
df_api = pd.DataFrame()

# we only use first 490 rows from original dataframe
tmp = batch[0]

for ind, row in tmp.iterrows():
    req_str = base_url + api_key + '/' + row['ll'] + ',' + str(row['DATE_'])[:-9] + time_only + sufix
    resp = requests.get(req_str)
    print(resp.json())
    try:
        weather = resp.json()['daily']['data'][0]
        df_api = df_api.append(pd.DataFrame(weather, index = [ind]))
        print(weather)
    except:
        pass


{'latitude': 44.03603367456014, 'longitude': -73.41895683118159, 'timezone': 'America/New_York', 'offset': -4}
{'latitude': 44.095871380936586, 'longitude': -73.30033208392106, 'timezone': 'America/New_York', 'hourly': {'summary': 'Clear throughout the day.', 'icon': 'clear-day', 'data': [{'time': 957326400, 'summary': 'Clear', 'icon': 'clear-night', 'precipIntensity': 0, 'precipProbability': 0, 'temperature': 41.33, 'apparentTemperature': 41.33, 'dewPoint': 28.95, 'humidity': 0.61, 'pressure': 1023.1, 'windSpeed': 0, 'cloudCover': 0, 'uvIndex': 0, 'visibility': 10}, {'time': 957330000, 'summary': 'Clear', 'icon': 'clear-night', 'precipIntensity': 0, 'precipProbability': 0, 'temperature': 39.34, 'apparentTemperature': 39.34, 'dewPoint': 28.95, 'humidity': 0.66, 'pressure': 1023.51, 'windSpeed': 0, 'cloudCover': 0, 'uvIndex': 0, 'visibility': 10}, {'time': 957333600, 'summary': 'Clear', 'icon': 'clear-night', 'precipIntensity': 0, 'precipProbability': 0, 'temperature': 38.26, 'apparentT

In [402]:
print(df_api.shape)
print(towns.shape)

(358, 39)
(358, 5)


In [403]:
towns.drop(['X', 'Y', 'll'], inplace = True, axis = 1)

In [407]:
towns = towns.merge(df_api, how='outer', left_index=True, right_index=True)

In [408]:
towns.head()

Unnamed: 0,TOWN_ID,DATE_,apparentTemperatureHigh,apparentTemperatureHighTime,apparentTemperatureLow,apparentTemperatureLowTime,apparentTemperatureMax,apparentTemperatureMaxTime,apparentTemperatureMin,apparentTemperatureMinTime,...,temperatureMin,temperatureMinTime,time,uvIndex,uvIndexTime,visibility,windBearing,windGust,windGustTime,windSpeed
0,107.0,2004-04-13,38.31,1081893600,38.46,1081900800,40.76,1081911600,34.16,1081850400,...,34.16,1081850400,1081828800,4,1081868400,7.6,157.0,21.02,1081879000.0,1.72
1,108.0,2004-09-16,72.36,1095361200,59.42,1095415200,72.36,1095361200,51.32,1095332400,...,51.32,1095332400,1095307200,6,1095350400,8.8,191.0,12.01,1095358000.0,0.8
2,201.0,2004-06-18,79.49,1087596000,65.38,1087646400,79.49,1087596000,64.94,1087552800,...,64.33,1087552800,1087531200,10,1087574400,9.21,158.0,10.99,1087582000.0,1.52
3,202.0,2004-02-06,30.16,1076112000,29.65,1076148000,35.45,1076115600,16.92,1076076000,...,21.72,1076065200,1076043600,2,1076083200,5.7,95.0,24.17,1076094000.0,3.31
4,202.0,2004-06-16,80.19,1087423200,60.23,1087462800,80.19,1087423200,53.6,1087380000,...,53.6,1087380000,1087358400,10,1087401600,8.89,265.0,7.99,1087405000.0,0.6


In [412]:
print(data.shape)
print(towns.shape)

(2244, 23)
(358, 41)


In [414]:
data.head()

Unnamed: 0,X,Y,OBJECTID,MSRI_CODE,MSRI_DESCR,DATE_,TOWN_ID,ROUTE_DES,ROUTE,BEGIN_MM,...,TOWN,RT_TOWN,RT_NUM,DAY_,MONTH_,YEAR_,REP_AGEN,LOCATION,YEAR_INT,ll
4,-72.989424,42.833309,5,Deer,Deer,2004-04-16,209.0,VT 8,8,0.35,...,READSBORO,,,,4.0,2004,AOT,,2004,"42.83330874047745,-72.98942388725152"
5,-72.375495,43.581005,6,Deer RK AB,"Deer, RoadKill, Adult Buck",2005-03-22,1409.0,US 5,5,5.0,...,HARTLAND,,,,3.0,2005,AOT,,2005,"43.581005107295645,-72.37549549789222"
6,-72.840257,43.235023,7,Lg Bird,"Large Bird (hawk, owl, turkey, waterfowl)",2004-11-15,1310.0,VT 11,11,0.5,...,LONDONDERRY,,,,11.0,2004,AOT,,2004,"43.23502288333559,-72.8402568531839"
7,-72.830991,44.783002,8,Otter,Otter,2004-10-18,601.0,VT 36,36,1.0,...,BAKERSFIELD,,,,10.0,2004,AOT,,2004,"44.78300220380285,-72.83099079061695"
9,-72.514595,44.242323,10,Beaver,Beaver,2005-04-22,1207.0,US 2,2,0.19,...,EAST MONTPEL,,,,4.0,2005,AOT,,2005,"44.24232301744294,-72.51459452575021"


In [416]:
towns.head()

Unnamed: 0,TOWN_ID,DATE_,apparentTemperatureHigh,apparentTemperatureHighTime,apparentTemperatureLow,apparentTemperatureLowTime,apparentTemperatureMax,apparentTemperatureMaxTime,apparentTemperatureMin,apparentTemperatureMinTime,...,temperatureMin,temperatureMinTime,time,uvIndex,uvIndexTime,visibility,windBearing,windGust,windGustTime,windSpeed
0,107.0,2004-04-13,38.31,1081893600,38.46,1081900800,40.76,1081911600,34.16,1081850400,...,34.16,1081850400,1081828800,4,1081868400,7.6,157.0,21.02,1081879000.0,1.72
1,108.0,2004-09-16,72.36,1095361200,59.42,1095415200,72.36,1095361200,51.32,1095332400,...,51.32,1095332400,1095307200,6,1095350400,8.8,191.0,12.01,1095358000.0,0.8
2,201.0,2004-06-18,79.49,1087596000,65.38,1087646400,79.49,1087596000,64.94,1087552800,...,64.33,1087552800,1087531200,10,1087574400,9.21,158.0,10.99,1087582000.0,1.52
3,202.0,2004-02-06,30.16,1076112000,29.65,1076148000,35.45,1076115600,16.92,1076076000,...,21.72,1076065200,1076043600,2,1076083200,5.7,95.0,24.17,1076094000.0,3.31
4,202.0,2004-06-16,80.19,1087423200,60.23,1087462800,80.19,1087423200,53.6,1087380000,...,53.6,1087380000,1087358400,10,1087401600,8.89,265.0,7.99,1087405000.0,0.6


In [428]:
df_merged = data.merge(towns, how = 'left', on = ['TOWN_ID', 'DATE_'])

In [426]:
data.dtypes

X                    float64
Y                    float64
OBJECTID               int64
MSRI_CODE             object
MSRI_DESCR            object
DATE_         datetime64[ns]
TOWN_ID              float64
ROUTE_DES             object
ROUTE                 object
BEGIN_MM             float64
END_MM               float64
LRS_CODE              object
COMMENT               object
TOWN                  object
RT_TOWN              float64
RT_NUM               float64
DAY_                 float64
MONTH_               float64
YEAR_                  int64
REP_AGEN              object
LOCATION              object
YEAR_INT               int64
ll                    object
dtype: object

In [425]:
towns.dtypes

TOWN_ID                               float64
DATE_                          datetime64[ns]
apparentTemperatureHigh               float64
apparentTemperatureHighTime             int64
apparentTemperatureLow                float64
apparentTemperatureLowTime              int64
apparentTemperatureMax                float64
apparentTemperatureMaxTime              int64
apparentTemperatureMin                float64
apparentTemperatureMinTime              int64
cloudCover                            float64
dewPoint                              float64
humidity                              float64
icon                                   object
moonPhase                             float64
precipAccumulation                    float64
precipIntensity                       float64
precipIntensityMax                    float64
precipIntensityMaxTime                float64
precipProbability                     float64
precipType                             object
pressure                          

In [429]:
df_merged.shape

(2244, 62)

In [432]:
data.count()

X             2244
Y             2244
OBJECTID      2244
MSRI_CODE     2244
MSRI_DESCR     419
DATE_         2205
TOWN_ID        419
ROUTE_DES      419
ROUTE          419
BEGIN_MM      2244
END_MM        2244
LRS_CODE       419
COMMENT        304
TOWN          2244
RT_TOWN          0
RT_NUM           0
DAY_          1825
MONTH_        2205
YEAR_         2244
REP_AGEN      2244
LOCATION      1821
YEAR_INT      2244
ll            2244
dtype: int64

In [431]:
df_merged

Unnamed: 0,X,Y,OBJECTID,MSRI_CODE,MSRI_DESCR,DATE_,TOWN_ID,ROUTE_DES,ROUTE,BEGIN_MM,...,temperatureMin,temperatureMinTime,time,uvIndex,uvIndexTime,visibility,windBearing,windGust,windGustTime,windSpeed
0,-72.989424,42.833309,5,Deer,Deer,2004-04-16,209.0,VT 8,008,0.35,...,25.20,1.082110e+09,1.082088e+09,8.0,1.082131e+09,10.00,308.0,,,3.58
1,-72.375495,43.581005,6,Deer RK AB,"Deer, RoadKill, Adult Buck",2005-03-22,1409.0,US 5,005,5.00,...,19.79,1.111489e+09,1.111468e+09,6.0,1.111511e+09,10.00,267.0,17.25,1.111511e+09,1.68
2,-72.840257,43.235023,7,Lg Bird,"Large Bird (hawk, owl, turkey, waterfowl)",2004-11-15,1310.0,VT 11,011,0.50,...,20.32,1.100520e+09,1.100495e+09,2.0,1.100531e+09,10.00,0.0,14.19,1.100549e+09,0.59
3,-72.830991,44.783002,8,Otter,Otter,2004-10-18,601.0,VT 36,036,1.00,...,33.25,1.098155e+09,1.098072e+09,2.0,1.098112e+09,9.97,276.0,17.56,1.098079e+09,4.74
4,-72.514595,44.242323,10,Beaver,Beaver,2005-04-22,1207.0,US 2,002,0.19,...,25.95,1.114164e+09,1.114142e+09,8.0,1.114186e+09,10.00,144.0,12.17,1.114186e+09,1.31
5,-73.195801,43.433986,12,Lg Bird,"Large Bird (hawk, owl, turkey, waterfowl)",2005-04-06,1126.0,VT 30,030,2.80,...,37.96,1.112782e+09,1.112760e+09,7.0,1.112807e+09,9.94,58.0,10.01,1.112764e+09,0.33
6,-72.904462,44.807817,13,Otter,Otter,2004-10-04,605.0,VT 36,036,6.60,...,43.57,1.096945e+09,1.096862e+09,4.0,1.096906e+09,10.00,207.0,25.31,1.096906e+09,3.16
7,-72.393480,43.462770,17,Other,Other,2005-12-20,1423.0,US 5,005,2.40,...,15.65,1.135066e+09,1.135055e+09,1.0,1.135091e+09,9.71,242.0,23.01,1.135102e+09,4.70
8,-72.842383,44.785350,19,Other,Other,2005-06-09,601.0,VT 36,036,0.40,...,52.34,1.118304e+09,1.118290e+09,9.0,1.118333e+09,5.25,187.0,13.93,1.118354e+09,1.19
9,-72.515176,44.172910,21,Deer RK AD,"Deer, RoadKill, Adult Doe",2004-06-14,1202.0,VT 63,063,2.50,...,59.50,1.087207e+09,1.087186e+09,9.0,1.087236e+09,10.00,187.0,16.60,1.087240e+09,2.81
