## Configure the notebook

This is done in this notebook in addition, so the notebook can run without being called from the main
notebook

- Load file with secrets
- Set some constants
- Load libraries

- Load variables from the main notebook from file, this is not needed if the notbook is running from inside the main notebook

The API secrets are not pushed to github. This is handled by placing them in a file named secrets and removing version control from that file.
Since the file is not in the same directory as the notebooks extra code is needed to add the path to the sys search path

The secrets file containts two variables used in the Foursquare API: CLIENT_ID and CLIENT_SECRET
It contains two variables used int the travel time API: APP_ID and API_KEY

In [1]:
import os
project_folder_path = os.path.dirname(os.getcwd())
project_folder_path
import sys
sys.path.insert(0, project_folder_path)

import secrets
print('secrets.py imported')

secrets.py imported


The data path contains data loaded from the net. The sanbox accounts used to load the data have limits on the number of requests. Storing the results allows for restarting the kernal without having to make new calls to the API.

In [2]:
DATA_PATH = project_folder_path + '/data/external/'
print('Data path is :{}'.format(DATA_PATH))

Data path is :/Users/danielhaugstvedt/Developer/coursera_capstone/data/external/


Import the standard libraries

In [3]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
import json
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

import folium # map rendering library

import pickle # needed to store variables

import time # Try to not overload the traveltime API

print('Libraries imported.')

Libraries imported.


In [4]:
file_name = 'neighborhoods_final.p'
with open(DATA_PATH + file_name, 'rb') as infile:
    neighborhoods_final = pickle.load(infile)

In [5]:
file_name = 'ny_venues_final.p'
with open(DATA_PATH + file_name, 'rb') as infile:
    ny_venues_final = pickle.load(infile)

## Get the travel time between the different neighborhoods

Using the traveltime API, get the travel time between the different neighborhoods

**Post request example**

```
POST /v4/time-filter HTTP/1.1
Host: api.traveltimeapp.com
Content-Type: application/json
Accept: application/json
X-Application-Id: APP_ID
X-Api-Key: API_KEY
```

Convert the information in the post request example to URI and header

In [6]:
URI_traveltime = 'HTTPS://api.traveltimeapp.com/v4/time-filter'
headers = {'Host': 'api.traveltimeapp.com',
           'Content-Type': 'application/json', 
           'Accept': 'application/json', 
           'X-Application-Id': secrets.APP_ID, # APP ID is in separate file not under version control
           'X-Api-Key': secrets.API_KEY} # API_KEY is in separate file not under version control

Fuction that adds locations to the json request

In [7]:
def add_locations(locations, latitudes, longitudes):
    
    json_request = {'locations': [], 'departure_searches':[], 'arrival_searches':[]}
    
    for loc, lat, long in zip(locations, latitudes, longitudes):
        json_request['locations'].append({
            'id': loc,
            'coords': {
                'lat': lat, 
                'lng': long
            }
        })
    
    return json_request

Add a searches to the json request

In [8]:
def add_search(json_request, from_location, travel_time, departure_time):
    
    arrival_locations = []
    for location_dic in json_request['locations']:
        arrival_locations.append(location_dic['id'])
        
    del arrival_locations[arrival_locations.index(from_location)]
    
    json_request['departure_searches'].append({
        'id': 'from {}'.format(from_location),
        'departure_location_id': from_location,
        'arrival_location_ids': arrival_locations,
        'transportation': {'type': 'public_transport'}, 
        'departure_time' : departure_time,
        'travel_time': travel_time,
        'range': {
                'enabled': True,
                'max_results': 3,
                'width': 600
            },
        'properties': ['travel_time']
    })
    
    return json_request

Make three vectors

- limits on API is 2000 arival locations
- limits on API is 10 searches

In [9]:
locations = neighborhoods_final.loc[:, 'Neighborhood']
latitudes = neighborhoods_final.loc[:, 'Latitude']
longitudes = neighborhoods_final.loc[:, 'Longitude']
print('We need less than 2000 locations. The number we are using is {}'.format(len(locations)))

We need less than 2000 locations. The number we are using is 262


Test if the file are stored, if not load the the data from the API

When using the API, loop through all locations and do a post request where 
the location is the origin of the search.

In [10]:
locations_subset = locations[0:150]
file_name = 'traveltime_0_149.p'

try:
    print('Trying to open file: {}'.format(DATA_PATH + file_name))
    with open(DATA_PATH + file_name, 'rb') as infile:
        results = pickle.load(infile)
        print('Sucess loading from file')
except FileNotFoundError:
    print('Failed to load from local file, trying to load from web')
    results = []
    for location in locations_subset:
        json_request_setup = add_locations(locations, latitudes, longitudes)
        json_request = add_search(json_request_setup, location, 60*60*1.5, '2019-01-11T13:00:00Z')
        
        # The fallback should not be used if you are not really sure 
        #result = requests.post(URI_traveltime, headers=headers, json=json_request)
        print('{} request returned with status code: {} and reason: {}'.format(location, result.status_code, result.reason))
        if result.status_code != 200:
            break
            
        json_result = result.json()
        results.append(json_result['results'])
        #time.sleep(60) # sleep 60 s, this is used when doing many queries. The api can get overloaded.
        
    with open(DATA_PATH + file_name, 'wb') as outfile:
        pickle.dump(results, outfile)
results_0_149 = results
print(len(locations_subset))
print(len(results_0_149))

Trying to open file: /Users/danielhaugstvedt/Developer/coursera_capstone/data/external/traveltime_0_149.p
Sucess loading from file
150
150


In [11]:
locations_subset = locations[150:193]
file_name = 'traveltime_150_192.p'

try:
    print('Trying to open file: {}'.format(DATA_PATH + file_name))
    with open(DATA_PATH + file_name, 'rb') as infile:
        results = pickle.load(infile)
        print('Sucess loading from file')
except FileNotFoundError:
    print('Failed to load from local file, trying to load from web')
    results = []
    for location in locations_subset:
        json_request_setup = add_locations(locations, latitudes, longitudes)
        json_request = add_search(json_request_setup, location, 60*60*1.5, '2019-01-11T13:00:00Z')
        
        # The fallback should not be used if you are not really sure 
        #result = requests.post(URI_traveltime, headers=headers, json=json_request)
        print('{} request returned with status code: {} and reason: {}'.format(location, result.status_code, result.reason))
        if result.status_code != 200:
            break
            
        json_result = result.json()
        results.append(json_result['results'])
        #time.sleep(60) # sleep 60 s, this is used when doing many queries. The api can get overloaded.
        
    with open(DATA_PATH + file_name, 'wb') as outfile:
        pickle.dump(results, outfile)
        
results_150_192 = results
print(len(locations_subset))
print(len(results_150_192))

Trying to open file: /Users/danielhaugstvedt/Developer/coursera_capstone/data/external/traveltime_150_192.p
Sucess loading from file
43
43


In [12]:
locations_subset = locations[193:261]
file_name = 'traveltime_193_260.p'

try:
    print('Trying to open file: {}'.format(DATA_PATH + file_name))
    with open(DATA_PATH + file_name, 'rb') as infile:
        results = pickle.load(infile)
        print('Sucess loading from file')
except FileNotFoundError:
    print('Failed to load from local file, trying to load from web')
    results = []
    for location in locations_subset:
        json_request_setup = add_locations(locations, latitudes, longitudes)
        json_request = add_search(json_request_setup, location, 60*60*1.5, '2019-01-11T13:00:00Z')
        
        result = requests.post(URI_traveltime, headers=headers, json=json_request)
        print('{} request returned with status code: {} and reason: {}'.format(location, result.status_code, result.reason))
        if result.status_code != 200:
            break
            
        json_result = result.json()
        results.append(json_result['results'])
        time.sleep(60) # sleep 60 s, this is used when doing many queries. The api can get overloaded.
        
    with open(DATA_PATH + file_name, 'wb') as outfile:
        pickle.dump(results, outfile)
        
results_193_260 = results
print(len(locations_subset))
print(len(results_193_260))

Trying to open file: /Users/danielhaugstvedt/Developer/coursera_capstone/data/external/traveltime_193_260.p
Sucess loading from file
68
68


In [13]:
locations_subset = locations[261:]
file_name = 'traveltime_261.p'

try:
    print('Trying to open file: {}'.format(DATA_PATH + file_name))
    with open(DATA_PATH + file_name, 'rb') as infile:
        results = pickle.load(infile)
        print('Sucess loading from file')
except FileNotFoundError:
    print('Failed to load from local file, trying to load from web')
    results = []
    for location in locations_subset:
        json_request_setup = add_locations(locations, latitudes, longitudes)
        json_request = add_search(json_request_setup, location, 60*60*1.5, '2019-01-11T13:00:00Z')
        
        result = requests.post(URI_traveltime, headers=headers, json=json_request)
        print('{} request returned with status code: {} and reason: {}'.format(location, result.status_code, result.reason))
        if result.status_code != 200:
            break
            
        json_result = result.json()
        results.append(json_result['results'])
        time.sleep(60) # sleep 60 s, this is used when doing many queries. The api can get overloaded.
        
    with open(DATA_PATH + file_name, 'wb') as outfile:
        pickle.dump(results, outfile)
        
results_261 = results
print(len(locations_subset))

Trying to open file: /Users/danielhaugstvedt/Developer/coursera_capstone/data/external/traveltime_261.p
Sucess loading from file
1


In [14]:
len(results_0_149) + len(results_150_192) + len(results_193_260) + len(results_261)

262

## Make the travel time into json into a data frame

In [15]:
travel_time_df = (pd.DataFrame(index=locations, columns=locations) # Data frame with NaNs
     .fillna(0))

for result in results_0_149:
    result = result[0] # get the dictionary from the array
    from_location = ' '.join(result['search_id'].split(" ")[1:])
    for to_location_dic in result['locations']:
        to_location = to_location_dic['id']
        travel_time = to_location_dic['properties'][0]['travel_time']
        travel_time_df.loc[from_location, to_location] = travel_time / (60*60) # Hours
        
for result in results_150_192:
    result = result[0] # get the dictionary from the array
    from_location = ' '.join(result['search_id'].split(" ")[1:])
    for to_location_dic in result['locations']:
        to_location = to_location_dic['id']
        travel_time = to_location_dic['properties'][0]['travel_time']
        travel_time_df.loc[from_location, to_location] = travel_time / (60*60) # Hours
        
for result in results_193_260:
    result = result[0] # get the dictionary from the array
    from_location = ' '.join(result['search_id'].split(" ")[1:])
    for to_location_dic in result['locations']:
        to_location = to_location_dic['id']
        travel_time = to_location_dic['properties'][0]['travel_time']
        travel_time_df.loc[from_location, to_location] = travel_time / (60*60) # Hours
        
for result in results_261:
    result = result[0] # get the dictionary from the array
    from_location = ' '.join(result['search_id'].split(" ")[1:])
    for to_location_dic in result['locations']:
        to_location = to_location_dic['id']
        travel_time = to_location_dic['properties'][0]['travel_time']
        travel_time_df.loc[from_location, to_location] = travel_time / (60*60) # Hours
        

In [16]:
sum(travel_time_df.sum(axis=0)==0)

53

## Clean up values you do not want to pass back to the main notebook

In [17]:
del results_261
del results_193_260
del results_150_192
del results_0_149
del results
del locations_subset
del URI_traveltime
del headers
del to_location      
del to_location_dic
del travel_time 
del from_location