# The best place to rent

Hill: A person visiting Manhattan can find the best neigbourhood for their needs without needing to do research

Using foursquare and travel time this project will group neighboorhods based on venues in that neighboorhod and venues in neighboorhods that are close in travel time

## Configure the notebook

- Load file with secrets
- Set some constants
- Load libraries

The API secrets are not pushed to github. This is handled by placing them in a file named secrets and removing version control from that file.
Since the file is not in the same directory as the notebooks extra code is needed to add the path to the sys search path

The secrets file containts two variables used in the Foursquare API: CLIENT_ID and CLIENT_SECRET
It contains two variables used int the travel time API: APP_ID and API_KEY

In [1]:
import os
project_folder_path = os.path.dirname(os.getcwd())
project_folder_path
import sys
sys.path.insert(0, project_folder_path)

import secrets
print('secrets.py imported')

secrets.py imported


The data path contains data loaded from the net. The sanbox accounts used to load the data have limits on the number of requests. Storing the results allows for restarting the kernal without having to make new calls to the API.

In [2]:
DATA_PATH = project_folder_path + '/data/external/'
print('Data path is :{}'.format(DATA_PATH))

Data path is :/Users/danielhaugstvedt/Developer/coursera_capstone/data/external/


Import the standard libraries

In [3]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
import json
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

import pickle # needed to store variables

print('Libraries imported.')

Libraries imported.


## Get neighborhoods in New York which are not in Manhattan 

The dataset with neighborhoods for New York exists for free on the [web](https://geo.nyu.edu/catalog/nyu_2451_34572)

The information on the webpage where the download link is located gave this information for the element to download a geojson file
```
<a class="btn btn-primary btn-block download download-generated" 
   data-download-path="/download/nyu-2451-34572?type=geojson" 
    data-download="trigger" 
    data-download-type="geojson" 
    data-download-id="nyu-2451-34572" href="">Download
</a>
```

Using this information i ran a request. The information from running a request to the URI: `https://geo.nyu.edu/download/nyu-2451-34572?type=geojson` was:

`[['success',
  '<a data-download="trigger" data-download-id="nyu-2451-34572" data-download-type="generated-geojson" href="/download/file/nyu-2451-34572-geojson.json">Your file nyu-2451-34572-geojson.json is ready for download</a>']]`
  
I used this information to run the second request:

In [4]:
file_name = 'nyu-2451-34572-geojson.json'
URI_NY = 'https://geo.nyu.edu/download/file/nyu-2451-34572-geojson.json'

try: 
    print('Trying to open file: {}'.format(DATA_PATH + file_name))
    with open(DATA_PATH + file_name, 'r') as infile:
        ny_json = json.load(infile)
except FileNotFoundError:
    print('Failed to load json from local file, loading from web')
    response_ny = requests.get(URI_NY)
    print(response_ny.status_code, response_ny.reason)
    
    ny_json = response_ny.json()
    with open(DATA_PATH + file_name, 'w') as outfile:
        json.dump(ny_json, outfile)

print('Total number of features in geojsjon: {}'.format(ny_json['totalFeatures']))

Trying to open file: /Users/danielhaugstvedt/Developer/coursera_capstone/data/external/nyu-2451-34572-geojson.json
Total number of features in geojsjon: 306


Turn the json file into  a data frame

In [5]:
neighborhoods_data = ny_json['features']

# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods_ny = pd.DataFrame(columns=column_names)

for data in neighborhoods_data:
    borough = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods_ny = neighborhoods_ny.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [6]:
neighborhoods_ny.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [7]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods_ny['Borough'].unique()),
        neighborhoods_ny.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


Split the data in Manhattan and not Manhattan

In [8]:
neighborhoods_not_manhattan = (neighborhoods_ny.loc[neighborhoods_ny['Borough'] != 'Manhattan', 
                                     ['Neighborhood', 'Latitude', 'Longitude']]
                                 .reset_index(drop=True))
neighborhoods_manhattan = (neighborhoods_ny.loc[neighborhoods_ny['Borough'] != 'Manhattan', 
                                     ['Neighborhood', 'Latitude', 'Longitude']]
                                 .reset_index(drop=True))

## Get the foursquare data for Manhattan

Using the foursquare API, get the venues in New York

In [9]:
neighborhood_latitude = neighborhoods_not_manhattan['Latitude'] # neighborhood latitude value
neighborhood_longitude = neighborhoods_not_manhattan['Longitude'] # neighborhood longitude value
neighborhood_name = neighborhoods_not_manhattan['Neighborhood'] # neighborhood name

Set some global variables to use in the four square API


In [10]:
# We want the real number, not a relative number
# Thereforw we set the limit to be so high that there should never be any venues droped (this is tested)
LIMIT = 500 
VERSION = '20180605' 

Define a function for gettingt nearby venues

In [11]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            secrets.CLIENT_ID, 
            secrets.CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)

        results_object = requests.get(url)
        if results_object.status_code != 200:
            print('A request failed with status code: {} and reason: {}'.format(
                    results_object.status_code, 
                    results_object.reason))
            break
        
        results = results_object.json()["response"]['groups'][0]['items']
        print('{} has {} venues'.format(name, len(results)))
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    print('makeing a data frame and returning it ')
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Check for a file with the data Call the function for getting nearby venues

The distance used is calculated like this
- Average walking speed 5 km/h
- Converted to m/min this is 83.33
- A 5 min walk is **416 m**

In [13]:
radius = 416 # The four square limit parameter uses meters
file_name = 'ny_venues.json'
try:
    print('Trying to open file: {}'.format(DATA_PATH + file_name))
    with open(DATA_PATH + file_name, 'rb') as infile:
        ny_venues = pickle.load(infile)
        print('Sucess loading from file')
except FileNotFoundError:
    print('Failed to load json from local file, trying to load from web')
    ny_venues = getNearbyVenues(neighborhood_name, neighborhood_latitude, neighborhood_longitude, radius)
    with open(DATA_PATH + file_name, 'wb') as outfile:
        pickle.dump(ny_venues, outfile)

Trying to open file: /Users/danielhaugstvedt/Developer/coursera_capstone/data/external/ny_venues.json
Failed to load json from local file, trying to load from web
Wakefield has 3 venues
Co-op City has 12 venues
Eastchester has 19 venues
Fieldston has 3 venues
Riverdale has 6 venues
Kingsbridge has 49 venues
Woodlawn has 15 venues
Norwood has 23 venues
Williamsbridge has 5 venues
Baychester has 9 venues
Pelham Parkway has 18 venues
City Island has 23 venues
Bedford Park has 31 venues
University Heights has 19 venues
Morris Heights has 9 venues
Fordham has 65 venues
East Tremont has 14 venues
West Farms has 14 venues
High  Bridge has 20 venues
Melrose has 17 venues
Mott Haven has 24 venues
Port Morris has 13 venues
Longwood has 7 venues
Hunts Point has 9 venues
Morrisania has 18 venues
Soundview has 13 venues
Clason Point has 7 venues
Throgs Neck has 2 venues
Country Club has 5 venues
Parkchester has 22 venues
Westchester Square has 30 venues
Van Nest has 21 venues
Morris Park has 15 ven

In [20]:
print(ny_venues.shape)
ny_venues.head()

(4236, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Wakefield,40.894705,-73.847201,Lollipops Gelato,40.894123,-73.845892,Dessert Shop
1,Wakefield,40.894705,-73.847201,Rite Aid,40.896521,-73.84468,Pharmacy
2,Wakefield,40.894705,-73.847201,Pitman Deli,40.894149,-73.845748,Food
3,Co-op City,40.874294,-73.829939,Capri II Pizza,40.876374,-73.82994,Pizza Place
4,Co-op City,40.874294,-73.829939,Sleepy's Co-Op City,40.872234,-73.828607,Mattress Store


## Get the travel time between the different neighborhoods

Using the traveltime API, get the travel time between the different neighborhoods

**Post request example**

```
POST /v4/time-filter HTTP/1.1
Host: api.traveltimeapp.com
Content-Type: application/json
Accept: application/json
X-Application-Id: APP_ID
X-Api-Key: API_KEY
```

Convert the information in the post request example to URI and header

In [15]:
URI_traveltime = 'HTTPS://api.traveltimeapp.com/v4/time-filter'
headers = {'Host': 'api.traveltimeapp.com',
           'Content-Type': 'application/json', 
           'Accept': 'application/json', 
           'X-Application-Id': secrets.APP_ID, # APP ID is in separate file not under version control
           'X-Api-Key': secrets.API_KEY} # API_KEY is in separate file not under version control

**JSON data as raw string**

The json request is long, double clik to view

In [16]:
raw_json_text="""{
  "locations": [
    {
      "id": "London center",
      "coords": {
        "lat": 51.508930,
        "lng": -0.131387
      }
    },
    {
      "id": "Hyde Park",
      "coords": {
        "lat": 51.508824,
        "lng": -0.167093
      }
    },
    {
      "id": "ZSL London Zoo",
      "coords": {
        "lat": 51.536067,
        "lng": -0.153596
      }
    }
  ],
  "departure_searches": [
    {
      "id": "forward search example",
      "departure_location_id": "London center",
      "arrival_location_ids": [
        "Hyde Park",
        "ZSL London Zoo"
      ],
      "transportation": {
        "type": "bus"
      },
      "departure_time": "2019-01-11T08:00:00Z",
      "travel_time": 1800,
      "properties": [
        "travel_time"
      ],
      "range": {
        "enabled": true,
        "max_results": 3,
        "width": 600
      }
    }
  ],
  "arrival_searches": [
    {
      "id": "backward search example",
      "departure_location_ids": [
        "Hyde Park",
        "ZSL London Zoo"
      ],
      "arrival_location_id": "London center",
      "transportation": {
        "type": "public_transport"
      },
      "arrival_time": "2019-01-11T08:00:00Z",
      "travel_time": 1900,
      "properties": [
        "travel_time",
        "distance",
        "distance_breakdown",
        "fares"
      ]
    }
  ]
}"""

cleaned_json_text = (raw_json_text.replace('\n','')
                                  .replace(" ", ""))
data = json.loads(cleaned_json_text)
data

Make raw json string into json

Make the post request and check that it is OK

In [18]:
r = requests.post(URI_traveltime, headers=headers, json=data)

print(r.status_code, r.reason)

200 OK


**This is an example of the json data in the request**
Double click to see the example

```
{
  "locations": [
    {
      "id": "London center",
      "coords": {
        "lat": 51.508930,
        "lng": -0.131387
      }
    },
    {
      "id": "Hyde Park",
      "coords": {
        "lat": 51.508824,
        "lng": -0.167093
      }
    },
    {
      "id": "ZSL London Zoo",
      "coords": {
        "lat": 51.536067,
        "lng": -0.153596
      }
    }
  ],
  "departure_searches": [
    {
      "id": "forward search example",
      "departure_location_id": "London center",
      "arrival_location_ids": [
        "Hyde Park",
        "ZSL London Zoo"
      ],
      "transportation": {
        "type": "bus"
      },
      "departure_time": "2019-01-11T08:00:00Z",
      "travel_time": 1800,
      "properties": [
        "travel_time"
      ],
      "range": {
        "enabled": true,
        "max_results": 3,
        "width": 600
      }
    }
  ],
  "arrival_searches": [
    {
      "id": "backward search example",
      "departure_location_ids": [
        "Hyde Park",
        "ZSL London Zoo"
      ],
      "arrival_location_id": "London center",
      "transportation": {
        "type": "public_transport"
      },
      "arrival_time": "2019-01-11T08:00:00Z",
      "travel_time": 1900,
      "properties": [
        "travel_time",
        "distance",
        "distance_breakdown",
        "fares"
      ]
    }
  ]
}
```

In [19]:
r.json()

{'results': [{'search_id': 'backwardsearchexample',
   'locations': [{'id': 'HydePark',
     'properties': [{'travel_time': 1892,
       'distance': 0,
       'distance_breakdown': [{'mode': 'bus', 'distance': 2879},
        {'mode': 'walk', 'distance': 826}],
       'fares': {'breakdown': [{'modes': ['bus'],
          'route_part_ids': [6],
          'tickets': [{'type': 'single', 'price': 1.5, 'currency': 'GBP'}]},
         {'modes': ['bus'],
          'route_part_ids': [7],
          'tickets': [{'type': 'single', 'price': 1.5, 'currency': 'GBP'}]},
         {'modes': ['bus'],
          'route_part_ids': [5],
          'tickets': [{'type': 'single', 'price': 1.5, 'currency': 'GBP'}]},
         {'modes': ['bus'],
          'route_part_ids': [5, 7, 6],
          'tickets': [{'type': 'week', 'price': 21, 'currency': 'GBP'},
           {'type': 'month', 'price': 80.7, 'currency': 'GBP'},
           {'type': 'year', 'price': 840, 'currency': 'GBP'}]}],
        'tickets_total': [{'type': 

## Visualize the map 

Use the geocoding from travel time to get longetiude and altitude of manhattan

Use the longitude and latitude to make a map of manhattan

Add labels for every neigbouthoods into the manhattan map

In [None]:
query = 'Central Park, New York'
URI_traveltime = 'HTTPS://api.traveltimeapp.com/v4/geocoding/search?query={}'.format(query)
print('URI is : {}'.format(URI_traveltime))
headers = {'Host': 'api.traveltimeapp.com',
           'Accept': 'application/json', 
           'X-Application-Id': secrets.APP_ID, # APP ID is in separate file not under version control
           'X-Api-Key': secrets.API_KEY} # API_KEY is in separate file not under version control

In [None]:
r = requests.get(URI_traveltime, headers=headers)

print(r.status_code, r.reason)
r.url

In [None]:
r.json()

In [None]:
manhattan_location_data = r.json()['features'][0]
[longitude, latitude] = manhattan_location_data['geometry']['coordinates']

In [None]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_manhattan)  
map_manhattan