## Cultural Scene Comparison

From a short list of preferred locations, in which city should a small tour-guide company that focuses on museums and cultural arts centers open a new office?

### Table of Contents

1. Get neighborhoods for selected cities and format
1.  

---

- Import libraries

In [282]:
import pandas as pd
import numpy as np

import requests
import api_keys

import json
import re

from geopy.geocoders import Nominatim


---

## (1) Get neighborhoods for selected cities and format

---

[A] Preferred cities

In [166]:
preferred_cities = [
    'New York, NY',
    'Toronto, Canada',
    'Paris, France',
    'London, UK'
]

---

[B] Function to retrieve coordinates from given address

In [167]:
def get_coordinates_to_df(lookup_series, city):
    lat_list = []
    lon_list = []
    
    for lookup in lookup_series:
        
        rough_address = city+' '+lookup
        
        geolocator = Nominatim(user_agent="mapper")
        location = geolocator.geocode(rough_address)
        
        try:
            lat_list.append(location.latitude)
            lon_list.append(location.longitude)  
        except:
            print('NO ADDRESS RETURNED:', rough_address)
            lat_list.append(np.nan)
            lon_list.append(np.nan)
    
    print('Returned tuple of latitude and longitude lists in {}'.format(city))
    
    return lat_list, lon_list

---

[C] Construct dataframes of city neighborhoods and their coordinates

- **New York**; sourced from Coursera

In [148]:
with open('newyork_data.json', 'r') as file:
    new_york_data = json.load(file)

In [236]:
#normalize json into df & drop_duplicates in 'properties.name'
new_york = pd.json_normalize(new_york_data['features']).drop_duplicates(subset='properties.name')

#add lat/lon columns
new_york['Longitude'] = [pair[0] for pair in new_york['geometry.coordinates']]
new_york['Latitude'] = [pair[1] for pair in new_york['geometry.coordinates']]

#keep only names/boroughs/coordinates & rename columns
new_york = new_york[['properties.name', 'properties.borough', 'Latitude', 'Longitude']].reset_index(drop=True)
new_york.columns = ['Neighborhood', 'Borough', 'Latitude', 'Longitude']

In [237]:
new_york.head()

Unnamed: 0,Neighborhood,Borough,Latitude,Longitude
0,Wakefield,Bronx,40.894705,-73.847201
1,Co-op City,Bronx,40.874294,-73.829939
2,Eastchester,Bronx,40.887556,-73.827806
3,Fieldston,Bronx,40.895437,-73.905643
4,Riverdale,Bronx,40.890834,-73.912585


- **Paris, France**; sourced from Wikipedia

In [221]:
paris = pd.read_html('https://en.wikipedia.org/wiki/Arrondissements_of_Paris')[2]

In [238]:
#for elements in 'Name' with comma separation, take the last element
paris['Name'] = paris['Name'].apply(lambda x: x.split(',')[-1])

In [241]:
#call function get_coordinates_to_df()
paris_coordinates = get_coordinates_to_df(paris['Name'], 'Paris')

Returned tuple of latitude and longitude lists in Paris


In [242]:
#add coordinates to df
paris['Latitude'] = paris_coordinates[0]
paris['Longitude'] = paris_coordinates[1]

In [275]:
paris = paris[['Name', 'Latitude', 'Longitude']].reset_index(drop=True)
paris.rename(columns={'Name': 'Neighborhood'}, inplace=True)

In [276]:
paris.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Hôtel-de-Ville,48.856821,2.352301
1,Panthéon,48.846191,2.346079
2,Luxembourg,49.504314,6.279185
3,Palais-Bourbon,48.86092,2.318035
4,Élysée,48.846644,2.36983


- **London, UK**; sourced from Wikipedia

In [340]:
london = pd.read_html('https://en.wikipedia.org/wiki/List_of_areas_of_London')[1]

In [341]:
#rename 'Postcode district' to fix format
london.rename(columns={
    'Postcode\xa0district': 'Postcode district', 
    'London\xa0borough': 'London borough'
}, inplace=True)

#take 'Post town' LONDON only & drop_duplicates under 'Location'
london = london[london['Post town'] == 'LONDON'].drop_duplicates(subset='Location')

#for elements in 'Location' + ''London borough'' with ' (also' in name, split & take the first element
london['Location'] = london['Location'].apply(lambda x: x.split(' (also')[0])

In [342]:
london.head(2)

Unnamed: 0,Location,London borough,Post town,Postcode district,Dial code,OS grid ref
0,Abbey Wood,"Bexley, Greenwich [7]",LONDON,SE2,20,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",20,TQ205805


In [343]:
#call function get_coordinates_to_df()
london_coordinates = get_coordinates_to_df(london['Location'], 'London')

NO ADDRESS RETURNED: London Somerstown
Returned tuple of latitude and longitude lists in London


In [361]:
#add coordinates to df
london['Latitude'] = london_coordinates[0]
london['Longitude'] = london_coordinates[1]
london.dropna(subset=['Latitude', 'Longitude'], axis=0, inplace=True)

In [363]:
#get relevant columns & rename/format
london = london[['Location', 'London borough', 'Latitude', 'Longitude']].reset_index(drop=True)
london.columns = ['Neighborhood', 'Boroughs', 'Latitude', 'Longitude']

In [364]:
#drop footnote number from 'Boroughs'
repl = re.compile(r"\[\d*]")
london['Boroughs'] = [repl.sub('', name) for name in london['Boroughs']]

In [365]:
london.head()

Unnamed: 0,Neighborhood,Boroughs,Latitude,Longitude
0,Abbey Wood,"Bexley, Greenwich",51.487621,0.11405
1,Acton,"Ealing, Hammersmith and Fulham",51.50814,-0.273261
2,Aldgate,City,51.514248,-0.075719
3,Aldwych,Westminster,51.513131,-0.117593
4,Anerley,Bromley,51.407599,-0.061939


- **Toronto, Canada**; sourced from Wikipedia

In [369]:
toronto = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]

In [370]:
#ignoring rows 'Not assigned'
toronto = toronto[~(toronto['Borough'] == 'Not assigned')]

#for elements in 'Neighbourhood' with ', ' separating >1 name, split & take the last element
toronto['Neighbourhood'] = toronto['Neighbourhood'].apply(lambda x: x.split(', ')[-1])

In [379]:
#call function get_coordinates_to_df()
toronto_coordinates = get_coordinates_to_df(toronto['Neighbourhood'], 'Toronto')

Returned tuple of latitude and longitude lists in Toronto


In [380]:
#add coordinates to df
toronto['Latitude'] = toronto_coordinates[0]
toronto['Longitude'] = toronto_coordinates[1]

In [381]:
#drop any nans & rename col
toronto.dropna(subset=['Latitude', 'Longitude'], axis=0, inplace=True)
toronto.rename(columns={'Neighbourhood': 'Neighborhood'}, inplace=True)

In [384]:
#get relevant data
toronto = toronto[['Neighborhood', 'Borough', 'Latitude', 'Longitude']]
toronto.head()

Unnamed: 0,Neighborhood,Borough,Latitude,Longitude
2,Parkwoods,North York,43.761124,-79.324059
3,Victoria Village,North York,43.732658,-79.311189
4,Harbourfront,Downtown Toronto,43.64008,-79.38015
5,Lawrence Heights,North York,43.722778,-79.450933
8,Humber Valley Village,Etobicoke,43.666472,-79.524314


---

## (2) Access FourSquare's API

In [403]:
#set FourSquare credentials
CLIENT_ID = api_keys.CLIENT_ID
CLIENT_SECRET = api_keys.CLIENT_SECRET
VERSION = '20180605'
LIMIT = 100
radius = 500

In [398]:
#function takes in lists of names/coordinates and returns pd.DataFrame of FourSquare query
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()   #["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
#         venues_list.append([(
#             name, 
#             lat, 
#             lng, 
#             v['venue']['name'], 
#             v['venue']['location']['lat'], 
#             v['venue']['location']['lng'],  
#             v['venue']['categories'][0]['name']) for v in results])

#     nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
#     nearby_venues.columns = ['Neighborhood', 
#                   'Neighborhood Latitude', 
#                   'Neighborhood Longitude', 
#                   'Venue', 
#                   'Venue Latitude', 
#                   'Venue Longitude', 
#                   'Venue Category']
    
    return results #(nearby_venues)

In [404]:
cities = ['Paris']# 'NewYork']#, 'Paris', 'London', 'Toronto']
city_dfs = [paris] #new_york]#, paris, london, toronto]

In [405]:
four_sqr_queries = {city: getNearbyVenues(city_df['Neighborhood'], 
                                          city_df['Latitude'], 
                                          city_df['Longitude']) for city, city_df in zip(cities, city_dfs)}

In [406]:
four_sqr_queries

{'Paris': {'meta': {'code': 429,
   'errorType': 'quota_exceeded',
   'errorDetail': 'Quota exceeded',
   'requestId': '5fbdb1b0050c132220a82d9c'},
  'response': {}}}