<h1 align=center><font size = 5>Battle of Neighborhoods</font></h1>
<h1 align=center><font size = 5>New York vs Toronto</font></h1>

**Introduction**

New York city and city of Toronto are very diverse and are the financial capitals of their respective countries. Both cities have very huge population and traffic. **Gold's Gym** International, Inc. is an American chain of international co-ed fitness is planning to open a gym in any of the New York and Toronto cities. 
 Comparing both the cities and their neighbourhoods to find the city with a smaller number of gyms would be helpful for setting up new gym in any one of the cities. This project aims at comparing the New York city and Toronto, to find the best city to set up **Gold’s Gym**.



Before we get the data and start exploring it, let's download all the dependencies that we will need.


In [79]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported.


## 1. Download and Explore Dataset

## 1.1 New York Dataset

Neighborhood has a total of 5 boroughs and 306 neighborhoods. In order to segement the neighborhoods and explore them, we will essentially need a dataset that contains the 5 boroughs and the neighborhoods that exist in each borough as well as the the latitude and logitude coordinates of each neighborhood. 

In [80]:
!wget -q -O 'newyork_data.json' https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json
print('Data downloaded!')

Data downloaded!


In [81]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [82]:
neighborhoods_data = newyork_data['features']

In [83]:
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

#### Tranform the data into a _pandas_ dataframe

In [84]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


In [85]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
    

In [86]:
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585
5,Bronx,Kingsbridge,40.881687,-73.902818
6,Manhattan,Marble Hill,40.876551,-73.91066
7,Bronx,Woodlawn,40.898273,-73.867315
8,Bronx,Norwood,40.877224,-73.879391
9,Bronx,Williamsbridge,40.881039,-73.857446


In [87]:
print('The New York dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The New York dataframe has 5 boroughs and 306 neighborhoods.


## 1. Toronto Dataset

In [88]:
!pip install beautifulsoup4
!pip install lxml

import requests # library to handle requests
from bs4 import BeautifulSoup

import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner



##### Scraping the Wikipedia page for the table of postal codes of Canada
##### BeautifulSoup Library of Python is used for web scraping of table from the Wikipedia. The title of the webpage is printed to check if the page has been scraped successfully or not. Then the table of postal codes of Canada is printed.

In [89]:
source_wiki = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text

soup = BeautifulSoup(source_wiki,'lxml')
print(soup.title)

from IPython.display import display_html
tab = str(soup.table)
display_html(tab,raw=True)

<title>List of postal codes of Canada: M - Wikipedia</title>


Postal Code,Borough,Neighbourhood
M1A,Not assigned,Not assigned
M2A,Not assigned,Not assigned
M3A,North York,Parkwoods
M4A,North York,Victoria Village
M5A,Downtown Toronto,"Regent Park, Harbourfront"
M6A,North York,"Lawrence Manor, Lawrence Heights"
M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
M8A,Not assigned,Not assigned
M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
M1B,Scarborough,"Malvern, Rouge"


In [90]:
dfs = pd.read_html(tab)
df=dfs[0]
df.rename(columns={'Postal Code':'Postcode'},inplace=True)
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [91]:
# Dropping the rows where Borough is 'Not assigned'
df1 = df[df.Borough != 'Not assigned']

# Combining the neighbourhoods with same Postalcode
df2 = df1.groupby(['Postcode','Borough'], sort=False).agg(', '.join)
df2.reset_index(inplace=True)

# Replacing the name of the neighbourhoods which are 'Not assigned' with names of Borough
df2['Neighbourhood'] = np.where(df2['Neighbourhood'] == 'Not assigned',df2['Borough'], df2['Neighbourhood'])

df2

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [92]:
lat_lon = pd.read_csv('https://cocl.us/Geospatial_data')
lat_lon.rename(columns={'Postal Code':'Postcode'},inplace=True)
df3 = pd.merge(df2,lat_lon,on='Postcode')

In [93]:
df3

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


In [94]:
print('The Toronto dataframe has {} boroughs and {} neighborhoods.'.format(
        len(df3['Borough'].unique()),
        df3.shape[0]
    )
)

The Toronto dataframe has 10 boroughs and 103 neighborhoods.


## The New York dataframe has 5 boroughs and 306 neighborhoods.
## The Toronto dataframe has 10 boroughs and 103 neighborhoods.


##### Lets check out the coordinates of New York and Toronto

In [95]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude1 = location.latitude
longitude1 = location.longitude
print('The geograpical coordinate of Toronto City are {}, {}.'.format(latitude1, longitude1))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.
The geograpical coordinate of Toronto City are 43.6534817, -79.3839347.


## Now we load the Maps and their neighbourhoods for both the cities 

In [96]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork



In [97]:
map_toronto = folium.Map(location=[latitude1,longitude1],zoom_start=10)

for lat,lng,borough,neighbourhood in zip(df3['Latitude'],df3['Longitude'],df3['Borough'],df3['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [lat,lng],
    radius=5,
    popup=label,
    color='blue',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(map_toronto)
map_toronto

## Using Four Square API

In [98]:
# type your answer here
urlny = 'https://api.foursquare.com/v2/venues/explore?&client_id=BDWNVVEV5WKNK3P0DXIBX5IGDT4WANBVEDBWB0MLTE41BRTY&client_secret=K2AQNDDH3YCQ3SS5QFPJVKIQ2F0KMME2MVSKQVGNAME3CCZN&v=20180605&ll=40.7127281,-74.0060152&radius=500&limit=100'


In [99]:
# type your answer here
urltr = 'https://api.foursquare.com/v2/venues/explore?&client_id=BDWNVVEV5WKNK3P0DXIBX5IGDT4WANBVEDBWB0MLTE41BRTY&client_secret=K2AQNDDH3YCQ3SS5QFPJVKIQ2F0KMME2MVSKQVGNAME3CCZN&v=20180605&ll=43.6534817,-79.3839347&radius=500&limit=100'


In [100]:
resultsny = requests.get(urlny).json()
resultsny

{'meta': {'code': 200, 'requestId': '60047f4c720fd13177af2a26'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'},
    {'name': '$-$$$$', 'key': 'price'}]},
  'headerLocation': 'Downtown Manhattan',
  'headerFullLocation': 'Downtown Manhattan, New York',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 113,
  'suggestedBounds': {'ne': {'lat': 40.7172281045, 'lng': -74.00008952063419},
   'sw': {'lat': 40.7082280955, 'lng': -74.0119408793658}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '57f0689d498e7d49d9189369',
       'name': 'The Bar Room at Temple Court',
       'location': {'address': '123 Nassau St',
        'lat': 40.7114477287544,
        'lng': -74.00680157032005,
        'labe

In [101]:
resultstr = requests.get(urltr).json()
resultstr

{'meta': {'code': 200, 'requestId': '60047f4d0070522c68a4b0b5'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Bay Street Corridor',
  'headerFullLocation': 'Bay Street Corridor, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 72,
  'suggestedBounds': {'ne': {'lat': 43.6579817045, 'lng': -79.37772678059432},
   'sw': {'lat': 43.6489816955, 'lng': -79.39014261940568}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '5227bb01498e17bf485e6202',
       'name': 'Downtown Toronto',
       'location': {'lat': 43.65323167517444,
        'lng': -79.38529600606677,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.65323167517444,
          'lng'

In [102]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [103]:
venuesny = resultsny['response']['groups'][0]['items']
    
nearby_venuesny = json_normalize(venuesny) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venuesny =nearby_venuesny.loc[:, filtered_columns]

# filter the category for each row
nearby_venuesny['venue.categories'] = nearby_venuesny.apply(get_category_type, axis=1)

# clean columns
nearby_venuesny.columns = [col.split(".")[-1] for col in nearby_venuesny.columns]

nearby_venuesny

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,The Bar Room at Temple Court,Hotel Bar,40.711448,-74.006802
1,"The Beekman, A Thompson Hotel",Hotel,40.711173,-74.006702
2,Alba Dry Cleaner & Tailor,Laundry Service,40.711434,-74.006272
3,City Hall Park,Park,40.711893,-74.007792
4,Gibney Dance Center Downtown,Dance Studio,40.713923,-74.005661
5,The Wooly Daily,Coffee Shop,40.712137,-74.008395
6,The Class by Taryn Toomey,Gym / Fitness Center,40.712753,-74.008734
7,CrossFit 212 TriBeCa,Gym,40.714537,-74.005999
8,Takahachi Bakery,Bakery,40.713653,-74.008804
9,Pisillo Italian Panini,Sandwich Place,40.71053,-74.007526


In [104]:
venuestr = resultstr['response']['groups'][0]['items']
    
nearby_venuestr = json_normalize(venuestr) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venuestr =nearby_venuestr.loc[:, filtered_columns]

# filter the category for each row
nearby_venuestr['venue.categories'] = nearby_venuestr.apply(get_category_type, axis=1)

# clean columns
nearby_venuestr.columns = [col.split(".")[-1] for col in nearby_venuestr.columns]

nearby_venuestr

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,Downtown Toronto,Neighborhood,43.653232,-79.385296
1,Nathan Phillips Square,Plaza,43.65227,-79.383516
2,Japango,Sushi Restaurant,43.655268,-79.385165
3,Poke Guys,Poke Place,43.654895,-79.385052
4,Indigo,Bookstore,43.653515,-79.380696
5,Chatime 日出茶太,Bubble Tea Shop,43.655542,-79.384684
6,Textile Museum of Canada,Art Museum,43.654396,-79.3865
7,CF Toronto Eaton Centre,Shopping Mall,43.654447,-79.380952
8,Old City Hall,Monument / Landmark,43.652009,-79.381744
9,LUSH,Cosmetics Shop,43.653557,-79.3804


In [105]:
print('{} venues were returned by Foursquare for New York.'.format(nearby_venuesny.shape[0]))
print('{} venues were returned by Foursquare for Toronto.'.format(nearby_venuestr.shape[0]))

100 venues were returned by Foursquare for New York.
72 venues were returned by Foursquare for Toronto.


## 100 venues were returned by Foursquare for New York city.
## 72 venues were returned by Foursquare for Toronto city.
## But which city is best for setting up new Gold's Gym ? 

In [109]:
nearby_venuesny = nearby_venuesny.loc[nearby_venuesny['categories'] == 'Gym']
print ("Gyms in New York")
nearby_venuesny


Gyms in New York


Unnamed: 0,name,categories,lat,lng
7,CrossFit 212 TriBeCa,Gym,40.714537,-74.005999
21,Equinox Tribeca,Gym,40.714099,-74.009686
48,New York by Gehry Gym,Gym,40.710655,-74.005709


In [110]:
nearby_venuestr = nearby_venuestr.loc[nearby_venuestr['categories'] == 'Gym']
print ('Gyms in Toronto')
nearby_venuestr

Gyms in Toronto


Unnamed: 0,name,categories,lat,lng


In [111]:
print('{} Gyms in New York city around 500m radius.'.format(nearby_venuesny.shape[0]))
print('{} Gyms centres in Toronto city around 500m radius.'.format(nearby_venuestr.shape[0]))

3 Gyms in New York city around 500m radius.
0 Gyms centres in Toronto city around 500m radius.


### 3 Gyms in New York city around 500m radius.
### 0 Gyms centres in Toronto city around 500m radius.

## So Toronto is the best city to set up the new Gold's Gym as there are no gyms in the city around 500m radius.