# The Battle of the Neighborhoods 


## Table of contents
* [Introduction](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results](#results)
* [Discussion](#results)
* [Conclusion](#conclusion)

## Introduction <a name="introduction"></a>

I will compare the Neighbourhoods of New York and Toronto for the availability of sports classes (gyms)
The problem: to determine similarity and (or) dissimilarity. 
I will:
* retrieve the top 100 venues that are in the Neighbourhoods of a city within a radius of 1000 meter; 
* explore, analyse and cluster the Neighbourhoods of the two cities; 
* use and compare the top ten venues around the Neighbourhood 
New York and Toronto are very diverse and are the financial capitals of their respective countries. 

I will use data science to determine how similar or dissimilar the Neighborhoods are of the two cities New York and Toronto and come up with findings. 


## Data <a name="data"></a>

Data with a description of the data that will be used to solve the problem and the source of the data
* For Toronto:
* https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, 
* - dataset containing the Borough, Neighbourhood with Latitude and Longitude for Toronto
* For New York:
* https://geo.nyu.edu/catalog/nyu_2451_34572 
* - dataset containing the Borough, Neighbourhood with Latitude and Longitude coordinates for New York city



In [1]:
! conda install -c conda-forge beautifulsoup4 --yes

import pandas as pd
import requests 

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - beautifulsoup4


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2019.9.11          |           py36_0         147 KB  conda-forge
    beautifulsoup4-4.8.1       |           py36_0         149 KB  conda-forge
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    ca-certificates-2019.9.11  |       hecc5488_0         144 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following packages will be UPDATED:

    beautifulsoup4:  4.7.1-py36_1      --> 4.8.1-py36_0         conda-forge
    certifi:         2019.9.11-py36_0  --> 2019.9.11-py36_0     conda-forge

The following packages will be DOWNGRADED:

    ca-certificates: 2019.10.

**Use the BeautifulSoup package for web scraping**

In [2]:
from urllib.request import urlopen
from bs4 import BeautifulSoup

def getHTMLContent(link):
    html = urlopen(link)
    soup = BeautifulSoup(html, 'html.parser')
    return soup

**Scrape the following Wikipedia page into a dataframe**

In [3]:
content = getHTMLContent('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
tables = content.find_all('table')

table = content.find('table', {'class': 'wikitable sortable'})
rows = table.find_all('tr')

data_content = []
for row in rows:
    cells = row.find_all('td')
    #Ignore cells with a borough that is Not assigned.
    if len(cells) > 1 and cells[1].get_text()!='Not assigned':
        country_info = [cell.text.strip('\n') for cell in cells]
        data_content.append(country_info)

dataset = pd.DataFrame(data_content)

# Define column headings
headers = rows[0].find_all('th')
headers = [header.get_text().strip('\n') for header in headers]
dataset.columns = headers


#If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough
for index, row in dataset.iterrows() :
    if row['Neighbourhood']== 'Not assigned':
        row['Neighbourhood']=row['Borough']
    
dataset = dataset.rename(columns={"Neighbourhood": "Neighborhood"})
dataformap =  dataset  


dataset.head(15)

Unnamed: 0,Postcode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights
5,M6A,North York,Lawrence Manor
6,M7A,Queen's Park,Queen's Park
7,M9A,Etobicoke,Islington Avenue
8,M1B,Scarborough,Rouge
9,M1B,Scarborough,Malvern


In [4]:
#use the .shape method to print the number of rows of your dataframe
print(dataset.shape)

(211, 3)


**Reading the Geospactial data into a dataframe**

In [5]:
geodata = pd.read_csv('https://cocl.us/Geospatial_data')
geodata.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


**Merging the Geospactial data into the Neighborhood dataframe**

In [6]:

geodata.rename(columns={'Postal Code':'Postcode'}, inplace=True)
datacluster = pd.merge(dataformap, geodata, on='Postcode', how='left')

neighborhoods=datacluster[datacluster['Borough'].str.contains("Toronto")]
neighborhoods.reset_index()
#neighborhoods

Unnamed: 0,index,Postcode,Borough,Neighborhood,Latitude,Longitude
0,2,M5A,Downtown Toronto,Harbourfront,43.654260,-79.360636
1,3,M5A,Downtown Toronto,Regent Park,43.654260,-79.360636
2,13,M5B,Downtown Toronto,Ryerson,43.657162,-79.378937
3,14,M5B,Downtown Toronto,Garden District,43.657162,-79.378937
4,27,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
5,36,M4E,East Toronto,The Beaches,43.676357,-79.293031
6,37,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
7,41,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
8,42,M6G,Downtown Toronto,Christie,43.669542,-79.422564
9,49,M5H,Downtown Toronto,Adelaide,43.650571,-79.384568


In [7]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 4 boroughs and 74 neighborhoods.


## Methodology <a name="methodology"></a>

## Methodology section 
represents the main component of the report with exploratory data analysis and inferential statistical testing that you performed, 
if any, and what machine learnings were used and why
The data of 100 venues near a Neighborhood using the Foursquare API. Then we will gather data for the Top ten most common venues for a Neighborhood. 
* Created the dataset for the two cities that has the Borough and the Neighborhoods. 
* Appended the data with the Latitude and Longitude values retrieved from the geo dataset. 
* Used the Foursquare API to retrieve the venues near the Neighborhoods, the top 10 most common venues for a Neighborhood. 
* Applied one-hot encoding and normalized of data of the venues. 
* Merged the data for the New York and Toronto cities 
* Applied K-means clustering algorithm. 
Examined the map and the data generated with cluster labels. 









## Analysis <a name="analysis"></a>

 **The following code is to explore and analyze neighborhoods for Toronto**

In [8]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          91 KB

The following NEW packages will be INSTALLED:

    geographiclib: 1.50-py_0   conda-forge
    geopy:         1.20.0-py_0 conda-forge


Downloading and Extracting Packages
geographiclib-1.50   | 34 KB     | ##################################### | 100% 
geopy-1.20.0         | 57 KB     | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done


ModuleNotFoundError: No module named 'folium'

In [None]:
address = 'Toronto City'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto City are {}, {}.'.format(latitude, longitude))

In [None]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [None]:
radius = 1000 # define radius
LIMIT = 100 # limit of number of venues returned by Foursquare API

In [None]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

### **Explore Neighborhoods of Toronto with FourSquare API Venues data**

In [None]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [None]:
toronto_venues = getNearbyVenues(names=neighborhoods['Neighborhood'],
                                   latitudes=neighborhoods['Latitude'],
                                   longitudes=neighborhoods['Longitude']
                                  )


In [None]:
print(toronto_venues.shape)
toronto_venues.head()

In [None]:
toronto_venues.groupby('Neighborhood').count()

In [None]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

 **Analyze Neighborhoods**

In [None]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

In [None]:
toronto_onehot.shape

In [None]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

In [None]:
toronto_grouped.shape

In [None]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [None]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

**Explore and analyze Neighborhood of New York**

In [None]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')




In [None]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
    
  

In [None]:
neighborhoods_data = newyork_data['features']  

In [None]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoodsny = pd.DataFrame(columns=column_names)



In [None]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoodsny = neighborhoodsny.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [None]:
neighborhoodsny.head()

In [None]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoodsny['Borough'].unique()),
        neighborhoodsny.shape[0]
    )
)

In [None]:
address = 'Manhattan, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

In [None]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoodsny['Latitude'], neighborhoodsny['Longitude'], neighborhoodsny['Borough'], neighborhoodsny['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

In [None]:
ny_venues = getNearbyVenues(names=neighborhoodsny['Neighborhood'],
                                   latitudes=neighborhoodsny['Latitude'],
                                   longitudes=neighborhoodsny['Longitude']
                                  )

In [None]:
print(ny_venues.shape)
ny_venues.head()

In [None]:
ny_venues.groupby('Neighborhood').count()

In [None]:
print('There are {} uniques categories.'.format(len(ny_venues['Venue Category'].unique())))

In [None]:
# one hot encoding
ny_onehot = pd.get_dummies(ny_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
ny_onehot['Neighborhood'] = ny_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [ny_onehot.columns[-1]] + list(ny_onehot.columns[:-1])
ny_onehot = ny_onehot[fixed_columns]

ny_onehot.head()

In [None]:
ny_onehot.shape

In [None]:
ny_grouped = ny_onehot.groupby('Neighborhood').mean().reset_index()
ny_grouped

In [None]:
ny_grouped.shape

In [None]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sortedny = pd.DataFrame(columns=columns)
neighborhoods_venues_sortedny['Neighborhood'] = ny_grouped['Neighborhood']

for ind in np.arange(ny_grouped.shape[0]):
    neighborhoods_venues_sortedny.iloc[ind, 1:] = return_most_common_venues(ny_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sortedny.head()

In [None]:
print(toronto_grouped.shape)
print(ny_grouped.shape)

### **Cluster Neighborhoods of Toronto & New York**

In [None]:
# set number of clusters
kclusters = 5

final_grouped= toronto_grouped.append(ny_grouped)
print(final_grouped.shape)
final_grouped_clustering = final_grouped.drop('Neighborhood', 1)
final_grouped_clustering = final_grouped_clustering.fillna(0)
#toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
#kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(final_grouped_clustering)
# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 


In [None]:
final_venues_sorted = neighborhoods_venues_sorted.append(neighborhoods_venues_sortedny)
final_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
final_venues_sorted.head()

In [None]:
print(final_venues_sorted['Cluster Labels'].unique())

In [None]:
#merge the data of 2 cities
toronto_merged = neighborhoods
ny_merged = neighborhoodsny

final_merged=toronto_merged.append(ny_merged)

final_merged.head() 

In [None]:
final_merged = final_merged.join(final_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
final_merged.head()

In [None]:
final_merged['Cluster Labels'].replace(np.nan, -1, inplace=True)
final_merged[['Cluster Labels']] = final_merged[['Cluster Labels']].astype("int")
print(final_merged['Cluster Labels'].unique())
final_merged.head()

**Generate Map for New York with clusters**

In [None]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(final_merged['Latitude'], final_merged['Longitude'], final_merged['Neighborhood'], final_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

**Generate Map for Toronto with clusters**

In [None]:
address = 'Toronto City'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto City are {}, {}.'.format(latitude, longitude))

In [None]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(final_merged['Latitude'], final_merged['Longitude'], final_merged['Neighborhood'], final_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### **Examine Clusters** 

In [None]:
final_merged.loc[final_merged['Cluster Labels'] == 0, final_merged.columns[[1] + list(range(5, final_merged.shape[1]))]]

In [None]:
final_merged.loc[final_merged['Cluster Labels'] == 1, final_merged.columns[[1] + list(range(5, final_merged.shape[1]))]]

In [None]:
final_merged.loc[final_merged['Cluster Labels'] == 2, final_merged.columns[[1] + list(range(5, final_merged.shape[1]))]]

In [None]:
final_merged.loc[final_merged['Cluster Labels'] == 3, final_merged.columns[[1] + list(range(5, final_merged.shape[1]))]]

In [60]:
final_merged.loc[final_merged['Cluster Labels'] == 4, final_merged.columns[[1] + list(range(5, final_merged.shape[1]))]]

Unnamed: 0,Latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
112,43.72802,4,Park,Swim School,Bus Line,Wings Joint,Diner,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant
122,43.696948,4,Park,Trail,Jewelry Store,Sushi Restaurant,Wings Joint,Diner,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store
123,43.696948,4,Park,Trail,Jewelry Store,Sushi Restaurant,Wings Joint,Diner,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store
183,43.679563,4,Park,Playground,Trail,Wings Joint,Dim Sum Restaurant,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant
27,40.806551,4,Park,South American Restaurant,Home Service,Grocery Store,Pool,Boat or Ferry,Bus Stop,Event Service,Event Space,Exhibit
35,40.881395,4,Park,Bank,Thai Restaurant,Tennis Stadium,Tennis Court,Pharmacy,Pizza Place,Falafel Restaurant,Empanada Restaurant,English Restaurant
169,40.659816,4,Park,Playground,Trail,Wings Joint,Dim Sum Restaurant,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant
192,40.597711,4,Park,Women's Store,Field,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Eye Doctor,Factory,Falafel Restaurant
203,40.597069,4,Park,Women's Store,Field,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Eye Doctor,Factory,Falafel Restaurant
256,40.63563,4,Park,Bagel Shop,Deli / Bodega,Bus Stop,Filipino Restaurant,Event Service,Event Space,Exhibit,Eye Doctor,Factory


 ## Results  <a name="results"></a>

* Purpose of this project was to compare the Neighbourhoods of the two cities and determine how similar or dissimilar they are. By using Foursquare API we were able to leverage the venues data to compare Neighbourhoods. The K-Means algorithm was very useful for Clustering similar data points. 
* The stakeholders can use this approach to compare Neighbourhoods effectively. 



## Conclusion <a name="conclusion"></a>

* Our analysis shows that Toronto and New York are similar in many ways. 
* Similarities: Both the cities are on waterfronts. Neighbourhoods have proximity to Restaurants with all types of cuisine, Bars, Parks, Culture Centers. Very ethnically diverse. Both the cities propose a lot of opportunities for training and sport activities
* Dissimilarities: New York neighbourhoods have more gyms for sports activities compared to Toronto. 

