# BATTLE OF THE CITIES -- J.R. Gutierrez

## Intro / Business Problem

#### John and his wife Libby want to take a vacation to either New York and Toronto. However they aren't sure what would offer them the right experience for what they are seeking. New York and Toronto are two very powerful, diverse cities in their respective countries. While they are very similar, they also differ greatly in their food, hotels and tourists attractions. John and Libby don't find enough information online to satisfy these concerns.

## Background

#### John happens to be a data scientist, and will explore New York and Toronto by segmenting and clustering the most exciting neighborhood for each city using the Foursquare API and K-Means algorithm. These clusters of data will include the restaurants, hotels, parks, galleries and other tourist attractions. The neighborhoods selected include Manhattan from New York and Downtown Toronto from Toronto.

## Data

#### John will use the ML algorithm K-means to cluster the neighborhoods with similar objects, with a focus on locating the hottest tourist spots and venues of each. This way the cities can be analyzed and compared adequately.

#### For downtown Toronto, John will extract data from a wikipedia page to breakdown the neighborhoods within. Then he we will clean and wrangle data as needed, like removing duplicates and "Not assigned" values. John will use Foursquare API to retrieve the coordinates of Downtown Toronto and explore its neighborhoods. The neighborhoods are then further characterized and clustered.

#### Manhattan will go through much of the same process. John will retrieve a saved data file which is already explored through foursquare API in which he extracted and sorted all the boroughs of New York. Afterwards, Manhattan's neighborhoods and its venues will be characterized and clustered as well. 

In [5]:
import numpy as np
import pandas as pd

# Visualization
import matplotlib.pyplot
import seaborn as sns
# Too see full dataframe...
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.width', None)

import json # library to handle JSON files

!conda install -c conda-forge beautifulsoup4 --yes 
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print('Imported')

Solving environment: done


  current version: 4.5.11
  latest version: 4.8.2

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs: 
    - beautifulsoup4


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    soupsieve-1.9.4            |           py36_0          58 KB  conda-forge
    beautifulsoup4-4.8.2       |           py36_0         157 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         215 KB

The following NEW packages will be INSTALLED:

    soupsieve:      1.9.4-py36_0 conda-forge

The following packages will be UPDATED:

    beautifulsoup4: 4.6.3-py37_0             --> 4.8.2-py36_0 conda-forge


Downloading and Extracting Packages
soupsieve-1.9.4      | 58 KB     | 

In [10]:
from bs4 import BeautifulSoup

#Extract Toronto Data
req = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
soup = BeautifulSoup(req.text, 'html.parser')
table=soup.find('table', attrs={'class':'wikitable sortable'})

In [17]:

#get headers:
headers=table.findAll('th')
for i, head in enumerate(headers): headers[i]=str(headers[i]).replace("<th>","").replace("</th>","").replace("\n","")

#Find all items and skip first one:
rows=table.findAll('tr')
rows=rows[1:len(rows)]

# skip all meta symbols and line feeds between rows:
for i, row in enumerate(rows): rows[i] = str(rows[i]).replace("\n</td></tr>","").replace("<tr>\n<td>","")

# make dataframe, expand rows and drop the old one:
df=pd.DataFrame(rows)
df[headers] = df[0].str.split("</td>\n<td>", n = 2, expand = True) 
df.drop(columns=[0],inplace=True)

# skip not assigned boroughs:
df = df.drop(df[(df.Borough == "Not assigned")].index)
# give "Not assigned" Neighborhoods same name as Borough:
df.Neighbourhood.replace("Not assigned", df.Borough, inplace=True)

# copy Borough value to Neighborhood if NaN:
df.Neighbourhood.fillna(df.Borough, inplace=True)
# drop duplicate rows:
df=df.drop_duplicates()

# extract titles from columns
df.update(
    df.Neighbourhood.loc[
        lambda x: x.str.contains('title')
    ].str.extract('title=\"([^\"]*)',expand=False))

df.update(
    df.Borough.loc[
        lambda x: x.str.contains('title')
    ].str.extract('title=\"([^\"]*)',expand=False))

# delete Toronto annotation from Neighbourhood:
df.update(
    df.Neighbourhood.loc[
        lambda x: x.str.contains('Toronto')
    ].str.replace(", Toronto",""))
df.update(
    df.Neighbourhood.loc[
        lambda x: x.str.contains('Toronto')
    ].str.replace("\(Toronto\)",""))

# combine multiple neighborhoods with the same post code
df2 = pd.DataFrame({'Postcode':df.Postcode.unique()})
df2['Borough']=pd.DataFrame(list(set(df['Borough'].loc[df['Postcode'] == x['Postcode']])) for i, x in df2.iterrows())
df2['Neighborhood']=pd.Series(list(set(df['Neighbourhood'].loc[df['Postcode'] == x['Postcode']])) for i, x in df2.iterrows())
df2['Neighborhood']=df2['Neighborhood'].apply(lambda x: ', '.join(x))
df2.dtypes

df2.head()

Unnamed: 0,Postcode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Regent Park
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,Queen's Park


In [18]:
#add Geo-spatial data
dfll= pd.read_csv("http://cocl.us/Geospatial_data")
dfll.rename(columns={'Postal Code':'Postcode'}, inplace=True)
dfll.set_index("Postcode")
df2.set_index("Postcode")
toronto_data=pd.merge(df2, dfll)
toronto_data.head()

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Regent Park,43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494


In [19]:
# Sorting
# set index for only Downtown Toronto
downtown_toronto_data = toronto_data[toronto_data['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
# eliminate 'Postcode' column
downtown_toronto_data=downtown_toronto_data.drop(['Postcode'], axis=1)
downtown_toronto_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Downtown Toronto,Regent Park,43.65426,-79.360636
1,Downtown Toronto,Queen's Park,43.662301,-79.389494
2,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937
3,Downtown Toronto,St. James Town,43.651494,-79.375418
4,Downtown Toronto,Berczy Park,43.644771,-79.373306


### Now we'll get New York data from a json file previously uploaded from the web

In [20]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


In [21]:
# Load the Data
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

### The relevant data is in the features key, which is basically a list of the neighborhoods. So, let's define a new variable that includes this data.

In [22]:
neighborhoods_data = newyork_data['features']
#Let's take a look at the first item on the list
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

### Tranform the data into a *pandas* dataframe

In [23]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

## Loop through the data and fill the dataframe one row at a time.

In [24]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
    neighborhoods.head()

In [25]:
# Creating new Dataframe manhattan_data
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


## Foursquare API

In [26]:
# Define Foursquare Credentials and Version
CLIENT_ID = 'L5WTISNZNIB1MWT5UBFOJKUHFB25E3Z3CFVJDCZ2TGCHTQM3' # your Foursquare ID
CLIENT_SECRET = '1ZDGHO2NDC1QSA45DOI3WQCKLD4SJ0FJKQJQXD300TSJH41M' # your Foursquare Secret
VERSION = '20180604'
limit = 20
print('Your credentails:')
print('CLIENT_ID:'+ CLIENT_ID)
print('CLIENT_SECRET:'+ CLIENT_SECRET)

Your credentails:
CLIENT_ID:L5WTISNZNIB1MWT5UBFOJKUHFB25E3Z3CFVJDCZ2TGCHTQM3
CLIENT_SECRET:1ZDGHO2NDC1QSA45DOI3WQCKLD4SJ0FJKQJQXD300TSJH41M


In [27]:
# get the geographical coordinates of Downtown Toronto
address = 'Downtown Toronto, ON, Canada'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude_downtown_toronto = location.latitude
longitude_downtown_toronto = location.longitude
print("Downtown Toronto","latitude",latitude_downtown_toronto, "& " "longitude" ,longitude_downtown_toronto)

  after removing the cwd from sys.path.


Downtown Toronto latitude 43.6563221 & longitude -79.3809161


In [28]:
# Let's get the geographical coordinates of Manhattan.
address = 'Manhattan, NY'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

  after removing the cwd from sys.path.


The geograpical coordinate of Manhattan are 40.7896239, -73.9598939.


## VISUALIZATION
#### John wants to visualize the selected neighborhoods so that he confirm coordinates of each neighborhood.

In [29]:
# Map of Downtown Toronto using latitude and longitude values
map_downtown_toronto = folium.Map(location=[latitude_downtown_toronto,longitude_downtown_toronto], zoom_start=11)

# add markers to map
for lat, lng, label in zip(downtown_toronto_data['Latitude'], downtown_toronto_data['Longitude'], downtown_toronto_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_downtown_toronto)  
    
map_downtown_toronto

In [30]:
from folium import plugins
# create map of Downtown Toronto using latitude and longitude values
map_downtown_toronto = folium.Map(location=[latitude_downtown_toronto,longitude_downtown_toronto], zoom_start=11)
# instantiate a mark cluster object for the incidents in the dataframe
incidents = plugins.MarkerCluster().add_to(map_downtown_toronto)
# add markers to map
for lat, lng, label in zip(downtown_toronto_data['Latitude'], downtown_toronto_data['Longitude'], downtown_toronto_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(incidents)  
    
map_downtown_toronto

In [31]:
# Manhattan and its neighborhoods.
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

In [32]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

grouping = plugins.MarkerCluster().add_to(map_manhattan)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(grouping)  
    
map_manhattan

## Exploring Downtown Toronto

In [33]:
# Let's create a function to repeat the process to all the neighborhoods in Toronto
def getNearbyVenues(names, latitudes,longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names,latitudes,longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [34]:
# Write the code to run the above function on each neighborhood and create a new dataframe called toronto_venues.
downtown_toronto_venues = getNearbyVenues(names=downtown_toronto_data['Neighborhood'],
                                   latitudes=downtown_toronto_data['Latitude'],
                                   longitudes=downtown_toronto_data['Longitude'],
                                  )

Regent Park
Queen's Park 
Ryerson, Garden District
St. James Town
Berczy Park
Central Bay Street
Christie
Adelaide, Richmond, King
Union Station , Harbourfront East, Toronto Islands
Design Exchange, Toronto Dominion Centre
Commerce Court, Victoria Hotel
University of Toronto, Harbord
Grange Park , Kensington Market, Chinatown
South Niagara, Bathurst Quay, King and Spadina, Railway Lands, CN Tower, Harbourfront West, Island airport
Rosedale
Stn A PO Boxes 25 The Esplanade
St. James Town, Cabbagetown
Underground city, First Canadian Place
Church and Wellesley


In [35]:
# Let's check the size of the resulting dataframe
print(downtown_toronto_venues.shape)
downtown_toronto_venues.head()

(358, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Regent Park,43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,Regent Park,43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,Regent Park,43.65426,-79.360636,Cooper Koo Family YMCA,43.653191,-79.357947,Gym / Fitness Center
3,Regent Park,43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,Regent Park,43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant


In [36]:
# Let's check how many venues were returned for each neighborhood
downtown_toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, Richmond, King",20,20,20,20,20,20
Berczy Park,20,20,20,20,20,20
Central Bay Street,20,20,20,20,20,20
Christie,18,18,18,18,18,18
Church and Wellesley,20,20,20,20,20,20
"Commerce Court, Victoria Hotel",20,20,20,20,20,20
"Design Exchange, Toronto Dominion Centre",20,20,20,20,20,20
"Grange Park , Kensington Market, Chinatown",20,20,20,20,20,20
Queen's Park,20,20,20,20,20,20
Regent Park,20,20,20,20,20,20


In [37]:
# Let's find out how many unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(downtown_toronto_venues['Venue Category'].unique())))

There are 121 uniques categories.


In [38]:
#JOHN ANALYZES THE NEIGHBORHOODS

# one hot encoding
downtown_toronto_onehot = pd.get_dummies(downtown_toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
downtown_toronto_onehot['Neighborhood'] = downtown_toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [downtown_toronto_onehot.columns[-1]] + list(downtown_toronto_onehot.columns[:-1])
downtown_toronto_onehot = downtown_toronto_onehot[fixed_columns]

downtown_toronto_onehot.head()

Unnamed: 0,Yoga Studio,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Baby Store,Bakery,Bar,Beer Bar,Belgian Restaurant,Boat or Ferry,Bookstore,Breakfast Spot,Bubble Tea Shop,Burger Joint,Burrito Place,Butcher,Café,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Gym,Comfort Food Restaurant,Comic Shop,Concert Hall,Cosmetics Shop,Creperie,Dance Studio,Deli / Bodega,Dessert Shop,Diner,Electronics Store,Ethiopian Restaurant,Farmers Market,Fish Market,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Gas Station,Gastropub,General Entertainment,Gift Shop,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Historic Site,Hobby Shop,Hotel,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Lake,Liquor Store,Lounge,Market,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Movie Theater,Museum,Music Venue,Neighborhood,New American Restaurant,Nightclub,Opera House,Organic Grocery,Park,Performing Arts Venue,Pet Store,Pizza Place,Playground,Plaza,Portuguese Restaurant,Pub,Ramen Restaurant,Rental Car Location,Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Sculpture Garden,Seafood Restaurant,Skating Rink,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Tailor Shop,Taiwanese Restaurant,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar
0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Regent Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Regent Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Regent Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Regent Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Regent Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [39]:
# Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
downtown_toronto_grouped = downtown_toronto_onehot.groupby('Neighborhood').mean().reset_index()

In [40]:
# Let's print each neighborhood along with the top 5 most common venues
num_top_venues = 5

for hood in downtown_toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = downtown_toronto_grouped[downtown_toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, Richmond, King----
                venue  freq
0  Seafood Restaurant  0.10
1    Asian Restaurant  0.10
2          Steakhouse  0.10
3          Food Court  0.05
4           Speakeasy  0.05


----Berczy Park----
            venue  freq
0  Farmers Market  0.10
1        Beer Bar  0.10
2    Cocktail Bar  0.05
3    Liquor Store  0.05
4    Concert Hall  0.05


----Central Bay Street----
                 venue  freq
0          Coffee Shop  0.25
1  Japanese Restaurant  0.10
2                  Spa  0.05
3     Sushi Restaurant  0.05
4   Chinese Restaurant  0.05


----Christie----
                venue  freq
0       Grocery Store  0.22
1                Café  0.17
2                Park  0.11
3  Athletics & Sports  0.06
4           Nightclub  0.06


----Church and Wellesley----
                   venue  freq
0         Breakfast Spot  0.05
1  General Entertainment  0.05
2     Salon / Barbershop  0.05
3        Bubble Tea Shop  0.05
4                   Park  0.05


----Commerce Court, Vict

In [41]:
# Let's put that into a pandas dataframe
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [42]:
# Now let's create the new dataframe and display the top 10 venues for each neighborhood.
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = downtown_toronto_grouped['Neighborhood']

for ind in np.arange(downtown_toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(downtown_toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, Richmond, King",Seafood Restaurant,Steakhouse,Asian Restaurant,Opera House,Pizza Place,Bar,Lounge,Plaza,Speakeasy,Hotel
1,Berczy Park,Beer Bar,Farmers Market,Seafood Restaurant,Cocktail Bar,Concert Hall,Bakery,Liquor Store,Italian Restaurant,Steakhouse,Coffee Shop
2,Central Bay Street,Coffee Shop,Japanese Restaurant,Park,Ice Cream Shop,Modern European Restaurant,Bubble Tea Shop,Ramen Restaurant,Sandwich Place,Italian Restaurant,Spa
3,Christie,Grocery Store,Café,Park,Gas Station,Italian Restaurant,Restaurant,Diner,Baby Store,Candy Store,Athletics & Sports
4,Church and Wellesley,Park,Ice Cream Shop,Mexican Restaurant,Pizza Place,Juice Bar,Bubble Tea Shop,Ramen Restaurant,Breakfast Spot,Restaurant,Bookstore
5,"Commerce Court, Victoria Hotel",Café,Gastropub,Coffee Shop,Bakery,Gym / Fitness Center,Deli / Bodega,Museum,Pub,Restaurant,Salad Place
6,"Design Exchange, Toronto Dominion Centre",Coffee Shop,Café,Restaurant,Gym,Bakery,Gym / Fitness Center,Hotel,Japanese Restaurant,Deli / Bodega,Pub
7,"Grange Park , Kensington Market, Chinatown",Café,Vietnamese Restaurant,Mexican Restaurant,Bakery,Fish Market,Farmers Market,Dessert Shop,Coffee Shop,Organic Grocery,Cocktail Bar
8,Queen's Park,Coffee Shop,Park,Gym,Seafood Restaurant,Arts & Crafts Store,Beer Bar,Burger Joint,Burrito Place,Creperie,Diner
9,Regent Park,Coffee Shop,Bakery,Park,Breakfast Spot,Farmers Market,Performing Arts Venue,Gym / Fitness Center,Restaurant,Historic Site,Chocolate Shop


# Clustering Neighborhoods

In [43]:
# set number of clusters
kclusters = 5

downtown_toronto_grouped_clustering = downtown_toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(downtown_toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 2, 3, 4, 2, 1, 1, 1, 2, 3], dtype=int32)

In [44]:
# Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
downtown_toronto_merged = downtown_toronto_data

# add clustering labels
downtown_toronto_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
downtown_toronto_merged = downtown_toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

downtown_toronto_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,Regent Park,43.65426,-79.360636,2,Coffee Shop,Bakery,Park,Breakfast Spot,Farmers Market,Performing Arts Venue,Gym / Fitness Center,Restaurant,Historic Site,Chocolate Shop
1,Downtown Toronto,Queen's Park,43.662301,-79.389494,2,Coffee Shop,Park,Gym,Seafood Restaurant,Arts & Crafts Store,Beer Bar,Burger Joint,Burrito Place,Creperie,Diner
2,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937,3,Clothing Store,Café,Pizza Place,Burrito Place,Coffee Shop,Plaza,Burger Joint,Diner,Electronics Store,Restaurant
3,Downtown Toronto,St. James Town,43.651494,-79.375418,4,Gastropub,Restaurant,Hotel,Coffee Shop,Food Truck,Italian Restaurant,Japanese Restaurant,Creperie,Cosmetics Shop,Middle Eastern Restaurant
4,Downtown Toronto,Berczy Park,43.644771,-79.373306,2,Beer Bar,Farmers Market,Seafood Restaurant,Cocktail Bar,Concert Hall,Bakery,Liquor Store,Italian Restaurant,Steakhouse,Coffee Shop


In [45]:
# create map
map_clusters = folium.Map(location=[latitude_downtown_toronto, longitude_downtown_toronto], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(downtown_toronto_merged['Latitude'], downtown_toronto_merged['Longitude'], downtown_toronto_merged['Neighborhood'], downtown_toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examine Toronto's Clusters

#### Cluster 1

In [46]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 0, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,"Commerce Court, Victoria Hotel",Café,Gastropub,Coffee Shop,Bakery,Gym / Fitness Center,Deli / Bodega,Museum,Pub,Restaurant,Salad Place


#### Cluster 2

In [47]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 1, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Central Bay Street,Coffee Shop,Japanese Restaurant,Park,Ice Cream Shop,Modern European Restaurant,Bubble Tea Shop,Ramen Restaurant,Sandwich Place,Italian Restaurant,Spa
6,Christie,Grocery Store,Café,Park,Gas Station,Italian Restaurant,Restaurant,Diner,Baby Store,Candy Store,Athletics & Sports
7,"Adelaide, Richmond, King",Seafood Restaurant,Steakhouse,Asian Restaurant,Opera House,Pizza Place,Bar,Lounge,Plaza,Speakeasy,Hotel
11,"University of Toronto, Harbord",Bookstore,Bakery,Japanese Restaurant,Restaurant,Sushi Restaurant,Dessert Shop,Sandwich Place,Bar,Italian Restaurant,Beer Bar
13,"South Niagara, Bathurst Quay, King and Spadina...",Airport Service,Airport Lounge,Airport Terminal,Rental Car Location,Bar,Airport,Airport Food Court,Airport Gate,Harbor / Marina,Sculpture Garden
14,Rosedale,Park,Playground,Trail,Deli / Bodega,Church,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Gym
16,"St. James Town, Cabbagetown",Restaurant,Café,Italian Restaurant,Bakery,Indian Restaurant,Diner,Market,Caribbean Restaurant,Pet Store,Jewelry Store


#### Cluster 3

In [48]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 2, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Regent Park,Coffee Shop,Bakery,Park,Breakfast Spot,Farmers Market,Performing Arts Venue,Gym / Fitness Center,Restaurant,Historic Site,Chocolate Shop
1,Queen's Park,Coffee Shop,Park,Gym,Seafood Restaurant,Arts & Crafts Store,Beer Bar,Burger Joint,Burrito Place,Creperie,Diner
4,Berczy Park,Beer Bar,Farmers Market,Seafood Restaurant,Cocktail Bar,Concert Hall,Bakery,Liquor Store,Italian Restaurant,Steakhouse,Coffee Shop
8,"Union Station , Harbourfront East, Toronto Isl...",Park,Plaza,Sports Bar,Performing Arts Venue,Café,Lake,Salad Place,Japanese Restaurant,Skating Rink,Ice Cream Shop
12,"Grange Park , Kensington Market, Chinatown",Café,Vietnamese Restaurant,Mexican Restaurant,Bakery,Fish Market,Farmers Market,Dessert Shop,Coffee Shop,Organic Grocery,Cocktail Bar
15,Stn A PO Boxes 25 The Esplanade,Farmers Market,Cocktail Bar,Beer Bar,French Restaurant,Concert Hall,Jazz Club,Vegetarian / Vegan Restaurant,Café,Fountain,Steakhouse
17,"Underground city, First Canadian Place",Café,Coffee Shop,Steakhouse,Restaurant,Gym,Gym / Fitness Center,Deli / Bodega,Pizza Place,Pub,Salad Place
18,Church and Wellesley,Park,Ice Cream Shop,Mexican Restaurant,Pizza Place,Juice Bar,Bubble Tea Shop,Ramen Restaurant,Breakfast Spot,Restaurant,Bookstore


#### Cluster 4

In [49]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 3, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,"Ryerson, Garden District",Clothing Store,Café,Pizza Place,Burrito Place,Coffee Shop,Plaza,Burger Joint,Diner,Electronics Store,Restaurant
9,"Design Exchange, Toronto Dominion Centre",Coffee Shop,Café,Restaurant,Gym,Bakery,Gym / Fitness Center,Hotel,Japanese Restaurant,Deli / Bodega,Pub


#### Cluster 5

In [50]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 4, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,St. James Town,Gastropub,Restaurant,Hotel,Coffee Shop,Food Truck,Italian Restaurant,Japanese Restaurant,Creperie,Cosmetics Shop,Middle Eastern Restaurant


### Exploring Manhattan Neighborhoods

In [51]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [52]:
# Now write the code to run the above function on each neighborhood and create a new dataframe called manhattan_venues
manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude'],
                                  )

Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


In [53]:
# Let's check how many venues were returned for each neighborhood
manhattan_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Battery Park City,20,20,20,20,20,20
Carnegie Hill,20,20,20,20,20,20
Central Harlem,20,20,20,20,20,20
Chelsea,20,20,20,20,20,20
Chinatown,20,20,20,20,20,20
Civic Center,20,20,20,20,20,20
Clinton,20,20,20,20,20,20
East Harlem,20,20,20,20,20,20
East Village,20,20,20,20,20,20
Financial District,20,20,20,20,20,20


In [54]:
# Let's find out how many unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(manhattan_venues['Venue Category'].unique())))

There are 216 uniques categories.


In [55]:
# Analyzing the neighborhoods
# one hot encoding
manhattan_onehot = pd.get_dummies(manhattan_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
manhattan_onehot['Neighborhood'] = manhattan_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]

manhattan_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Art Gallery,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Beer Bar,Beer Garden,Beer Store,Bike Rental / Bike Share,Bike Shop,Bike Trail,Bistro,Board Shop,Boat or Ferry,Bookstore,Boutique,Boxing Gym,Breakfast Spot,Bridge,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Café,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Christmas Market,Circus,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Academic Building,College Arts Building,Comedy Club,Community Center,Concert Hall,Convenience Store,Cooking School,Cosmetics Shop,Cuban Restaurant,Cupcake Shop,Cycle Studio,Czech Restaurant,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Diner,Discount Store,Doctor's Office,Dog Run,Donut Shop,Dumpling Restaurant,Duty-free Shop,English Restaurant,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Filipino Restaurant,Fish Market,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gas Station,Gastropub,General Entertainment,German Restaurant,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Hawaiian Restaurant,Health Food Store,Heliport,Himalayan Restaurant,Historic Site,History Museum,Hobby Shop,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Irish Pub,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kids Store,Korean Restaurant,Kosher Restaurant,Latin American Restaurant,Leather Goods Store,Library,Lingerie Store,Liquor Store,Lounge,Market,Martial Arts Dojo,Mediterranean Restaurant,Memorial Site,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Museum,Music School,Music Venue,New American Restaurant,Nightclub,Noodle House,Opera House,Optical Shop,Outdoor Sculpture,Outdoors & Recreation,Park,Pedestrian Plaza,Performing Arts Venue,Peruvian Restaurant,Pet Café,Pet Service,Pet Store,Pharmacy,Pie Shop,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Pool,Pub,Public Art,Ramen Restaurant,Residential Building (Apartment / Condo),Restaurant,Rock Club,Roof Deck,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,School,Seafood Restaurant,Shanghai Restaurant,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Club,Steakhouse,Street Art,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tapas Restaurant,Tea Room,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Tiki Bar,Tourist Information Center,Trail,Turkish Restaurant,Vegetarian / Vegan Restaurant,Veterinarian,Vietnamese Restaurant,Waterfront,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
2,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [56]:
# Set Index
manhattan_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()

In [57]:
# John prints each neighborhood along with the top 5 most common venues
num_top_venues = 5

for hood in manhattan_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = manhattan_grouped[manhattan_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Battery Park City----
            venue  freq
0            Park  0.15
1   Memorial Site  0.10
2      Food Court  0.10
3   Burrito Place  0.05
4  Sandwich Place  0.05


----Carnegie Hill----
                  venue  freq
0           Coffee Shop  0.10
1    Italian Restaurant  0.10
2  Gym / Fitness Center  0.10
3                   Gym  0.10
4      Community Center  0.05


----Central Harlem----
                 venue  freq
0   African Restaurant  0.10
1  American Restaurant  0.10
2    French Restaurant  0.10
3            Jazz Club  0.05
4             Beer Bar  0.05


----Chelsea----
               venue  freq
0  Indian Restaurant  0.05
1            Theater  0.05
2              Hotel  0.05
3        Men's Store  0.05
4  French Restaurant  0.05


----Chinatown----
                venue  freq
0  Chinese Restaurant  0.15
1      Sandwich Place  0.10
2         Pizza Place  0.05
3           Roof Deck  0.05
4        Noodle House  0.05


----Civic Center----
                venue  freq
0  Falaf

In [58]:
# Let's put that into a pandas dataframe
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [59]:
# Now let's create the new dataframe and display the top 10 venues for each neighborhood.
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted1 = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted1['Neighborhood'] = manhattan_grouped['Neighborhood']

for ind in np.arange(manhattan_grouped.shape[0]):
    neighborhoods_venues_sorted1.iloc[ind, 1:] = return_most_common_venues(manhattan_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted1

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Battery Park City,Park,Memorial Site,Food Court,Gym,Burrito Place,Cupcake Shop,Sandwich Place,Smoke Shop,Coffee Shop,Grocery Store
1,Carnegie Hill,Gym,Italian Restaurant,Coffee Shop,Gym / Fitness Center,Café,Bookstore,Gourmet Shop,Shoe Store,Bagel Shop,Community Center
2,Central Harlem,French Restaurant,African Restaurant,American Restaurant,Cycle Studio,Jazz Club,Ethiopian Restaurant,Library,Music Venue,Cocktail Bar,Juice Bar
3,Chelsea,Cupcake Shop,Italian Restaurant,Men's Store,Café,French Restaurant,Chinese Restaurant,Speakeasy,Coffee Shop,Beer Bar,Tapas Restaurant
4,Chinatown,Chinese Restaurant,Sandwich Place,Greek Restaurant,Spa,Roof Deck,Cocktail Bar,New American Restaurant,Noodle House,Bakery,Museum
5,Civic Center,Falafel Restaurant,Yoga Studio,Bar,Dance Studio,Park,Monument / Landmark,Molecular Gastronomy Restaurant,Sandwich Place,Martial Arts Dojo,Spa
6,Clinton,Gym / Fitness Center,Theater,Pie Shop,French Restaurant,Pizza Place,Comedy Club,Movie Theater,Sandwich Place,Café,Building
7,East Harlem,Mexican Restaurant,Thai Restaurant,Gym,Café,Cuban Restaurant,French Restaurant,Sandwich Place,Latin American Restaurant,Steakhouse,Street Art
8,East Village,Coffee Shop,Ice Cream Shop,Bar,Japanese Restaurant,Korean Restaurant,Moroccan Restaurant,Dog Run,Dessert Shop,Park,Pet Café
9,Financial District,Gym / Fitness Center,Jewelry Store,Coffee Shop,Falafel Restaurant,Steakhouse,Doctor's Office,New American Restaurant,Monument / Landmark,Gym,Salad Place


## CLUSTERING NEIGHBORHOODS
John now applies the Machine Learning Technique “Clustering” to segment the neighborhoods into similar objects clusters. This will help John and Libby compare Toronto Downtown and Manhattan. John will extract the hot spots which are present on one of the clusters.

### Manhattan

In [60]:
# Run k-means to cluster the neighborhood into 5 clusters.
# set number of clusters
kclusters = 5

manhattan_grouped_clustering = manhattan_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([3, 1, 1, 2, 3, 3, 3, 3, 1, 1, 1, 1, 2, 2, 1, 1, 3, 4, 1, 2, 0, 2,
       1, 3, 0, 1, 2, 2, 1, 3, 3, 3, 1, 3, 1, 2, 2, 2, 2, 2], dtype=int32)

In [61]:
# Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood
manhattan_merged = manhattan_data

In [62]:
# add clustering labels
manhattan_merged['Cluster Labels'] = kmeans.labels_

In [63]:
# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
manhattan_merged = manhattan_merged.join(neighborhoods_venues_sorted1.set_index('Neighborhood'), on='Neighborhood')

manhattan_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Manhattan,Marble Hill,40.876551,-73.91066,3,Coffee Shop,Yoga Studio,Gym,Seafood Restaurant,Sandwich Place,Pizza Place,Pharmacy,Miscellaneous Shop,Kids Store,Donut Shop
1,Manhattan,Chinatown,40.715618,-73.994279,1,Chinese Restaurant,Sandwich Place,Greek Restaurant,Spa,Roof Deck,Cocktail Bar,New American Restaurant,Noodle House,Bakery,Museum
2,Manhattan,Washington Heights,40.851903,-73.9369,1,Wine Shop,Café,Coffee Shop,Frozen Yogurt Shop,Restaurant,Ramen Restaurant,Cocktail Bar,Pool,Liquor Store,Bakery
3,Manhattan,Inwood,40.867684,-73.92121,2,Park,Bakery,Yoga Studio,Deli / Bodega,Mexican Restaurant,Café,Frozen Yogurt Shop,Latin American Restaurant,Bistro,Restaurant
4,Manhattan,Hamilton Heights,40.823604,-73.949688,3,Yoga Studio,Cocktail Bar,Mexican Restaurant,Bakery,Cosmetics Shop,Pizza Place,Coffee Shop,Caribbean Restaurant,Café,Burger Joint


In [64]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manhattan_merged['Latitude'], manhattan_merged['Longitude'], manhattan_merged['Neighborhood'], manhattan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Analyzing Manhattan Clusters
### John examines the clusters and their particular venues.

## Cluster 1

In [67]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 0, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
20,Lower East Side,Japanese Restaurant,Coffee Shop,Cocktail Bar,Yoga Studio,Performing Arts Venue,Clothing Store,Chinese Restaurant,Filipino Restaurant,Bubble Tea Shop,Mediterranean Restaurant
24,West Village,Italian Restaurant,Cosmetics Shop,Gourmet Shop,Chinese Restaurant,Board Shop,Mediterranean Restaurant,New American Restaurant,Park,Coffee Shop,Cocktail Bar


## Cluster 2

In [68]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 1, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Chinatown,Chinese Restaurant,Sandwich Place,Greek Restaurant,Spa,Roof Deck,Cocktail Bar,New American Restaurant,Noodle House,Bakery,Museum
2,Washington Heights,Wine Shop,Café,Coffee Shop,Frozen Yogurt Shop,Restaurant,Ramen Restaurant,Cocktail Bar,Pool,Liquor Store,Bakery
8,Upper East Side,Hotel,Bakery,French Restaurant,Chocolate Shop,Spa,Bookstore,Coffee Shop,Sushi Restaurant,Jazz Club,Bar
9,Yorkville,Italian Restaurant,Deli / Bodega,Coffee Shop,Beer Store,Sushi Restaurant,Pub,Monument / Landmark,Diner,Gym,Gym / Fitness Center
10,Lenox Hill,Thai Restaurant,Taco Place,Gym,Middle Eastern Restaurant,Smoke Shop,Salad Place,Liquor Store,French Restaurant,Restaurant,Gift Shop
11,Roosevelt Island,Sandwich Place,Bus Line,Gym / Fitness Center,Kosher Restaurant,Residential Building (Apartment / Condo),Liquor Store,Coffee Shop,Greek Restaurant,Japanese Restaurant,School
14,Clinton,Gym / Fitness Center,Theater,Pie Shop,French Restaurant,Pizza Place,Comedy Club,Movie Theater,Sandwich Place,Café,Building
15,Midtown,Chinese Restaurant,Hotel,Coffee Shop,Smoke Shop,Mediterranean Restaurant,Cuban Restaurant,Spa,Food Truck,Sporting Goods Shop,French Restaurant
18,Greenwich Village,Italian Restaurant,Café,Yoga Studio,Beer Bar,Jazz Club,Gourmet Shop,French Restaurant,Food Truck,Optical Shop,Coffee Shop
22,Little Italy,Café,Ice Cream Shop,Wine Bar,Sandwich Place,Coffee Shop,Spanish Restaurant,Salon / Barbershop,Salad Place,Gourmet Shop,Clothing Store


## Cluster 3

In [70]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 2, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Inwood,Park,Bakery,Yoga Studio,Deli / Bodega,Mexican Restaurant,Café,Frozen Yogurt Shop,Latin American Restaurant,Bistro,Restaurant
12,Upper West Side,Wine Bar,Italian Restaurant,Peruvian Restaurant,Bakery,Chinese Restaurant,Southern / Soul Food Restaurant,Bookstore,Movie Theater,Pub,Juice Bar
13,Lincoln Square,Theater,Performing Arts Venue,Indie Movie Theater,Concert Hall,Fountain,Library,Gift Shop,Circus,Gym / Fitness Center,Opera House
19,East Village,Coffee Shop,Ice Cream Shop,Bar,Japanese Restaurant,Korean Restaurant,Moroccan Restaurant,Dog Run,Dessert Shop,Park,Pet Café
21,Tribeca,Park,American Restaurant,Yoga Studio,Dog Run,Salad Place,Sushi Restaurant,Spa,Greek Restaurant,Poke Place,Coffee Shop
26,Morningside Heights,Bookstore,American Restaurant,Park,Sandwich Place,Coffee Shop,Ice Cream Shop,Pub,Farmers Market,Café,Salad Place
27,Gramercy,Pizza Place,Coffee Shop,Yoga Studio,Thai Restaurant,Food Truck,Spa,Liquor Store,Cocktail Bar,Bike Rental / Bike Share,Gourmet Shop
35,Turtle Bay,Karaoke Bar,Sushi Restaurant,Greek Restaurant,Cocktail Bar,Boxing Gym,Café,Duty-free Shop,Residential Building (Apartment / Condo),Museum,Gift Shop
36,Tudor City,Park,Deli / Bodega,Thai Restaurant,Convenience Store,Pizza Place,Salad Place,Café,Bridge,Boxing Gym,Spanish Restaurant
37,Stuyvesant Town,Park,Bar,Pet Service,Harbor / Marina,Fountain,Boat or Ferry,Gas Station,German Restaurant,Cocktail Bar,Baseball Field


## Cluster 4

In [71]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 3, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Marble Hill,Coffee Shop,Yoga Studio,Gym,Seafood Restaurant,Sandwich Place,Pizza Place,Pharmacy,Miscellaneous Shop,Kids Store,Donut Shop
4,Hamilton Heights,Yoga Studio,Cocktail Bar,Mexican Restaurant,Bakery,Cosmetics Shop,Pizza Place,Coffee Shop,Caribbean Restaurant,Café,Burger Joint
5,Manhattanville,Italian Restaurant,Coffee Shop,Bar,Mexican Restaurant,Lounge,Gastropub,Climbing Gym,Ramen Restaurant,Bike Trail,Supermarket
6,Central Harlem,French Restaurant,African Restaurant,American Restaurant,Cycle Studio,Jazz Club,Ethiopian Restaurant,Library,Music Venue,Cocktail Bar,Juice Bar
7,East Harlem,Mexican Restaurant,Thai Restaurant,Gym,Café,Cuban Restaurant,French Restaurant,Sandwich Place,Latin American Restaurant,Steakhouse,Street Art
16,Murray Hill,Italian Restaurant,Hotel,Shanghai Restaurant,Speakeasy,Museum,Restaurant,Jewish Restaurant,Sushi Restaurant,Burger Joint,Jazz Club
23,Soho,Women's Store,Clothing Store,Men's Store,Yoga Studio,Tea Room,Dance Studio,Optical Shop,Dessert Shop,Salon / Barbershop,Miscellaneous Shop
29,Financial District,Gym / Fitness Center,Jewelry Store,Coffee Shop,Falafel Restaurant,Steakhouse,Doctor's Office,New American Restaurant,Monument / Landmark,Gym,Salad Place
30,Carnegie Hill,Gym,Italian Restaurant,Coffee Shop,Gym / Fitness Center,Café,Bookstore,Gourmet Shop,Shoe Store,Bagel Shop,Community Center
31,Noho,Rock Club,French Restaurant,Cocktail Bar,Italian Restaurant,Deli / Bodega,Sandwich Place,Gourmet Shop,Grocery Store,Gym,Coffee Shop


## Cluster 5

In [73]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 4, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,Chelsea,Cupcake Shop,Italian Restaurant,Men's Store,Café,French Restaurant,Chinese Restaurant,Speakeasy,Coffee Shop,Beer Bar,Tapas Restaurant
