# PROBLEM & BACKGROUND

## Toronto and New York are some of the commanding cities of the world. Both has been known for their diversities and their financial impact on their respective countries. The Downtown Toronto Area and Manhattan have been claimed for having similarities by numerous people for their respective role in their cities. However, after visiting both cities, I think I can have a better neighborhood's layout comparison to Downtown Toronto with Brooklyn. So today I am going to break down the differences and similarity between Downtown Toronto and Brooklyn using K-Mean Clustering, NY to see the differences in the layout and which of the two is more worth-visiting

# DATA 

### We will use Foursquare API data of two cities, in terms of their neighborhoods. The data also include the information about the venues in each neighborhood. (The dataset are provided in previous works. We will use “K-Mean Clustering” to segment the neighborhoods with similar objects based on their activities' footprints. We will judge the similarity and dissimilarity between two areas of two cities based on this metric. 

# METHODOLOGY
### Analysis of both boroughs using K-Mean Clustering on venues' footprints seperately then compare the footprints

## PREPROCESSING

In [12]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Visualization
import matplotlib.pyplot
import seaborn as sns
# Too see full dataframe...
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.width', None)

import json # library to handle JSON files
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

# Any results you write to the current directory are saved as output.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



In [13]:
# Geographical Coordinates
# geographical data was exported to csv from the dataframe in our last week project
df1=pd.read_csv('geo_toronto.csv')
df1.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1A,Not assigned,M2A,43.64869,-79.38544
1,M3A,North York,Parkwoods,43.752935,-79.335641
2,M4A,North York,Victoria Village,43.728102,-79.31189
3,M5A,Downtown Toronto,Regent Park / Harbourfront,43.650964,-79.353041
4,M6A,North York,Lawrence Manor / Lawrence Heights,43.723265,-79.451211


In [14]:
# Sorting
# set index for only Downtown Toronto
downtown_toronto_data = df1[df1['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
# eliminate 'Postcode' column
downtown_toronto_data=downtown_toronto_data.drop(['PostalCode'], axis=1)
downtown_toronto_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Downtown Toronto,Regent Park / Harbourfront,43.650964,-79.353041
1,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.66179,-79.38939
2,Downtown Toronto,"Garden District, Ryerson",43.657491,-79.377529
3,Downtown Toronto,St. James Town,43.651734,-79.375554
4,Downtown Toronto,Berczy Park,43.645196,-79.373855


### Now we will move towards New York Boroughs. We select "Brooklyn" as a Borough and anaylze its neighborhoods later

In [20]:
# Get previous NYC geo data
neighborhoods=pd.read_csv("nyc_geo.csv")
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [21]:
# Creating new Dataframe of brooklyn
brooklyn_data = neighborhoods[neighborhoods['Borough'] == 'Brooklyn'].reset_index(drop=True)
brooklyn_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Brooklyn,Bay Ridge,40.625801,-74.030621
1,Brooklyn,Bensonhurst,40.611009,-73.99518
2,Brooklyn,Sunset Park,40.645103,-74.010316
3,Brooklyn,Greenpoint,40.730201,-73.954241
4,Brooklyn,Gravesend,40.59526,-73.973471


## Foursquare API 

In [22]:
# Define Foursquare Credentials and Version
CLIENT_ID = 'UIIVSPUR522RRUZRVQHQ5HFV4PVQHRSMJWVMC2NQCOGTD2ZA' # your Foursquare ID
CLIENT_SECRET = 'X2AS1MI5BRDFQOVR235XGGJZSLMVNDLR4RHQ3K10M52YVQ4P' # your Foursquare Secret
VERSION = '20180604'
limit = 20
print('Your credentails:')
print('CLIENT_ID:'+ CLIENT_ID)
print('CLIENT_SECRET:'+ CLIENT_SECRET)

Your credentails:
CLIENT_ID:UIIVSPUR522RRUZRVQHQ5HFV4PVQHRSMJWVMC2NQCOGTD2ZA
CLIENT_SECRET:X2AS1MI5BRDFQOVR235XGGJZSLMVNDLR4RHQ3K10M52YVQ4P


In [23]:
# get the geographical coordinates of Downtown Toronto
address = 'Downtown Toronto, ON, Canada'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude_downtown_toronto = location.latitude
longitude_downtown_toronto = location.longitude
print("Downtown Toronto","latitude",latitude_downtown_toronto, "& " "longitude" ,longitude_downtown_toronto)

  after removing the cwd from sys.path.


Downtown Toronto latitude 43.6563221 & longitude -79.3809161


In [24]:
# Let's get the geographical coordinates of Manhattan.
address = 'Brooklyn, NY'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Brooklyn are {}, {}.'.format(latitude, longitude))

  after removing the cwd from sys.path.


The geograpical coordinate of Brooklyn are 40.6501038, -73.9495823.


# VISUALIZATION 

### We visualize the data many times at different stages. In the beginning, we visualize the selected borough neighborhoods so that we can get an idea or confirmation regarding the coordinates of that Borough. The second time after clustered the neighborhoods, we visualize the clusters to name them. Assigning the names are very important because it can identify the areas or specific places in each cluster.

## (Before Clustering)

## Downtown Toronto

In [26]:
from folium import plugins
# create map of Downtown Toronto using latitude and longitude values
map_downtown_toronto = folium.Map(location=[latitude_downtown_toronto,longitude_downtown_toronto], zoom_start=11)
# instantiate a mark cluster object for the incidents in the dataframe
incidents = plugins.MarkerCluster().add_to(map_downtown_toronto)
# add markers to map
for lat, lng, label in zip(downtown_toronto_data['Latitude'], downtown_toronto_data['Longitude'], downtown_toronto_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(incidents)  
    
map_downtown_toronto

## Brooklyn

In [30]:
# create map of Manhattan using latitude and longitude values
Brooklyn_map = folium.Map(location=[latitude, longitude], zoom_start=11)

grouping = plugins.MarkerCluster().add_to(Brooklyn_map)

# add markers to map
for lat, lng, label in zip(brooklyn_data['Latitude'], brooklyn_data['Longitude'], brooklyn_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(grouping)  
    
Brooklyn_map

# ANALYSIS

### We analyze both boroughs neighborhoods through one hot encoding on the presence of venues. On the basis of one hot encoding, we calculate mean of the frequency of occurrence of each category and picked top ten venues on that basis for each neighborhood

## Exploring Neighborhoods in Downtown Toronto

In [31]:
# Let's create a function to repeat the process to all the neighborhoods in Toronto
def getNearbyVenues(names, latitudes,longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names,latitudes,longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [32]:
# Write the code to run the above function on each neighborhood and create a new dataframe called toronto_venues.
downtown_toronto_venues = getNearbyVenues(names=downtown_toronto_data['Neighborhood'],
                                   latitudes=downtown_toronto_data['Latitude'],
                                   longitudes=downtown_toronto_data['Longitude'],
                                  )

Regent Park / Harbourfront
Queen's Park / Ontario Provincial Government
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Christie
Richmond / Adelaide / King
Harbourfront East / Union Station / Toronto Islands
Toronto Dominion Centre / Design Exchange
Commerce Court / Victoria Hotel
University of Toronto / Harbord
Kensington Market / Chinatown / Grange Park
CN Tower / King and Spadina / Railway Lands / Harbourfront West / Bathurst
Rosedale
Stn A PO Boxes
St. James Town / Cabbagetown
First Canadian Place / Underground city
Church and Wellesley


In [33]:
# Let's check the size of the resulting dataframe
print(downtown_toronto_venues.shape)
downtown_toronto_venues.head()

(343, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Regent Park / Harbourfront,43.650964,-79.353041,Souk Tabule,43.653756,-79.35439,Mediterranean Restaurant
1,Regent Park / Harbourfront,43.650964,-79.353041,Young Centre for the Performing Arts,43.650825,-79.357593,Performing Arts Venue
2,Regent Park / Harbourfront,43.650964,-79.353041,SOMA chocolatemaker,43.650622,-79.358127,Chocolate Shop
3,Regent Park / Harbourfront,43.650964,-79.353041,Cluny Bistro & Boulangerie,43.650565,-79.357843,French Restaurant
4,Regent Park / Harbourfront,43.650964,-79.353041,BATLgrounds,43.647088,-79.351306,Athletics & Sports


In [34]:
# Let's check how many venues were returned for each neighborhood
downtown_toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,20,20,20,20,20,20
CN Tower / King and Spadina / Railway Lands / Harbourfront West / Bathurst,20,20,20,20,20,20
Central Bay Street,20,20,20,20,20,20
Christie,12,12,12,12,12,12
Church and Wellesley,20,20,20,20,20,20
Commerce Court / Victoria Hotel,20,20,20,20,20,20
First Canadian Place / Underground city,20,20,20,20,20,20
"Garden District, Ryerson",20,20,20,20,20,20
Harbourfront East / Union Station / Toronto Islands,7,7,7,7,7,7
Kensington Market / Chinatown / Grange Park,20,20,20,20,20,20


In [35]:
# Let's find out how many unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(downtown_toronto_venues['Venue Category'].unique())))

There are 120 uniques categories.


## Analyzing Each Neighborhood

In [36]:
# one hot encoding
downtown_toronto_onehot = pd.get_dummies(downtown_toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
downtown_toronto_onehot['Neighborhood'] = downtown_toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [downtown_toronto_onehot.columns[-1]] + list(downtown_toronto_onehot.columns[:-1])
downtown_toronto_onehot = downtown_toronto_onehot[fixed_columns]

downtown_toronto_onehot.head()

Unnamed: 0,Yoga Studio,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Baby Store,Bakery,Bank,Bar,Beer Bar,Bookstore,Brazilian Restaurant,Breakfast Spot,Bubble Tea Shop,Burger Joint,Burrito Place,Butcher,Café,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Gym,College Rec Center,Comfort Food Restaurant,Comic Shop,Concert Hall,Cosmetics Shop,Creperie,Dance Studio,Deli / Bodega,Dessert Shop,Diner,Discount Store,Distribution Center,Dumpling Restaurant,Electronics Store,Farm,Farmers Market,Fast Food Restaurant,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Gaming Cafe,Garden,Gastropub,Gay Bar,General Entertainment,General Travel,Gift Shop,Gluten-free Restaurant,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Hobby Shop,Hotel,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Liquor Store,Lounge,Market,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Movie Theater,Museum,Music Venue,Neighborhood,New American Restaurant,Noodle House,Opera House,Organic Grocery,Park,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pizza Place,Playground,Plaza,Pub,Ramen Restaurant,Record Shop,Restaurant,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Spa,Speakeasy,Steakhouse,Sushi Restaurant,Tailor Shop,Taiwanese Restaurant,Tea Room,Tech Startup,Thai Restaurant,Theater,Theme Park,Theme Restaurant,Train Station,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,Regent Park / Harbourfront,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Regent Park / Harbourfront,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Regent Park / Harbourfront,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Regent Park / Harbourfront,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Regent Park / Harbourfront,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [37]:
# Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
downtown_toronto_grouped = downtown_toronto_onehot.groupby('Neighborhood').mean().reset_index()

In [38]:
# Let's print each neighborhood along with the top 5 most common venues
num_top_venues = 5

for hood in downtown_toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = downtown_toronto_grouped[downtown_toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Berczy Park----
            venue  freq
0        Beer Bar  0.10
1  Farmers Market  0.10
2    Cocktail Bar  0.10
3    Liquor Store  0.05
4      Food Truck  0.05


----CN Tower / King and Spadina / Railway Lands / Harbourfront West / Bathurst----
                 venue  freq
0                 Park  0.10
1   Italian Restaurant  0.10
2  Peruvian Restaurant  0.05
3            Speakeasy  0.05
4          Coffee Shop  0.05


----Central Bay Street----
                       venue  freq
0                   Tea Room  0.10
1                Coffee Shop  0.10
2  Middle Eastern Restaurant  0.05
3            Bubble Tea Shop  0.05
4                 Steakhouse  0.05


----Christie----
           venue  freq
0           Café  0.25
1  Grocery Store  0.25
2           Park  0.08
3     Baby Store  0.08
4     Playground  0.08


----Church and Wellesley----
                venue  freq
0         Coffee Shop  0.10
1                Café  0.05
2    Ramen Restaurant  0.05
3          Restaurant  0.05
4  Salon /

In [39]:
# Let's put that into a pandas dataframe
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [40]:
# Now let's create the new dataframe and display the top 10 venues for each neighborhood.
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = downtown_toronto_grouped['Neighborhood']

for ind in np.arange(downtown_toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(downtown_toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Cocktail Bar,Beer Bar,Farmers Market,Bakery,Liquor Store,Seafood Restaurant,Concert Hall,Café,Restaurant,Food Truck
1,CN Tower / King and Spadina / Railway Lands / ...,Italian Restaurant,Park,French Restaurant,Peruvian Restaurant,Hotel,Ramen Restaurant,Mexican Restaurant,Restaurant,Sandwich Place,Seafood Restaurant
2,Central Bay Street,Coffee Shop,Tea Room,Gastropub,Spa,Middle Eastern Restaurant,Clothing Store,Comic Shop,Chinese Restaurant,Japanese Restaurant,Dessert Shop
3,Christie,Café,Grocery Store,Athletics & Sports,Candy Store,Playground,Baby Store,Park,Coffee Shop,Cosmetics Shop,Concert Hall
4,Church and Wellesley,Coffee Shop,Ramen Restaurant,Dance Studio,Pizza Place,Ice Cream Shop,Breakfast Spot,Pub,Men's Store,Park,Restaurant
5,Commerce Court / Victoria Hotel,Café,Coffee Shop,Gastropub,Gym,Museum,Deli / Bodega,Japanese Restaurant,Pub,Ice Cream Shop,Hotel
6,First Canadian Place / Underground city,Restaurant,Coffee Shop,Café,Bakery,Pizza Place,Japanese Restaurant,Pub,Gym / Fitness Center,Gym,Steakhouse
7,"Garden District, Ryerson",Coffee Shop,Clothing Store,Sandwich Place,Movie Theater,Music Venue,Comic Shop,Japanese Restaurant,Café,Burrito Place,Plaza
8,Harbourfront East / Union Station / Toronto Is...,Harbor / Marina,Theme Park,Park,Farm,Vietnamese Restaurant,Creperie,Diner,Dessert Shop,Deli / Bodega,Dance Studio
9,Kensington Market / Chinatown / Grange Park,Café,Vietnamese Restaurant,Mexican Restaurant,Record Shop,Gaming Cafe,Farmers Market,Dumpling Restaurant,Dessert Shop,Coffee Shop,Noodle House


## Clustering Neighborhoods

In [41]:
# set number of clusters
kclusters = 5

downtown_toronto_grouped_clustering = downtown_toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(downtown_toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 4, 0, 3, 0, 0, 0, 0, 1, 4], dtype=int32)

In [42]:
# Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
downtown_toronto_merged = downtown_toronto_data

# add clustering labels
downtown_toronto_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
downtown_toronto_merged = downtown_toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

downtown_toronto_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,Regent Park / Harbourfront,43.650964,-79.353041,4,Pub,Café,Athletics & Sports,Distribution Center,Performing Arts Venue,Seafood Restaurant,Bank,Bakery,Mediterranean Restaurant,Chocolate Shop
1,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.66179,-79.38939,4,Coffee Shop,Burger Joint,Fried Chicken Joint,Hobby Shop,Distribution Center,Discount Store,Juice Bar,Diner,Creperie,Middle Eastern Restaurant
2,Downtown Toronto,"Garden District, Ryerson",43.657491,-79.377529,0,Coffee Shop,Clothing Store,Sandwich Place,Movie Theater,Music Venue,Comic Shop,Japanese Restaurant,Café,Burrito Place,Plaza
3,Downtown Toronto,St. James Town,43.651734,-79.375554,3,Coffee Shop,Japanese Restaurant,Restaurant,Gastropub,Creperie,Italian Restaurant,Hotel,Church,Café,Gym
4,Downtown Toronto,Berczy Park,43.645196,-79.373855,0,Cocktail Bar,Beer Bar,Farmers Market,Bakery,Liquor Store,Seafood Restaurant,Concert Hall,Café,Restaurant,Food Truck


In [43]:
# create map
map_clusters = folium.Map(location=[latitude_downtown_toronto, longitude_downtown_toronto], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(downtown_toronto_merged['Latitude'], downtown_toronto_merged['Longitude'], downtown_toronto_merged['Neighborhood'], downtown_toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examine Clusters

### Now, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, we can then assign a name to each cluster

## Cluster 0 (Coffee Shop, Cafe, Restaurants & Grocery Store)

In [44]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 0, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,"Garden District, Ryerson",Coffee Shop,Clothing Store,Sandwich Place,Movie Theater,Music Venue,Comic Shop,Japanese Restaurant,Café,Burrito Place,Plaza
4,Berczy Park,Cocktail Bar,Beer Bar,Farmers Market,Bakery,Liquor Store,Seafood Restaurant,Concert Hall,Café,Restaurant,Food Truck
5,Central Bay Street,Coffee Shop,Tea Room,Gastropub,Spa,Middle Eastern Restaurant,Clothing Store,Comic Shop,Chinese Restaurant,Japanese Restaurant,Dessert Shop
6,Christie,Café,Grocery Store,Athletics & Sports,Candy Store,Playground,Baby Store,Park,Coffee Shop,Cosmetics Shop,Concert Hall
7,Richmond / Adelaide / King,Asian Restaurant,Speakeasy,Bar,Restaurant,Gym / Fitness Center,Plaza,Seafood Restaurant,Lounge,Café,Steakhouse
10,Commerce Court / Victoria Hotel,Café,Coffee Shop,Gastropub,Gym,Museum,Deli / Bodega,Japanese Restaurant,Pub,Ice Cream Shop,Hotel
11,University of Toronto / Harbord,Bookstore,Park,Café,Japanese Restaurant,Yoga Studio,Museum,College Arts Building,Comfort Food Restaurant,Dessert Shop,Italian Restaurant
14,Rosedale,Park,Grocery Store,Playground,Candy Store,Vietnamese Restaurant,Discount Store,College Rec Center,Comfort Food Restaurant,Comic Shop,Concert Hall
17,First Canadian Place / Underground city,Restaurant,Coffee Shop,Café,Bakery,Pizza Place,Japanese Restaurant,Pub,Gym / Fitness Center,Gym,Steakhouse


## Cluster 1 (Harbor, Park)

In [45]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 1, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Harbourfront East / Union Station / Toronto Is...,Harbor / Marina,Theme Park,Park,Farm,Vietnamese Restaurant,Creperie,Diner,Dessert Shop,Deli / Bodega,Dance Studio


## Cluster 3 (Restaurant, Park, Hotel)

In [46]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 2, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,CN Tower / King and Spadina / Railway Lands / ...,Italian Restaurant,Park,French Restaurant,Peruvian Restaurant,Hotel,Ramen Restaurant,Mexican Restaurant,Restaurant,Sandwich Place,Seafood Restaurant


## Cluster 4 (Coffee Shop, Cafe, Park & Japanese Restaurant)

In [47]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 3, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,St. James Town,Coffee Shop,Japanese Restaurant,Restaurant,Gastropub,Creperie,Italian Restaurant,Hotel,Church,Café,Gym


## Cluster 5 (Seafood, steakhouse, Hotel & Cafe)

In [48]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 4, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Regent Park / Harbourfront,Pub,Café,Athletics & Sports,Distribution Center,Performing Arts Venue,Seafood Restaurant,Bank,Bakery,Mediterranean Restaurant,Chocolate Shop
1,Queen's Park / Ontario Provincial Government,Coffee Shop,Burger Joint,Fried Chicken Joint,Hobby Shop,Distribution Center,Discount Store,Juice Bar,Diner,Creperie,Middle Eastern Restaurant
9,Toronto Dominion Centre / Design Exchange,Coffee Shop,Café,Gastropub,Bakery,Deli / Bodega,Japanese Restaurant,Pub,Hotel,Gym / Fitness Center,Gym
12,Kensington Market / Chinatown / Grange Park,Café,Vietnamese Restaurant,Mexican Restaurant,Record Shop,Gaming Cafe,Farmers Market,Dumpling Restaurant,Dessert Shop,Coffee Shop,Noodle House
15,Stn A PO Boxes,Hotel,Concert Hall,Coffee Shop,Sushi Restaurant,Café,Brazilian Restaurant,Restaurant,Opera House,Speakeasy,Burrito Place
16,St. James Town / Cabbagetown,Restaurant,Café,Gift Shop,General Entertainment,Farm,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Jewelry Store,Diner
18,Church and Wellesley,Coffee Shop,Ramen Restaurant,Dance Studio,Pizza Place,Ice Cream Shop,Breakfast Spot,Pub,Men's Store,Park,Restaurant


## Exploring Neighborhoods in Brooklyn

In [49]:
# Let's create a function to repeat the same process to all the neighborhoods in Manhattan
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [51]:
# Now write the code to run the above function on each neighborhood and create a new dataframe called manhattan_venues
brooklyn_venues = getNearbyVenues(names=brooklyn_data['Neighborhood'],
                                   latitudes=brooklyn_data['Latitude'],
                                   longitudes=brooklyn_data['Longitude'],
                                  )

Bay Ridge
Bensonhurst
Sunset Park
Greenpoint
Gravesend
Brighton Beach
Sheepshead Bay
Manhattan Terrace
Flatbush
Crown Heights
East Flatbush
Kensington
Windsor Terrace
Prospect Heights
Brownsville
Williamsburg
Bushwick
Bedford Stuyvesant
Brooklyn Heights
Cobble Hill
Carroll Gardens
Red Hook
Gowanus
Fort Greene
Park Slope
Cypress Hills
East New York
Starrett City
Canarsie
Flatlands
Mill Island
Manhattan Beach
Coney Island
Bath Beach
Borough Park
Dyker Heights
Gerritsen Beach
Marine Park
Clinton Hill
Sea Gate
Downtown
Boerum Hill
Prospect Lefferts Gardens
Ocean Hill
City Line
Bergen Beach
Midwood
Prospect Park South
Georgetown
East Williamsburg
North Side
South Side
Ocean Parkway
Fort Hamilton
Ditmas Park
Wingate
Rugby
Remsen Village
New Lots
Paerdegat Basin
Mill Basin
Fulton Ferry
Vinegar Hill
Weeksville
Broadway Junction
Dumbo
Homecrest
Highland Park
Madison
Erasmus


In [52]:
# Let's check how many venues were returned for each neighborhood
brooklyn_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bath Beach,20,20,20,20,20,20
Bay Ridge,20,20,20,20,20,20
Bedford Stuyvesant,20,20,20,20,20,20
Bensonhurst,20,20,20,20,20,20
Bergen Beach,6,6,6,6,6,6
Boerum Hill,20,20,20,20,20,20
Borough Park,20,20,20,20,20,20
Brighton Beach,20,20,20,20,20,20
Broadway Junction,15,15,15,15,15,15
Brooklyn Heights,20,20,20,20,20,20


In [53]:
# Let's find out how many unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(brooklyn_venues['Venue Category'].unique())))

There are 231 uniques categories.


## Analyzing Brooklyn

In [54]:
# one hot encoding
brooklyn_onehot = pd.get_dummies(brooklyn_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
brooklyn_onehot['Neighborhood'] = brooklyn_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [brooklyn_onehot.columns[-1]] + list(brooklyn_onehot.columns[:-1])
brooklyn_onehot = brooklyn_onehot[fixed_columns]

brooklyn_onehot.head()

Unnamed: 0,Yoga Studio,Accessories Store,Airport Terminal,American Restaurant,Antique Shop,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Beach,Beer Garden,Beer Store,Big Box Store,Boat or Ferry,Bookstore,Boutique,Boxing Gym,Breakfast Spot,Brewery,Bridge,Bubble Tea Shop,Buffet,Burger Joint,Bus Line,Bus Station,Bus Stop,Butcher,Café,Cajun / Creole Restaurant,Candy Store,Caribbean Restaurant,Caucasian Restaurant,Check Cashing Service,Cheese Shop,Chinese Restaurant,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Cosmetics Shop,Creperie,Cuban Restaurant,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distillery,Dive Bar,Doctor's Office,Dog Run,Donut Shop,Dry Cleaner,Eastern European Restaurant,Electronics Store,Entertainment Service,Event Service,Event Space,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Fish & Chips Shop,Fish Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Harbor / Marina,Hardware Store,Health & Beauty Service,Health Food Store,History Museum,Home Service,Hookah Bar,Hostel,Hotel,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Intersection,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kids Store,Korean Restaurant,Latin American Restaurant,Laundry Service,Lighthouse,Lingerie Store,Liquor Store,Lounge,Market,Martial Arts Dojo,Massage Studio,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Movie Theater,Moving Target,Museum,Music Venue,Neighborhood,New American Restaurant,Nightclub,Non-Profit,Noodle House,Optical Shop,Organic Grocery,Other Great Outdoors,Other Repair Shop,Outdoors & Recreation,Outlet Store,Paper / Office Supplies Store,Park,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pie Shop,Pilates Studio,Pizza Place,Playground,Plaza,Polish Restaurant,Pool,Print Shop,Pub,Ramen Restaurant,Record Shop,Rental Car Location,Restaurant,Rock Club,Roof Deck,Russian Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shabu-Shabu Restaurant,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Soccer Field,South American Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Supplement Shop,Surf Spot,Sushi Restaurant,Taco Place,Tapas Restaurant,Tattoo Parlor,Tennis Court,Thai Restaurant,Theater,Theme Park Ride / Attraction,Thrift / Vintage Store,Tibetan Restaurant,Toy / Game Store,Trail,Turkish Restaurant,Used Bookstore,Varenyky restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Bay Ridge,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Bay Ridge,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Bay Ridge,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Bay Ridge,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Bay Ridge,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [57]:
# Set Index
brooklyn_grouped = brooklyn_onehot.groupby('Neighborhood').mean().reset_index()

In [58]:
# Let's print each neighborhood along with the top 5 most common venues
num_top_venues = 5

for hood in brooklyn_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = brooklyn_grouped[brooklyn_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Bath Beach----
                venue  freq
0     Bubble Tea Shop  0.10
1  Italian Restaurant  0.10
2        Dessert Shop  0.05
3  Turkish Restaurant  0.05
4        Burger Joint  0.05


----Bay Ridge----
                       venue  freq
0              Grocery Store  0.10
1           Greek Restaurant  0.10
2                        Spa  0.10
3                 Bagel Shop  0.05
4  Middle Eastern Restaurant  0.05


----Bedford Stuyvesant----
                 venue  freq
0                 Café  0.10
1          Coffee Shop  0.10
2                  Bar  0.10
3            BBQ Joint  0.05
4  Fried Chicken Joint  0.05


----Bensonhurst----
                venue  freq
0      Ice Cream Shop  0.10
1          Donut Shop  0.10
2  Italian Restaurant  0.10
3    Sushi Restaurant  0.10
4   Hotpot Restaurant  0.05


----Bergen Beach----
                venue  freq
0     Harbor / Marina  0.33
1      Baseball Field  0.17
2  Athletics & Sports  0.17
3          Playground  0.17
4          Donut Shop  0.17

In [55]:
# Let's put that into a pandas dataframe
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [60]:
# Now let's create the new dataframe and display the top 10 venues for each neighborhood.
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = brooklyn_grouped['Neighborhood']

for ind in np.arange(brooklyn_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(brooklyn_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bath Beach,Italian Restaurant,Bubble Tea Shop,Peruvian Restaurant,Sushi Restaurant,Restaurant,Rental Car Location,Burger Joint,Bakery,German Restaurant,Coffee Shop
1,Bay Ridge,Spa,Grocery Store,Greek Restaurant,Bagel Shop,Breakfast Spot,Sports Bar,Lounge,Bookstore,Caucasian Restaurant,Taco Place
2,Bedford Stuyvesant,Coffee Shop,Bar,Café,Juice Bar,Gourmet Shop,Cocktail Bar,New American Restaurant,Gift Shop,Bagel Shop,BBQ Joint
3,Bensonhurst,Donut Shop,Sushi Restaurant,Ice Cream Shop,Italian Restaurant,Grocery Store,Liquor Store,Bakery,Bagel Shop,Cosmetics Shop,Shabu-Shabu Restaurant
4,Bergen Beach,Harbor / Marina,Playground,Athletics & Sports,Donut Shop,Baseball Field,Event Service,Flower Shop,Fish Market,Fish & Chips Shop,Field


## CLUSTERING NEIGHBORHOODS

### Now we applied Machine Learning Technique “Clustering” to segment the neighborhoods in similar objects cluster. This will help to analyze from Tourist perspective and we can easily extract the Tourist places which are present on one of the clusters.

## Brooklyn

In [70]:
# Run k-means to cluster the neighborhood into 5 clusters.
# set number of clusters
kclusters = 5

brooklyn_grouped_clustering = brooklyn_grouped.drop('Neighborhood', 1)
#print(brooklyn_grouped_clustering.head())
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(brooklyn_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 4, 4, 0, 0, 4, 0, 4, 2, 4], dtype=int32)

In [62]:
# Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood
brooklyn_merged = brooklyn_data

# add clustering labels
brooklyn_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
brooklyn_merged = brooklyn_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

brooklyn_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Brooklyn,Bay Ridge,40.625801,-74.030621,4,Spa,Grocery Store,Greek Restaurant,Bagel Shop,Breakfast Spot,Sports Bar,Lounge,Bookstore,Caucasian Restaurant,Taco Place
1,Brooklyn,Bensonhurst,40.611009,-73.99518,4,Donut Shop,Sushi Restaurant,Ice Cream Shop,Italian Restaurant,Grocery Store,Liquor Store,Bakery,Bagel Shop,Cosmetics Shop,Shabu-Shabu Restaurant
2,Brooklyn,Sunset Park,40.645103,-74.010316,4,Pizza Place,Latin American Restaurant,Mexican Restaurant,Bank,Bakery,Fried Chicken Joint,Mobile Phone Shop,Italian Restaurant,Grocery Store,Breakfast Spot
3,Brooklyn,Greenpoint,40.730201,-73.954241,0,Bar,Mexican Restaurant,Café,Yoga Studio,Gymnastics Gym,Grocery Store,Laundry Service,Furniture / Home Store,French Restaurant,Pizza Place
4,Brooklyn,Gravesend,40.59526,-73.973471,0,Pizza Place,Lounge,Bakery,Deli / Bodega,Donut Shop,Fish Market,Martial Arts Dojo,Furniture / Home Store,Baseball Field,Eastern European Restaurant


In [63]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(brooklyn_merged['Latitude'], brooklyn_merged['Longitude'], brooklyn_merged['Neighborhood'], brooklyn_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## EXAMINE CLUSTERS

### Now, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, we can then assign a name to each cluster.

## Brooklyn

### Lounge, Bar, Restaurants and Cafes

In [67]:
brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 0, brooklyn_merged.columns[[1] + list(range(5, brooklyn_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Greenpoint,Bar,Mexican Restaurant,Café,Yoga Studio,Gymnastics Gym,Grocery Store,Laundry Service,Furniture / Home Store,French Restaurant,Pizza Place
4,Gravesend,Pizza Place,Lounge,Bakery,Deli / Bodega,Donut Shop,Fish Market,Martial Arts Dojo,Furniture / Home Store,Baseball Field,Eastern European Restaurant
6,Sheepshead Bay,Dessert Shop,Turkish Restaurant,Sandwich Place,Outlet Store,Russian Restaurant,Karaoke Bar,Buffet,Restaurant,Yoga Studio,Grocery Store
14,Brownsville,Restaurant,Moving Target,Chinese Restaurant,Park,Playground,Spanish Restaurant,Trail,Bus Station,Discount Store,Pool
18,Brooklyn Heights,Yoga Studio,Pet Store,Thai Restaurant,Pizza Place,Playground,History Museum,Diner,Cosmetics Shop,Coffee Shop,Mexican Restaurant
19,Cobble Hill,Cocktail Bar,Bar,Italian Restaurant,Playground,Japanese Restaurant,Pilates Studio,Restaurant,Seafood Restaurant,Men's Store,Deli / Bodega
33,Bath Beach,Italian Restaurant,Bubble Tea Shop,Peruvian Restaurant,Sushi Restaurant,Restaurant,Rental Car Location,Burger Joint,Bakery,German Restaurant,Coffee Shop
36,Gerritsen Beach,Pizza Place,Bar,Deli / Bodega,Cocktail Bar,Skating Rink,Skate Park,Department Store,Seafood Restaurant,Café,Event Space
39,Sea Gate,Beach,Spa,Bus Line,Bus Station,Lighthouse,Factory,Food,Flower Shop,Fish Market,Fish & Chips Shop
40,Downtown,Coffee Shop,Grocery Store,Yoga Studio,Gift Shop,Diner,Creperie,Chinese Restaurant,Sandwich Place,Shopping Mall,Movie Theater


### Restaurants

In [72]:
brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 1, brooklyn_merged.columns[[1] + list(range(5, brooklyn_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
47,Prospect Park South,Caribbean Restaurant,Mexican Restaurant,Latin American Restaurant,Grocery Store,Department Store,Clothing Store,Food Truck,Electronics Store,Bar,Miscellaneous Shop


### Stores and restaurants

In [73]:
brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 2, brooklyn_merged.columns[[1] + list(range(5, brooklyn_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Flatbush,Coffee Shop,Mexican Restaurant,Pharmacy,Caribbean Restaurant,Plaza,Lounge,Sandwich Place,Chinese Restaurant,Bank,Bagel Shop
12,Windsor Terrace,Café,Diner,Grocery Store,Bar,Italian Restaurant,Bookstore,Chinese Restaurant,Beer Store,Sushi Restaurant,Coffee Shop
17,Bedford Stuyvesant,Coffee Shop,Bar,Café,Juice Bar,Gourmet Shop,Cocktail Bar,New American Restaurant,Gift Shop,Bagel Shop,BBQ Joint
24,Park Slope,Bagel Shop,Yoga Studio,Toy / Game Store,Cosmetics Shop,Coffee Shop,Organic Grocery,Burger Joint,Sporting Goods Shop,Frozen Yogurt Shop,Furniture / Home Store
25,Cypress Hills,Fried Chicken Joint,Pizza Place,Fast Food Restaurant,Ice Cream Shop,Latin American Restaurant,Women's Store,Bank,Supermarket,Liquor Store,Gas Station
27,Starrett City,Pizza Place,Bus Stop,Chinese Restaurant,Caribbean Restaurant,Donut Shop,Shopping Mall,Cosmetics Shop,Bus Station,American Restaurant,Pharmacy
28,Canarsie,Chinese Restaurant,Caribbean Restaurant,Grocery Store,Gym,Asian Restaurant,Falafel Restaurant,Event Space,Factory,Women's Store,Event Service
29,Flatlands,Pharmacy,Fast Food Restaurant,Fried Chicken Joint,Caribbean Restaurant,Deli / Bodega,Paper / Office Supplies Store,Lounge,Bar,Dry Cleaner,Electronics Store
56,Rugby,Caribbean Restaurant,Bank,Grocery Store,Pharmacy,Mobile Phone Shop,Supermarket,Seafood Restaurant,Fried Chicken Joint,Sandwich Place,Salon / Barbershop
58,New Lots,Pizza Place,Pharmacy,Metro Station,Fried Chicken Joint,Grocery Store,Discount Store,Park,Chinese Restaurant,Breakfast Spot,Furniture / Home Store


### Center Acivity

In [74]:
brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 3, brooklyn_merged.columns[[1] + list(range(5, brooklyn_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,East Flatbush,Department Store,Food & Drink Shop,Moving Target,Park,Caribbean Restaurant,Print Shop,Supermarket,Hardware Store,Pharmacy,Fast Food Restaurant
32,Coney Island,Park,Caribbean Restaurant,Beach,Baseball Stadium,Brewery,Theme Park Ride / Attraction,Cosmetics Shop,Music Venue,Café,Monument / Landmark
38,Clinton Hill,Thai Restaurant,Italian Restaurant,Yoga Studio,Bar,Pet Store,Pizza Place,Diner,Convenience Store,Pub,Restaurant
48,Georgetown,Bank,Pharmacy,Bagel Shop,Miscellaneous Shop,Pet Store,Sandwich Place,Supermarket,Supplement Shop,Frozen Yogurt Shop,Mexican Restaurant
52,Ocean Parkway,Playground,Sake Bar,Supermarket,Steakhouse,Restaurant,Donut Shop,Bus Station,Liquor Store,Jewish Restaurant,General Entertainment
66,Homecrest,Chinese Restaurant,Donut Shop,Sandwich Place,Bank,American Restaurant,Pizza Place,Vietnamese Restaurant,Tattoo Parlor,Café,Mexican Restaurant


### Going Out Places

In [75]:
brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 4, brooklyn_merged.columns[[1] + list(range(5, brooklyn_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bay Ridge,Spa,Grocery Store,Greek Restaurant,Bagel Shop,Breakfast Spot,Sports Bar,Lounge,Bookstore,Caucasian Restaurant,Taco Place
1,Bensonhurst,Donut Shop,Sushi Restaurant,Ice Cream Shop,Italian Restaurant,Grocery Store,Liquor Store,Bakery,Bagel Shop,Cosmetics Shop,Shabu-Shabu Restaurant
2,Sunset Park,Pizza Place,Latin American Restaurant,Mexican Restaurant,Bank,Bakery,Fried Chicken Joint,Mobile Phone Shop,Italian Restaurant,Grocery Store,Breakfast Spot
5,Brighton Beach,Russian Restaurant,Varenyky restaurant,Beach,Korean Restaurant,Eastern European Restaurant,Supermarket,Non-Profit,Restaurant,Sushi Restaurant,Other Great Outdoors
7,Manhattan Terrace,Pizza Place,Donut Shop,Ice Cream Shop,Mobile Phone Shop,Jazz Club,Organic Grocery,Coffee Shop,Steakhouse,Bank,Convenience Store
9,Crown Heights,Pizza Place,Museum,Café,Pharmacy,Supermarket,Fried Chicken Joint,Burger Joint,Candy Store,Bakery,Bagel Shop
11,Kensington,Thai Restaurant,Pizza Place,Ice Cream Shop,Grocery Store,Japanese Restaurant,Lingerie Store,Mexican Restaurant,Liquor Store,Bakery,Bagel Shop
13,Prospect Heights,Bar,Cocktail Bar,Café,Yoga Studio,Pizza Place,Playground,Coffee Shop,Caribbean Restaurant,Cajun / Creole Restaurant,Mexican Restaurant
15,Williamsburg,Bar,Coffee Shop,Yoga Studio,Latin American Restaurant,Taco Place,Tapas Restaurant,Event Space,Steakhouse,Liquor Store,Bagel Shop
16,Bushwick,Mexican Restaurant,Bar,Thrift / Vintage Store,Coffee Shop,French Restaurant,Sandwich Place,Bagel Shop,Latin American Restaurant,Italian Restaurant,Used Bookstore


# RESULTS

### After clustering the data of the respective neighborhoods, both areas have similar tourist attraction venues such as cafes, food places, clubs, museums, parks, department stores, stadium, harbors. 

# Observations & Recommendations

### Downtown Toronto seems to be more appealing to visitors based on their most common venues, many of which is a tourist area. Brooklyn, while having all possible tourist attractions, in some way even more diverse than Downtown Toronto, footprint indicated that the most visited places are restaurants, dominantly for residential lifestyle. 

### Therefore, I recommend a short visit to Downtown Toronto where the most commonly visited places are suitable for tourists such as Theaters, opera houses, food places, clubs, museums, parks, etc. However, you can not go wrong with Brooklyn with multiple restaurants, stores, museums, parks, bridges, etc, especially long-term.