# Capstone Final Project

## Introduction

Toronto and Mumbai are the financial capitals of Canada & India respectively. Both the cities are very different in many aspects like demography, climate, culture etc. At the same time both the cities are prominent tourist attractions due to their diverse, multicultural and wide range of experiences. In this study we have grouped the neighbourhoods of Toronto & Mumbai to compare the similarities and differences.

## Business Problem

Our main aim is to group the neighbourhoods of Toronto & Mumbai to help the stakeholders take informed decision while planning to travel or relocate in either of the cities. There are two types of stakeholders in this case – a) Tourists/Travel Agents – who can look around in the venues, categories and locations and plan accordingly as per their choice of experience. b) Migrants – people who want to migrate or relocate in these cities can find all details in their choice of neighbourhood, like departmental store, Bank, Park, medical shop etc.

## Data

We have obtained Neighbourhood data along with postal codes, Latitude and Longitude for both the cities from websites. Also we have used FourSquare API to get the venues corresponding to each neighbourhood. Below are the sources of our data collection –

a) Toronto Data – Wikipedia page https://en.wikipedia.org/w/index.php?title=List_of_postal_codes_of_Canada:_M&oldid=1011037969 has the details of all the Borough, Neighbourhoods along with postal codes of Toronto. We have obtained latitude and longitude of those places from the page https://cocl.us/Geospatial_data .

b) Mumbai Data – We have obtained all the Neighbourhoods of Mumbai along with postal codes, latitude & longitude from the website https://geographic.org/streetview/india/maharashtra/konkan/mumbai.html . However, I have saved the data in excel format after cleaning & sorting in the Github location https://github.com/omarjlinaresh/Coursera_Capstone/blob/main/Mumbai%20Neighbourhood.xls .

c) FourSquare API Data – To obtain the venues of a neighbourhood, we have used the FourSquare API, which is a location data provided with all kinds of information on venues and events within an area. This includes venue name, category, reviews, images etc. We have obtained list of venues around 500 meters of each neighbourhood. After obtaining the information, we have arranged the data by venue name & category for our exploratory data analysis and clustering of the neighbourhoods. We have also used the same data to find out top 5 categories of venue for each neighbourhood.

## Methodology & Coding

### Grouping the Neighbourhoods of Toronto

In [1]:
import requests
import pandas as pd
import numpy as np

In [2]:
# since new url has changed the format of the postal code table in Wiki page, we are using the old url
url = "https://en.wikipedia.org/w/index.php?title=List_of_postal_codes_of_Canada:_M&oldid=1011037969"
old_url = requests.get(url)
old_url

<Response [200]>

In [3]:
# we will scrape the data from html to text format using pandas
raw_data = pd.read_html(old_url.text)
raw_data

[    Postal Code           Borough  \
 0           M1A      Not assigned   
 1           M2A      Not assigned   
 2           M3A        North York   
 3           M4A        North York   
 4           M5A  Downtown Toronto   
 ..          ...               ...   
 175         M5Z      Not assigned   
 176         M6Z      Not assigned   
 177         M7Z      Not assigned   
 178         M8Z         Etobicoke   
 179         M9Z      Not assigned   
 
                                          Neighbourhood  
 0                                         Not assigned  
 1                                         Not assigned  
 2                                            Parkwoods  
 3                                     Victoria Village  
 4                            Regent Park, Harbourfront  
 ..                                                 ...  
 175                                       Not assigned  
 176                                       Not assigned  
 177                

In [4]:
# now we will make the text data into a pandas data frame keeping only the postal codes section
raw_data = raw_data[0]
raw_data.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [5]:
# we will remove the not assigned rows and reset the index
df = raw_data[raw_data["Borough"] != "Not assigned"]
df = df.reset_index()
df.head()

Unnamed: 0,index,Postal Code,Borough,Neighbourhood
0,2,M3A,North York,Parkwoods
1,3,M4A,North York,Victoria Village
2,4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,5,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [6]:
# removing the old index
df.drop(['index'], axis = 'columns', inplace = True)
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [7]:
df.shape

(103, 3)

In [8]:
# now we will download latitude & longitude of all the postal codes, using the link below
data = pd.read_csv("https://cocl.us/Geospatial_data")
data.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [9]:
data.shape

(103, 3)

In [10]:
# since both the data set has 103 postal codes, we will assume they are same, and join the datasets using the postal codes
combined_data = df.join(data.set_index('Postal Code'), on='Postal Code', how='inner')
combined_data.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


In [11]:
combined_data.shape

(103, 5)

In [12]:
from geopy.geocoders import Nominatim

In [13]:
# getting the lat & lon of Toronto using geocoder
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The coordinates of Toronto are {}, {}.'.format(latitude, longitude))

The coordinates of Toronto are 43.6534817, -79.3839347.


In [14]:
import folium

In [15]:
# Creating the map of Toronto
map_Toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# adding markers to map
for latitude, longitude, borough, neighbourhood in zip(combined_data['Latitude'], combined_data['Longitude'], combined_data['Borough'], combined_data['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='red',
        fill=True
        ).add_to(map_Toronto)  
    
map_Toronto

In [16]:
CLIENT_ID = 'X44MXRRXXIV00CMCETPJWZYUGN435BVLP4MECN0VUDBFMQRG' 
CLIENT_SECRET = 'HVFUVOSULDT4MWDJOXBSLXTO1GFC3F4ZWKHRAR41Q0RGSY2T'
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: X44MXRRXXIV00CMCETPJWZYUGN435BVLP4MECN0VUDBFMQRG
CLIENT_SECRET:HVFUVOSULDT4MWDJOXBSLXTO1GFC3F4ZWKHRAR41Q0RGSY2T


In [17]:
# we are creating a function to get all the venue catagories for all the neighborhoods
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius
            )
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Category']
    
    return(nearby_venues)

In [18]:
# collecting the venues in Toronto for all neighborhoods, within 500 mtrs radius
venues_in_toronto = getNearbyVenues(combined_data['Neighbourhood'], combined_data['Latitude'], combined_data['Longitude'])

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue, Humber Valley Village
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto, Broadview North (Old East York)
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmo

In [19]:
venues_in_toronto.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,Food & Drink Shop
2,Parkwoods,43.753259,-79.329656,TTC stop - 44 Valley Woods,Bus Stop
3,Victoria Village,43.725882,-79.315572,Victoria Village Arena,Hockey Arena
4,Victoria Village,43.725882,-79.315572,Portugril,Portuguese Restaurant


In [20]:
venues_in_toronto.shape

(1326, 5)

In [21]:
# checking by venue catagories with max frequency
venues_in_toronto.groupby('Venue Category').max()

Unnamed: 0_level_0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Accessories Store,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,Ardene Shoes Outlet
Adult Boutique,Church and Wellesley,43.665860,-79.383160,Seduction
Airport,Downsview,43.737473,-79.394420,Toronto Downsview Airport (YZD)
Airport Food Court,"CN Tower, King and Spadina, Railway Lands, Har...",43.628947,-79.394420,Billy Bishop Café
Airport Gate,"CN Tower, King and Spadina, Railway Lands, Har...",43.628947,-79.394420,Gate 8
...,...,...,...,...
Wine Bar,"Little Portugal, Trinity",43.653206,-79.400049,Paris Paris Bar
Wine Shop,"Dufferin, Dovercourt Village",43.669005,-79.442259,Macedo
Wings Joint,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,Wingporium
Women's Store,Caledonia-Fairbanks,43.689026,-79.453512,Maximum Woman


In [22]:
# now we will get dummies for all the venue catagories

toronto_venue_cat = pd.get_dummies(venues_in_toronto[['Venue Category']], prefix="", prefix_sep="")
toronto_venue_cat

Unnamed: 0,Accessories Store,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1321,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1322,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1323,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1324,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [23]:
# Adding the Neighborhood column in the encoded dataset

toronto_venue_cat['Neighbourhood'] = venues_in_toronto['Neighbourhood'] 

# moving neighborhood column to the first column
fixed_columns = [toronto_venue_cat.columns[-1]] + list(toronto_venue_cat.columns[:-1])
toronto_venue_cat = toronto_venue_cat[fixed_columns]

toronto_venue_cat.head()

Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [24]:
toronto_grouped = toronto_venue_cat.groupby('Neighbourhood').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [25]:
# first we will make a funstion to get the top most venues
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [26]:
# now plug in the function to get top 5 venues for each neighborhood
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighbourhood'] = toronto_grouped['Neighbourhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Agincourt,Latin American Restaurant,Clothing Store,Breakfast Spot,Lounge,Yoga Studio
1,"Alderwood, Long Branch",Pizza Place,Coffee Shop,Playground,Sandwich Place,Pub
2,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Bank,Pharmacy,Mobile Phone Shop,Diner
3,Bayview Village,Café,Chinese Restaurant,Bank,Japanese Restaurant,Department Store
4,"Bedford Park, Lawrence Manor East",Restaurant,Coffee Shop,Italian Restaurant,Sandwich Place,Pharmacy
...,...,...,...,...,...,...
90,"Willowdale, Willowdale West",Pizza Place,Coffee Shop,Supermarket,Discount Store,Pharmacy
91,Woburn,Coffee Shop,Soccer Field,Korean BBQ Restaurant,Yoga Studio,Deli / Bodega
92,Woodbine Heights,Skating Rink,Bus Stop,Curling Ice,Park,Beer Store
93,York Mills West,Park,Convenience Store,Yoga Studio,Deli / Bodega,Escape Room


### Now we will make model to cluster the Neighborhoods

In [27]:
from sklearn.cluster import KMeans

In [28]:
# set number of clusters
k_num_clusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=k_num_clusters, random_state=0).fit(toronto_grouped_clustering)
kmeans

KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
       n_clusters=5, n_init=10, n_jobs=None, precompute_distances='auto',
       random_state=0, tol=0.0001, verbose=0)

In [29]:
kmeans.labels_[0:50]

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 4, 1, 1, 1, 1,
       1, 1, 0, 1, 1, 1])

In [30]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [31]:
neighborhoods_venues_sorted.head()

Unnamed: 0,Cluster Labels,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,1,Agincourt,Latin American Restaurant,Clothing Store,Breakfast Spot,Lounge,Yoga Studio
1,1,"Alderwood, Long Branch",Pizza Place,Coffee Shop,Playground,Sandwich Place,Pub
2,1,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Bank,Pharmacy,Mobile Phone Shop,Diner
3,1,Bayview Village,Café,Chinese Restaurant,Bank,Japanese Restaurant,Department Store
4,1,"Bedford Park, Lawrence Manor East",Restaurant,Coffee Shop,Italian Restaurant,Sandwich Place,Pharmacy


In [32]:
toronto_merged = combined_data

toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

toronto_merged

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,0.0,Park,Food & Drink Shop,Bus Stop,Yoga Studio,Deli / Bodega
1,M4A,North York,Victoria Village,43.725882,-79.315572,1.0,Coffee Shop,Portuguese Restaurant,Financial or Legal Service,Hockey Arena,Yoga Studio
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636,1.0,Coffee Shop,Bakery,Park,Breakfast Spot,Café
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,1.0,Clothing Store,Furniture / Home Store,Accessories Store,Boutique,Coffee Shop
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,1.0,Coffee Shop,Sushi Restaurant,Creperie,Bar,Bank
...,...,...,...,...,...,...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944,1.0,Smoke Shop,River,Yoga Studio,Dance Studio,Eastern European Restaurant
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160,1.0,Sushi Restaurant,Bookstore,Escape Room,Ramen Restaurant,Martial Arts School
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558,1.0,Gym / Fitness Center,Auto Workshop,Pizza Place,Comic Shop,Restaurant
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509,4.0,Baseball Field,Yoga Studio,Falafel Restaurant,Escape Room,Electronics Store


In [33]:
# we will remove the NaN values to prevent data skew
toronto_merged_nonan = toronto_merged.dropna(subset=['Cluster Labels'])

In [34]:
import matplotlib.cm as cm
import matplotlib.colors as colors

In [35]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(k_num_clusters)
ys = [i + x + (i*x)**2 for i in range(k_num_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged_nonan['Latitude'], toronto_merged_nonan['Longitude'], toronto_merged_nonan['Neighbourhood'], toronto_merged_nonan['Cluster Labels']):
    label = folium.Popup('Cluster ' + str(int(cluster) +1) + '\n' + str(poi) , parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)]
        ).add_to(map_clusters)
        
map_clusters

## Now lets verify each of the clusters

### Cluster 1:

In [36]:
toronto_merged_nonan.loc[toronto_merged_nonan['Cluster Labels'] == 0, toronto_merged_nonan.columns[[1] + list(range(5, toronto_merged_nonan.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,North York,0.0,Park,Food & Drink Shop,Bus Stop,Yoga Studio,Deli / Bodega
21,York,0.0,Park,Women's Store,Pool,Yoga Studio,Distribution Center
35,East York,0.0,Intersection,Convenience Store,Coffee Shop,Park,Department Store
45,North York,0.0,Park,Yoga Studio,Deli / Bodega,Escape Room,Electronics Store
61,Central Toronto,0.0,Park,Swim School,Bus Line,Yoga Studio,Deli / Bodega
66,North York,0.0,Park,Convenience Store,Yoga Studio,Deli / Bodega,Escape Room
85,Scarborough,0.0,Playground,Intersection,Park,Arts & Crafts Store,Yoga Studio
91,Downtown Toronto,0.0,Park,Playground,Trail,Yoga Studio,Diner


### Cluster 2

In [37]:
toronto_merged_nonan.loc[toronto_merged_nonan['Cluster Labels'] == 1, toronto_merged_nonan.columns[[1] + list(range(5, toronto_merged_nonan.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,North York,1.0,Coffee Shop,Portuguese Restaurant,Financial or Legal Service,Hockey Arena,Yoga Studio
2,Downtown Toronto,1.0,Coffee Shop,Bakery,Park,Breakfast Spot,Café
3,North York,1.0,Clothing Store,Furniture / Home Store,Accessories Store,Boutique,Coffee Shop
4,Downtown Toronto,1.0,Coffee Shop,Sushi Restaurant,Creperie,Bar,Bank
6,Scarborough,1.0,Fast Food Restaurant,Print Shop,Yoga Studio,Dance Studio,Electronics Store
...,...,...,...,...,...,...,...
97,Downtown Toronto,1.0,Café,Restaurant,Coffee Shop,Seafood Restaurant,Gym / Fitness Center
98,Etobicoke,1.0,Smoke Shop,River,Yoga Studio,Dance Studio,Eastern European Restaurant
99,Downtown Toronto,1.0,Sushi Restaurant,Bookstore,Escape Room,Ramen Restaurant,Martial Arts School
100,East Toronto,1.0,Gym / Fitness Center,Auto Workshop,Pizza Place,Comic Shop,Restaurant


### Cluster 3

In [38]:
toronto_merged_nonan.loc[toronto_merged_nonan['Cluster Labels'] == 2, toronto_merged_nonan.columns[[1] + list(range(5, toronto_merged_nonan.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
50,North York,2.0,Gym,Pizza Place,Dance Studio,Electronics Store,Eastern European Restaurant


### Cluster 4

In [39]:
toronto_merged_nonan.loc[toronto_merged_nonan['Cluster Labels'] == 3, toronto_merged_nonan.columns[[1] + list(range(5, toronto_merged_nonan.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
12,Scarborough,3.0,Bar,Yoga Studio,Falafel Restaurant,Escape Room,Electronics Store
94,Etobicoke,3.0,Bar,Drugstore,Rental Car Location,Truck Stop,Discount Store


### Cluster 5

In [40]:
toronto_merged_nonan.loc[toronto_merged_nonan['Cluster Labels'] == 4, toronto_merged_nonan.columns[[1] + list(range(5, toronto_merged_nonan.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
57,North York,4.0,Baseball Field,Construction & Landscaping,Yoga Studio,Department Store,Escape Room
101,Etobicoke,4.0,Baseball Field,Yoga Studio,Falafel Restaurant,Escape Room,Electronics Store


## Grouping of Neighbourhoods of Mumbai

In [42]:
data= pd.read_excel("https://github.com/omarjlinaresh/Coursera_Capstone/blob/main/Mumbai%20Neighbourhood.xls?raw=true")
data.head()

Unnamed: 0,Neighbourhood,Postal Code,Latitude,Longitude
0,Aareymilk Colony,400065,19.162898,72.88367
1,Agripada,400011,18.975302,72.824897
2,Airport,400099,19.095696,72.855633
3,A I Staff Colony,400029,19.176062,72.944793
4,Ambewadi,400004,18.955627,72.821715


In [43]:
# Now we will collect the venues for all these locations using FourSquare API
CLIENT_ID = 'X44MXRRXXIV00CMCETPJWZYUGN435BVLP4MECN0VUDBFMQRG' 
CLIENT_SECRET = 'HVFUVOSULDT4MWDJOXBSLXTO1GFC3F4ZWKHRAR41Q0RGSY2T'
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: X44MXRRXXIV00CMCETPJWZYUGN435BVLP4MECN0VUDBFMQRG
CLIENT_SECRET:HVFUVOSULDT4MWDJOXBSLXTO1GFC3F4ZWKHRAR41Q0RGSY2T


In [44]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius
            )
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Category']
    
    return(nearby_venues)

In [45]:
# collecting the venues in Mumbai for all neighborhoods, within 500 mtrs radius
venues_in_mumbai = getNearbyVenues(data['Neighbourhood'], data['Latitude'], data['Longitude'])

Aareymilk Colony       
Agripada             
Airport             
A I Staff Colony  
Ambewadi             
Andheri             
Andheri East       
Andheri Railway Station
Antop Hill             
Anushakti Nagar        
Asvini             
Audit Bhavan        
Azad Nagar             
Bandra             
Bandra West        
Bangur Nagar       
BARC             
Barve Nagar       
Bazargate             
Best Staff Colony
BEST STaff Quarters 
Bhandup Complex
Bhandup East    
Bhandup Ind. Estate 
Bhandup West     
Bharat Nagar    
Bhawani Shankar  
Bhawani Shankar Rd   
B.N. Bhavan      
Borivali     
Borivali East    
Borivali West      
B.P.Lane             
B P T Colony          
Central Building    
Century Mill         
C G S Colony         
Chakala Midc        
Chamarbaug          
Charkop          
Charni Road         
Chaupati          
Chembur          
Chembur Extension   
Chembur Rs          
Chinchbunder        
Chinchpokli       
Chunabhatti       
Churchgate        
Colaba  

In [46]:
venues_in_mumbai.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category
0,Aareymilk Colony,19.162898,72.88367,Film City,Event Space
1,Aareymilk Colony,19.162898,72.88367,Film City Studio No. 7,Dance Studio
2,Aareymilk Colony,19.162898,72.88367,Cafe Mosaque,Café
3,Aareymilk Colony,19.162898,72.88367,Hill top,Mountain
4,Agripada,18.975302,72.824897,Celejor,Bakery


In [47]:
venues_in_mumbai.shape

(2622, 5)

In [48]:
# checking by venue catagories with max frequency
venues_in_mumbai.groupby('Venue Category').max()

Unnamed: 0_level_0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
ATM,Sahakar Bhavan,19.152227,73.110429,SBI Bank ATM
Accessories Store,Malad East,18.984800,73.110429,World of Titan
Afghan Restaurant,Psm Colony,19.102440,72.918038,Zaffran
Airport,Santacruz P&t Colony,19.101213,72.862256,Sahar Cargo Airport
Airport Lounge,Vidyanagari,19.099541,72.875948,GVK First and Business Lounge
...,...,...,...,...
Wine Bar,New Yogakshema,19.178201,72.943314,Shloka dining bar
Wine Shop,Orlem,19.195491,72.923188,S.K Wines
Women's Store,Santacruz Central,19.260510,73.910998,Veromoda
Yoga Studio,S. C. Court,19.128686,72.867794,Usha Kunj


In [49]:
# now we will get dummies for all the venue catagories

mumbai_venue_cat = pd.get_dummies(venues_in_mumbai[['Venue Category']], prefix="", prefix_sep="")
mumbai_venue_cat

Unnamed: 0,ATM,Accessories Store,Afghan Restaurant,Airport,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Tunnel,Vegetarian / Vegan Restaurant,Video Store,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2617,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2618,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2619,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2620,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [50]:
# Adding the Neighborhood column in the encoded dataset

mumbai_venue_cat['Neighbourhood'] = venues_in_mumbai['Neighbourhood'] 

# moving neighborhood column to the first column
fixed_columns = [mumbai_venue_cat.columns[-1]] + list(mumbai_venue_cat.columns[:-1])
mumbai_venue_cat = mumbai_venue_cat[fixed_columns]

mumbai_venue_cat.head()


Unnamed: 0,Neighbourhood,ATM,Accessories Store,Afghan Restaurant,Airport,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Tunnel,Vegetarian / Vegan Restaurant,Video Store,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo
0,Aareymilk Colony,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Aareymilk Colony,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Aareymilk Colony,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Aareymilk Colony,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Agripada,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [51]:
mumbai_grouped = mumbai_venue_cat.groupby('Neighbourhood').mean().reset_index()
mumbai_grouped.head()

Unnamed: 0,Neighbourhood,ATM,Accessories Store,Afghan Restaurant,Airport,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Tunnel,Vegetarian / Vegan Restaurant,Video Store,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo
0,A I Staff Colony,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Aareymilk Colony,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Agripada,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Airport,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0
4,Ambewadi,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [52]:
# first we will make a funstion to get the top most venues
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [53]:
# now plug in the function to get top 5 venues for each neighborhood
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighbourhood'] = mumbai_grouped['Neighbourhood']

for ind in np.arange(mumbai_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(mumbai_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,A I Staff Colony,Athletics & Sports,Indian Restaurant,Café,Lounge,Pool
1,Aareymilk Colony,Café,Mountain,Dance Studio,Event Space,Zoo
2,Agripada,Athletics & Sports,Soccer Field,Bakery,Platform,Bank
3,Airport,Hotel,Coffee Shop,Café,Indian Restaurant,Bar
4,Ambewadi,Snack Place,Juice Bar,Fast Food Restaurant,Breakfast Spot,Coffee Shop
...,...,...,...,...,...,...
224,Worli,Ice Cream Shop,Indian Restaurant,Spa,Seafood Restaurant,Diner
225,Worli Colony,Pizza Place,Bakery,Art Gallery,Thai Restaurant,Donut Shop
226,Worli Naka,Restaurant,Indian Restaurant,Stadium,Smoke Shop,Hotel
227,Worli Police Camp,Café,Ice Cream Shop,Convenience Store,Gourmet Shop,Bus Station


In [54]:
# set number of clusters
k_num_clusters = 5

mumbai_grouped_clustering = mumbai_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=k_num_clusters, random_state=0).fit(mumbai_grouped_clustering)
kmeans

KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
       n_clusters=5, n_init=10, n_jobs=None, precompute_distances='auto',
       random_state=0, tol=0.0001, verbose=0)

In [55]:
kmeans.labels_[0:50]

array([1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 3, 0, 1, 1, 1, 0, 0, 1,
       2, 0, 2, 3, 4, 4, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1,
       0, 1, 1, 0, 0, 0])

In [56]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [57]:
neighborhoods_venues_sorted.head()

Unnamed: 0,Cluster Labels,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,1,A I Staff Colony,Athletics & Sports,Indian Restaurant,Café,Lounge,Pool
1,0,Aareymilk Colony,Café,Mountain,Dance Studio,Event Space,Zoo
2,0,Agripada,Athletics & Sports,Soccer Field,Bakery,Platform,Bank
3,1,Airport,Hotel,Coffee Shop,Café,Indian Restaurant,Bar
4,1,Ambewadi,Snack Place,Juice Bar,Fast Food Restaurant,Breakfast Spot,Coffee Shop


In [58]:
mumbai_merged = data

mumbai_merged = mumbai_merged.join(neighborhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

mumbai_merged

Unnamed: 0,Neighbourhood,Postal Code,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Aareymilk Colony,400065,19.162898,72.883670,0.0,Café,Mountain,Dance Studio,Event Space,Zoo
1,Agripada,400011,18.975302,72.824897,0.0,Athletics & Sports,Soccer Field,Bakery,Platform,Bank
2,Airport,400099,19.095696,72.855633,1.0,Hotel,Coffee Shop,Café,Indian Restaurant,Bar
3,A I Staff Colony,400029,19.176062,72.944793,1.0,Athletics & Sports,Indian Restaurant,Café,Lounge,Pool
4,Ambewadi,400004,18.955627,72.821715,1.0,Snack Place,Juice Bar,Fast Food Restaurant,Breakfast Spot,Coffee Shop
...,...,...,...,...,...,...,...,...,...,...
234,Worli,400018,19.000633,72.816812,1.0,Ice Cream Shop,Indian Restaurant,Spa,Seafood Restaurant,Diner
235,Worli Colony,400030,19.006054,72.821421,0.0,Pizza Place,Bakery,Art Gallery,Thai Restaurant,Donut Shop
236,Worli Naka,400018,18.984683,72.819052,1.0,Restaurant,Indian Restaurant,Stadium,Smoke Shop,Hotel
237,Worli Police Camp,400030,19.005591,72.815207,0.0,Café,Ice Cream Shop,Convenience Store,Gourmet Shop,Bus Station


In [59]:
# we will remove the NaN values to prevent data skew
mumbai_merged_nonan = mumbai_merged.dropna(subset=['Cluster Labels'])

In [61]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(k_num_clusters)
ys = [i + x + (i*x)**2 for i in range(k_num_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(mumbai_merged_nonan['Latitude'], mumbai_merged_nonan['Longitude'], mumbai_merged_nonan['Neighbourhood'], mumbai_merged_nonan['Cluster Labels']):
    label = folium.Popup('Cluster ' + str(int(cluster) +1) + '\n' + str(poi) , parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)]
        ).add_to(map_clusters)
        
map_clusters

### We will verify the clusters now

### Cluster 1

In [62]:
mumbai_merged_nonan.loc[mumbai_merged_nonan['Cluster Labels'] == 0, mumbai_merged_nonan.columns[[0] + list(range(4, mumbai_merged_nonan.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Aareymilk Colony,0.0,Café,Mountain,Dance Studio,Event Space,Zoo
1,Agripada,0.0,Athletics & Sports,Soccer Field,Bakery,Platform,Bank
6,Andheri East,0.0,Hotel,Pizza Place,Hotel Bar,Bistro,Coffee Shop
8,Antop Hill,0.0,Gym / Fitness Center,Train Station,Zoo,Diner,Fish Market
10,Asvini,0.0,Stables,Zoo,Dog Run,Flea Market,Fish Market
...,...,...,...,...,...,...,...
232,Wadala Rs,0.0,Gym,Pizza Place,Soccer Stadium,Bakery,Cupcake Shop
233,Wadala Truck Terminal,0.0,Pizza Place,Asian Restaurant,Coffee Shop,Multiplex,Snack Place
235,Worli Colony,0.0,Pizza Place,Bakery,Art Gallery,Thai Restaurant,Donut Shop
237,Worli Police Camp,0.0,Café,Ice Cream Shop,Convenience Store,Gourmet Shop,Bus Station


### Cluster 2

In [63]:
mumbai_merged_nonan.loc[mumbai_merged_nonan['Cluster Labels'] == 1, mumbai_merged_nonan.columns[[0] + list(range(4, mumbai_merged_nonan.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
2,Airport,1.0,Hotel,Coffee Shop,Café,Indian Restaurant,Bar
3,A I Staff Colony,1.0,Athletics & Sports,Indian Restaurant,Café,Lounge,Pool
4,Ambewadi,1.0,Snack Place,Juice Bar,Fast Food Restaurant,Breakfast Spot,Coffee Shop
5,Andheri,1.0,Camera Store,Indian Restaurant,Shopping Mall,Electronics Store,Sandwich Place
7,Andheri Railway Station,1.0,Indian Restaurant,Fast Food Restaurant,Food Court,Vegetarian / Vegan Restaurant,Restaurant
...,...,...,...,...,...,...,...
229,V.P. Road,1.0,Bar,Dessert Shop,Indian Restaurant,Seafood Restaurant,Modern European Restaurant
230,V.W.T.C.,1.0,Indian Restaurant,Italian Restaurant,Shopping Mall,Snack Place,Gym
231,Wadala,1.0,Plaza,Indian Restaurant,Seafood Restaurant,Gourmet Shop,Maharashtrian Restaurant
234,Worli,1.0,Ice Cream Shop,Indian Restaurant,Spa,Seafood Restaurant,Diner


### Cluster 3

In [64]:
mumbai_merged_nonan.loc[mumbai_merged_nonan['Cluster Labels'] == 2, mumbai_merged_nonan.columns[[0] + list(range(4, mumbai_merged_nonan.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
22,Bhandup East,2.0,ATM,Comfort Food Restaurant,Flower Shop,Flea Market,Fish Market
24,Bhandup West,2.0,ATM,Comfort Food Restaurant,Flower Shop,Flea Market,Fish Market


### Cluster 4

In [65]:
mumbai_merged_nonan.loc[mumbai_merged_nonan['Cluster Labels'] == 3, mumbai_merged_nonan.columns[[0] + list(range(4, mumbai_merged_nonan.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
25,Bharat Nagar,3.0,Indian Restaurant,Electronics Store,Train Station,Zoo,Dog Run
32,B.P.Lane,3.0,Indian Restaurant,Food,American Restaurant,Café,Restaurant
66,Falkland Road,3.0,Indian Restaurant,Zoo,Dog Run,Flea Market,Fish Market
89,J.B. Nagar,3.0,Indian Restaurant,Hotel,Zoo,Dog Run,Flea Market
100,Kandivali West,3.0,Indian Restaurant,Theater,Department Store,Shopping Mall,Gym / Fitness Center
115,Madhavbaug,3.0,Indian Restaurant,Snack Place,Fast Food Restaurant,Zoo,Dog Run
120,Mahul Road,3.0,Indian Restaurant,Food Court,Ice Cream Shop,Fast Food Restaurant,Gastropub
127,Mandvi,3.0,Indian Restaurant,Dessert Shop,BBQ Joint,Restaurant,Market
131,Marol Naka,3.0,Indian Restaurant,Hotel,Coffee Shop,Zoo,Dog Run
160,Null Bazar,3.0,Indian Restaurant,Breakfast Spot,Antique Shop,BBQ Joint,Market


### Cluster 5

In [66]:
mumbai_merged_nonan.loc[mumbai_merged_nonan['Cluster Labels'] == 4, mumbai_merged_nonan.columns[[0] + list(range(4, mumbai_merged_nonan.shape[1]))]]


Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
26,Bhawani Shankar,4.0,Boat or Ferry,Department Store,Donut Shop,Flower Shop,Flea Market
27,Bhawani Shankar Rd,4.0,Boat or Ferry,Department Store,Donut Shop,Flower Shop,Flea Market
124,Malad West Dely,4.0,Boat or Ferry,Department Store,Donut Shop,Flower Shop,Flea Market
187,Sandeepany Sadhanalya,4.0,Boat or Ferry,Department Store,Donut Shop,Flower Shop,Flea Market
211,T.F.Donar,4.0,Boat or Ferry,Department Store,Donut Shop,Flower Shop,Flea Market
