#  Introduction

## Objective

In this project, we will study area classification using Foursquare API data and ML segmentation and clustering. The aim of this project is to segment areas of Delhi and Mumbai based on the most common places captured from Foursquare in India.

Using segmentation and clustering, we hope we can determine:

the similarity or dissimilarirty of both cities
classification of area located inside the city whether it is residential, tourism places, or others

# Data

Data is acquired from following two -

-- For Mumbai (https://www.mapsofindia.com/pincode/india/maharashtra/mumbai/)

-- For Delhi (https://www.whatsuplife.in/delhi/blog/zip-pin-postal-code-pincodes-delhi/)

and these will converted to csv by parsing the html text

Data is in form of Area along with their Pincodes for each city. We will further fetch the lattitude and longitude for each areas and store to a DataFrame for analysis and also to a separate CSV file to avoid scrapping again.

This data (Area, Pincode, City, Latitude, Longitude) will be help to identify common places using FS API.

In [1]:
# importing all necessary libraries
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import requests
from geopy.geocoders import Nominatim
import folium

In [2]:
# process html for Mumbai and make csv

url = 'https://www.mapsofindia.com/pincode/india/maharashtra/mumbai/'
request = requests.get(url).text
type(request)

page = BeautifulSoup(request, 'lxml')


In [3]:
div = page.find('div', class_='tables table2 sliderespon')
table = page.find('table')

In [4]:
columns = []
for column in table.find_all('th'):
    columns.append(column.text.replace('\n',''))
    
columns

['Pincode Details']

In [5]:
all = []
for tr in table.find_all('tr'):    
    row=[]
    for col in tr.find_all('td'):
        row.append(col.text.replace('\n',''))    
    all.append(row)

In [6]:
df = pd.DataFrame(all, columns=['Area', 'Pincode', 'State', 'District'])
df.drop(0, inplace=True) #dropping 0th index row. for some reason it was adding a None Value row
df.drop(1, inplace=True)
df.sort_values('Pincode').head()

Unnamed: 0,Area,Pincode,State,District
94,M.P.t.,400001,Maharashtra,Mumbai
19,Bazargate,400001,Maharashtra,Mumbai
119,Mumbai.,400001,Maharashtra,Mumbai
160,Stock Exchange,400001,Maharashtra,Mumbai
161,Tajmahal,400001,Maharashtra,Mumbai


In [7]:
df = df.reset_index(drop=True) #reseting the index and drop the state column
df = df.drop(['State'], axis=1)
df.head()

Unnamed: 0,Area,Pincode,District
0,A I staff colony,400029,Mumbai
1,Aareymilk Colony,400065,Mumbai
2,Agripada,400011,Mumbai
3,Airport,400099,Mumbai
4,Ambewadi,400004,Mumbai


In [8]:
[pincode for pincode in df['Pincode']]

['400029 ',
 '400065 ',
 '400011 ',
 '400099 ',
 '400004 ',
 '400053 ',
 '400069 ',
 '400058 ',
 '400037 ',
 '400005 ',
 '400053 ',
 '400003 ',
 '400051 ',
 '400003 ',
 '400050 ',
 '400051 ',
 '400090 ',
 '400001 ',
 '400012 ',
 '400007 ',
 '400028 ',
 '400028 ',
 '400091 ',
 '400066 ',
 '400092 ',
 '400013 ',
 '400020 ',
 '400030 ',
 '400093 ',
 '400012 ',
 '400067 ',
 '400004 ',
 '400004 ',
 '400009 ',
 '400011 ',
 '400020 ',
 '400005 ',
 '400033 ',
 '400026 ',
 '400026 ',
 '400014 ',
 '400014 ',
 '400068 ',
 '400068 ',
 '400052 ',
 '400066 ',
 '400013 ',
 '400017 ',
 '400017 ',
 '400010 ',
 '400026 ',
 '400008 ',
 '400004 ',
 '400028 ',
 '400062 ',
 '400063 ',
 '400062 ',
 '400051 ',
 '400026 ',
 '400007 ',
 '400058 ',
 '400012 ',
 '400011 ',
 '400034 ',
 '400057 ',
 '400032 ',
 '400005 ',
 '400056 ',
 '400095 ',
 '400099 ',
 '400059 ',
 '400008 ',
 '400011 ',
 '400060 ',
 '400102 ',
 '400049 ',
 '400033 ',
 '400002 ',
 '400008 ',
 '400101 ',
 '400067 ',
 '400016 ',
 '400068 ',
 '40

In [9]:
# process the html for delhi 

url = 'https://www.whatsuplife.in/delhi/blog/zip-pin-postal-code-pincodes-delhi/'
request = requests.get(url).text
page = BeautifulSoup(request,'lxml')

In [10]:
div = page.find('div', class_ = 'post-content description ')
    

In [11]:
district = []
for span in div.find_all('span'):
    district.append(span.text.replace('\xa0',' '))

In [12]:
district = district[:7]
district

['CENTRAL DELHI',
 'SOUTH DELHI',
 'WEST DELHI',
 'NORTH DELHI',
 'EAST DELHI',
 'SOUTH WEST DELHI',
 'NORTH WEST DELHI']

In [13]:
all_d = []
for d in district:
    url_d = ('https://www.mapsofindia.com/pincode/india/delhi/{}/').format(d.replace(' ','-'))
    print(url_d)
    request_d = requests.get(url_d).text
    page_d = BeautifulSoup(request_d,'lxml')    
    div_d = page_d.find('div', class_='tables table2 sliderespon')
    table_d = page_d.find('table')
    for tr_d in table_d.find_all('tr'):    
        row_d=[]
        for col_d in tr_d.find_all('td'):
            row_d.append(col_d.text.replace('\n',''))    
        all_d.append(row_d)

https://www.mapsofindia.com/pincode/india/delhi/CENTRAL-DELHI/
https://www.mapsofindia.com/pincode/india/delhi/SOUTH-DELHI/
https://www.mapsofindia.com/pincode/india/delhi/WEST-DELHI/
https://www.mapsofindia.com/pincode/india/delhi/NORTH-DELHI/
https://www.mapsofindia.com/pincode/india/delhi/EAST-DELHI/
https://www.mapsofindia.com/pincode/india/delhi/SOUTH-WEST-DELHI/
https://www.mapsofindia.com/pincode/india/delhi/NORTH-WEST-DELHI/


In [49]:
all_d

[[],
 ['Location', 'Pincode ', 'State ', 'District '],
 ['A.G.c.r.', '110002 ', 'Delhi', 'Central Delhi '],
 ['A.K.market', '110055 ', 'Delhi', 'Central Delhi '],
 ['Ajmeri Gate extn.', '110002 ', 'Delhi', 'Central Delhi '],
 ['Anand Parbat', '110005 ', 'Delhi', 'Central Delhi '],
 ['Anand Parbat indl. area', '110005 ', 'Delhi', 'Central Delhi '],
 ['Bank Street', '110005 ', 'Delhi', 'Central Delhi '],
 ['Baroda House', '110001 ', 'Delhi', 'Central Delhi '],
 ['Bengali Market', '110001 ', 'Delhi', 'Central Delhi '],
 ['Bhagat Singh market', '110001 ', 'Delhi', 'Central Delhi '],
 ['Connaught Place', '110001 ', 'Delhi', 'Central Delhi '],
 ['Constitution House', '110001 ', 'Delhi', 'Central Delhi '],
 ['Dada Ghosh bhawan', '110008 ', 'Delhi', 'Central Delhi '],
 ['Darya Ganj', '110002 ', 'Delhi', 'Central Delhi '],
 ['Delhi High court', '110003 ', 'Delhi', 'Central Delhi '],
 ['Desh Bandhu gupta road', '110005 ', 'Delhi', 'Central Delhi '],
 ['Election Commission', '110001 ', 'Delhi', '

In [58]:
dfd = pd.DataFrame(all_d, columns=['Area', 'Pincode', 'State', 'District'])

In [74]:
#dfd = dfd.dropna(axis=1, how='all')
dfd = dfd[pd.notnull(dfd['Area'])]
dfd = dfd[dfd['Area'] != 'Location']

In [76]:

dfd.head()

Unnamed: 0,Area,Pincode,State,District
2,A.G.c.r.,110002,Delhi,Central Delhi
3,A.K.market,110055,Delhi,Central Delhi
4,Ajmeri Gate extn.,110002,Delhi,Central Delhi
5,Anand Parbat,110005,Delhi,Central Delhi
6,Anand Parbat indl. area,110005,Delhi,Central Delhi


In [77]:
dfd = dfd.reset_index(drop=True) #reseting the index and drop the state column
dfd = dfd.drop(['State'], axis=1)
dfd.head()

Unnamed: 0,Area,Pincode,District
0,A.G.c.r.,110002,Central Delhi
1,A.K.market,110055,Central Delhi
2,Ajmeri Gate extn.,110002,Central Delhi
3,Anand Parbat,110005,Central Delhi
4,Anand Parbat indl. area,110005,Central Delhi


In [18]:
def get_latitude_longitude(area):
    address = '{}, India'.format(area)
    geo_locator = Nominatim(user_agent="get_latlong")
    location = geo_locator.geocode(address, timeout=30)
    #return location.latitude
    
    if location is None:
        latitude = None
        longitude = None
    else:
        latitude = location.latitude
        longitude = location.longitude
        
    return [latitude, longitude]
    #print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

In [19]:
loc = get_latitude_longitude('110018 ')

In [20]:
loc

[28.6338964119353, 77.0820948893626]

In [28]:
m_latlong = []
i=0
#from tqdm import tqdm_notebook
#m_latlong = [get_latitude_longitude(area + ", Mumbai") for area in tqdm(df['Area'])]
for area in df['Area']:
    m_latlong.append(get_latitude_longitude(area + ", Mumbai"))
    print(i)
    i=i+1

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181


In [32]:
dfd['Area'][106] +", Delhi"

'Madanpur Khadar, Delhi'

In [None]:
from tqdm import tqdm
d_latlong = [get_latitude_longitude(area) for area in tqdm(dfd['Area'])]

In [44]:
dfd.iloc[50:60]['Area']

50                         Sat Nagar
51                 Secretariat North
52                    Shastri Bhawan
53                      South Avenue
54                     Supreme Court
55             Swami Ram tirth nagar
56                      Udyog Bhawan
57    Union Public service commissio
58                              None
60              Abul Fazal enclave-i
Name: Area, dtype: object

In [78]:
#above line was not working for some reason
#dfd.iloc[532]['Pincode']
d_latlong = []
i=0

In [79]:
#above line was not working
for area in dfd.iloc[:]['Area']:        
    d_latlong.append(get_latitude_longitude(area + ", Delhi"))
    print(i)
    i=i+1

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
27

In [81]:
#now, using Geocoder and Google API, we get the Latitude and Longitude of each area
df_m_ll = pd.DataFrame(m_latlong, columns=['Latitude', 'Longitude'])
df_d_ll = pd.DataFrame(d_latlong, columns=['Latitude', 'Longitude'])

df['Latitude'] = df_m_ll['Latitude']
df['Longitude'] = df_m_ll['Longitude']

dfd['Latitude'] = df_d_ll['Latitude']
dfd['Longitude'] = df_d_ll['Longitude']

In [86]:
df = df[np.isfinite(df['Latitude'])]
dfd = dfd[np.isfinite(dfd['Latitude'])]

In [82]:
#saving to csv file for future use.
df.to_csv('mumbai_pincodes.csv')
dfd.to_csv('delhi_pincodes.csv')

In [87]:
df

Unnamed: 0,Area,Pincode,District,Latitude,Longitude
2,Agripada,400011,Mumbai,18.975302,72.824898
3,Airport,400099,Mumbai,19.090201,72.863808
4,Ambewadi,400004,Mumbai,19.186776,72.859313
5,Andheri,400053,Mumbai,19.120371,72.848043
6,Andheri East,400069,Mumbai,19.115883,72.854202
7,Andheri Railway station,400058,Mumbai,19.120371,72.848043
8,Antop Hill,400037,Mumbai,19.020761,72.865256
9,Asvini,400005,Mumbai,18.900867,72.815941
10,Azad Nagar,400053,Mumbai,19.165798,72.955893
11,B P t colony,400003,Mumbai,19.101937,72.861599


In [88]:
#create map for mumbai for lat long
from geopy.geocoders import Nominatim
import folium

address = 'Mumbai, India'
geolocator = Nominatim()
location = geolocator.geocode(address, timeout=10)
latitude = location.latitude
longitude = location.longitude

# create map of New York using latitude and longitude values
map_mumbai = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['District'], df['Area']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_mumbai)  
    
map_mumbai

  


In [90]:
#create map for mumbai for lat long
from geopy.geocoders import Nominatim
import folium

address = 'Delhi, India'
geolocator = Nominatim()
location = geolocator.geocode(address, timeout=10)
latitude = location.latitude
longitude = location.longitude

# create map of New York using latitude and longitude values
map_delhi = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(dfd['Latitude'], dfd['Longitude'], dfd['District'], dfd['Area']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_delhi)  
    
map_delhi

  


# Methodology

Above, we have done convert addresses into their equivalent latitude and longitude values.
Then we will use the Foursquare API to explore neighborhoods in both cities, Mumbai and Delhi

After that, explore function to get the most common venue categories in each neighborhood,
and then use this feature to group the neighborhoods into clusters

K-means clustering algorithm will be use to complete this task. And also, the Folium library to visualize the neighborhoods in Mumbai and Delhi and their emerging clusters.

Based on dataframe analysis above, we found out that 400003 area in Mumbai and 110001 area in Delhi are both have the highest number of area within itself.

In [92]:
#lets find out the highest number of area per dist
df.groupby('Pincode').count().sort_values('Area', ascending=False).head()

Unnamed: 0_level_0,Area,District,Latitude,Longitude
Pincode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
400001,5,5,5,5
400004,5,5,5,5
400028,5,5,5,5
400012,5,5,5,5
400003,4,4,4,4


In [93]:
#lets find out the highest number of area per dist
dfd.groupby('Pincode').count().sort_values('Area', ascending=False).head()

Unnamed: 0_level_0,Area,District,Latitude,Longitude
Pincode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
110001,15,15,15,15
110010,11,11,11,11
110028,11,11,11,11
110015,11,11,11,11
110019,7,7,7,7


In [96]:
#we identified Pincode 400003 and 110001 have the hihest places count in Mumbai and Delhi respectively
# we'll only follow them
m_df = df[df['Pincode'] == '400003 '].reset_index(drop=True)
d_df = dfd[dfd['Pincode'] == '110001 '].reset_index(drop=True)

In [97]:
m_df.head()

Unnamed: 0,Area,Pincode,District,Latitude,Longitude
0,B P t colony,400003,Mumbai,19.101937,72.861599
1,Mandvi,400003,Mumbai,18.955056,72.834792
2,Masjid,400003,Mumbai,19.053051,72.832407
3,Null Bazar,400003,Mumbai,18.928665,72.832264


In [98]:
d_df.head()

Unnamed: 0,Area,Pincode,District,Latitude,Longitude
0,Baroda House,110001,Central Delhi,28.615804,77.23002
1,Bengali Market,110001,Central Delhi,28.629465,77.232185
2,Connaught Place,110001,Central Delhi,28.631383,77.219792
3,Election Commission,110001,Central Delhi,28.448733,77.028976
4,Janpath,110001,Central Delhi,28.610086,77.218247


In [99]:
#get the geographical coordinates of 400003, mumbai
address = '400003, Mumbai'
geolocator = Nominatim()
location = geolocator.geocode(address, timeout=10)
latitude = location.latitude
longitude = location.longitude

# create map of Bukit Bintang using latitude and longitude values
map_400003 = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(m_df['Latitude'], m_df['Longitude'], m_df['Area']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_400003)  
    
map_400003

  This is separate from the ipykernel package so we can avoid doing imports until


In [100]:
#get the geographical coordinates of 400003, mumbai
address = '110001, Delhi'
geolocator = Nominatim()
location = geolocator.geocode(address, timeout=10)
latitude = location.latitude
longitude = location.longitude

# create map of Bukit Bintang using latitude and longitude values
map_110001 = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(d_df['Latitude'], d_df['Longitude'], d_df['Area']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_110001)  
    
map_110001

  This is separate from the ipykernel package so we can avoid doing imports until


### Using Foursquare API to get venues at surounding area of both 400003 Mumbai and 110001 area.

In [101]:
m_df.loc[0, 'Latitude']

19.101937

In [102]:
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

#Define Foursquare Credentials and Version
CLIENT_ID = '42RMX5YJWUHBI5XLF0LTY22OSXJZVCSMUFVKCTEQSPD1NX2G' # your Foursquare ID
CLIENT_SECRET = 'V5WJW5BYB3HWGT0BHAHO3NJXVJJ2K0U4VCVWYWQKGDEWGPI2' # your Foursquare Secret
VERSION = '20180604'

#explore the first neighborhood in our dataframe
#Get the neighborhood's latitude and longitude values.
neighborhood_latitude = m_df.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = m_df.loc[0, 'Longitude'] # neighborhood longitude value
neighborhood_name = m_df.loc[0, 'Area'] # neighborhood name

#get the top 100 venues that are in Bukit Bintang within a radius of 500 meters
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 1000 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

#Send the GET request and examine the resutls
results = requests.get(url).json()

#borrow the get_category_type function from the Foursquare lab.
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

#clean the json and structure it into a pandas dataframe
venues = results['response']['groups'][0]['items']    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
print('{} venues were returned by Foursquare for 400003 Mumbai.'.format(nearby_venues.shape[0]))
nearby_venues.head()

19 venues were returned by Foursquare for 400003 Mumbai.


Unnamed: 0,name,categories,lat,lng
0,Peshawari,Indian Restaurant,19.103954,72.869879
1,ITC Maratha,Hotel,19.104023,72.869638
2,Pan Asian,Asian Restaurant,19.104424,72.869751
3,Hit & Run,Falafel Restaurant,19.107787,72.863333
4,Dum Pukth,Indian Restaurant,19.10407,72.869822


In [103]:
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

#Define Foursquare Credentials and Version
CLIENT_ID = '42RMX5YJWUHBI5XLF0LTY22OSXJZVCSMUFVKCTEQSPD1NX2G' # your Foursquare ID
CLIENT_SECRET = 'V5WJW5BYB3HWGT0BHAHO3NJXVJJ2K0U4VCVWYWQKGDEWGPI2' # your Foursquare Secret
VERSION = '20180604'

#explore the first neighborhood in our dataframe
#Get the neighborhood's latitude and longitude values.
neighborhood_latitude = d_df.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = d_df.loc[0, 'Longitude'] # neighborhood longitude value
neighborhood_name = d_df.loc[0, 'Area'] # neighborhood name

#get the top 100 venues that are in Bukit Bintang within a radius of 500 meters
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 1000 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

#Send the GET request and examine the resutls
results = requests.get(url).json()

#borrow the get_category_type function from the Foursquare lab.
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

#clean the json and structure it into a pandas dataframe
venues = results['response']['groups'][0]['items']    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
print('{} venues were returned by Foursquare for 400003 Mumbai.'.format(nearby_venues.shape[0]))
nearby_venues.head()

24 venues were returned by Foursquare for 400003 Mumbai.


Unnamed: 0,name,categories,lat,lng
0,India Gate | इंडिया गेट (India Gate),Monument / Landmark,28.612796,77.229207
1,Amar Jawan Jyoti | अमर जवान ज्योति (Amar Jawan...,Sculpture Garden,28.61298,77.228247
2,Andhra Bhavan Canteen,Indian Restaurant,28.617095,77.225721
3,Gulati Restaurant,Indian Restaurant,28.60801,77.229989
4,National Gallery Of Modern Art | राष्ट्रीय आधु...,Art Gallery,28.609411,77.234585


In [107]:
#function to repeat the same process to all area
def getNearbyVenues(names, latitudes, longitudes, radius=1000):    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Area', 
                  'Area Latitude', 
                  'Area Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [108]:
#run the above function on each neighborhood and create a new dataframe
m_venues = getNearbyVenues(names=m_df['Area'],
                                   latitudes=m_df['Latitude'],
                                   longitudes=m_df['Longitude']
                                  )

#check the size of the resulting dataframe
print(m_venues.shape)
m_venues.head()

B P t colony
Mandvi
Masjid
Null Bazar
(263, 7)


Unnamed: 0,Area,Area Latitude,Area Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,B P t colony,19.101937,72.861599,Peshawari,19.103954,72.869879,Indian Restaurant
1,B P t colony,19.101937,72.861599,ITC Maratha,19.104023,72.869638,Hotel
2,B P t colony,19.101937,72.861599,Pan Asian,19.104424,72.869751,Asian Restaurant
3,B P t colony,19.101937,72.861599,Hit & Run,19.107787,72.863333,Falafel Restaurant
4,B P t colony,19.101937,72.861599,Dum Pukth,19.10407,72.869822,Indian Restaurant


In [105]:
#run the above function on each neighborhood and create a new dataframe
d_venues = getNearbyVenues(names=d_df['Area'],
                                   latitudes=d_df['Latitude'],
                                   longitudes=d_df['Longitude']
                                  )

#check the size of the resulting dataframe
print(d_venues.shape)
d_venues.head()

Baroda House
Bengali Market
Connaught Place
Election Commission
Janpath
Krishi Bhawan
North Avenue
Parliament House
Patiala House
Pragati Maidan
Rail Bhawan
Sansad Marg
Secretariat North
Shastri Bhawan
Supreme Court
(417, 7)


Unnamed: 0,Area,Area Latitude,Area Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Baroda House,28.615804,77.23002,India Gate | इंडिया गेट (India Gate),28.612796,77.229207,Monument / Landmark
1,Baroda House,28.615804,77.23002,Amar Jawan Jyoti | अमर जवान ज्योति (Amar Jawan...,28.61298,77.228247,Sculpture Garden
2,Baroda House,28.615804,77.23002,Andhra Bhavan Canteen,28.617095,77.225721,Indian Restaurant
3,Baroda House,28.615804,77.23002,Gulati Restaurant,28.60801,77.229989,Indian Restaurant
4,Baroda House,28.615804,77.23002,National Gallery Of Modern Art | राष्ट्रीय आधु...,28.609411,77.234585,Art Gallery


In [109]:
#check how many venues were returned for each area
print('There are {} uniques categories in Mumbai.'.format(len(m_venues['Venue Category'].unique())))
m_venues.groupby('Area').count()

There are 83 uniques categories in Mumbai.


Unnamed: 0_level_0,Area Latitude,Area Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
B P t colony,19,19,19,19,19,19
Mandvi,44,44,44,44,44,44
Masjid,100,100,100,100,100,100
Null Bazar,100,100,100,100,100,100


In [110]:
#check how many venues were returned for each area
print('There are {} uniques categories in Delhi.'.format(len(d_venues['Venue Category'].unique())))
d_venues.groupby('Area').count()

There are 78 uniques categories in Delhi.


Unnamed: 0_level_0,Area Latitude,Area Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Baroda House,24,24,24,24,24,24
Bengali Market,28,28,28,28,28,28
Connaught Place,81,81,81,81,81,81
Election Commission,4,4,4,4,4,4
Janpath,23,23,23,23,23,23
Krishi Bhawan,24,24,24,24,24,24
North Avenue,8,8,8,8,8,8
Parliament House,11,11,11,11,11,11
Patiala House,24,24,24,24,24,24
Pragati Maidan,14,14,14,14,14,14


## Analyzing

In [113]:
# one hot encoding
m_onehot = pd.get_dummies(m_venues[['Venue Category']], prefix="", prefix_sep="")
m_onehot.head()

Unnamed: 0,American Restaurant,Antique Shop,Arcade,Art Gallery,Asian Restaurant,BBQ Joint,Bakery,Bar,Bookstore,Boutique,...,Snack Place,Stadium,Steakhouse,Tea Room,Tennis Court,Thai Restaurant,Theater,Train Station,Vegetarian / Vegan Restaurant,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [115]:
# add neighborhood column back to dataframe
m_onehot['Area'] = m_venues['Area'] 
m_onehot.head()

Unnamed: 0,American Restaurant,Antique Shop,Arcade,Art Gallery,Asian Restaurant,BBQ Joint,Bakery,Bar,Bookstore,Boutique,...,Stadium,Steakhouse,Tea Room,Tennis Court,Thai Restaurant,Theater,Train Station,Vegetarian / Vegan Restaurant,Women's Store,Area
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,B P t colony
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,B P t colony
2,0,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,B P t colony
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,B P t colony
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,B P t colony


In [121]:
# move neighborhood column to the first column
fixed_columns = [m_onehot.columns[-1]] + list(m_onehot.columns[:-1])
m_onehot = m_onehot[fixed_columns]

m_onehot.head()

Unnamed: 0,Women's Store,Area,American Restaurant,Antique Shop,Arcade,Art Gallery,Asian Restaurant,BBQ Joint,Bakery,Bar,...,Smoke Shop,Snack Place,Stadium,Steakhouse,Tea Room,Tennis Court,Thai Restaurant,Theater,Train Station,Vegetarian / Vegan Restaurant
0,0,B P t colony,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,B P t colony,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,B P t colony,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,B P t colony,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,B P t colony,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [122]:
#examine the new dataframe size after one hot encoding
print('{} rows were returned after one hot encoding.'.format(m_onehot.shape[0]))

263 rows were returned after one hot encoding.


In [123]:
#group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
m_grouped = m_onehot.groupby('Area').mean().reset_index()
m_grouped.head()

Unnamed: 0,Area,Women's Store,American Restaurant,Antique Shop,Arcade,Art Gallery,Asian Restaurant,BBQ Joint,Bakery,Bar,...,Smoke Shop,Snack Place,Stadium,Steakhouse,Tea Room,Tennis Court,Thai Restaurant,Theater,Train Station,Vegetarian / Vegan Restaurant
0,B P t colony,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.105263,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Mandvi,0.0,0.022727,0.022727,0.022727,0.0,0.0,0.022727,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Masjid,0.01,0.0,0.0,0.01,0.0,0.03,0.01,0.04,0.06,...,0.01,0.03,0.0,0.01,0.02,0.0,0.01,0.01,0.01,0.01
3,Null Bazar,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.04,0.03,...,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01


In [120]:
#examine the new dataframe size after one hot encoding
print('{} rows were returned after grouping.'.format(m_grouped.shape[0]))

4 rows were returned after grouping.


In [124]:
#print each neighborhood along with the top 5 most common venues
num_top_venues = 5

for hood in m_grouped['Area']:
    print("----"+hood+"----")
    temp = m_grouped[m_grouped['Area'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')


----B P t colony----
                  venue  freq
0     Indian Restaurant  0.26
1                 Hotel  0.16
2                   Bar  0.11
3                  Café  0.05
4  Gym / Fitness Center  0.05


----Mandvi----
               venue  freq
0  Indian Restaurant  0.32
1    Harbor / Marina  0.07
2     Ice Cream Shop  0.07
3       Dessert Shop  0.07
4              Hotel  0.05


----Masjid----
               venue  freq
0               Café  0.11
1  Indian Restaurant  0.09
2                Bar  0.06
3        Coffee Shop  0.04
4             Bakery  0.04


----Null Bazar----
                venue  freq
0   Indian Restaurant  0.12
1                Café  0.09
2  Chinese Restaurant  0.05
3  Seafood Restaurant  0.04
4              Bakery  0.04




In [125]:
#put into a pandas dataframe

#write a function to sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#create the new dataframe and display the top 10 venues for each neighborhood
num_top_venues = 8

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Area']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
areas_venues_sorted = pd.DataFrame(columns=columns)
areas_venues_sorted['Area'] = m_grouped['Area']

for ind in np.arange(m_grouped.shape[0]):
    areas_venues_sorted.iloc[ind, 1:] = return_most_common_venues(m_grouped.iloc[ind, :], num_top_venues)

areas_venues_sorted.head()

Unnamed: 0,Area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
0,B P t colony,Indian Restaurant,Hotel,Bar,Gym / Fitness Center,Falafel Restaurant,Diner,Asian Restaurant,Hotel Bar
1,Mandvi,Indian Restaurant,Ice Cream Shop,Harbor / Marina,Dessert Shop,Convenience Store,Hotel,Restaurant,Arcade
2,Masjid,Café,Indian Restaurant,Bar,Pizza Place,Bakery,Coffee Shop,Italian Restaurant,Asian Restaurant
3,Null Bazar,Indian Restaurant,Café,Chinese Restaurant,Bakery,Seafood Restaurant,Fast Food Restaurant,Hotel,Bar


## K-means Clustering

In [126]:
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 3

m_grouped_clustering = m_grouped.drop('Area', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(m_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

#create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
m_merged = m_df

# add clustering labels
m_merged['Cluster Labels'] = kmeans.labels_



In [127]:
m_merged

Unnamed: 0,Area,Pincode,District,Latitude,Longitude,Cluster Labels
0,B P t colony,400003,Mumbai,19.101937,72.861599,0
1,Mandvi,400003,Mumbai,18.955056,72.834792,2
2,Masjid,400003,Mumbai,19.053051,72.832407,1
3,Null Bazar,400003,Mumbai,18.928665,72.832264,1


In [129]:
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
m_merged = m_merged.join(areas_venues_sorted.set_index('Area'), on='Area')

m_merged.head()

Unnamed: 0,Area,Pincode,District,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
0,B P t colony,400003,Mumbai,19.101937,72.861599,0,Indian Restaurant,Hotel,Bar,Gym / Fitness Center,Falafel Restaurant,Diner,Asian Restaurant,Hotel Bar
1,Mandvi,400003,Mumbai,18.955056,72.834792,2,Indian Restaurant,Ice Cream Shop,Harbor / Marina,Dessert Shop,Convenience Store,Hotel,Restaurant,Arcade
2,Masjid,400003,Mumbai,19.053051,72.832407,1,Café,Indian Restaurant,Bar,Pizza Place,Bakery,Coffee Shop,Italian Restaurant,Asian Restaurant
3,Null Bazar,400003,Mumbai,18.928665,72.832264,1,Indian Restaurant,Café,Chinese Restaurant,Bakery,Seafood Restaurant,Fast Food Restaurant,Hotel,Bar


In [130]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

#Finally, let's visualize the resulting clusters
# create map 3.1343385, 101.6863371
m_clusters = folium.Map(location=[19.0760, 72.8777], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(m_merged['Latitude'], m_merged['Longitude'], m_merged['Area'], m_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(m_clusters)
       
m_clusters

In [131]:
# one hot encoding
d_onehot = pd.get_dummies(d_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
d_onehot['Area'] = d_venues['Area'] 

# move neighborhood column to the first column
fixed_columns = [d_onehot.columns[-1]] + list(d_onehot.columns[:-1])
d_onehot = d_onehot[fixed_columns]

#examine the new dataframe size after one hot encoding
print('{} rows were returned after one hot encoding.'.format(d_onehot.shape[0]))

#group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
d_grouped = d_onehot.groupby('Area').mean().reset_index()

#examine the new dataframe size after one hot encoding
print('{} rows were returned after grouping.'.format(d_grouped.shape[0]))

417 rows were returned after one hot encoding.
15 rows were returned after grouping.


In [132]:
#print each neighborhood along with the top 5 most common venues
num_top_venues = 5

for hood in d_grouped['Area']:
    print("----"+hood+"----")
    temp = d_grouped[d_grouped['Area'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Baroda House----
                venue  freq
0   Indian Restaurant  0.21
1                Pool  0.08
2          Smoke Shop  0.08
3  Chinese Restaurant  0.08
4          Playground  0.04


----Bengali Market----
         venue  freq
0      Theater  0.11
1         Café  0.11
2  Art Gallery  0.07
3        Hotel  0.07
4       Bakery  0.07


----Connaught Place----
               venue  freq
0  Indian Restaurant  0.14
1               Café  0.09
2              Hotel  0.06
3                Bar  0.05
4        Coffee Shop  0.05


----Election Commission----
                        venue  freq
0                      Lawyer  0.25
1  Tourist Information Center  0.25
2                 Golf Course  0.25
3                       Hotel  0.25
4                      Arcade  0.00


----Janpath----
               venue  freq
0  Indian Restaurant  0.13
1              Hotel  0.09
2          Hotel Bar  0.09
3     History Museum  0.09
4        Music Venue  0.04


----Krishi Bhawan----
               venue  

In [133]:
#create the new dataframe and display the top 10 venues for each neighborhood
num_top_venues = 8

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Area']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
areas_venues_sorted = pd.DataFrame(columns=columns)
areas_venues_sorted['Area'] = d_grouped['Area']

for ind in np.arange(d_grouped.shape[0]):
    areas_venues_sorted.iloc[ind, 1:] = return_most_common_venues(d_grouped.iloc[ind, :], num_top_venues)

areas_venues_sorted.head()

Unnamed: 0,Area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
0,Baroda House,Indian Restaurant,Smoke Shop,Pool,Chinese Restaurant,Sculpture Garden,Food & Drink Shop,Concert Hall,Furniture / Home Store
1,Bengali Market,Theater,Café,Hotel,Indian Restaurant,Art Gallery,Bakery,Salon / Barbershop,Historic Site
2,Connaught Place,Indian Restaurant,Café,Hotel,Coffee Shop,Chinese Restaurant,Bar,Lounge,Fast Food Restaurant
3,Election Commission,Tourist Information Center,Hotel,Lawyer,Golf Course,Wine Bar,Government Building,Flea Market,Food & Drink Shop
4,Janpath,Indian Restaurant,Hotel,History Museum,Hotel Bar,Restaurant,Japanese Restaurant,Government Building,Jewelry Store


In [135]:
# set number of clusters
kclusters = 3

d_grouped_clustering = d_grouped.drop('Area', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(d_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

#create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
d_merged = d_df

# add clustering labels
d_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
d_merged = d_merged.join(areas_venues_sorted.set_index('Area'), on='Area')

d_merged.head() # check the last columns!

Unnamed: 0,Area,Pincode,District,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
0,Baroda House,110001,Central Delhi,28.615804,77.23002,0,Indian Restaurant,Smoke Shop,Pool,Chinese Restaurant,Sculpture Garden,Food & Drink Shop,Concert Hall,Furniture / Home Store
1,Bengali Market,110001,Central Delhi,28.629465,77.232185,0,Theater,Café,Hotel,Indian Restaurant,Art Gallery,Bakery,Salon / Barbershop,Historic Site
2,Connaught Place,110001,Central Delhi,28.631383,77.219792,0,Indian Restaurant,Café,Hotel,Coffee Shop,Chinese Restaurant,Bar,Lounge,Fast Food Restaurant
3,Election Commission,110001,Central Delhi,28.448733,77.028976,1,Tourist Information Center,Hotel,Lawyer,Golf Course,Wine Bar,Government Building,Flea Market,Food & Drink Shop
4,Janpath,110001,Central Delhi,28.610086,77.218247,0,Indian Restaurant,Hotel,History Museum,Hotel Bar,Restaurant,Japanese Restaurant,Government Building,Jewelry Store


In [137]:
#Finally, let's visualize the resulting clusters
# create map
d_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(d_merged['Latitude'], d_merged['Longitude'], d_merged['Area'], d_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(d_clusters)
       
d_clusters

## Result

In [138]:
#Cluster 1 for Mumbai
m_merged.loc[m_merged['Cluster Labels'] == 0, m_merged.columns[[2] + list(range(5, m_merged.shape[1]))]]

Unnamed: 0,District,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
0,Mumbai,0,Indian Restaurant,Hotel,Bar,Gym / Fitness Center,Falafel Restaurant,Diner,Asian Restaurant,Hotel Bar


In [139]:
#Cluster 2 for Mumbai
m_merged.loc[m_merged['Cluster Labels'] == 1, m_merged.columns[[2] + list(range(5, m_merged.shape[1]))]]

Unnamed: 0,District,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
2,Mumbai,1,Café,Indian Restaurant,Bar,Pizza Place,Bakery,Coffee Shop,Italian Restaurant,Asian Restaurant
3,Mumbai,1,Indian Restaurant,Café,Chinese Restaurant,Bakery,Seafood Restaurant,Fast Food Restaurant,Hotel,Bar


In [140]:
#Cluster 3 for Mumbai
m_merged.loc[m_merged['Cluster Labels'] == 2, m_merged.columns[[2] + list(range(5, m_merged.shape[1]))]]

Unnamed: 0,District,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
1,Mumbai,2,Indian Restaurant,Ice Cream Shop,Harbor / Marina,Dessert Shop,Convenience Store,Hotel,Restaurant,Arcade


In [141]:
#Cluster 1 for Delhi
d_merged.loc[d_merged['Cluster Labels'] == 0, d_merged.columns[[2] + list(range(5, d_merged.shape[1]))]]

Unnamed: 0,District,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
0,Central Delhi,0,Indian Restaurant,Smoke Shop,Pool,Chinese Restaurant,Sculpture Garden,Food & Drink Shop,Concert Hall,Furniture / Home Store
1,Central Delhi,0,Theater,Café,Hotel,Indian Restaurant,Art Gallery,Bakery,Salon / Barbershop,Historic Site
2,Central Delhi,0,Indian Restaurant,Café,Hotel,Coffee Shop,Chinese Restaurant,Bar,Lounge,Fast Food Restaurant
4,Central Delhi,0,Indian Restaurant,Hotel,History Museum,Hotel Bar,Restaurant,Japanese Restaurant,Government Building,Jewelry Store
5,Central Delhi,0,Indian Restaurant,Hotel,Restaurant,Spa,History Museum,Hotel Bar,Lounge,Hotel Pool
8,Central Delhi,0,Art Gallery,Indian Restaurant,Pool,Sculpture Garden,Snack Place,Park,Concert Hall,Furniture / Home Store
9,Central Delhi,0,Theater,Pool,Coffee Shop,Chinese Restaurant,Udupi Restaurant,Train Station,Art Gallery,Art Museum
10,Central Delhi,0,Hotel,Spa,Indian Restaurant,Chinese Restaurant,History Museum,Restaurant,Wine Bar,Music Venue
11,Central Delhi,0,Hotel,Indian Restaurant,Café,Chinese Restaurant,Bar,Coffee Shop,Lounge,Italian Restaurant
13,Central Delhi,0,Indian Restaurant,Hotel,Restaurant,History Museum,Spa,Government Building,Music Venue,Hotel Pool


In [142]:
#Cluster 2 for Delhi
d_merged.loc[d_merged['Cluster Labels'] == 1, d_merged.columns[[2] + list(range(5, d_merged.shape[1]))]]

Unnamed: 0,District,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
3,Central Delhi,1,Tourist Information Center,Hotel,Lawyer,Golf Course,Wine Bar,Government Building,Flea Market,Food & Drink Shop


In [143]:
#Cluster 3 for Delhi
d_merged.loc[d_merged['Cluster Labels'] == 2, d_merged.columns[[2] + list(range(5, d_merged.shape[1]))]]

Unnamed: 0,District,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
6,Central Delhi,2,Indian Restaurant,Asian Restaurant,Theater,Stadium,Spiritual Center,Garden,Smoke Shop,Historic Site
7,Central Delhi,2,Government Building,Garden,Hotel Bar,Food & Drink Shop,Hostel,Music Venue,History Museum,Spiritual Center
12,Central Delhi,2,Music Venue,Museum,Tea Room,Spiritual Center,Garden,Government Building,Wine Bar,Golf Course


## Discussion

Based on cluster for each cities above, we believe that classification for each cluster can be done better with calculation of venues categories (most common) in each cities. Refering to each clsuter, we can't deterimine clearly what represent in each cluster by using Foursquare - Most Common Venue data.

However, for the sae of this project we assumed each cluster as follow:


-- Cluster 1: Mumbai: Tourism
-- Cluster 2: Mumbai: Residental
-- Cluster 3: Mumbai: Mix
-- ster 1: Delhi: Residental
-- ster 2: Delhi: Tourism
-- ster 3: Delhi: Sport


What is lacking at this point is a systematic, quantitative way to identify and distinguish different district and to describe the correlation most common venues as recorded in Foursquare. The reality is however more complex: similar cities might have or might not have similar common venues. A further step in this classification would be to find a method to extract these common venues and integrate the spatial correlations between different of areas or district.

We believe that the classification we propose is an encouraging step towards a quantitative and systematic comparison of the different cities. Further studies are indeed needed in order to relate the data acquired, then observe it to more meaningful and objective results.

## Conclusion

Using Foursquare API, we can captured data of common places all around the world. Using it, we refer back to our main objectives, which is to determine;

the similarity or dissimilarirty of both cities
classification of area located inside the city whether it is residential, tourism places, or others

In conclusion, both cities Mumbai and Delhi are the center of attraction among Indians. However, to declare both cities are similar or dissimilar base on common venues visited is quite difficult. Both cities is similar in some venues also dissimilar in certain venues. And for classitification based on common venues, again we must have more systematic or quantitative way to identify and declare this. Comparison can be made, but no such method or quantitative data to determine this. We hope in the future, a method to determine it can be establish and explore for references.

Thank you.