## Introduction/Business Problem 

Mumbai and Delhi and two most important metro cities of India. There has been a war for supremacy in terms of quality of life, jobs, education, entertainment and recreation that these cities offer to its residents. In this project I attempt to analyze the neighbourhoods in each of these two cities and try to understand what is popular in them and what they have to offer to someone who is contemplating to make a choice on seeking a life in either of the metro cities. The deciding factor for most would be on how lively, supportive, vibrant and unique each of the cities can be when compared to each other. The business problem assumes there are people who would be interested in this study, which can reveal or create a projection of potential life and activities if the subject moves to live in one of these metro cities. The decision to choose one over the other would depend on popular venues in the neighborhoods in each of these metro cities. 

## Data

The dataset which we will use for our study is a csv downloaded from https://data.gov.in/catalog/all-india-pincode-directory.
This dataset has pincodes for all the post offices in India. We will read the csv, curate it to remove the data related to all other cities, towns and places which are not Mumbai or Delhi. We will then cleanup the unnecessary columns in the csv, which if not relevant or useful for our current study.
Postoffice names will be used as the neighbourhood names in each of the regions such as Mumbai or Delhi.
Neighbourhood names with same pincode will be combined as single row.
Foursquare API will be used to find the longitude and latitude of each of the neighbourhoods in both Mumbai and Delhi.
This will form the dataset we will use for this study.

<h3>Downloading dependencies</h3>

In [5]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
#import folium # map rendering library

print('Libraries imported.')


Libraries imported.


<h3>Read the dataset which is a csv downloaded from https://data.gov.in/catalog/all-india-pincode-directory</h3>

In [13]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,officename,pincode,officeType,Deliverystatus,divisionname,regionname,circlename,Taluk,Districtname,statename,Telephone,Related Suboffice,Related Headoffice,longitude,latitude
0,Achalapur B.O,504273,B.O,Delivery,Adilabad,Hyderabad,Andhra Pradesh,Asifabad,Adilabad,TELANGANA,,Rechini S.O,Mancherial H.O,,
1,Ada B.O,504293,B.O,Delivery,Adilabad,Hyderabad,Andhra Pradesh,Asifabad,Adilabad,TELANGANA,,Asifabad S.O,Mancherial H.O,,
2,Adegaon B.O,504307,B.O,Delivery,Adilabad,Hyderabad,Andhra Pradesh,Boath,Adilabad,TELANGANA,,Echoda S.O,Adilabad H.O,,
3,Adilabad Collectorate S.O,504001,S.O,Non-Delivery,Adilabad,Hyderabad,Andhra Pradesh,Adilabad,Adilabad,TELANGANA,08732-226703,,Adilabad H.O,,
4,Adilabad H.O,504001,H.O,Delivery,Adilabad,Hyderabad,Andhra Pradesh,Adilabad,Adilabad,TELANGANA,08732-226738,,,,


In [14]:
len(df_data_1['pincode'].unique())


19100

In [15]:
df_data_1.tail()

Unnamed: 0,officename,pincode,officeType,Deliverystatus,divisionname,regionname,circlename,Taluk,Districtname,statename,Telephone,Related Suboffice,Related Headoffice,longitude,latitude
154792,Uttar Sautanchak B.O,721649,B.O,Delivery,Tamluk,South Bengal,West Bengal,Nandakumar,East Midnapore,WEST BENGAL,,Mirikpur S.O,Tamluk H.O,,
154793,Uttarjianda B.O,721151,B.O,Delivery,Tamluk,South Bengal,West Bengal,Panskura-i,East Midnapore,WEST BENGAL,,Bhogpur S.O (East Midnapore),Tamluk H.O,,
154794,Uttarkotebarh B.O,721626,B.O,Delivery,Tamluk,South Bengal,West Bengal,Bhagawanpur,East Midnapore,WEST BENGAL,,Kajlagarh S.O,Tamluk H.O,,
154795,Uttarmechogram B.O,721139,B.O,Delivery,Tamluk,South Bengal,West Bengal,Panskura-i,East Midnapore,WEST BENGAL,,Panskura S.O,Tamluk H.O,,
154796,Uttarsonamui B.O,721648,B.O,Delivery,Tamluk,South Bengal,West Bengal,Nandakumar,East Midnapore,WEST BENGAL,,Byabattarhat S.O,Tamluk H.O,,


In [17]:
df_data_2 = df_data_1
df_data_2.drop(columns=['Taluk'], axis=1, inplace=True)
df_data_2.head()

Unnamed: 0,officename,pincode,officeType,Deliverystatus,divisionname,regionname,circlename,Districtname,statename,Telephone,Related Suboffice,Related Headoffice,longitude,latitude
0,Achalapur B.O,504273,B.O,Delivery,Adilabad,Hyderabad,Andhra Pradesh,Adilabad,TELANGANA,,Rechini S.O,Mancherial H.O,,
1,Ada B.O,504293,B.O,Delivery,Adilabad,Hyderabad,Andhra Pradesh,Adilabad,TELANGANA,,Asifabad S.O,Mancherial H.O,,
2,Adegaon B.O,504307,B.O,Delivery,Adilabad,Hyderabad,Andhra Pradesh,Adilabad,TELANGANA,,Echoda S.O,Adilabad H.O,,
3,Adilabad Collectorate S.O,504001,S.O,Non-Delivery,Adilabad,Hyderabad,Andhra Pradesh,Adilabad,TELANGANA,08732-226703,,Adilabad H.O,,
4,Adilabad H.O,504001,H.O,Delivery,Adilabad,Hyderabad,Andhra Pradesh,Adilabad,TELANGANA,08732-226738,,,,


In [18]:
df_data_2.drop(columns=['Related Suboffice', 'Related Headoffice', 'divisionname', 'Deliverystatus', 'officeType', 'Telephone', 'Districtname', 'circlename'], axis=1, inplace=True)
df_data_2.head()

Unnamed: 0,officename,pincode,regionname,statename,longitude,latitude
0,Achalapur B.O,504273,Hyderabad,TELANGANA,,
1,Ada B.O,504293,Hyderabad,TELANGANA,,
2,Adegaon B.O,504307,Hyderabad,TELANGANA,,
3,Adilabad Collectorate S.O,504001,Hyderabad,TELANGANA,,
4,Adilabad H.O,504001,Hyderabad,TELANGANA,,


In [23]:
df_data_3 = df_data_2[df_data_2.regionname == 'Mumbai']
df_data_3.head()

Unnamed: 0,officename,pincode,regionname,statename,longitude,latitude
81395,Antop Hill S.O,400037,Mumbai,MAHARASHTRA,,
81396,B P T Colony S.O,400037,Mumbai,MAHARASHTRA,,
81397,B.P.Lane S.O,400003,Mumbai,MAHARASHTRA,,
81398,BEST STaff Quarters S.O,400012,Mumbai,MAHARASHTRA,,
81399,C G S Colony S.O,400037,Mumbai,MAHARASHTRA,,


In [24]:
df_data_4 = df_data_2[df_data_2.regionname == 'Delhi']
df_data_4.head()

Unnamed: 0,officename,pincode,regionname,statename,longitude,latitude
32383,Anand Vihar S.O,110092,Delhi,DELHI,,
32384,Azad Nagar S.O (East Delhi),110051,Delhi,DELHI,,
32385,Babarpur S.O (North East Delhi),110032,Delhi,DELHI,,
32386,Badarpur Khadar B.O,110090,Delhi,DELHI,,
32387,Balbir Nagar S.O,110032,Delhi,DELHI,,


In [25]:
df_data_3.shape

(1123, 6)

In [26]:
df_data_4.shape

(545, 6)

In [28]:
df_data_3 = df_data_3.append(df_data_4)
df_data_3.shape

(1668, 6)

In [29]:
df_data_3.head(10)

Unnamed: 0,officename,pincode,regionname,statename,longitude,latitude
81395,Antop Hill S.O,400037,Mumbai,MAHARASHTRA,,
81396,B P T Colony S.O,400037,Mumbai,MAHARASHTRA,,
81397,B.P.Lane S.O,400003,Mumbai,MAHARASHTRA,,
81398,BEST STaff Quarters S.O,400012,Mumbai,MAHARASHTRA,,
81399,C G S Colony S.O,400037,Mumbai,MAHARASHTRA,,
81400,Chamarbaug S.O,400012,Mumbai,MAHARASHTRA,,
81401,Chinchbunder H.O,400009,Mumbai,MAHARASHTRA,,
81402,Cotton Exchange S.O,400033,Mumbai,MAHARASHTRA,,
81403,Dadar Colony S.O,400014,Mumbai,MAHARASHTRA,,
81404,Dadar H.O,400014,Mumbai,MAHARASHTRA,,


In [30]:
df_data_3.rename(columns = {'officename': 'neighbourhood'}, inplace=True)
df_data_3.head()


Unnamed: 0,neighbourhood,pincode,regionname,statename,longitude,latitude
81395,Antop Hill S.O,400037,Mumbai,MAHARASHTRA,,
81396,B P T Colony S.O,400037,Mumbai,MAHARASHTRA,,
81397,B.P.Lane S.O,400003,Mumbai,MAHARASHTRA,,
81398,BEST STaff Quarters S.O,400012,Mumbai,MAHARASHTRA,,
81399,C G S Colony S.O,400037,Mumbai,MAHARASHTRA,,


In [31]:
df_data_3 = df_data_3.replace(' S.O','', regex=True)
df_data_3.head()

Unnamed: 0,neighbourhood,pincode,regionname,statename,longitude,latitude
81395,Antop Hill,400037,Mumbai,MAHARASHTRA,,
81396,B P T Colony,400037,Mumbai,MAHARASHTRA,,
81397,B.P.Lane,400003,Mumbai,MAHARASHTRA,,
81398,BEST STaff Quarters,400012,Mumbai,MAHARASHTRA,,
81399,C G S Colony,400037,Mumbai,MAHARASHTRA,,


In [32]:
df_data_3 = df_data_3.replace(' H.O','', regex=True)
df_data_3.head()

Unnamed: 0,neighbourhood,pincode,regionname,statename,longitude,latitude
81395,Antop Hill,400037,Mumbai,MAHARASHTRA,,
81396,B P T Colony,400037,Mumbai,MAHARASHTRA,,
81397,B.P.Lane,400003,Mumbai,MAHARASHTRA,,
81398,BEST STaff Quarters,400012,Mumbai,MAHARASHTRA,,
81399,C G S Colony,400037,Mumbai,MAHARASHTRA,,


In [39]:
df_data_3 = df_data_3.replace(' B.O','', regex=True)
df_data_3.head()

Unnamed: 0,neighbourhood,pincode,regionname,longitude,latitude
81395,Antop Hill,400037,Mumbai,,
81396,B P T Colony,400037,Mumbai,,
81397,B.P.Lane,400003,Mumbai,,
81398,BEST STaff Quarters,400012,Mumbai,,
81399,C G S Colony,400037,Mumbai,,


In [34]:
df_data_3.drop(columns=['statename'], axis =1, inplace = True)
df_data_3.head()

Unnamed: 0,neighbourhood,pincode,regionname,longitude,latitude
81395,Antop Hill,400037,Mumbai,,
81396,B P T Colony,400037,Mumbai,,
81397,B.P.Lane,400003,Mumbai,,
81398,BEST STaff Quarters,400012,Mumbai,,
81399,C G S Colony,400037,Mumbai,,


### Let us group the neighbourhoods based on the pincode value and the regionname.
#### The neighbourhood values are concatenated using a comma when the rows are combined for pincodes with same value and region.

In [40]:
change = lambda a: ", ".join(a) 

df_grouped = df_data_3.groupby(['pincode','regionname'])
df_grouped_agg = df_grouped.agg({'neighbourhood': change}).reset_index()
df_grouped_agg.head()


Unnamed: 0,pincode,regionname,neighbourhood
0,110001,Delhi,"Baroda House, Bengali Market, Bhagat Singh Mar..."
1,110002,Delhi,"A.G.C.R., Ajmeri Gate Extn., Darya Ganj, Gandh..."
2,110003,Delhi,"Delhi High Court Extension Counter, Delhi High..."
3,110004,Delhi,Rashtrapati Bhawan
4,110005,Delhi,"Anand Parbat Indl. Area, Anand Parbat, Bank St..."


In [41]:
df_grouped_agg.shape

(335, 3)

In [42]:
df_grouped_agg.tail()

Unnamed: 0,pincode,regionname,neighbourhood
330,421506,Mumbai,Additional Ambernath
331,421601,Mumbai,"Aghai, Alyani, Ambarje, Andad, Asangaon, Atgao..."
332,421602,Mumbai,"Kasara (Thane), Mokhawane, Shirol, Vashala, Vi..."
333,421603,Mumbai,"Bhatsanagar, Birwadi"
334,421605,Mumbai,"Khadavali, Manda, Phalegaon, Titwala"


In [44]:
!pip install folium
import folium # map rendering library

Collecting folium
  Downloading https://files.pythonhosted.org/packages/43/77/0287320dc4fd86ae8847bab6c34b5ec370e836a79c7b0c16680a3d9fd770/folium-0.8.3-py2.py3-none-any.whl (87kB)
[K    100% |████████████████████████████████| 92kB 7.0MB/s eta 0:00:01
[?25hRequirement not upgraded as not directly required: jinja2 in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from folium)
Requirement not upgraded as not directly required: requests in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from folium)
Requirement not upgraded as not directly required: numpy in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from folium)
Requirement not upgraded as not directly required: six in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from folium)
Collecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/63/36/1c93318e9653f4e414a2e0c3b98fc898b4970e939afeedeee6075dd3b703/branca-0.3.1-py3-none-any.whl
Requirement not upgraded a

In [45]:
address = 'Mumbai'

geolocator = Nominatim(user_agent="mumbai_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Mumbai are {}, {}.'.format(latitude, longitude))


The geograpical coordinate of Mumbai are 18.9387711, 72.8353355.


In [46]:
address_d = 'Delhi'

geolocator = Nominatim(user_agent="delhi_explorer")
location_d = geolocator.geocode(address_d)
latitude_d = location_d.latitude
longitude_d = location_d.longitude
print('The geograpical coordinate of Mumbai are {}, {}.'.format(latitude_d, longitude_d))

The geograpical coordinate of Mumbai are 28.6517178, 77.2219388.


### Let us now get the latitude and longitude for the neighbourhoods in Mumbai and create a new dataframe that holds the neighbourhood values along with their corresponding latitude and longitude values.

In [53]:
# define the dataframe columns
column_names = ['neighbourhood', 'latitude', 'longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)
neighborhoods

Unnamed: 0,neighbourhood,latitude,longitude


In [54]:
for index, row in df_grouped_agg.iterrows():
    region = row['regionname'] 
    neighborhood_name = row['neighbourhood']
    
        
    address = neighborhood_name+", "+region
    geolocator = Nominatim(user_agent="ind_explorer")
    n_location = geolocator.geocode(address)
    try:
        n_latitude = n_location.latitude
        n_longitude = n_location.longitude
    except:
        address = region
        geolocator = Nominatim(user_agent="ind_explorer")
        n_location = geolocator.geocode(address)
        n_latitude = n_location.latitude
        n_longitude = n_location.longitude
    #print('The geograpical coordinate of {} are {}, {}.'.format(address, n_latitude, n_longitude))
    neighborhoods = neighborhoods.append({#'Borough': borough,
                                          'neighbourhood': neighborhood_name,
                                          'latitude': n_latitude,
                                          'longitude': n_longitude}, ignore_index=True)
    
neighborhoods.head()


Unnamed: 0,neighbourhood,latitude,longitude
0,"Baroda House, Bengali Market, Bhagat Singh Mar...",28.651718,77.221939
1,"A.G.C.R., Ajmeri Gate Extn., Darya Ganj, Gandh...",28.651718,77.221939
2,"Delhi High Court Extension Counter, Delhi High...",28.651718,77.221939
3,Rashtrapati Bhawan,28.614458,77.199594
4,"Anand Parbat Indl. Area, Anand Parbat, Bank St...",28.651718,77.221939


In [55]:
neighborhoods.tail()

Unnamed: 0,neighbourhood,latitude,longitude
330,Additional Ambernath,18.938771,72.835335
331,"Aghai, Alyani, Ambarje, Andad, Asangaon, Atgao...",18.938771,72.835335
332,"Kasara (Thane), Mokhawane, Shirol, Vashala, Vi...",18.938771,72.835335
333,"Bhatsanagar, Birwadi",18.938771,72.835335
334,"Khadavali, Manda, Phalegaon, Titwala",18.938771,72.835335


In [56]:
neighborhoods.shape

(335, 3)

In [57]:
neighborhoods.drop('neighbourhood',axis=1, inplace=True)
neighborhoods = pd.concat([df_grouped_agg, neighborhoods], axis=1)
neighborhoods.head()


Unnamed: 0,pincode,regionname,neighbourhood,latitude,longitude
0,110001,Delhi,"Baroda House, Bengali Market, Bhagat Singh Mar...",28.651718,77.221939
1,110002,Delhi,"A.G.C.R., Ajmeri Gate Extn., Darya Ganj, Gandh...",28.651718,77.221939
2,110003,Delhi,"Delhi High Court Extension Counter, Delhi High...",28.651718,77.221939
3,110004,Delhi,Rashtrapati Bhawan,28.614458,77.199594
4,110005,Delhi,"Anand Parbat Indl. Area, Anand Parbat, Bank St...",28.651718,77.221939


In [60]:
neighborhoods_m = neighborhoods[neighborhoods.regionname == 'Mumbai']
neighborhoods_m.head()

Unnamed: 0,pincode,regionname,neighbourhood,latitude,longitude
95,400001,Mumbai,"Bazargate, Elephanta Caves Po, M.P.T., Stock E...",18.938771,72.835335
96,400002,Mumbai,"Kalbadevi, Ramwadi, S. C. Court, Thakurdwar",18.938771,72.835335
97,400003,Mumbai,"B.P.Lane, Mandvi (Mumbai), Masjid, Null Bazar",18.938771,72.835335
98,400004,Mumbai,"Ambewadi (Mumbai), Charni Road, Chaupati, Girg...",18.938771,72.835335
99,400005,Mumbai,"Asvini, Colaba Bazar, Colaba, Holiday Camp, V....",18.938771,72.835335


### Plot the neighbourhoods in Mumbai based on the pincode

In [61]:
# create map using latitude and longitude values
map_mumbai = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, region, neighborhood in zip(neighborhoods_m['latitude'], neighborhoods_m['longitude'], neighborhoods_m['regionname'], neighborhoods_m['neighbourhood']):
    label = '{}, {}'.format(neighborhood, region)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_mumbai)  
    
map_mumbai

In [62]:
neighborhoods_d = neighborhoods[neighborhoods.regionname == 'Delhi']
neighborhoods_d.head()

Unnamed: 0,pincode,regionname,neighbourhood,latitude,longitude
0,110001,Delhi,"Baroda House, Bengali Market, Bhagat Singh Mar...",28.651718,77.221939
1,110002,Delhi,"A.G.C.R., Ajmeri Gate Extn., Darya Ganj, Gandh...",28.651718,77.221939
2,110003,Delhi,"Delhi High Court Extension Counter, Delhi High...",28.651718,77.221939
3,110004,Delhi,Rashtrapati Bhawan,28.614458,77.199594
4,110005,Delhi,"Anand Parbat Indl. Area, Anand Parbat, Bank St...",28.651718,77.221939


### Plot the neighbourhoods in Delhi based on their pin code

In [64]:


# create map using latitude and longitude values
map_delhi = folium.Map(location=[latitude_d, longitude_d], zoom_start=10)

# add markers to map
for lat, lng, region, neighborhood in zip(neighborhoods_d['latitude'], neighborhoods_d['longitude'], neighborhoods_d['regionname'], neighborhoods_d['neighbourhood']):
    label = '{}, {}'.format(neighborhood, region)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#4286cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_delhi)  
    
map_delhi

In [65]:
neighborhoods_d.shape

(95, 5)

In [66]:
# @hidden-cell
CLIENT_ID = 'PWE1OYYQJVRPR30NEY4DAP00OTIMREKUGQVZ5IMEFYJLE5Q4' # your Foursquare ID
CLIENT_SECRET = '0BFI4OURUS4QRA0IMLBH31CINF0HGHJOMTMROEJW0DCIUQ3K' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: PWE1OYYQJVRPR30NEY4DAP00OTIMREKUGQVZ5IMEFYJLE5Q4
CLIENT_SECRET:0BFI4OURUS4QRA0IMLBH31CINF0HGHJOMTMROEJW0DCIUQ3K


In [67]:
mumbai_data = neighborhoods[neighborhoods['regionname'] == 'Mumbai'].reset_index(drop=True)
mumbai_data.head()
mumbai_data.shape

(240, 5)

In [94]:
mumbai_data.rename(columns = {'neighbourhood': 'Neighborhood'}, inplace=True)


In [95]:
mumbai_data.loc[0, 'Neighborhood']


'Bazargate, Elephanta Caves Po, M.P.T., Stock Exchange, Tajmahal, Town Hall (Mumbai), Mumbai G.P.O. '

In [69]:
neighborhood_latitude = mumbai_data.loc[0, 'latitude'] # neighborhood latitude value
neighborhood_longitude = mumbai_data.loc[0, 'longitude'] # neighborhood longitude value

neighborhood_name = mumbai_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))


Latitude and longitude values of Bazargate, Elephanta Caves Po, M.P.T., Stock Exchange, Tajmahal, Town Hall (Mumbai), Mumbai G.P.O.  are 18.9387711, 72.8353355.


### Let us find the common venues around the neighbourhoods of Mumbai using teh Foursquare API

In [70]:
limit=100
radius=500
url='https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, neighborhood_latitude, neighborhood_longitude, radius, limit)
results = requests.get(url).json()
results


{'meta': {'code': 200, 'requestId': '5cd919fddb04f559d7f875e3'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-4caa0096d971b1f7ccca23e1-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/asian_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d145941735',
         'name': 'Chinese Restaurant',
         'pluralName': 'Chinese Restaurants',
         'primary': True,
         'shortName': 'Chinese'}],
       'id': '4caa0096d971b1f7ccca23e1',
       'location': {'address': 'Waudby Road',
        'cc': 'IN',
        'city': 'Mumbai',
        'country': 'India',
        'distance': 253,
        'formattedAddress': ['Waudby Road', 'Mumbai', 'Mahārāshtra', 'India'],
        'labeledLatLngs': [{'label': 'display',
          'lat': 18.938715239156295,
          'ln

In [71]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [72]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()


Unnamed: 0,name,categories,lat,lng
0,Royal China,Chinese Restaurant,18.938715,72.832933
1,Town House Cafe,Bar,18.93855,72.833464
2,Cafe Excelsior,Café,18.937701,72.833566
3,Chhatrapati Shivaji Maharaj Terminus,Train Station,18.940297,72.835384
4,Sher-E-Punjab,Indian Restaurant,18.937944,72.837853


In [73]:
nearby_venues.shape

(19, 4)

In [74]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()['response']['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)


In [96]:
mumbai_venues = getNearbyVenues(names=mumbai_data['Neighborhood'],
                                   latitudes=mumbai_data['latitude'],
                                   longitudes=mumbai_data['longitude']
                                  )

mumbai_venues


Bazargate, Elephanta Caves Po, M.P.T., Stock Exchange, Tajmahal, Town Hall (Mumbai), Mumbai G.P.O. 
Kalbadevi, Ramwadi, S. C. Court, Thakurdwar
B.P.Lane, Mandvi (Mumbai), Masjid, Null Bazar
Ambewadi (Mumbai), Charni Road, Chaupati, Girgaon, Madhavbaug, Opera House
Asvini, Colaba Bazar, Colaba, Holiday Camp, V.W.T.C.
Malabar Hill
Bharat Nagar (Mumbai), Grant Road, N.S.Patkar Marg, S V Marg, Tardeo
Falkland Road, J.J.Hospital, Kamathipura, M A Marg, Mumbai Central
Chinchbunder, Noor Baug, Princess Dock
Dockyard Road, Mazgaon Dock, Mazgaon Road, Mazgaon, V K Bhavan
Agripada, BPC  Jacob Circle, Chinchpokli, Haines Road, Jacob Circle
BEST STaff Quarters, Chamarbaug, Haffkin Institute, Lal Baug, Parel Naka, Parel Rly Work Shop, Parel
Delisle Road
Dadar Colony, Dadar, Naigaon (Mumbai)
Sewri
Kapad Bazar, Mahim Bazar, Mahim East, Mahim, Mori Road
Dharavi Road, Dharavi
Worli Naka, Worli
Matunga
Central Building, Churchgate, Marine Lines
Nariman Point, New Yogakshema
Chunabhatti, Raoli Camp, Sion

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Bazargate, Elephanta Caves Po, M.P.T., Stock E...",18.938771,72.835335,Royal China,18.938715,72.832933,Chinese Restaurant
1,"Bazargate, Elephanta Caves Po, M.P.T., Stock E...",18.938771,72.835335,Town House Cafe,18.93855,72.833464,Bar
2,"Bazargate, Elephanta Caves Po, M.P.T., Stock E...",18.938771,72.835335,Cafe Excelsior,18.937701,72.833566,Café
3,"Bazargate, Elephanta Caves Po, M.P.T., Stock E...",18.938771,72.835335,Chhatrapati Shivaji Maharaj Terminus,18.940297,72.835384,Train Station
4,"Bazargate, Elephanta Caves Po, M.P.T., Stock E...",18.938771,72.835335,Sher-E-Punjab,18.937944,72.837853,Indian Restaurant
5,"Bazargate, Elephanta Caves Po, M.P.T., Stock E...",18.938771,72.835335,Cannon Pav Bhaji,18.941034,72.835673,Food Truck
6,"Bazargate, Elephanta Caves Po, M.P.T., Stock E...",18.938771,72.835335,Dakshin Bar And Kitchen,18.936489,72.83749,Seafood Restaurant
7,"Bazargate, Elephanta Caves Po, M.P.T., Stock E...",18.938771,72.835335,Sterling Cineplex,18.938296,72.833104,Multiplex
8,"Bazargate, Elephanta Caves Po, M.P.T., Stock E...",18.938771,72.835335,Pancham Puriwala,18.938214,72.835697,Indian Restaurant
9,"Bazargate, Elephanta Caves Po, M.P.T., Stock E...",18.938771,72.835335,Café Universal,18.936021,72.837453,Irani Cafe


In [76]:
print(mumbai_venues.shape)
mumbai_venues.head()


(4298, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Bazargate, Elephanta Caves Po, M.P.T., Stock E...",18.938771,72.835335,Royal China,18.938715,72.832933,Chinese Restaurant
1,"Bazargate, Elephanta Caves Po, M.P.T., Stock E...",18.938771,72.835335,Town House Cafe,18.93855,72.833464,Bar
2,"Bazargate, Elephanta Caves Po, M.P.T., Stock E...",18.938771,72.835335,Cafe Excelsior,18.937701,72.833566,Café
3,"Bazargate, Elephanta Caves Po, M.P.T., Stock E...",18.938771,72.835335,Chhatrapati Shivaji Maharaj Terminus,18.940297,72.835384,Train Station
4,"Bazargate, Elephanta Caves Po, M.P.T., Stock E...",18.938771,72.835335,Sher-E-Punjab,18.937944,72.837853,Indian Restaurant


In [77]:
mumbai_venues.groupby('Neighborhood').count()


Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"A I Staff Colony, Santacruz P&t Colony",19,19,19,19,19,19
"Aareymilk Colony, Nagari Niwara, S R P F Camp",19,19,19,19,19,19
"Abitghar, Abje, Alonde, Baliwali, Dohe, Gargoan, Gorhe, Hamarapur, Kalambe, Kanchad, Khare Ambiwali, Khariwali, Kone, Mala, Malwada, Maniwali, Moj, Parali, Pik, Posheri, Sonale, Tuse, Utawali, Varale, Vilkos, Wada, Waki",19,19,19,19,19,19
"Acchad, Dabhadi, Dhaniwari, Dhundalwadi, Dongri, Girgaon, Kochai, Kurze, Modgaon, Talasari, Udhave, Uplat, Vevji, Zari",19,19,19,19,19,19
"Achloli, Khardi, Kolose, Konzar, Mandle, Nate, Pachad, Sandoshi, Savrat, Taloshi, Tetghar, Varandoli, Wagheri",19,19,19,19,19,19
"Adai, Awre, Chawk, Chirner, Dighode, Gavan, Jasai, Jci Kamothe, Jui, Kelevane, Khanda Colony, Koproli, Kundevahal, Lodhivali, Morbe, Nere, Nhava, Panvel City, Panvel, Pargaon, Somatane, Ulwa, Vahal, Vaje, Vasheni, Wavandhal, Wavarle",19,19,19,19,19,19
"Adavale Budruk, Borghar, Chambhargani, Deopur, Devla, Dhamandevi, Divil, Golegani, Kapade Budruk, Kondhavi, Kotwal Budruk, Lohare, Mahargul, Ombali, Paithan, Palchil, Poladpur, Sadavali, Sakhar, Turbhe Khurd, Wakan",19,19,19,19,19,19
Additional Ambernath,19,19,19,19,19,19
"Adgaon, Borlipanchatan, Sarve, Velas, Wadwali",19,19,19,19,19,19
"Adhi, Devali, Goregaon (Raigarh(MH)), Harkol, Lonere, Mangarul, Nagaon, Nandvi, Panhalghar, Purar, Usarghar, Wadvali",19,19,19,19,19,19


In [78]:
print('There are {} uniques categories.'.format(len(mumbai_venues['Venue Category'].unique())))


There are 116 uniques categories.


In [79]:
# one hot encoding
mumbai_onehot = pd.get_dummies(mumbai_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
mumbai_onehot['Neighborhood'] = mumbai_venues['Neighborhood'] 



Unnamed: 0,Yoga Studio,Accessories Store,Afghan Restaurant,American Restaurant,Aquarium,Arcade,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Beach,Beer Garden,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Bubble Tea Shop,Burger Joint,Bus Station,Bus Stop,Café,Campground,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Auditorium,Concert Hall,Convenience Store,Deli / Bodega,Department Store,Dessert Shop,Diner,Donut Shop,Event Space,Fast Food Restaurant,Fish & Chips Shop,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Gas Station,Gastropub,Gourmet Shop,Gym,Gym / Fitness Center,Harbor / Marina,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Irani Cafe,Italian Restaurant,Jewelry Store,Juice Bar,Lake,Light Rail Station,Lounge,Market,Mediterranean Restaurant,Men's Store,Molecular Gastronomy Restaurant,Motel,Movie Theater,Moving Target,Mughlai Restaurant,Multiplex,Music Venue,Neighborhood,Nightclub,Other Great Outdoors,Park,Pharmacy,Pizza Place,Platform,Playground,Plaza,Pool,Pub,Restaurant,Salad Place,Sandwich Place,Sculpture Garden,Seafood Restaurant,Shopping Mall,Smoke Shop,Snack Place,Soup Place,South Indian Restaurant,Spa,Spanish Restaurant,Sports Bar,Steakhouse,Supermarket,Tailor Shop,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park,Toll Plaza,Trail,Train Station,Vegetarian / Vegan Restaurant,Women's Store
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Bazargate, Elephanta Caves Po, M.P.T., Stock E...",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Bazargate, Elephanta Caves Po, M.P.T., Stock E...",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Bazargate, Elephanta Caves Po, M.P.T., Stock E...",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Bazargate, Elephanta Caves Po, M.P.T., Stock E...",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Bazargate, Elephanta Caves Po, M.P.T., Stock E...",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [80]:
# move neighborhood column to the first column
fixed_columns = [mumbai_onehot.columns[-1]] + list(mumbai_onehot.columns[:-1])
mumbai_onehot = mumbai_onehot[fixed_columns]

mumbai_onehot.head()

Unnamed: 0,Women's Store,Yoga Studio,Accessories Store,Afghan Restaurant,American Restaurant,Aquarium,Arcade,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Beach,Beer Garden,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Bubble Tea Shop,Burger Joint,Bus Station,Bus Stop,Café,Campground,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Auditorium,Concert Hall,Convenience Store,Deli / Bodega,Department Store,Dessert Shop,Diner,Donut Shop,Event Space,Fast Food Restaurant,Fish & Chips Shop,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Gas Station,Gastropub,Gourmet Shop,Gym,Gym / Fitness Center,Harbor / Marina,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Irani Cafe,Italian Restaurant,Jewelry Store,Juice Bar,Lake,Light Rail Station,Lounge,Market,Mediterranean Restaurant,Men's Store,Molecular Gastronomy Restaurant,Motel,Movie Theater,Moving Target,Mughlai Restaurant,Multiplex,Music Venue,Neighborhood,Nightclub,Other Great Outdoors,Park,Pharmacy,Pizza Place,Platform,Playground,Plaza,Pool,Pub,Restaurant,Salad Place,Sandwich Place,Sculpture Garden,Seafood Restaurant,Shopping Mall,Smoke Shop,Snack Place,Soup Place,South Indian Restaurant,Spa,Spanish Restaurant,Sports Bar,Steakhouse,Supermarket,Tailor Shop,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park,Toll Plaza,Trail,Train Station,Vegetarian / Vegan Restaurant
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Bazargate, Elephanta Caves Po, M.P.T., Stock E...",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Bazargate, Elephanta Caves Po, M.P.T., Stock E...",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Bazargate, Elephanta Caves Po, M.P.T., Stock E...",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Bazargate, Elephanta Caves Po, M.P.T., Stock E...",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Bazargate, Elephanta Caves Po, M.P.T., Stock E...",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [81]:
mumbai_grouped = mumbai_onehot.groupby('Neighborhood').mean().reset_index()
mumbai_grouped.head()


Unnamed: 0,Neighborhood,Women's Store,Yoga Studio,Accessories Store,Afghan Restaurant,American Restaurant,Aquarium,Arcade,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Beach,Beer Garden,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Bubble Tea Shop,Burger Joint,Bus Station,Bus Stop,Café,Campground,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Auditorium,Concert Hall,Convenience Store,Deli / Bodega,Department Store,Dessert Shop,Diner,Donut Shop,Event Space,Fast Food Restaurant,Fish & Chips Shop,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Gas Station,Gastropub,Gourmet Shop,Gym,Gym / Fitness Center,Harbor / Marina,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Irani Cafe,Italian Restaurant,Jewelry Store,Juice Bar,Lake,Light Rail Station,Lounge,Market,Mediterranean Restaurant,Men's Store,Molecular Gastronomy Restaurant,Motel,Movie Theater,Moving Target,Mughlai Restaurant,Multiplex,Music Venue,Nightclub,Other Great Outdoors,Park,Pharmacy,Pizza Place,Platform,Playground,Plaza,Pool,Pub,Restaurant,Salad Place,Sandwich Place,Sculpture Garden,Seafood Restaurant,Shopping Mall,Smoke Shop,Snack Place,Soup Place,South Indian Restaurant,Spa,Spanish Restaurant,Sports Bar,Steakhouse,Supermarket,Tailor Shop,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park,Toll Plaza,Trail,Train Station,Vegetarian / Vegan Restaurant
0,"A I Staff Colony, Santacruz P&t Colony",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.052632,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.210526,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0
1,"Aareymilk Colony, Nagari Niwara, S R P F Camp",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.052632,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.210526,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0
2,"Abitghar, Abje, Alonde, Baliwali, Dohe, Gargoa...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.052632,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.210526,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0
3,"Acchad, Dabhadi, Dhaniwari, Dhundalwadi, Dongr...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.052632,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.210526,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0
4,"Achloli, Khardi, Kolose, Konzar, Mandle, Nate,...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.052632,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.210526,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0


In [82]:
num_top_venues = 3

for hood in mumbai_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = mumbai_grouped[mumbai_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')


----A I Staff Colony, Santacruz P&t Colony----
               venue  freq
0  Indian Restaurant  0.21
1         Irani Cafe  0.11
2             Lounge  0.05


----Aareymilk Colony, Nagari Niwara, S R P F Camp----
               venue  freq
0  Indian Restaurant  0.21
1         Irani Cafe  0.11
2             Lounge  0.05


----Abitghar, Abje, Alonde, Baliwali, Dohe, Gargoan, Gorhe, Hamarapur, Kalambe, Kanchad, Khare Ambiwali, Khariwali, Kone, Mala, Malwada, Maniwali, Moj, Parali, Pik, Posheri, Sonale, Tuse, Utawali, Varale, Vilkos, Wada, Waki----
               venue  freq
0  Indian Restaurant  0.21
1         Irani Cafe  0.11
2             Lounge  0.05


----Acchad, Dabhadi, Dhaniwari, Dhundalwadi, Dongri, Girgaon, Kochai, Kurze, Modgaon, Talasari, Udhave, Uplat, Vevji, Zari----
               venue  freq
0  Indian Restaurant  0.21
1         Irani Cafe  0.11
2             Lounge  0.05


----Achloli, Khardi, Kolose, Konzar, Mandle, Nate, Pachad, Sandoshi, Savrat, Taloshi, Tetghar, Varandoli

In [83]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]


In [97]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = mumbai_grouped['Neighborhood']

for ind in np.arange(mumbai_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(mumbai_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,"A I Staff Colony, Santacruz P&t Colony",Indian Restaurant,Irani Cafe,Café,Seafood Restaurant,Fast Food Restaurant
1,"Aareymilk Colony, Nagari Niwara, S R P F Camp",Indian Restaurant,Irani Cafe,Café,Seafood Restaurant,Fast Food Restaurant
2,"Abitghar, Abje, Alonde, Baliwali, Dohe, Gargoa...",Indian Restaurant,Irani Cafe,Café,Seafood Restaurant,Fast Food Restaurant
3,"Acchad, Dabhadi, Dhaniwari, Dhundalwadi, Dongr...",Indian Restaurant,Irani Cafe,Café,Seafood Restaurant,Fast Food Restaurant
4,"Achloli, Khardi, Kolose, Konzar, Mandle, Nate,...",Indian Restaurant,Irani Cafe,Café,Seafood Restaurant,Fast Food Restaurant


### Let us now cluster the Mumbai neighbourhoods based on the top venues discovered around them
#### We will use k-Means clustering 

In [98]:
# set number of clusters
kclusters = 5

mumbai_grouped_n_clustering = mumbai_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(mumbai_grouped_n_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

In [99]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

mumbai_merged = mumbai_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
mumbai_merged = mumbai_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

mumbai_merged.head() # check the last columns!

Unnamed: 0,pincode,regionname,Neighborhood,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,400001,Mumbai,"Bazargate, Elephanta Caves Po, M.P.T., Stock E...",18.938771,72.835335,1,Indian Restaurant,Irani Cafe,Café,Seafood Restaurant,Fast Food Restaurant
1,400002,Mumbai,"Kalbadevi, Ramwadi, S. C. Court, Thakurdwar",18.938771,72.835335,1,Indian Restaurant,Irani Cafe,Café,Seafood Restaurant,Fast Food Restaurant
2,400003,Mumbai,"B.P.Lane, Mandvi (Mumbai), Masjid, Null Bazar",18.938771,72.835335,1,Indian Restaurant,Irani Cafe,Café,Seafood Restaurant,Fast Food Restaurant
3,400004,Mumbai,"Ambewadi (Mumbai), Charni Road, Chaupati, Girg...",18.938771,72.835335,1,Indian Restaurant,Irani Cafe,Café,Seafood Restaurant,Fast Food Restaurant
4,400005,Mumbai,"Asvini, Colaba Bazar, Colaba, Holiday Camp, V....",18.938771,72.835335,1,Indian Restaurant,Irani Cafe,Café,Seafood Restaurant,Fast Food Restaurant


In [101]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(mumbai_merged['latitude'], mumbai_merged['longitude'], mumbai_merged['Neighborhood'], mumbai_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### First cluster of Mumbai neighbourhoods are grouped based on the most common venues being Cafes, Indian  restuarants and Fish and Chip shops

In [105]:
mumbai_merged.loc[mumbai_merged['Cluster Labels'] == 0, mumbai_merged.columns[[1] + list(range(5, mumbai_merged.shape[1]))]]


Unnamed: 0,regionname,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
12,Mumbai,0,Indian Restaurant,Coffee Shop,Music Venue,Vegetarian / Vegan Restaurant,Fish & Chips Shop
128,Mumbai,0,Indian Restaurant,Bus Station,Fish & Chips Shop,Vegetarian / Vegan Restaurant,Flea Market
132,Mumbai,0,Indian Restaurant,Café,Vegetarian / Vegan Restaurant,Flea Market,College Auditorium
173,Mumbai,0,Indian Restaurant,Café,Bus Station,Breakfast Spot,Vegetarian / Vegan Restaurant


### Second cluster of Mumbai neighbourhoods are grouped based on the most common venues being Irani Cafes, Indian  restuarants and Seafood and Fast Food joints

In [106]:
mumbai_merged.loc[mumbai_merged['Cluster Labels'] == 1, mumbai_merged.columns[[1] + list(range(5, mumbai_merged.shape[1]))]]



Unnamed: 0,regionname,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Mumbai,1,Indian Restaurant,Irani Cafe,Café,Seafood Restaurant,Fast Food Restaurant
1,Mumbai,1,Indian Restaurant,Irani Cafe,Café,Seafood Restaurant,Fast Food Restaurant
2,Mumbai,1,Indian Restaurant,Irani Cafe,Café,Seafood Restaurant,Fast Food Restaurant
3,Mumbai,1,Indian Restaurant,Irani Cafe,Café,Seafood Restaurant,Fast Food Restaurant
4,Mumbai,1,Indian Restaurant,Irani Cafe,Café,Seafood Restaurant,Fast Food Restaurant
6,Mumbai,1,Indian Restaurant,Irani Cafe,Café,Seafood Restaurant,Fast Food Restaurant
7,Mumbai,1,Indian Restaurant,Irani Cafe,Café,Seafood Restaurant,Fast Food Restaurant
8,Mumbai,1,Indian Restaurant,Irani Cafe,Café,Seafood Restaurant,Fast Food Restaurant
9,Mumbai,1,Indian Restaurant,Irani Cafe,Café,Seafood Restaurant,Fast Food Restaurant
10,Mumbai,1,Indian Restaurant,Irani Cafe,Café,Seafood Restaurant,Fast Food Restaurant


### Third cluster of Mumbai neighbourhoods are grouped based on the most common venues being Cafes, Bakeries, Book stores, Harbor and recreational spots

In [107]:
mumbai_merged.loc[mumbai_merged['Cluster Labels'] == 2, mumbai_merged.columns[[1] + list(range(5, mumbai_merged.shape[1]))]]


Unnamed: 0,regionname,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
5,Mumbai,2,Park,Fast Food Restaurant,Gym,Restaurant,Bakery
14,Mumbai,2,Harbor / Marina,Bus Station,Flea Market,Train Station,Afghan Restaurant
16,Mumbai,2,Indian Restaurant,Seafood Restaurant,Vegetarian / Vegan Restaurant,Bookstore,Movie Theater
18,Mumbai,2,Indian Restaurant,Fast Food Restaurant,Vegetarian / Vegan Restaurant,Bar,Train Station
22,Mumbai,2,Ice Cream Shop,Indian Restaurant,Dessert Shop,Bakery,Seafood Restaurant
23,Mumbai,2,Theater,Bakery,Indian Restaurant,Chinese Restaurant,Sandwich Place
35,Mumbai,2,Asian Restaurant,Train Station,Ice Cream Shop,Bowling Alley,Snack Place
37,Mumbai,2,Café,Lounge,Chinese Restaurant,Bar,Hotel
38,Mumbai,2,Indian Restaurant,Café,Chinese Restaurant,Bakery,Pub
41,Mumbai,2,Indian Restaurant,Fast Food Restaurant,Lounge,Sandwich Place,Bakery


### Fourth cluster of Mumbai neighbourhoods are grouped based on the most common venues being Theme parks and Pizza places

In [108]:
mumbai_merged.loc[mumbai_merged['Cluster Labels'] == 3, mumbai_merged.columns[[1] + list(range(5, mumbai_merged.shape[1]))]]


Unnamed: 0,regionname,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
203,Mumbai,3,Theme Park,Pizza Place,Vegetarian / Vegan Restaurant,Fast Food Restaurant,Cocktail Bar


### Fifth cluster of Mumbai neighbourhoods are grouped based on the most common venues being Boats, Ferries and College auditoriums

In [188]:
mumbai_merged.loc[mumbai_merged['Cluster Labels'] == 4, mumbai_merged.columns[[1] + list(range(5, mumbai_merged.shape[1]))]]

Unnamed: 0,regionname,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
80,Mumbai,4,Moving Target,Boat or Ferry,Vegetarian / Vegan Restaurant,Fish & Chips Shop,College Auditorium


## Now let us try and find the top venues in neighbourhoods of Delhi
#### We will also see what are the top venues in each of the neighbourhood in Delhi and how the neighborhoods can be clustered based on k-Means clustering, given the similarity between the neighborhoods in the same cluster or dissimilarity in types of venues in neighbourhoods belonging to different clusters

In [110]:
delhi_data = neighborhoods[neighborhoods['regionname'] == 'Delhi'].reset_index(drop=True)
delhi_data.head()
delhi_data.shape
delhi_data.rename(columns = {'neighbourhood': 'Neighborhood'}, inplace=True)
delhi_data.loc[0, 'Neighborhood']

'Baroda House, Bengali Market, Bhagat Singh Market, Connaught Place, Constitution House, Election Commission, Janpath, Krishi Bhawan, Lady Harding Medical College, North Avenue, Parliament House, Patiala House, Pragati Maidan Camp, Pragati Maidan, Rail Bhawan, Sansad Marg, Sansadiya Soudh, Secretariat North, Shastri Bhawan, Supreme Court, New Delhi G.P.O. '

In [111]:
neighborhood_latitude = delhi_data.loc[0, 'latitude'] # neighborhood latitude value
neighborhood_longitude = delhi_data.loc[0, 'longitude'] # neighborhood longitude value

neighborhood_name = delhi_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))



Latitude and longitude values of Baroda House, Bengali Market, Bhagat Singh Market, Connaught Place, Constitution House, Election Commission, Janpath, Krishi Bhawan, Lady Harding Medical College, North Avenue, Parliament House, Patiala House, Pragati Maidan Camp, Pragati Maidan, Rail Bhawan, Sansad Marg, Sansadiya Soudh, Secretariat North, Shastri Bhawan, Supreme Court, New Delhi G.P.O.  are 28.6517178, 77.2219388.


### Let us now use the Foursquare API to discover the venues closest to the neighborhoods in Delhi

In [112]:
limit=100
radius=500
url='https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, neighborhood_latitude, neighborhood_longitude, radius, limit)
results_d = requests.get(url).json()
results_d

{'meta': {'code': 200, 'requestId': '5cd921e86a60712128d78ff2'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-5476dcf6498eb0f50cbad8c1-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/shops/artstore_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d127951735',
         'name': 'Arts & Crafts Store',
         'pluralName': 'Arts & Crafts Stores',
         'primary': True,
         'shortName': 'Arts & Crafts'}],
       'id': '5476dcf6498eb0f50cbad8c1',
       'location': {'cc': 'IN',
        'country': 'India',
        'distance': 166,
        'formattedAddress': ['India'],
        'labeledLatLngs': [{'label': 'display',
          'lat': 28.652778,
          'lng': 77.223145}],
        'lat': 28.652778,
        'lng': 77.223145},
       'name': 'bishan kite

In [113]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [164]:
venues = results_d['response']['groups'][0]['items']
    
nearby_venues_d = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues_d =nearby_venues_d.loc[:, filtered_columns]

# filter the category for each row
nearby_venues_d['venue.categories'] = nearby_venues_d.apply(get_category_type, axis=1)

# clean columns
nearby_venues_d.columns = [col.split(".")[-1] for col in nearby_venues_d.columns]

nearby_venues_d.head()

Unnamed: 0,name,categories,lat,lng
0,bishan kite merchant,Arts & Crafts Store,28.652778,77.223145


In [165]:
delhi_venues = getNearbyVenues(names=delhi_data['Neighborhood'],
                                   latitudes=delhi_data['latitude'],
                                   longitudes=delhi_data['longitude']
                                  )

delhi_venues

Baroda House, Bengali Market, Bhagat Singh Market, Connaught Place, Constitution House, Election Commission, Janpath, Krishi Bhawan, Lady Harding Medical College, North Avenue, Parliament House, Patiala House, Pragati Maidan Camp, Pragati Maidan, Rail Bhawan, Sansad Marg, Sansadiya Soudh, Secretariat North, Shastri Bhawan, Supreme Court, New Delhi G.P.O. 
A.G.C.R., Ajmeri Gate Extn., Darya Ganj, Gandhi Smarak Nidhi, I.P.Estate, Indraprastha, Minto Road
Delhi High Court Extension Counter, Delhi High Court, Pandara Road, Aliganj (South Delhi), C G O Complex, Golf Links, Kasturba Nagar (South Delhi), Lodi Road, Pragati Vihar, Safdarjung Air Port
Rashtrapati Bhawan
Anand Parbat Indl. Area, Anand Parbat, Bank Street (Central Delhi), Desh Bandhu Gupta Road, Guru Gobind Singh Marg, Karol Bagh, Master Prithvi Nath Marg, Sat Nagar
Delhi G.P.O. , Baratooti, Chandni Chowk, Chawri Bazar, Dareeba, Delhi Sadar Bazar, S.T. Road, Hauz Qazi, Jama Masjid
C.C.I., Delhi University, Gulabi Bagh, Jawahar Na

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Baroda House, Bengali Market, Bhagat Singh Mar...",28.651718,77.221939,bishan kite merchant,28.652778,77.223145,Arts & Crafts Store
1,"A.G.C.R., Ajmeri Gate Extn., Darya Ganj, Gandh...",28.651718,77.221939,bishan kite merchant,28.652778,77.223145,Arts & Crafts Store
2,"Delhi High Court Extension Counter, Delhi High...",28.651718,77.221939,bishan kite merchant,28.652778,77.223145,Arts & Crafts Store
3,Rashtrapati Bhawan,28.614458,77.199594,Mughal Gardens | मुगल गार्डन (Mughal Gardens),28.617427,77.199727,Garden
4,Rashtrapati Bhawan,28.614458,77.199594,"Rashtrapati Bhawan, President's Estate Yeni De...",28.61435,77.199715,Museum
5,Rashtrapati Bhawan,28.614458,77.199594,Rashtrapati Bhawan,28.614153,77.19596,Museum
6,"Anand Parbat Indl. Area, Anand Parbat, Bank St...",28.651718,77.221939,bishan kite merchant,28.652778,77.223145,Arts & Crafts Store
7,"Delhi G.P.O. , Baratooti, Chandni Chowk, Chawr...",28.651718,77.221939,bishan kite merchant,28.652778,77.223145,Arts & Crafts Store
8,"C.C.I., Delhi University, Gulabi Bagh, Jawahar...",28.651718,77.221939,bishan kite merchant,28.652778,77.223145,Arts & Crafts Store
9,"Dada Ghosh Bhawan, Patel Nagar East, Patel Nag...",28.651718,77.221939,bishan kite merchant,28.652778,77.223145,Arts & Crafts Store


In [166]:
delhi_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"505 A B Workshop, A F Palam, Aps Colony, Bazar Road, C.V.D., COD (South West Delhi), Delhi Cantt, Dhaula Kuan, Kirby Place, Pinto Park, R R Hospital, Signal Enclave, Station Road (South West Delhi), Subroto Park",1,1,1,1,1,1
"A F Rajokari, Rajokari",1,1,1,1,1,1
"A.G.C.R., Ajmeri Gate Extn., Darya Ganj, Gandhi Smarak Nidhi, I.P.Estate, Indraprastha, Minto Road",1,1,1,1,1,1
"A.K.Market, Multani Dhanda, Pahar Ganj, Swami Ram Tirth Nagar",1,1,1,1,1,1
"Abul Fazal Enclave-I, Jamia Nagar, New Friends Colony, Sukhdev Vihar, Zakir Nagar",1,1,1,1,1,1
"Adrash Nagar, Bhalaswa, Jahangir Puri A Block, Jahangir Puri D Block, Jahangir Puri H Block, N.S.Mandi",1,1,1,1,1,1
"Air Force Station Tugalkabad, BSF Camp Tigri, Dakshinpuri Phase-I, Dakshinpuri Phase-II, Dakshinpuri Phase-III, Deoli, Dr. Ambedkar Nagar (South Delhi), Hamdard Nagar, Khanpur (South Delhi), Pushpa Bhawan, Talimabad",1,1,1,1,1,1
"Alaknanda, Chittranjan Park, Kalkaji, Nehru Place",1,1,1,1,1,1
"Ali, Madanpur Khadar, Sarita Vihar",1,1,1,1,1,1
"Alipur, Bakhtawar Pur, Bakoli, Hiranki, Kadipur, Khampur, Mukhmelpur, Nangal Poona, Palla, Tajpur Kalan",1,1,1,1,1,1


In [167]:
print('There are {} uniques categories.'.format(len(delhi_venues['Venue Category'].unique())))

There are 14 uniques categories.


In [168]:
# one hot encoding
delhi_onehot = pd.get_dummies(delhi_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
delhi_onehot['Neighborhood'] = delhi_venues['Neighborhood'] 

In [169]:
# move neighborhood column to the first column
fixed_columns = [delhi_onehot.columns[-1]] + list(delhi_onehot.columns[:-1])
delhi_onehot = delhi_onehot[fixed_columns]

delhi_onehot.head()

Unnamed: 0,Neighborhood,ATM,Arts & Crafts Store,Burger Joint,Café,Department Store,Garden,Gym,Hotel,Indian Restaurant,Multiplex,Museum,Pizza Place,Shopping Mall,Water Park
0,"Baroda House, Bengali Market, Bhagat Singh Mar...",0,1,0,0,0,0,0,0,0,0,0,0,0,0
1,"A.G.C.R., Ajmeri Gate Extn., Darya Ganj, Gandh...",0,1,0,0,0,0,0,0,0,0,0,0,0,0
2,"Delhi High Court Extension Counter, Delhi High...",0,1,0,0,0,0,0,0,0,0,0,0,0,0
3,Rashtrapati Bhawan,0,0,0,0,0,1,0,0,0,0,0,0,0,0
4,Rashtrapati Bhawan,0,0,0,0,0,0,0,0,0,0,1,0,0,0


In [170]:
delhi_grouped = delhi_onehot.groupby('Neighborhood').mean().reset_index()
delhi_grouped.head()

Unnamed: 0,Neighborhood,ATM,Arts & Crafts Store,Burger Joint,Café,Department Store,Garden,Gym,Hotel,Indian Restaurant,Multiplex,Museum,Pizza Place,Shopping Mall,Water Park
0,"505 A B Workshop, A F Palam, Aps Colony, Bazar...",0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"A F Rajokari, Rajokari",0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"A.G.C.R., Ajmeri Gate Extn., Darya Ganj, Gandh...",0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"A.K.Market, Multani Dhanda, Pahar Ganj, Swami ...",0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Abul Fazal Enclave-I, Jamia Nagar, New Friends...",0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [171]:
num_top_venues = 3

for hood in delhi_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = delhi_grouped[delhi_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----505 A B Workshop, A F Palam, Aps Colony, Bazar Road, C.V.D., COD (South West Delhi), Delhi Cantt, Dhaula Kuan, Kirby Place, Pinto Park, R R Hospital, Signal Enclave, Station Road (South West Delhi), Subroto Park----
                 venue  freq
0  Arts & Crafts Store   1.0
1                  ATM   0.0
2         Burger Joint   0.0


----A F Rajokari, Rajokari----
                 venue  freq
0  Arts & Crafts Store   1.0
1                  ATM   0.0
2         Burger Joint   0.0


----A.G.C.R., Ajmeri Gate Extn., Darya Ganj, Gandhi Smarak Nidhi, I.P.Estate, Indraprastha, Minto Road----
                 venue  freq
0  Arts & Crafts Store   1.0
1                  ATM   0.0
2         Burger Joint   0.0


----A.K.Market, Multani Dhanda, Pahar Ganj, Swami Ram Tirth Nagar----
                 venue  freq
0  Arts & Crafts Store   1.0
1                  ATM   0.0
2         Burger Joint   0.0


----Abul Fazal Enclave-I, Jamia Nagar, New Friends Colony, Sukhdev Vihar, Zakir Nagar----
          

In [172]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = delhi_grouped['Neighborhood']

for ind in np.arange(delhi_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(delhi_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,"505 A B Workshop, A F Palam, Aps Colony, Bazar...",Arts & Crafts Store,Water Park,Shopping Mall,Pizza Place,Museum
1,"A F Rajokari, Rajokari",Arts & Crafts Store,Water Park,Shopping Mall,Pizza Place,Museum
2,"A.G.C.R., Ajmeri Gate Extn., Darya Ganj, Gandh...",Arts & Crafts Store,Water Park,Shopping Mall,Pizza Place,Museum
3,"A.K.Market, Multani Dhanda, Pahar Ganj, Swami ...",Arts & Crafts Store,Water Park,Shopping Mall,Pizza Place,Museum
4,"Abul Fazal Enclave-I, Jamia Nagar, New Friends...",Arts & Crafts Store,Water Park,Shopping Mall,Pizza Place,Museum


In [178]:
neighborhoods_venues_sorted.shape
delhi_data.shape

(94, 6)

### Create the clusters for the neighbourhoods in Delhi based on the top venues discovered

In [174]:
# set number of clusters
kclusters = 5

delhi_grouped_n_clustering = delhi_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(delhi_grouped_n_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

In [179]:
#neighborhoods_venues_n_sorted = neighborhoods_venues_sorted
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

delhi_merged = delhi_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
delhi_merged = delhi_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

delhi_merged.head() # check the last columns!

Unnamed: 0,pincode,regionname,Neighborhood,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,110001,Delhi,"Baroda House, Bengali Market, Bhagat Singh Mar...",28.651718,77.221939,0.0,Arts & Crafts Store,Water Park,Shopping Mall,Pizza Place,Museum
1,110002,Delhi,"A.G.C.R., Ajmeri Gate Extn., Darya Ganj, Gandh...",28.651718,77.221939,0.0,Arts & Crafts Store,Water Park,Shopping Mall,Pizza Place,Museum
2,110003,Delhi,"Delhi High Court Extension Counter, Delhi High...",28.651718,77.221939,0.0,Arts & Crafts Store,Water Park,Shopping Mall,Pizza Place,Museum
3,110004,Delhi,Rashtrapati Bhawan,28.614458,77.199594,4.0,Museum,Garden,Water Park,Shopping Mall,Pizza Place
4,110005,Delhi,"Anand Parbat Indl. Area, Anand Parbat, Bank St...",28.651718,77.221939,0.0,Arts & Crafts Store,Water Park,Shopping Mall,Pizza Place,Museum


In [181]:
delhi_merged = delhi_merged.dropna(how = 'any')
delhi_merged.shape

Unnamed: 0,pincode,regionname,Neighborhood,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,110001,Delhi,"Baroda House, Bengali Market, Bhagat Singh Mar...",28.651718,77.221939,0.0,Arts & Crafts Store,Water Park,Shopping Mall,Pizza Place,Museum
1,110002,Delhi,"A.G.C.R., Ajmeri Gate Extn., Darya Ganj, Gandh...",28.651718,77.221939,0.0,Arts & Crafts Store,Water Park,Shopping Mall,Pizza Place,Museum
2,110003,Delhi,"Delhi High Court Extension Counter, Delhi High...",28.651718,77.221939,0.0,Arts & Crafts Store,Water Park,Shopping Mall,Pizza Place,Museum
3,110004,Delhi,Rashtrapati Bhawan,28.614458,77.199594,4.0,Museum,Garden,Water Park,Shopping Mall,Pizza Place
4,110005,Delhi,"Anand Parbat Indl. Area, Anand Parbat, Bank St...",28.651718,77.221939,0.0,Arts & Crafts Store,Water Park,Shopping Mall,Pizza Place,Museum
5,110006,Delhi,"Delhi G.P.O. , Baratooti, Chandni Chowk, Chawr...",28.651718,77.221939,0.0,Arts & Crafts Store,Water Park,Shopping Mall,Pizza Place,Museum
6,110007,Delhi,"C.C.I., Delhi University, Gulabi Bagh, Jawahar...",28.651718,77.221939,0.0,Arts & Crafts Store,Water Park,Shopping Mall,Pizza Place,Museum
7,110008,Delhi,"Dada Ghosh Bhawan, Patel Nagar East, Patel Nag...",28.651718,77.221939,0.0,Arts & Crafts Store,Water Park,Shopping Mall,Pizza Place,Museum
8,110009,Delhi,"Dr.Mukerjee Nagar, G.T.B.Nagar, Gujranwala Col...",28.651718,77.221939,0.0,Arts & Crafts Store,Water Park,Shopping Mall,Pizza Place,Museum
9,110010,Delhi,"505 A B Workshop, A F Palam, Aps Colony, Bazar...",28.651718,77.221939,0.0,Arts & Crafts Store,Water Park,Shopping Mall,Pizza Place,Museum


In [182]:
delhi_merged.shape

(94, 11)

In [183]:
# using dictionary to convert specific columns 
convert_dict = {'Cluster Labels': int}
  
delhi_merged = delhi_merged.astype(convert_dict)
delhi_merged.shape

(94, 11)

In [184]:
delhi_merged.head()

Unnamed: 0,pincode,regionname,Neighborhood,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,110001,Delhi,"Baroda House, Bengali Market, Bhagat Singh Mar...",28.651718,77.221939,0,Arts & Crafts Store,Water Park,Shopping Mall,Pizza Place,Museum
1,110002,Delhi,"A.G.C.R., Ajmeri Gate Extn., Darya Ganj, Gandh...",28.651718,77.221939,0,Arts & Crafts Store,Water Park,Shopping Mall,Pizza Place,Museum
2,110003,Delhi,"Delhi High Court Extension Counter, Delhi High...",28.651718,77.221939,0,Arts & Crafts Store,Water Park,Shopping Mall,Pizza Place,Museum
3,110004,Delhi,Rashtrapati Bhawan,28.614458,77.199594,4,Museum,Garden,Water Park,Shopping Mall,Pizza Place
4,110005,Delhi,"Anand Parbat Indl. Area, Anand Parbat, Bank St...",28.651718,77.221939,0,Arts & Crafts Store,Water Park,Shopping Mall,Pizza Place,Museum


### Let us now show the clusters for the neighborhoods in Delhi created based on similarity of venues around them

In [185]:
# create map
map_clusters_d = folium.Map(location=[latitude_d, longitude_d], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(delhi_merged['latitude'], delhi_merged['longitude'], delhi_merged['Neighborhood'], delhi_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_d)
       
map_clusters_d

### Let us explore the clusters in the Delhi neighbourhood

### First cluster of Delhi neighbourhoods are grouped based on the most common venues being Arts and Crafts stores, Water Parks and Shopping Malls.

In [186]:
delhi_merged.loc[delhi_merged['Cluster Labels'] == 0, delhi_merged.columns[[1] + list(range(5, delhi_merged.shape[1]))]]

Unnamed: 0,regionname,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Delhi,0,Arts & Crafts Store,Water Park,Shopping Mall,Pizza Place,Museum
1,Delhi,0,Arts & Crafts Store,Water Park,Shopping Mall,Pizza Place,Museum
2,Delhi,0,Arts & Crafts Store,Water Park,Shopping Mall,Pizza Place,Museum
4,Delhi,0,Arts & Crafts Store,Water Park,Shopping Mall,Pizza Place,Museum
5,Delhi,0,Arts & Crafts Store,Water Park,Shopping Mall,Pizza Place,Museum
6,Delhi,0,Arts & Crafts Store,Water Park,Shopping Mall,Pizza Place,Museum
7,Delhi,0,Arts & Crafts Store,Water Park,Shopping Mall,Pizza Place,Museum
8,Delhi,0,Arts & Crafts Store,Water Park,Shopping Mall,Pizza Place,Museum
9,Delhi,0,Arts & Crafts Store,Water Park,Shopping Mall,Pizza Place,Museum
10,Delhi,0,Arts & Crafts Store,Water Park,Shopping Mall,Pizza Place,Museum


### Second cluster of Delhi neighbourhoods are grouped based on the most common venues being Shopping Malls, Multiplexes, Gym and Department Stores.

In [127]:
delhi_merged.loc[delhi_merged['Cluster Labels'] == 1, delhi_merged.columns[[1] + list(range(5, delhi_merged.shape[1]))]]

Unnamed: 0,regionname,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
86,Delhi,1.0,Shopping Mall,Department Store,Multiplex,Indian Restaurant,Gym


### Third cluster of Delhi neighbourhoods are grouped based on the most common venues being ATMs, Museums and Water Park.

In [128]:
delhi_merged.loc[delhi_merged['Cluster Labels'] == 2, delhi_merged.columns[[1] + list(range(5, delhi_merged.shape[1]))]]

Unnamed: 0,regionname,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
77,Delhi,2.0,ATM,Water Park,Shopping Mall,Pizza Place,Museum


### Fourth cluster of Delhi neighbourhoods are grouped based on the most common venues being Hotel, Water Park and Pizza place

In [129]:
delhi_merged.loc[delhi_merged['Cluster Labels'] == 3, delhi_merged.columns[[1] + list(range(5, delhi_merged.shape[1]))]]

Unnamed: 0,regionname,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
94,Delhi,3.0,Hotel,Water Park,Pizza Place,Shopping Mall,Museum


### Fifth cluster of Delhi neighbourhoods are grouped based on the most common venues being Museum, Gardens and Water Park.

In [130]:
delhi_merged.loc[delhi_merged['Cluster Labels'] == 4, delhi_merged.columns[[1] + list(range(5, delhi_merged.shape[1]))]]

Unnamed: 0,regionname,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
3,Delhi,4.0,Museum,Garden,Water Park,Shopping Mall,Pizza Place


#### In this project we have attempted to load the dataset for two of India's prime metro cities and have tried to analyze the neighbourhood regions in these metro cities based on the type of popular and top venues they have. We have clustered the neighbourhoods based on the most common top venues in each of the neighbourhood. Our intention with this project was to analyze and understand the difference in type of life and venues in these metros which can offer decision points for anybody who is considering to settle in either of the metro cities and can get a peek into what type of experience and recreational facilities he will be provided with.
Given our cluster information we see that Mumbai and its neighbourhoods are a great place for a foodie. There are a lot of resturants, cafes, bars etc in Mumbai neighbourhoods. Also due to the proximity of Mumbai to seashore, Mumbai neighbourhoods also offer for harbours, seafoods, boats and ferry rides.
On the other hand we see how dissimilar life in Delhi neighbourhoods would be compare to Mumbai neighbourhoods. Delhi neighbourhoods and good for those who like Arts and Crafts, Museums, Water Parks and Pizza places.
Thus with this project we have analyzed the kind of life each of these big metro cities have to offer based on the popular venues in their neighbourhood.
Mumbai would be my choice to settle!