# Capstone Project - The Battle of the Neighborhoods
### Applied Data Science Capstone by IBM/Coursera

## Introduction: Business Problem

Before you move to  a city, it can be hard to tell the character of individual cities. The sheer size and polpulation of the Indian cities can be overwhelming, and for some it might be tempting to paint all busy centres with the same brush. But in reality, you can have drastically different experiences depending on where you visit. Mumbai and Delhi epitomise this: one’s a modern, cosmopolitan and commercial centre while the other, in places, feels like a gateway to a bygone era. Here, we let Data decide the similar neighborhoods based on the surrounding amenities between **Delhi and Mumbai**.

We will use our data science powers to generate a few most promissing neighborhoods based on similarity between two cities. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

## Data

Based on definition of our problem, factors that will influence our decission are:
* number of existing amenities in the neighborhood (any type of restaurant)

For this study, we needed data about neighborhoods in each of these metro cities.


**Mumbai** : https://www.mapsofindia.com/pincode/india/maharashtra/mumbai/

**Delhi**  : https://www.movingsolutions.in/blog/2020/02/05/pin-codes-of-delhi-locations/ 

Following data sources will be needed to extract/generate the required information:
* number of amenities and their type and location in every neighborhood will be obtained using **Foursquare API**
* coordinate of locations will be obtained using **Geopy Geocode**

In [1]:
import numpy as np

import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json 

#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

### Web Scrapping for Mumbai neighborhood data fetching

In [2]:
table_m = pd.read_html("https://www.mapsofindia.com/pincode/india/maharashtra/mumbai/",header=0,index_col=0)
Borough = ['Mumbai']

In [3]:
mumbai = table_m[0]
mumbai = mumbai.reset_index()
mumbai.rename(columns = {'Pincode Details':'Neighborhood','Pincode Details.1':'Pincode','Pincode Details.2': 'State','Pincode Details.3': 'District'}, inplace = True)    
mumbai=mumbai.drop(columns=[ 'State'])
mumbai=mumbai.drop(mumbai.index[0])
mumbai.head()

Unnamed: 0,Neighborhood,Pincode,District
1,A I staff colony,400029,Mumbai
2,Aareymilk Colony,400065,Mumbai
3,Agripada,400011,Mumbai
4,Airport,400099,Mumbai
5,Ambewadi,400004,Mumbai


### Neighborhood Candidates

Let's create latitude & longitude coordinates for centroids of our candidate neighborhoods. We will create a grid of cells covering our area of interest which is aprox. 12x12 killometers centered around Berlin city center.

Let's first find the latitude & longitude of Berlin city center, using specific, well known address and Google Maps geocoding API.

In [4]:
# define a sample to get coordinates
locator = Nominatim(user_agent='myGeocoder')
location = locator.geocode('Borivali', 'Mumbai')
print('latitude = {}, longitude = {}'.format(location.latitude, location.longitude))

latitude = 19.229068, longitude = 72.8573628


In [5]:
# define a function to get coordinates
def get_latlng(Neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    lat_lng_coords = locator.geocode(Neighborhood)
    
    if(lat_lng_coords is None):
        for i in range(5):
            lat_lng_coords = locator.geocode(Neighborhood)
    if(lat_lng_coords is None):
        lat_lng_coords = 'Nan'
    return lat_lng_coords

In [6]:
# call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng('{},Mumbai'.format(Neighborhood)) for Neighborhood in mumbai["Neighborhood"].tolist() ]

In [7]:
latitude = []
longitude = []
for data in coords:
    if data != 'Nan':
        latitude.append(data.latitude)
        longitude.append(data.longitude)
    else :
        latitude.append(0)
        longitude.append(0)

In [8]:
mumbai['Latitude']=latitude
mumbai['Longitude']=longitude
mumbai = mumbai[mumbai['Latitude'] != 0]
mumbai.head()

Unnamed: 0,Neighborhood,Pincode,District,Latitude,Longitude
3,Agripada,400011,Mumbai,18.975302,72.824898
4,Airport,400099,Mumbai,19.090201,72.863808
5,Ambewadi,400004,Mumbai,19.186776,72.859313
6,Andheri,400053,Mumbai,19.119698,72.84642
7,Andheri East,400069,Mumbai,19.115883,72.854202


### Web Scrapping for Mumbai neighborhood data fetching¶

In [9]:
tables_d = pd.read_html("https://www.movingsolutions.in/blog/2020/02/05/pin-codes-of-delhi-locations/",header=0,index_col=0)
Borough_d = ['East Delhi', 'West delhi', 'South Delhi', 'North Delhi', 'New Delhi', 'Central Delhi', 'West Delhi', 'South West Delhi']

In [10]:
for i in range(len(Borough_d)):
    tables_d[i]['District']=Borough_d[i]  
Delhi = tables_d[0]    
for j in range(len(Borough_d)-1):
    Delhi = pd.concat([Delhi, tables_d[j+1]])

Delhi = Delhi.reset_index()
Delhi.rename(columns = {'Location':'Neighborhood', 'Pin code':'Pincode', 'District':'Borough'}, inplace = True)    
Delhi.head()

Unnamed: 0,Neighborhood,Pincode,Borough
0,Anand Vihar,110092,East Delhi
1,Azad Nagar,110051,East Delhi
2,Babarpur,110032,East Delhi
3,Balbir Nagar,110032,East Delhi
4,Bhajan Pura,110053,East Delhi


In [11]:
location = locator.geocode('adarsh nagar,delhi, india')
print('Latitude = {}, Longitude = {}'.format(location.latitude, location.longitude))

Latitude = 28.7144008, Longitude = 77.1672884


In [12]:
# call the function to get the coordinates, store in a new list using list comprehension
coords_d = [ get_latlng('{}, Delhi, india'.format(Neighborhood)) for Neighborhood in Delhi["Neighborhood"].tolist() ]

In [13]:
latitude = []
longitude = []
for data in coords_d:
    if data != 'Nan':
        latitude.append(data.latitude)
        longitude.append(data.longitude)
    else :
        latitude.append(0)
        longitude.append(0)

In [14]:
Delhi['Latitude']=latitude
Delhi['Longitude']=longitude

In [15]:
Delhi = Delhi[Delhi['Latitude'] != 0]
Delhi.head()

Unnamed: 0,Neighborhood,Pincode,Borough,Latitude,Longitude
0,Anand Vihar,110092,East Delhi,28.641115,77.312502
1,Azad Nagar,110051,East Delhi,28.662682,77.279515
2,Babarpur,110032,East Delhi,28.687431,77.279755
3,Balbir Nagar,110032,East Delhi,28.68379,77.290754
5,Bhola Nath nagar,110032,East Delhi,28.669127,77.285241


## Merging Delhi and Mumbai dataframes

In [16]:
data=pd.concat([mumbai,Delhi], axis=0 )
data.drop(['Borough', 'District', 'Pincode'],axis=1,inplace=True)
data.head()

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  if __name__ == '__main__':


Unnamed: 0,Latitude,Longitude,Neighborhood
3,18.975302,72.824898,Agripada
4,19.090201,72.863808,Airport
5,19.186776,72.859313,Ambewadi
6,19.119698,72.84642,Andheri
7,19.115883,72.854202,Andheri East


### obtaining Geocodes of Mumbai for creating a map

In [17]:
address = 'Mumbai, IN'

location = locator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Mumbai are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Mumbai are 18.9387711, 72.8353355.


In [18]:
map = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for Latitude, Longitude,  Neighborhood in zip(data['Latitude'], data['Longitude'], data['Neighborhood']):
    label = '{}'.format(Neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [Latitude, Longitude],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map) 
    
map

Foursquare

Now that we have our location candidates, let's use Foursquare API to get info on Amenities in each neighborhood.

In [19]:
# The code was removed by Watson Studio for sharing.

Your credentails:
CLIENT_ID: WDYUDPBSNVHNSR3NNV54QXP4GSSMGFF0Z5XMRWFPEW2UFL3T
CLIENT_SECRET:YH0RSL4NX0PRPVCJPTCSHPNH3QCOBLPRZHGXKZ0YKPHAZQFS


In [20]:
#Let's get the geographical coordinates of Anand Vihar
neighborhood_latitude = data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Anand Vihar are 28.641115, 77.3125024.


In [21]:
LIMIT = 100
radius = 1000
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

In [22]:
results = requests.get(url).json()

In [23]:
#let's borrow the get_category_type function from the Foursquare lab. function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [24]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,PVR EDM,Movie Theater,28.641323,77.317026
1,Pizza Hut,Pizza Place,28.641347,77.317054
2,Barista Lavazza,Café,28.646416,77.320005
3,"Big Cinemas, Imax",Multiplex,28.64594,77.319942
4,Lemon Tree Hotel,Hotel,28.641373,77.316549


In [25]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))


15 venues were returned by Foursquare.


## Methodology

The deciding factor for most would be on how lively, supportive, vibrant and unique each of the cities can be when compared to each other. The business problem in this study assumes that people who would be interested in this study are those who would like to create a projection of potential life and activities in these metro city neighborhoods if the subject moves to live in one of them. The decision to choose one over the other would depend on popular venues in the neighborhoods in each of these metro cities.

In [26]:
def getNearbyVenues(names, latitudes, longitudes, radius=300):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [28]:
all_venues = getNearbyVenues(names=data['Neighborhood'],
                   latitudes=data['Latitude'],
                   longitudes=data['Longitude']
                 )

Agripada
Airport
Ambewadi
Andheri
Andheri East
Andheri Railway station
Antop Hill
Asvini
Azad Nagar
B P t colony
B.N. bhavan
B.P.lane
Bandra West
Bandra(east)
Bangur Nagar
Bazargate
Best Staff colony
Bharat Nagar
Bhawani Shankar
Bhawani Shankar rd
Borivali
Borivali East
Central Building
Century Mill
Chakala Midc
Chamarbaug
Charkop
Charni Road
Chinchbunder
Chinchpokli
Churchgate
Colaba
Cotton Exchange
Cumballa Hill
Dadar
Dadar Colony
Dahisar
Danda
Daulat Nagar
Delisle Road
Dharavi
Dharavi Road
Dockyard Road
Dr Deshmukh marg
Falkland Road
Girgaon
Gokhale Road
Goregaon
Goregaon East
Government Colony
Gowalia Tank
Grant Road
Haines Road
Hanuman Road
Irla
Ins Hamla
International Airport
J.B. nagar
J.J.hospital
Jacob Circle
Jogeshwari East
Jogeshwari West
Juhu
Kalbadevi
Kamathipura
Kandivali East
Kandivali West
Ketkipada
Kharodi
Kherwadi
Kidwai Nagar
Lal Baug
Liberty Garden
M A marg
M.P.t.
Madh
Madhavbaug
Mahim
Mahim East
Malabar Hill
Malad
Malad East
Mandapeshwar
Mandvi
Mantralaya
Marine Li

In [29]:
all_venues.shape

(1870, 7)

In [30]:

all_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Agripada,18.975302,72.824898,Celejor,18.975844,72.823679,Bakery
1,Agripada,18.975302,72.824898,cafe coffee day,18.976988,72.824051,Coffee Shop
2,Agripada,18.975302,72.824898,Baby gardens,18.973466,72.82451,Garden
3,Agripada,18.975302,72.824898,YMCA Swimming Pool,18.974498,72.824721,Pool
4,Agripada,18.975302,72.824898,YMCA Working Mens Hostel,18.973614,72.824427,Hostel


In [31]:
all_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agripada,6,6,6,6,6,6
Airport,4,4,4,4,4,4
Alaknanda,4,4,4,4,4,4
Ali,5,5,5,5,5,5
Aliganj,4,4,4,4,4,4
Amar Colony,5,5,5,5,5,5
Ambewadi,3,3,3,3,3,3
Anand Parbat,2,2,2,2,2,2
Anand Vihar,2,2,2,2,2,2
Andheri,6,6,6,6,6,6


In [32]:
print('There are {} uniques categories.'.format(len(all_venues['Venue Category'].unique())))

There are 216 uniques categories.


In [33]:
# one hot encoding
all_onehot = pd.get_dummies(all_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
all_onehot['Neighborhood'] = all_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [all_onehot.columns[-1]] + list(all_onehot.columns[:-1])
all_onehot = all_onehot[fixed_columns]

all_onehot.head()

Unnamed: 0,Neighborhood,ATM,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Astrologer,Athletics & Sports,Auto Workshop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Basketball Court,Beach,Bed & Breakfast,Beer Garden,Bengali Restaurant,Big Box Store,Bike Rental / Bike Share,Bistro,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Brewery,Buffet,Burger Joint,Burrito Place,Bus Station,Bus Stop,Business Service,Cafeteria,Café,Camera Store,Candy Store,Castle,Chaat Place,Cheese Shop,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Academic Building,College Auditorium,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cricket Ground,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Donut Shop,Dumpling Restaurant,Duty-free Shop,Electronics Store,Entertainment Service,Event Space,Farm,Farmers Market,Fast Food Restaurant,Field,Fish & Chips Shop,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Garden,Gastropub,General Travel,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Government Building,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Halal Restaurant,Harbor / Marina,Hardware Store,Health Food Store,Historic Site,History Museum,Hockey Arena,Hostel,Hot Dog Joint,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Karaoke Bar,Karnataka Restaurant,Kids Store,Korean Restaurant,Lake,Light Rail Station,Liquor Store,Lounge,Market,Mediterranean Restaurant,Memorial Site,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,Mosque,Motel,Motorcycle Shop,Movie Theater,Moving Target,Multicuisine Indian Restaurant,Multiplex,Museum,Night Market,Nightclub,North Indian Restaurant,Office,Opera House,Other Great Outdoors,Paper / Office Supplies Store,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Platform,Playground,Plaza,Pool,Pool Hall,Portuguese Restaurant,Pub,Public Art,Record Shop,Resort,Rest Area,Restaurant,Road,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shoe Store,Shopping Mall,Ski Chalet,Smoke Shop,Snack Place,Soccer Field,South Indian Restaurant,Spa,Spiritual Center,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Tibetan Restaurant,Tourist Information Center,Track,Trail,Train,Train Station,Travel Lounge,Tunnel,University,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Women's Store,Yoga Studio
0,Agripada,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Agripada,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Agripada,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Agripada,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Agripada,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [34]:
all_onehot.shape

(1870, 217)

In [35]:
all_grouped = all_onehot.groupby('Neighborhood').mean().reset_index()
all_grouped

Unnamed: 0,Neighborhood,ATM,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Astrologer,Athletics & Sports,Auto Workshop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Basketball Court,Beach,Bed & Breakfast,Beer Garden,Bengali Restaurant,Big Box Store,Bike Rental / Bike Share,Bistro,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Brewery,Buffet,Burger Joint,Burrito Place,Bus Station,Bus Stop,Business Service,Cafeteria,Café,Camera Store,Candy Store,Castle,Chaat Place,Cheese Shop,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Academic Building,College Auditorium,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cricket Ground,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Donut Shop,Dumpling Restaurant,Duty-free Shop,Electronics Store,Entertainment Service,Event Space,Farm,Farmers Market,Fast Food Restaurant,Field,Fish & Chips Shop,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Garden,Gastropub,General Travel,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Government Building,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Halal Restaurant,Harbor / Marina,Hardware Store,Health Food Store,Historic Site,History Museum,Hockey Arena,Hostel,Hot Dog Joint,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Karaoke Bar,Karnataka Restaurant,Kids Store,Korean Restaurant,Lake,Light Rail Station,Liquor Store,Lounge,Market,Mediterranean Restaurant,Memorial Site,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,Mosque,Motel,Motorcycle Shop,Movie Theater,Moving Target,Multicuisine Indian Restaurant,Multiplex,Museum,Night Market,Nightclub,North Indian Restaurant,Office,Opera House,Other Great Outdoors,Paper / Office Supplies Store,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Platform,Playground,Plaza,Pool,Pool Hall,Portuguese Restaurant,Pub,Public Art,Record Shop,Resort,Rest Area,Restaurant,Road,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shoe Store,Shopping Mall,Ski Chalet,Smoke Shop,Snack Place,Soccer Field,South Indian Restaurant,Spa,Spiritual Center,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Tibetan Restaurant,Tourist Information Center,Track,Trail,Train,Train Station,Travel Lounge,Tunnel,University,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Women's Store,Yoga Studio
0,Agripada,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Airport,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Alaknanda,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Ali,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.2,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Aliganj,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Amar Colony,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Ambewadi,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0
7,Anand Parbat,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Anand Vihar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Andheri,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Cluster the neighbourhoods in Mumbai based on the similarity of top common venues

In [36]:
num_top_venues = 5

for hood in all_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp =  all_grouped[all_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agripada----
         venue  freq
0       Hostel  0.17
1  Coffee Shop  0.17
2         Pool  0.17
3         Bank  0.17
4       Bakery  0.17


----Airport----
             venue  freq
0          Airport  0.50
1  Airport Service  0.25
2    Jewelry Store  0.25
3              ATM  0.00
4    Moving Target  0.00


----Alaknanda----
                venue  freq
0         Coffee Shop  0.25
1   Food & Drink Shop  0.25
2  Chinese Restaurant  0.25
3        Burger Joint  0.25
4                 ATM  0.00


----Ali----
               venue  freq
0  Indian Restaurant   0.4
1             Hostel   0.2
2              Hotel   0.2
3      Movie Theater   0.2
4                ATM   0.0


----Aliganj----
               venue  freq
0  Indian Restaurant  0.50
1             Bakery  0.25
2        Record Shop  0.25
3                ATM  0.00
4      Moving Target  0.00


----Amar Colony----
               venue  freq
0        Coffee Shop   0.2
1             Market   0.2
2  Indian Restaurant   0.2
3       Dance S

In [37]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [38]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = all_grouped['Neighborhood']

for ind in np.arange(all_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(all_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agripada,Garden,Coffee Shop,Hostel,Bank,Bakery,Pool,Dessert Shop,Farm,Event Space,Entertainment Service
1,Airport,Airport,Airport Service,Jewelry Store,Yoga Studio,Dim Sum Restaurant,Fast Food Restaurant,Farmers Market,Farm,Event Space,Entertainment Service
2,Alaknanda,Coffee Shop,Burger Joint,Chinese Restaurant,Food & Drink Shop,Farmers Market,Farm,Event Space,Entertainment Service,Electronics Store,Duty-free Shop
3,Ali,Indian Restaurant,Hostel,Hotel,Movie Theater,Yoga Studio,Dessert Shop,Farmers Market,Farm,Event Space,Entertainment Service
4,Aliganj,Indian Restaurant,Record Shop,Bakery,Dessert Shop,Farmers Market,Farm,Event Space,Entertainment Service,Electronics Store,Duty-free Shop


In [39]:
# set number of clusters
kclusters = 5

all_grouped_clustering = all_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(all_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([0, 0, 0, 3, 3, 4, 3, 1, 0, 0, 0, 0, 3, 3, 0, 0, 0, 4, 2, 1, 4, 3,
       0, 0, 3, 1, 0, 0, 4, 4, 0, 0, 3, 0, 1, 3, 0, 4, 0, 0, 3, 1, 0, 3,
       4, 0, 1, 4, 0, 0, 4, 0, 0, 4, 0, 4, 0, 4, 4, 0, 3, 4, 4, 0, 0, 0,
       0, 2, 3, 0, 0, 0, 0, 0, 3, 0, 4, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
       0, 0, 0, 0, 3, 0, 0, 0, 0, 4, 0, 3, 0, 0, 0, 3, 0, 3, 0, 0, 0, 3,
       0, 0, 0, 0, 0, 4, 0, 4, 3, 0, 3, 0, 1, 3, 4, 0, 3, 0, 4, 0, 0, 0,
       4, 1, 4, 3, 3, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 3, 4, 0, 0, 3, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 4, 1, 3, 2, 0, 3, 3, 3, 4, 0, 0,
       3, 4, 0, 0, 1, 3, 3, 4, 4, 0, 0, 0, 0, 0, 0, 4, 4, 0, 0, 3, 4, 0,
       0, 4, 2, 0, 3, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0,
       1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 4, 0, 0, 0, 3, 4, 4, 0, 4, 0, 0,
       2, 3, 3, 4, 0, 0, 0, 0, 0, 0, 3, 3, 0, 1, 3, 1, 3, 4, 0, 0, 0, 3,
       0, 0, 4, 0, 4, 4, 4, 0, 0, 4, 3, 4, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       3, 0, 0, 0, 4, 0, 1, 0, 4, 0, 0, 0, 0, 0, 0,

In [41]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels',kmeans.labels_)

In [42]:
all_merged =data
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
all_merged = all_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
all_merged.head() # check the last columns!

Unnamed: 0,Latitude,Longitude,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,18.975302,72.824898,Agripada,0.0,Garden,Coffee Shop,Hostel,Bank,Bakery,Pool,Dessert Shop,Farm,Event Space,Entertainment Service
4,19.090201,72.863808,Airport,0.0,Airport,Airport Service,Jewelry Store,Yoga Studio,Dim Sum Restaurant,Fast Food Restaurant,Farmers Market,Farm,Event Space,Entertainment Service
5,19.186776,72.859313,Ambewadi,3.0,Women's Store,Entertainment Service,Indian Restaurant,Dessert Shop,Fast Food Restaurant,Farmers Market,Farm,Event Space,Electronics Store,Duty-free Shop
6,19.119698,72.84642,Andheri,0.0,Fast Food Restaurant,Food Court,Bakery,Indian Restaurant,Restaurant,Dim Sum Restaurant,Farmers Market,Farm,Event Space,Entertainment Service
7,19.115883,72.854202,Andheri East,0.0,Indian Restaurant,Gym Pool,Hotel,Smoke Shop,Light Rail Station,Bar,Electronics Store,Camera Store,Shopping Mall,Gym / Fitness Center


In [53]:
all_merged.dropna(inplace=True)

In [54]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(all_merged['Latitude'],all_merged['Longitude'], all_merged['Neighborhood'], all_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Cluster 1

In [55]:
all_merged.loc[all_merged['Cluster Labels'] == 0, all_merged.columns[[1] + list(range(5, all_merged.shape[1]))]]

Unnamed: 0,Longitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,72.824898,Coffee Shop,Hostel,Bank,Bakery,Pool,Dessert Shop,Farm,Event Space,Entertainment Service
4,72.863808,Airport Service,Jewelry Store,Yoga Studio,Dim Sum Restaurant,Fast Food Restaurant,Farmers Market,Farm,Event Space,Entertainment Service
6,72.84642,Food Court,Bakery,Indian Restaurant,Restaurant,Dim Sum Restaurant,Farmers Market,Farm,Event Space,Entertainment Service
7,72.854202,Gym Pool,Hotel,Smoke Shop,Light Rail Station,Bar,Electronics Store,Camera Store,Shopping Mall,Gym / Fitness Center
8,72.84642,Food Court,Bakery,Indian Restaurant,Restaurant,Dim Sum Restaurant,Farmers Market,Farm,Event Space,Entertainment Service
11,72.840038,Ice Cream Shop,Electronics Store,Metro Station,Fast Food Restaurant,Lake,Bar,Indian Restaurant,Snack Place,German Restaurant
17,72.833678,Multiplex,Yoga Studio,Design Studio,Farm,Event Space,Entertainment Service,Electronics Store,Duty-free Shop,Dumpling Restaurant
18,72.832264,Fast Food Restaurant,Dessert Shop,Café,Chinese Restaurant,Gift Shop,Coffee Shop,Restaurant,Boutique,Lounge
20,72.856474,Yoga Studio,Dim Sum Restaurant,Fast Food Restaurant,Farmers Market,Farm,Event Space,Entertainment Service,Electronics Store,Duty-free Shop
22,72.840432,Electronics Store,Indian Restaurant,Movie Theater,Breakfast Spot,Plaza,Bar,Bus Station,Flower Shop,Women's Store


## Cluster 2

In [56]:
all_merged.loc[all_merged['Cluster Labels'] == 1, all_merged.columns[[1] + list(range(5, all_merged.shape[1]))]]

Unnamed: 0,Longitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
176,72.875934,Snack Place,College Academic Building,Dim Sum Restaurant,Fast Food Restaurant,Farmers Market,Farm,Event Space,Entertainment Service,Electronics Store
5,77.285241,Dessert Shop,Fast Food Restaurant,Farmers Market,Farm,Event Space,Entertainment Service,Electronics Store,Duty-free Shop,Dumpling Restaurant
36,77.288363,Fried Chicken Joint,Dessert Shop,Farmers Market,Farm,Event Space,Entertainment Service,Electronics Store,Duty-free Shop,Dumpling Restaurant
53,77.289025,Light Rail Station,Dessert Shop,Farmers Market,Farm,Event Space,Entertainment Service,Electronics Store,Duty-free Shop,Dumpling Restaurant
70,77.019182,Dessert Shop,Fast Food Restaurant,Farmers Market,Farm,Event Space,Entertainment Service,Electronics Store,Duty-free Shop,Dumpling Restaurant
71,77.097591,Juice Bar,Dessert Shop,Farmers Market,Farm,Event Space,Entertainment Service,Electronics Store,Duty-free Shop,Dumpling Restaurant
87,77.105571,Pharmacy,Pizza Place,Design Studio,Farm,Event Space,Entertainment Service,Electronics Store,Duty-free Shop,Dumpling Restaurant
161,77.300518,Dessert Shop,Fast Food Restaurant,Farmers Market,Farm,Event Space,Entertainment Service,Electronics Store,Duty-free Shop,Dumpling Restaurant
221,77.18014,Fast Food Restaurant,Dessert Shop,Farmers Market,Farm,Event Space,Entertainment Service,Electronics Store,Duty-free Shop,Dumpling Restaurant
271,77.213108,Government Building,Design Studio,Farmers Market,Farm,Event Space,Entertainment Service,Electronics Store,Duty-free Shop,Dumpling Restaurant


## Cluster 3

In [57]:
all_merged.loc[all_merged['Cluster Labels'] == 2, all_merged.columns[[1] + list(range(5, all_merged.shape[1]))]]

Unnamed: 0,Longitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
92,77.11936,Yoga Studio,Design Studio,Farmers Market,Farm,Event Space,Entertainment Service,Electronics Store,Duty-free Shop,Dumpling Restaurant
96,77.030574,Yoga Studio,Design Studio,Farmers Market,Farm,Event Space,Entertainment Service,Electronics Store,Duty-free Shop,Dumpling Restaurant
129,77.303024,Furniture / Home Store,Yoga Studio,Dessert Shop,Farmers Market,Farm,Event Space,Entertainment Service,Electronics Store,Duty-free Shop
200,77.185256,Yoga Studio,Design Studio,Farmers Market,Farm,Event Space,Entertainment Service,Electronics Store,Duty-free Shop,Dumpling Restaurant
256,77.156957,Yoga Studio,Design Studio,Farmers Market,Farm,Event Space,Entertainment Service,Electronics Store,Duty-free Shop,Dumpling Restaurant


## Cluster 4

In [58]:
all_merged.loc[all_merged['Cluster Labels'] == 3, all_merged.columns[[1] + list(range(5, all_merged.shape[1]))]]

Unnamed: 0,Longitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,72.859313,Entertainment Service,Indian Restaurant,Dessert Shop,Fast Food Restaurant,Farmers Market,Farm,Event Space,Electronics Store,Duty-free Shop
9,72.865256,Grocery Store,Trail,Indian Restaurant,Yoga Studio,Design Studio,Farm,Event Space,Entertainment Service,Electronics Store
16,72.849811,Chinese Restaurant,Spa,Pizza Place,Fast Food Restaurant,Dessert Shop,Farm,Event Space,Entertainment Service,Electronics Store
21,72.83737,Multicuisine Indian Restaurant,Indian Restaurant,Buffet,Yoga Studio,Dim Sum Restaurant,Farmers Market,Farm,Event Space,Entertainment Service
30,72.8408,Pharmacy,Vegetarian / Vegan Restaurant,Yoga Studio,Design Studio,Farm,Event Space,Entertainment Service,Electronics Store,Duty-free Shop
42,72.839427,Chinese Restaurant,Fast Food Restaurant,Track,Shopping Mall,Bakery,Breakfast Spot,Restaurant,Playground,Ice Cream Shop
47,72.831012,Coffee Shop,Lounge,Seafood Restaurant,Dessert Shop,Farm,Event Space,Entertainment Service,Electronics Store,Duty-free Shop
58,72.83203,Chinese Restaurant,Convenience Store,Electronics Store,Seafood Restaurant,Fast Food Restaurant,Snack Place,Coffee Shop,Golf Course,Department Store
63,72.832289,Pizza Place,Cupcake Shop,Indian Restaurant,Dessert Shop,Farmers Market,Farm,Event Space,Entertainment Service,Electronics Store
65,72.830243,Market,Cheese Shop,Yoga Studio,Dessert Shop,Farmers Market,Farm,Event Space,Entertainment Service,Electronics Store


## Cluster 5

In [59]:
all_merged.loc[all_merged['Cluster Labels'] == 4, all_merged.columns[[1] + list(range(5, all_merged.shape[1]))]]

Unnamed: 0,Longitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,100.340014,Indian Restaurant,Hostel,Halal Restaurant,Coffee Shop,Bakery,Public Art,Vegetarian / Vegan Restaurant,Gift Shop,French Restaurant
15,72.830267,Indian Restaurant,Park,Arcade,Coffee Shop,Chinese Restaurant,Dessert Shop,Bookstore,College Auditorium,Bagel Shop
27,72.835335,Coffee Shop,Bar,Café,Train Station,Multiplex,Sandwich Place,Chinese Restaurant,Electronics Store,Dessert Shop
31,72.825865,Bakery,Food Truck,Café,Yoga Studio,Diner,Fast Food Restaurant,Farmers Market,Farm,Event Space
34,72.837053,Sandwich Place,Café,Indian Restaurant,Yoga Studio,Dessert Shop,Farm,Event Space,Entertainment Service,Electronics Store
39,72.806538,Café,Bakery,Sandwich Place,Donut Shop,Theater,Department Store,Other Great Outdoors,Cosmetics Shop,Concert Hall
43,72.859621,Café,Train Station,Design Studio,Farmers Market,Farm,Event Space,Entertainment Service,Electronics Store,Duty-free Shop
45,72.825031,Mediterranean Restaurant,Scenic Lookout,Asian Restaurant,Dim Sum Restaurant,Farmers Market,Farm,Event Space,Entertainment Service,Electronics Store
51,72.809547,Pet Store,Sandwich Place,Bakery,Café,Yoga Studio,Event Space,Entertainment Service,Electronics Store,Duty-free Shop
71,72.835335,Coffee Shop,Bar,Café,Train Station,Multiplex,Sandwich Place,Chinese Restaurant,Electronics Store,Dessert Shop


## Conclusion

In this project, we have attempted to load the dataset for two of India’s prime metro cities and have tried to analyze the neighborhood regions in these metro cities based on the type of popular and top venues they have. We have clustered the neighborhoods based on the most common top venues in each of the neighborhood. Our intention with this project was to analyze and understand the similarity in the type of life in these metros, which can offer decision points for anybody who is considering to settle in either of the metro cities and can get a peek into what type of experience and facilities he will be provided with.

Given our cluster information for both Mumbai and Delhi, we see that Mumbai and its neighbourhoods are a great place for a foodie. There are a lot of restaurants, cafes, bars, etc in Mumbai neighbourhoods. Also due to the proximity of Mumbai to the seashore, Mumbai neighborhoods offer for harbors, seafood, boat, and ferry rides. On the other hand, we see how dissimilar life in Delhi neighbourhoods would be compared to Mumbai neighbourhoods. Delhi neighborhoods and good for those who like Arts and Crafts, Museums, Water Parks and Pizza places. There is very less in terms of foreign cuisine restaurants in Delhi. Mumbai, on the other hand, is great for international visitors, expats, etc, because of the variety and types of food outlets it has. Delhi is inland and its neighborhoods have proximity to Water Parks, Museums and Arts, and Crafts stores.

Thus with this project, we have analyzed the kind of life each of these big metro cities has to offer based on the popular venues in their neighborhood.