# Where to Open an Indian Restaurant in Mumbai, India

## Introduction

    Selecting the right area to open a new restaurant is very important for its viability. Opening a restaurant in a not so favourable area not only affects its viability but also costs financially to the owner. In a metropolitan city like Mumbai, location is of utmost importance when it comes to restaurants. People in Mumbai prefers to live with people nearby of same ethnicity and hence prefers to visit restaurants serving such type of food.

## Business Problem

    The objective of this study is to cluster similar type of restaurants, and hence allowing a entrepreneur to make a better decision for the selection of locality and type of restaurant.

## Data

    To solve this problem, following data would be required:
    
    # List of neighborhoods in Mumbai, India (https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Mumbai). This defines the scope of this project.
    
    # Latitude and Longitude of the neighborhoods to plot the neighborhoods on map and get the venues nearby.
    
    # Venue data to form clusters and make recommendations.

## Methodology
    Firstly, neighborhoods of Mumbai will be scraped from wikipedia using Beautiful Soup. After getting a list of neighborhoods, their coordinates will be fetched using the geocoder package. These coordinates would be then used to find out the nearby venues under a radius of 500 metres with the help of foursquare.
    The nearby venues would then be clustered and recommendations will be made.

## Importing packages required

In [1]:
import numpy as np
import pandas as pd
import requests
import folium
from pandas.io.json import json_normalize
from geopy.geocoders import Nominatim
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
from bs4 import BeautifulSoup
import html5lib

## Scraping the neighborhood areas of mumbai from wikipedia page using BeautifulSoup

In [2]:
source = requests.get("https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Mumbai").text
soup = BeautifulSoup(source, 'html')

In [3]:
print(soup.prettify())

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   Category:Neighbourhoods in Mumbai - Wikipedia
  </title>
  <script>
   document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgMonthNamesShort":["","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"wgRequestId":"Xh1KXQpAMFcAAGbAvFgAAABV","wgCSPNonce":!1,"wgCanonicalNamespace":"Category","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":14,"wgPageName":"Category:Neighbourhoods_in_Mumbai","wgTitle":"Neighbourhoods in Mumbai","wgCurRevisionId":833825366,"wgRevisionId":833825366,"wgArticleId":12900201,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Commo

In [4]:
lst = []
for ul in soup.find_all('div', class_ = 'mw-category-group'):
    for li in ul.find_all('li'):
        name = li.a.text
        lst.append(name)

lst = lst[1:]
lst

['Aarey Milk Colony',
 'Agripada',
 'Altamount Road',
 'Amboli, Mumbai',
 'Amrut Nagar',
 'Antop Hill',
 'Anushakti Nagar',
 'Asalfa',
 'Badhwar Park',
 'Baiganwadi',
 'Ballard Estate',
 'Bandra',
 'Bandra Kurla Complex',
 'Bangur Nagar',
 'Bhuleshwar',
 'Bori Bunder',
 'Breach Candy',
 'Byculla',
 'C.G.S. colony',
 'Cavel',
 'Chandanwadi, Mumbai',
 'Chandivali',
 'Chinchpokli',
 'Chira Bazaar',
 'Chor Bazaar',
 'Churchgate',
 'Colaba',
 'Cotton Green',
 'Cuffe Parade',
 'Cumbala Hill',
 'Currey Road railway station',
 'D.N. Nagar',
 'Dadar',
 'Dadar Parsi Colony',
 'Dagdi Chawl',
 'Dava Bazaar',
 'Dedh galli',
 'Deonar',
 'Dharavi',
 'Dhobitalao',
 'Dindoshi',
 'Dongri',
 'Fanas Wadi',
 'Ferry Wharf',
 'Fort (Mumbai precinct)',
 'Four Bungalows',
 'Gavangaon',
 'Ghodapdeo',
 'Girgaon',
 'Gokuldham',
 'Gopalrao Deshmukh Marg',
 'Gorai',
 'Gowalia Tank',
 'Guru Tegh Bahadur Nagar',
 'Hindu colony',
 'Hiranandani Gardens, Mumbai',
 'I.C. Colony',
 'Irla',
 'Jagruti Nagar',
 'JB Nagar',
 

### Getting the coordinates of Neighborhoods

In [5]:
address = 'Mumbai, IN'

geolocator = Nominatim(user_agent="mumbai_explorer")
location = geolocator.geocode(address)

lat = location.latitude
lng = location.longitude

print("Latitude and Longitude of {} are {}, {}".format(address,lat,lng))

Latitude and Longitude of Mumbai, IN are 18.9387711, 72.8353355


In [6]:
Latitude = []
Longitude = []
Area = []
for address in lst:
    add = address+str(', IN')
    geolocator = Nominatim(user_agent="mumbai_explorer")
    
    location = geolocator.geocode(add)
    
    if location is None:
        continue
    Area.append(add)
    lat = location.latitude
    Latitude.append(lat)
    lng = location.longitude
    Longitude.append(lng)

    print("Latitude and Longitude of {} are {}, {}".format(add,lat,lng))

Latitude and Longitude of Aarey Milk Colony, IN are 19.1561292, 72.8707223
Latitude and Longitude of Agripada, IN are 18.9753024, 72.8248975
Latitude and Longitude of Altamount Road, IN are 18.9663618, 72.809148
Latitude and Longitude of Amboli, Mumbai, IN are 19.1319915, 72.8499596
Latitude and Longitude of Amrut Nagar, IN are 19.1741818, 73.0204924
Latitude and Longitude of Antop Hill, IN are 19.0207608, 72.8652556
Latitude and Longitude of Anushakti Nagar, IN are 19.0395778, 72.9221562
Latitude and Longitude of Badhwar Park, IN are 18.91904145, 72.8264976296761
Latitude and Longitude of Baiganwadi, IN are 19.06189785, 72.92496935499268
Latitude and Longitude of Ballard Estate, IN are 18.9366512, 72.8391325
Latitude and Longitude of Bandra, IN are 26.049014149999998, 85.62282824383587
Latitude and Longitude of Bandra Kurla Complex, IN are 19.067115, 72.8657245
Latitude and Longitude of Bangur Nagar, IN are 19.1688142, 72.8336777
Latitude and Longitude of Bhuleshwar, IN are 18.9537706

In [7]:
df = pd.DataFrame(list(zip(Area, Latitude, Longitude)), columns=['Area','Latitude', 'Longitude'])

In [8]:
df

Unnamed: 0,Area,Latitude,Longitude
0,"Aarey Milk Colony, IN",19.156129,72.870722
1,"Agripada, IN",18.975302,72.824898
2,"Altamount Road, IN",18.966362,72.809148
3,"Amboli, Mumbai, IN",19.131992,72.849960
4,"Amrut Nagar, IN",19.174182,73.020492
...,...,...,...
112,"Virar, IN",19.455298,72.811816
113,"Wadala, IN",19.026919,72.875934
114,"Walkeshwar, IN",18.955343,72.807947
115,"Yashodham, IN",19.174371,72.862940


### Creating a map of Mumbai and its neighborhood areas using Folium



In [9]:
map_mumbai = folium.Map(location=[lat, lng], zoom_start=10)

for latitude, longitude, area in zip(df['Latitude'], df['Longitude'], df['Area']):
    label = '{}'.format(Area)
    label = folium.Popup(label,parse_html=True)
    folium.CircleMarker(
    [latitude, longitude],
    radius = 4,
    popup=label,
    color = 'Blue',
    fill = True,
    fill_colors = '#3186cc',
    fill_opacity = 0.6,
    parse_html= False).add_to(map_mumbai)

map_mumbai

In [1]:
# Defining Foursquare Credentials: # Removed because of credentials 

CLIENT_ID = '##'
CLIENT_SECRET = '##'
VERSION = '20180605'

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: ##
CLIENT_SECRET:##


In [21]:
neighborhood_lat = df.loc[116, 'Latitude']
neighborhood_lng = df.loc[116, 'Longitude']
neighborhood_name = df.loc[116, 'Area']

print('Latitude and Longitude of {} are {} and {}'.format(neighborhood_name, neighborhood_lat, neighborhood_lng))

Latitude and Longitude of Zaveri Bazaar, IN are 18.9494725 and 72.8307161


### Exploring venues near to the neighborhood within 500 meter radius and limit of 100 venues



In [22]:

LIMIT = 100
radius = 500

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
CLIENT_ID,
CLIENT_SECRET,
VERSION,
neighborhood_lat,
neighborhood_lng,
radius,
LIMIT)

url

'https://api.foursquare.com/v2/venues/explore?&client_id=GPFSBWQXDWCKKF4LVNG1TUPQLTL1PVLD5A520NNENETDZSEG&client_secret=AHQYW0SG00EVSFHUKEPEDIR1N4GW2J2YQW20RIG2WGYCVIJR&v=20180605&ll=18.9494725,72.8307161&radius=500&limit=100'

In [23]:
results = requests.get(url).json()

results

{'meta': {'code': 200, 'requestId': '5e1d764ca2e538001b7f5cd5'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Mumbai',
  'headerFullLocation': 'Mumbai',
  'headerLocationGranularity': 'city',
  'totalResults': 15,
  'suggestedBounds': {'ne': {'lat': 18.953972504500005,
    'lng': 72.8354650714802},
   'sw': {'lat': 18.944972495499993, 'lng': 72.8259671285198}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4b0587d8f964a52006a422e3',
       'name': 'Bhagat Tarachand Restaurant',
       'location': {'address': 'Sheikh Memon Street',
        'lat': 18.951801999721898,
        'lng': 72.83048598110027,
        'labeledLatLngs': [{'label': 'display',
          'lat': 18.95180199972

In [25]:
# Function that extracts the category name

def get_category_type(row):
    
    try:
        category_list = row['categories']
    except:
        category_list = row['venue.categories']
        
    if len(category_list) == 0:
        return None
    else:
        return category_list[0]['name']

In [47]:
venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues)

filtered_column = ['venue.name','venue.categories', 'venue.location.lat', 'venue.location.lng', 'venue.location.distance']

nearby_venues = nearby_venues.loc[:,filtered_column]

nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis = 1)

nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

In [48]:
nearby_venues

Unnamed: 0,name,categories,lat,lng,distance
0,Bhagat Tarachand Restaurant,Indian Restaurant,18.951802,72.830486,260
1,Shree Thaker Bhojnalay,Indian Restaurant,18.951217,72.828326,317
2,Crawford Market,Market,18.946761,72.831225,306
3,Parsi Dairy Farm,Cheese Shop,18.946756,72.831183,306
4,"Badshah - Falooda, Ice Cream, Syrups",Ice Cream Shop,18.947303,72.83347,377
5,Surti Hotel,Food,18.952567,72.829905,354
6,Badshah,Fast Food Restaurant,18.947244,72.833456,380
7,Sadanand Hotel,Indian Restaurant,18.94797,72.834094,393
8,Patel Restaurant,Restaurant,18.949798,72.834655,416
9,Jaffar Bhai Delhi Darbar,Indian Restaurant,18.95,72.834625,415


In [50]:
print('{} venues are returned by foursquare'.format(nearby_venues.shape[0]))

15 venues are returned by foursquare


In [53]:
# Creating a function to repeat the above process for every neighborhood

def GetNearByVenues(names, latitudes, longitudes, radius = 500):
    
    venues_list = []
    for name, latitude, longitude in zip(names, latitudes, longitudes):    
        print(name)
        
        
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        latitude,
        longitude,
        radius,
        LIMIT)
    
        results = requests.get(url).json()['response']['groups'][0]['items']
    
        for v in results:
            venues_list.append([(
            name,
            latitude,
            longitude,
            v['venue']['name'],
            v['venue']['location']['lat'],
            v['venue']['location']['lng'],
            v['venue']['categories'][0]['name'],
            )])
        
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [54]:
mumbai_venues = GetNearByVenues(names = df['Area'],
                               latitudes=df['Latitude'],
                               longitudes=df['Longitude'])

Aarey Milk Colony, IN
Agripada, IN
Altamount Road, IN
Amboli, Mumbai, IN
Amrut Nagar, IN
Antop Hill, IN
Anushakti Nagar, IN
Badhwar Park, IN
Baiganwadi, IN
Ballard Estate, IN
Bandra, IN
Bandra Kurla Complex, IN
Bangur Nagar, IN
Bhuleshwar, IN
Bori Bunder, IN
Breach Candy, IN
Byculla, IN
C.G.S. colony, IN
Cavel, IN
Chandivali, IN
Chinchpokli, IN
Chor Bazaar, IN
Churchgate, IN
Colaba, IN
Cotton Green, IN
Cuffe Parade, IN
Cumbala Hill, IN
Currey Road railway station, IN
D.N. Nagar, IN
Dadar, IN
Deonar, IN
Dharavi, IN
Dhobitalao, IN
Dindoshi, IN
Dongri, IN
Fanas Wadi, IN
Ferry Wharf, IN
Four Bungalows, IN
Ghodapdeo, IN
Girgaon, IN
Gokuldham, IN
Gopalrao Deshmukh Marg, IN
Gorai, IN
Gowalia Tank, IN
Guru Tegh Bahadur Nagar, IN
Hindu colony, IN
Hiranandani Gardens, Mumbai, IN
I.C. Colony, IN
Irla, IN
Jagruti Nagar, IN
JB Nagar, IN
Kala Ghoda, IN
Kalbadevi, IN
Kamathipura, IN
Kannamwar Nagar, IN
Kemps Corner, IN
Khar Danda, IN
Kherwadi, IN
Koliwada, IN
Kopar Road, IN
Lalbaug, IN
Lallubhai Comp

In [59]:
print(mumbai_venues.shape)
mumbai_venues.head(10)



(1619, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Aarey Milk Colony, IN",19.156129,72.870722,Panchvati Fast Food Corner,19.157628,72.874506,Fast Food Restaurant
1,"Agripada, IN",18.975302,72.824898,Celejor,18.975844,72.823679,Bakery
2,"Agripada, IN",18.975302,72.824898,cafe coffee day,18.976988,72.824051,Coffee Shop
3,"Agripada, IN",18.975302,72.824898,HDFC Bank,18.973795,72.822895,Bank
4,"Agripada, IN",18.975302,72.824898,YMCA,18.972187,72.823491,Athletics & Sports
5,"Agripada, IN",18.975302,72.824898,YMCA Ground,18.972006,72.824011,Soccer Field
6,"Agripada, IN",18.975302,72.824898,Mumbai Central Platform No. 2,18.975996,72.820417,Platform
7,"Altamount Road, IN",18.966362,72.809148,Doolally Taproom,18.963809,72.807695,Brewery
8,"Altamount Road, IN",18.966362,72.809148,Crossword,18.963474,72.807773,Bookstore
9,"Altamount Road, IN",18.966362,72.809148,Gustoso,18.964198,72.807726,Pizza Place


In [60]:
mumbai_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Aarey Milk Colony, IN",1,1,1,1,1,1
"Agripada, IN",6,6,6,6,6,6
"Altamount Road, IN",19,19,19,19,19,19
"Amboli, Mumbai, IN",5,5,5,5,5,5
"Amrut Nagar, IN",2,2,2,2,2,2
...,...,...,...,...,...,...
"Virar, IN",4,4,4,4,4,4
"Wadala, IN",4,4,4,4,4,4
"Walkeshwar, IN",17,17,17,17,17,17
"Yashodham, IN",37,37,37,37,37,37


In [66]:
print("There are {} of unique venue categories".format(len(mumbai_venues['Venue Category'].unique())))

There are 181 of unique venue categories


In [72]:
# One Hot Encoding

mumbai_one_hot = pd.get_dummies(mumbai_venues[['Venue Category']], prefix = "", prefix_sep="")

mumbai_one_hot['Neighborhood'] = mumbai_venues['Neighborhood']

# Moving Neighborhood column to the first column

fixed_columns = [mumbai_one_hot.columns[-1]] + list(mumbai_one_hot.columns[:-1])

mumbai_one_hot = mumbai_one_hot[fixed_columns]

mumbai_one_hot.head()

Unnamed: 0,Neighborhood,ATM,Accessories Store,American Restaurant,Antique Shop,Aquarium,Arcade,Art Gallery,Arts & Crafts Store,Arts & Entertainment,...,Train,Train Station,Travel & Transport,Vegetarian / Vegan Restaurant,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo
0,"Aarey Milk Colony, IN",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Agripada, IN",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Agripada, IN",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Agripada, IN",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Agripada, IN",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [73]:
mumbai_grouped = mumbai_one_hot.groupby('Neighborhood').mean().reset_index()

In [76]:
mumbai_grouped

Unnamed: 0,Neighborhood,ATM,Accessories Store,American Restaurant,Antique Shop,Aquarium,Arcade,Art Gallery,Arts & Crafts Store,Arts & Entertainment,...,Train,Train Station,Travel & Transport,Vegetarian / Vegan Restaurant,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo
0,"Aarey Milk Colony, IN",0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.00,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
1,"Agripada, IN",0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.00,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
2,"Altamount Road, IN",0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.00,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
3,"Amboli, Mumbai, IN",0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.00,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
4,"Amrut Nagar, IN",0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.00,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
99,"Virar, IN",0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.25,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
100,"Wadala, IN",0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.00,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
101,"Walkeshwar, IN",0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.00,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
102,"Yashodham, IN",0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.00,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.0


In [78]:
num_top_venues = 5

for hood in mumbai_grouped['Neighborhood']:
    print("---"+hood+"---")
    
    temp = mumbai_grouped[mumbai_grouped['Neighborhood']==hood].T.reset_index()
    
    temp.columns = ['venue','freq']
    
    temp = temp.iloc[1:]
    
    temp['freq'] = temp['freq'].astype('float')
    
    temp = temp.round({'freq':2})
    
    print(temp.sort_values('freq', ascending = False).reset_index(drop=True).head(num_top_venues))
    
    print('/n')

---Aarey Milk Colony, IN---
                            venue  freq
0            Fast Food Restaurant   1.0
1                             ATM   0.0
2      Modern European Restaurant   0.0
3              Mughlai Restaurant   0.0
4  Multicuisine Indian Restaurant   0.0
/n
---Agripada, IN---
                venue  freq
0              Bakery  0.17
1            Platform  0.17
2        Soccer Field  0.17
3         Coffee Shop  0.17
4  Athletics & Sports  0.17
/n
---Altamount Road, IN---
               venue  freq
0               Café  0.16
1         Restaurant  0.11
2  Indian Restaurant  0.05
3            Theater  0.05
4              Diner  0.05
/n
---Amboli, Mumbai, IN---
                  venue  freq
0            Playground   0.2
1     Indian Restaurant   0.2
2           Pizza Place   0.2
3  Gym / Fitness Center   0.2
4    Chinese Restaurant   0.2
/n
---Amrut Nagar, IN---
                     venue  freq
0               Restaurant   0.5
1              Candy Store   0.5
2                   

                      venue  freq
0                      Café  0.50
1         Indian Restaurant  0.25
2  Mediterranean Restaurant  0.25
3                       ATM  0.00
4   North Indian Restaurant  0.00
/n
---Kherwadi, IN---
                     venue  freq
0        Indian Restaurant  0.50
1              Pizza Place  0.25
2                     Café  0.25
3                      ATM  0.00
4  North Indian Restaurant  0.00
/n
---Kopar Road, IN---
                     venue  freq
0               Smoke Shop   0.5
1                    Diner   0.5
2                      ATM   0.0
3  North Indian Restaurant   0.0
4       Mughlai Restaurant   0.0
/n
---Lalbaug, IN---
                            venue  freq
0               Indian Restaurant  0.50
1                           Plaza  0.25
2        Maharashtrian Restaurant  0.25
3                     Opera House  0.00
4  Multicuisine Indian Restaurant  0.00
/n
---Land's End, Bandra, IN---
             venue  freq
0      Coffee Shop  0.11
1   Scenic 

4                 ATM  0.00
/n
---Uttan, IN---
               venue  freq
0              Beach  0.33
1         Playground  0.17
2  Indian Restaurant  0.17
3             Resort  0.17
4        Bus Station  0.17
/n
---Versova, Mumbai, IN---
                venue  freq
0                 Pub  0.19
1  Chinese Restaurant  0.12
2                Café  0.12
3         Coffee Shop  0.08
4                 Bar  0.08
/n
---Vidyavihar, IN---
                  venue  freq
0        Cricket Ground   0.2
1         Train Station   0.2
2    Athletics & Sports   0.2
3                   Bar   0.2
4  Fast Food Restaurant   0.2
/n
---Virar, IN---
                  venue  freq
0                  Lake  0.25
1              Platform  0.25
2         Train Station  0.25
3  Fast Food Restaurant  0.25
4                   ATM  0.00
/n
---Wadala, IN---
                venue  freq
0  Light Rail Station  0.25
1                Pool  0.25
2            Gym Pool  0.25
3      Baseball Field  0.25
4                 ATM  0.00
/n


In [79]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [80]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = mumbai_grouped['Neighborhood']

for ind in np.arange(mumbai_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(mumbai_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Aarey Milk Colony, IN",Fast Food Restaurant,Zoo,Food Court,Food,Flower Shop,Flea Market,Fish Market,Field,Farmers Market,Electronics Store
1,"Agripada, IN",Platform,Coffee Shop,Bank,Athletics & Sports,Soccer Field,Bakery,Electronics Store,Food,Flower Shop,Flea Market
2,"Altamount Road, IN",Café,Restaurant,Breakfast Spot,Bakery,Concert Hall,Pizza Place,Coffee Shop,Dessert Shop,Diner,Salon / Barbershop
3,"Amboli, Mumbai, IN",Playground,Gym / Fitness Center,Chinese Restaurant,Indian Restaurant,Pizza Place,Gym Pool,Flower Shop,Fish Market,Field,Fast Food Restaurant
4,"Amrut Nagar, IN",Candy Store,Restaurant,Zoo,Food,Flower Shop,Flea Market,Fish Market,Field,Fast Food Restaurant,Farmers Market


### Clustering the neighborhoods on the basis of the most common venues nearby

In [81]:
# set number of clusters
kclusters = 5

mumbai_grouped_clustering = mumbai_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(mumbai_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 0, 0, 1, 0, 1, 0, 3, 0], dtype=int32)

In [82]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [86]:
mumbai_merged = df

# merge mumbai_grouped with mumbai_data to add latitude/longitude for each neighborhood
mumbai_merged = mumbai_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Area')

mumbai_merged.head() # check the last columns!

Unnamed: 0,Area,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Aarey Milk Colony, IN",19.156129,72.870722,1.0,Fast Food Restaurant,Zoo,Food Court,Food,Flower Shop,Flea Market,Fish Market,Field,Farmers Market,Electronics Store
1,"Agripada, IN",18.975302,72.824898,1.0,Platform,Coffee Shop,Bank,Athletics & Sports,Soccer Field,Bakery,Electronics Store,Food,Flower Shop,Flea Market
2,"Altamount Road, IN",18.966362,72.809148,0.0,Café,Restaurant,Breakfast Spot,Bakery,Concert Hall,Pizza Place,Coffee Shop,Dessert Shop,Diner,Salon / Barbershop
3,"Amboli, Mumbai, IN",19.131992,72.84996,0.0,Playground,Gym / Fitness Center,Chinese Restaurant,Indian Restaurant,Pizza Place,Gym Pool,Flower Shop,Fish Market,Field,Fast Food Restaurant
4,"Amrut Nagar, IN",19.174182,73.020492,1.0,Candy Store,Restaurant,Zoo,Food,Flower Shop,Flea Market,Fish Market,Field,Fast Food Restaurant,Farmers Market


In [87]:
mumbai_merged['Cluster Labels'] = mumbai_merged['Cluster Labels'].replace(np.nan, 0)

In [88]:
mumbai_merged['Cluster Labels'].values

array([1., 1., 0., 0., 1., 0., 1., 0., 3., 0., 0., 0., 1., 3., 0., 1., 0.,
       0., 1., 1., 1., 3., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 1.,
       0., 1., 0., 1., 1., 0., 1., 1., 0., 1., 0., 0., 0., 0., 0., 1., 1.,
       0., 3., 1., 0., 1., 0., 3., 0., 1., 3., 0., 0., 0., 0., 0., 0., 0.,
       4., 2., 1., 0., 3., 0., 0., 0., 3., 0., 0., 0., 3., 0., 1., 0., 3.,
       0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0., 3., 0., 0., 1., 0.,
       0., 1., 0., 0., 1., 0., 0., 0., 0., 1., 1., 1., 0., 1., 3.])

In [91]:
mumbai_merged['Cluster Labels'] = mumbai_merged['Cluster Labels'].astype(int)

In [92]:
mumbai_merged.dtypes

Area                       object
Latitude                  float64
Longitude                 float64
Cluster Labels              int64
1st Most Common Venue      object
2nd Most Common Venue      object
3rd Most Common Venue      object
4th Most Common Venue      object
5th Most Common Venue      object
6th Most Common Venue      object
7th Most Common Venue      object
8th Most Common Venue      object
9th Most Common Venue      object
10th Most Common Venue     object
dtype: object

In [94]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(mumbai_merged['Latitude'], mumbai_merged['Longitude'], mumbai_merged['Area'], mumbai_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [111]:
mumbai_merged.loc[mumbai_merged['Cluster Labels'] == 0, mumbai_merged.columns[[0] + [3] + list(range(4, mumbai_merged.shape[1]))]]

Unnamed: 0,Area,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,"Altamount Road, IN",0,Café,Restaurant,Breakfast Spot,Bakery,Concert Hall,Pizza Place,Coffee Shop,Dessert Shop,Diner,Salon / Barbershop
3,"Amboli, Mumbai, IN",0,Playground,Gym / Fitness Center,Chinese Restaurant,Indian Restaurant,Pizza Place,Gym Pool,Flower Shop,Fish Market,Field,Fast Food Restaurant
5,"Antop Hill, IN",0,Indian Restaurant,Gym / Fitness Center,Bus Station,Bar,Zoo,Electronics Store,Food,Flower Shop,Flea Market,Fish Market
7,"Badhwar Park, IN",0,Indian Restaurant,Coffee Shop,Diner,Café,Hotel,Toy / Game Store,Furniture / Home Store,Chinese Restaurant,Middle Eastern Restaurant,German Restaurant
9,"Ballard Estate, IN",0,Indian Restaurant,Irani Cafe,Hostel,BBQ Joint,Plaza,Clothing Store,Chinese Restaurant,Café,Seafood Restaurant,Flea Market
...,...,...,...,...,...,...,...,...,...,...,...,...
107,"Thakkar Bappa Colony, IN",0,Indian Restaurant,Chinese Restaurant,Grocery Store,Shoe Store,Zoo,Donut Shop,Flower Shop,Flea Market,Fish Market,Field
108,"Uran, IN",0,,,,,,,,,,
109,"Uttan, IN",0,Beach,Playground,Indian Restaurant,Bus Station,Resort,Donut Shop,Flower Shop,Flea Market,Fish Market,Field
110,"Versova, Mumbai, IN",0,Pub,Café,Chinese Restaurant,Coffee Shop,Bar,Italian Restaurant,Asian Restaurant,Market,South Indian Restaurant,Bistro


In [107]:
mumbai_merged.loc[mumbai_merged['Cluster Labels'] == 1, mumbai_merged.columns[[0] + [3] + list(range(4, mumbai_merged.shape[1]))]]

Unnamed: 0,Area,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Aarey Milk Colony, IN",1,Fast Food Restaurant,Zoo,Food Court,Food,Flower Shop,Flea Market,Fish Market,Field,Farmers Market,Electronics Store
1,"Agripada, IN",1,Platform,Coffee Shop,Bank,Athletics & Sports,Soccer Field,Bakery,Electronics Store,Food,Flower Shop,Flea Market
4,"Amrut Nagar, IN",1,Candy Store,Restaurant,Zoo,Food,Flower Shop,Flea Market,Fish Market,Field,Fast Food Restaurant,Farmers Market
6,"Anushakti Nagar, IN",1,Convenience Store,Lake,Fast Food Restaurant,Playground,Cricket Ground,Creperie,Food,Flower Shop,Flea Market,Fish Market
12,"Bangur Nagar, IN",1,Coffee Shop,Clothing Store,Department Store,Zoo,Donut Shop,Food,Flower Shop,Flea Market,Fish Market,Field
15,"Breach Candy, IN",1,Sandwich Place,Bar,Gym / Fitness Center,Indian Restaurant,Park,Pizza Place,Department Store,Chinese Restaurant,Dessert Shop,Racetrack
18,"Cavel, IN",1,Fast Food Restaurant,Pharmacy,Discount Store,Mexican Restaurant,Bar,Café,Donut Shop,Boutique,American Restaurant,Dhaba
19,"Chandivali, IN",1,Snack Place,Hotel,Flea Market,Gym Pool,Gym,Department Store,Multiplex,Zoo,Donut Shop,Fish Market
20,"Chinchpokli, IN",1,Spa,Coffee Shop,Multiplex,Fast Food Restaurant,Donut Shop,Food,Flower Shop,Flea Market,Fish Market,Field
24,"Cotton Green, IN",1,Plaza,Whisky Bar,Bakery,Donut Shop,Food,Flower Shop,Flea Market,Fish Market,Field,Fast Food Restaurant


In [108]:
mumbai_merged.loc[mumbai_merged['Cluster Labels'] == 2, mumbai_merged.columns[[0] + [3] + list(range(4, mumbai_merged.shape[1]))]]

Unnamed: 0,Area,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
69,"Mahul, IN",2,ATM,Food Court,Food,Flower Shop,Flea Market,Fish Market,Field,Fast Food Restaurant,Farmers Market,Electronics Store


In [109]:
mumbai_merged.loc[mumbai_merged['Cluster Labels'] == 3, mumbai_merged.columns[[0] + [3] + list(range(4, mumbai_merged.shape[1]))]]

Unnamed: 0,Area,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,"Baiganwadi, IN",3,Indian Restaurant,Zoo,Donut Shop,Food & Drink Shop,Food,Flower Shop,Flea Market,Fish Market,Field,Fast Food Restaurant
13,"Bhuleshwar, IN",3,Indian Restaurant,Food,Ice Cream Shop,Arcade,Dessert Shop,Bus Station,Fast Food Restaurant,Zoo,Electronics Store,Flower Shop
21,"Chor Bazaar, IN",3,Indian Restaurant,Dessert Shop,BBQ Joint,Antique Shop,Market,Arcade,Ice Cream Shop,Restaurant,Electronics Store,Food
52,"Kalbadevi, IN",3,Indian Restaurant,Snack Place,Jewelry Store,Movie Theater,Market,Café,Cheese Shop,Food,Flower Shop,Flea Market
57,"Kherwadi, IN",3,Indian Restaurant,Pizza Place,Café,Zoo,Donut Shop,Flower Shop,Flea Market,Fish Market,Field,Fast Food Restaurant
60,"Lalbaug, IN",3,Indian Restaurant,Plaza,Maharashtrian Restaurant,Donut Shop,Food,Flower Shop,Flea Market,Fish Market,Field,Fast Food Restaurant
72,"Mandvi, Mumbai, IN",3,Indian Restaurant,BBQ Joint,Dessert Shop,Café,Juice Bar,Chinese Restaurant,Indian Sweet Shop,Smoke Shop,Arcade,Convenience Store
76,"Marol, IN",3,Indian Restaurant,Diner,Snack Place,Hotel,Flea Market,Asian Restaurant,Donut Shop,Food,Flower Shop,Fish Market
80,"Mira Road, IN",3,Indian Restaurant,Gym / Fitness Center,Train Station,Zoo,Discount Store,Flower Shop,Flea Market,Fish Market,Field,Fast Food Restaurant
84,"Navy Nagar, IN",3,Indian Restaurant,Garden,Zoo,Donut Shop,Food,Flower Shop,Flea Market,Fish Market,Field,Fast Food Restaurant


In [110]:
mumbai_merged.loc[mumbai_merged['Cluster Labels'] == 4, mumbai_merged.columns[[0] + [3] + list(range(4, mumbai_merged.shape[1]))]]

Unnamed: 0,Area,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
68,"Mahim, IN",4,Beach,Zoo,Electronics Store,Food & Drink Shop,Food,Flower Shop,Flea Market,Fish Market,Field,Fast Food Restaurant


## Results
    Indian Restaurants are the most common venue for the areas of cluster 3, whereas cluster 2 and cluster 4 are under developed areas as they doesn't have much nearby. Cluster 0 and cluster 1 areas are well diversified in terms of common venues. 

## Conclusion
    The purpose of this project was to identify suitable locality for opening an Indian Restaurant in Mumbai, India. From the clusters formed, it is evident that for an Indian Restaurant, the ideal location would be in the areas   of Cluster 3 as it is the most common venue in those locations whereas, for a fast food restaurant, areas of Cluster 0 will be preferred.