# **Opening a New Restaurant in Mumbai**

       Mumbai is the financial and cultural capital of India. It is home to Indians of all states and is also a tourist attraction. What is more intriguing than its coastal location is the plethora of cuisines it hosts.
        Be it vegetarian or non-vegetarian, the dishes in Mumbai boast of rich taste, fieriness and impressive flavors. The cuisine of Mumbai covers a large assortment of interesting, authentic dishes and zesty seafood dishes. The staple foods consumed by the residents of Mumbai include rice, aromatic fish curries, Indian bread (chapatis and rotis), vegetable curries, pulses and desserts. Coconuts, cashew nuts, peanuts and peanut oil are some of the major ingredients used in many of Mumbai's traditional dishes.
        To monopolise on the hunger pangs of its residents, Mumbai is a rather great location for someone looking to enter the restaurant business. But being so populated, the high competition is obvious. Hence, it is sensible to analyse and settle for a location which would be most economically profitable.


## Installing the Libraries

In [1]:
!pip install geopy
!pip install geocoder
!pip install folium

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library


print('Libraries imported.')

Libraries imported.


<a id='item1'></a>


## 1. Download and Explore Dataset


In [2]:
df = pd.read_html('https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Mumbai')[-1]
df.head(10)

Unnamed: 0,Area,Location,Latitude,Longitude
0,Amboli,"Andheri,Western Suburbs",19.1293,72.8434
1,"Chakala, Andheri",Western Suburbs,19.111388,72.860833
2,D.N. Nagar,"Andheri,Western Suburbs",19.124085,72.831373
3,Four Bungalows,"Andheri,Western Suburbs",19.124714,72.82721
4,Lokhandwala,"Andheri,Western Suburbs",19.130815,72.82927
5,Marol,"Andheri,Western Suburbs",19.119219,72.882743
6,Sahar,"Andheri,Western Suburbs",19.098889,72.867222
7,Seven Bungalows,"Andheri,Western Suburbs",19.129052,72.817018
8,Versova,"Andheri,Western Suburbs",19.12,72.82
9,Mira Road,"Mira-Bhayandar,Western Suburbs",19.284167,72.871111


In [3]:
df.rename(columns={'Area':'Neighborhood'}, inplace=True)
df.head(10)

Unnamed: 0,Neighborhood,Location,Latitude,Longitude
0,Amboli,"Andheri,Western Suburbs",19.1293,72.8434
1,"Chakala, Andheri",Western Suburbs,19.111388,72.860833
2,D.N. Nagar,"Andheri,Western Suburbs",19.124085,72.831373
3,Four Bungalows,"Andheri,Western Suburbs",19.124714,72.82721
4,Lokhandwala,"Andheri,Western Suburbs",19.130815,72.82927
5,Marol,"Andheri,Western Suburbs",19.119219,72.882743
6,Sahar,"Andheri,Western Suburbs",19.098889,72.867222
7,Seven Bungalows,"Andheri,Western Suburbs",19.129052,72.817018
8,Versova,"Andheri,Western Suburbs",19.12,72.82
9,Mira Road,"Mira-Bhayandar,Western Suburbs",19.284167,72.871111


#### Use geopy library to get the latitude and longitude values of Mumbai City.


In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>ny_explorer</em>, as shown below.


In [4]:
address = 'Mumbai, IN'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Mumbai City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Mumbai City are 19.0759899, 72.8773928.


#### Creating a map of Mumbai with neighborhoods superimposed on top.


In [5]:
# create map of Mumbai using latitude and longitude values
map_mumbai = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, location, Neighborhood in zip(df['Latitude'], df['Longitude'], df['Location'], df['Neighborhood']):
    label = '{}, {}'.format(Neighborhood, location)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_mumbai)  
    
map_mumbai

**Folium** is a great visualization library. Feel free to zoom into the above map, and click on each circle mark to reveal the name of the neighborhood and its respective borough.


### As we can see in the previous table, the location column has two places seperated by a ' , ' We can change this to contain just one place name making it easier to deal ahead  

In [6]:
df['Location'] = df['Location'].apply(lambda x: x.split(',')[-1])
df.head(10)

Unnamed: 0,Neighborhood,Location,Latitude,Longitude
0,Amboli,Western Suburbs,19.1293,72.8434
1,"Chakala, Andheri",Western Suburbs,19.111388,72.860833
2,D.N. Nagar,Western Suburbs,19.124085,72.831373
3,Four Bungalows,Western Suburbs,19.124714,72.82721
4,Lokhandwala,Western Suburbs,19.130815,72.82927
5,Marol,Western Suburbs,19.119219,72.882743
6,Sahar,Western Suburbs,19.098889,72.867222
7,Seven Bungalows,Western Suburbs,19.129052,72.817018
8,Versova,Western Suburbs,19.12,72.82
9,Mira Road,Western Suburbs,19.284167,72.871111


#### We can find how many neighborhoods are present under each Location

In [7]:
df['Location'].value_counts()

South Mumbai       39
Western Suburbs    36
Eastern Suburbs    12
Harbour Suburbs     4
Mumbai              2
Name: Location, dtype: int64

### As we can see South Mumbai contains the maximum number of Neighborhoods

# 2. Foursquare API

### We will now use the Foursquare API to collect the venue details

#### Define Foursquare Credentials and Version


In [8]:
CLIENT_ID = 'OTWSWXVP1IABXZVJ4ZUQGLJTGQMQ1JGCAHDCD5C0VYSJHU2P' # your Foursquare ID
CLIENT_SECRET = 'ZXWG1AXPNLFKMFYGSM55DVXRG0U3WT22HVIM2CGDVSUOPLWE' # your Foursquare Secret
VERSION = '20201214'

### Exploring the only first Neighborhood

In [9]:
Neighborhood_name = df.loc[0, 'Neighborhood']
neighborhood_latitude = df.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = df.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Amboli are 19.1293, 72.8434.


#### Now, let's get the top 100 venues that are in a radius of 1000 meters.


In [10]:
LIMIT = 100
radius = 1000

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

In [11]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5fd741b70223702a3a782e56'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Jogeshwari West',
  'headerFullLocation': 'Jogeshwari West, Mumbai',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 33,
  'suggestedBounds': {'ne': {'lat': 19.13830000900001,
    'lng': 72.8529082359012},
   'sw': {'lat': 19.120299990999992, 'lng': 72.83389176409881}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '5174e2be498e39cf0d1c20cb',
       'name': 'Shawarma Factory',
       'location': {'address': 'Dadabhai Road',
        'crossStreet': 'Off JP Road, Near Navrang Cinema',
        'lat': 19.124590572173467,
        'lng': 72.8403981304492,
    

In [12]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a _pandas_ dataframe.


In [13]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  app.launch_new_instance()


Unnamed: 0,name,categories,lat,lng
0,Shawarma Factory,Falafel Restaurant,19.124591,72.840398
1,Jaffer Bhai's Delhi Darbar,Mughlai Restaurant,19.137714,72.845909
2,Cafe Arfa,Indian Restaurant,19.12893,72.84714
3,"5 Spice , Bandra",Chinese Restaurant,19.130421,72.847206
4,Pizza Express,Pizza Place,19.131893,72.834668


And how many venues were returned by Foursquare?


In [14]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

33 venues were returned by Foursquare.


<a id='item2'></a>


In [15]:
def getNearbyVenues(names, latitudes, longitudes, radius=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [16]:
mumbai_venues = getNearbyVenues(names=df['Neighborhood'],
                                latitudes=df['Latitude'],
                                longitudes=df['Longitude'])

Amboli
Chakala, Andheri
D.N. Nagar
Four Bungalows
Lokhandwala
Marol
Sahar
Seven Bungalows
Versova
Mira Road
Bhayandar
Uttan
Bandstand Promenade
Kherwadi
Pali Hill
I.C. Colony
Gorai
Dahisa
Aarey Milk Colony
Bangur Nagar
Jogeshwari West
Juhu
Charkop
Poisar
Mahavir Nagar
Thakur village
Pali Naka
Khar Danda
Dindoshi
Sunder Nagar
Kalina
Naigaon
Nalasopara
Virar
Irla
Vile Parle
Bhandup
Amrut Nagar
Asalfa
Pant Nagar
Kanjurmarg
Nehru Nagar
Nahur
Chandivali
Hiranandani Gardens
Indian Institute of Technology Bombay campus
Vidyavihar
Vikhroli
Chembur
Deonar
Mankhurd
Mahul
Agripada
Altamount Road
Bhuleshwar
Breach Candy
Carmichael Road
Cavel
Churchgate
Cotton Green
Cuffe Parade
Cumbala Hill
Currey Road
Dhobitalao
Dongri
Kala Ghoda
Kemps Corner
Lower Parel
Mahalaxmi
Mahim
Malabar Hill
Marine Drive
Marine Lines
Mumbai Central
Nariman Point
Prabhadevi
Sion
Walkeshwar
Worli
C.G.S. colony
Dagdi Chawl
Navy Nagar
Hindu colony
Ballard Estate
Chira Bazaar
Fanas Wadi
Chor Bazaar
Matunga
Parel
Gowalia Tank
D

#### Let's check the size of the resulting dataframe


In [17]:
print(mumbai_venues.shape)
mumbai_venues.head()

(137, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Amboli,19.1293,72.8434,Joe & Helen D'mello Ground,19.129238,72.842795,Playground
1,Amboli,19.1293,72.8434,V33,19.129068,72.84367,Gym
2,"Chakala, Andheri",19.111388,72.860833,The Mirador Mumbai,19.111462,72.860667,Asian Restaurant
3,"Chakala, Andheri",19.111388,72.860833,Grapevine Gourmet Cuisine,19.11184,72.860749,Vegetarian / Vegan Restaurant
4,Seven Bungalows,19.129052,72.817018,Tanjore Tiffin Room,19.128438,72.81715,South Indian Restaurant


Let's check how many venues were returned for each neighborhood


In [18]:
mumbai_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Amboli,2,2,2,2,2,2
Bandstand Promenade,4,4,4,4,4,4
Bhuleshwar,1,1,1,1,1,1
Breach Candy,1,1,1,1,1,1
C.G.S. colony,3,3,3,3,3,3
Cavel,1,1,1,1,1,1
"Chakala, Andheri",2,2,2,2,2,2
Chembur,3,3,3,3,3,3
Cotton Green,2,2,2,2,2,2
Currey Road,1,1,1,1,1,1


#### Let's find out how many unique categories can be curated from all the returned venues


In [19]:
print('There are {} uniques categories.'.format(len(mumbai_venues['Venue Category'].unique())))

There are 62 uniques categories.


<a id='item3'></a>


## 3. Analyze Each Neighborhood


In [20]:
# one hot encoding
mumbai_onehot = pd.get_dummies(mumbai_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
mumbai_onehot['Neighborhood'] = mumbai_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [mumbai_onehot.columns[-1]] + list(mumbai_onehot.columns[:-1])
mumbai_onehot = mumbai_onehot[fixed_columns]

mumbai_onehot.head()

Unnamed: 0,Women's Store,Airport Lounge,Arcade,Asian Restaurant,Bakery,Bar,Bistro,Boutique,Burger Joint,Bus Station,Café,Cheese Shop,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Cupcake Shop,Department Store,Dessert Shop,Diner,Fast Food Restaurant,Food & Drink Shop,Food Truck,Gastropub,Gourmet Shop,Gym,Hookah Bar,Hotel,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Jewelry Store,Juice Bar,Lounge,Market,Men's Store,Middle Eastern Restaurant,Movie Theater,Multiplex,Music Store,Neighborhood,North Indian Restaurant,Performing Arts Venue,Pizza Place,Platform,Playground,Plaza,Pub,Restaurant,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shopping Mall,Smoke Shop,Snack Place,South Indian Restaurant,Spanish Restaurant,Street Food Gathering,Tea Room,Train Station,Vegetarian / Vegan Restaurant
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Amboli,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Amboli,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Chakala, Andheri",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Chakala, Andheri",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Seven Bungalows,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0


And let's examine the new dataframe size.


In [21]:
mumbai_onehot.shape

(137, 62)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category


In [22]:
mumbai_grouped = mumbai_onehot.groupby('Neighborhood').mean().reset_index()
mumbai_grouped

Unnamed: 0,Neighborhood,Women's Store,Airport Lounge,Arcade,Asian Restaurant,Bakery,Bar,Bistro,Boutique,Burger Joint,Bus Station,Café,Cheese Shop,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Cupcake Shop,Department Store,Dessert Shop,Diner,Fast Food Restaurant,Food & Drink Shop,Food Truck,Gastropub,Gourmet Shop,Gym,Hookah Bar,Hotel,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Jewelry Store,Juice Bar,Lounge,Market,Men's Store,Middle Eastern Restaurant,Movie Theater,Multiplex,Music Store,North Indian Restaurant,Performing Arts Venue,Pizza Place,Platform,Playground,Plaza,Pub,Restaurant,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shopping Mall,Smoke Shop,Snack Place,South Indian Restaurant,Spanish Restaurant,Street Food Gathering,Tea Room,Train Station,Vegetarian / Vegan Restaurant
0,Amboli,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Bandstand Promenade,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bhuleshwar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
3,Breach Candy,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,C.G.S. colony,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Cavel,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,"Chakala, Andheri",0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5
7,Chembur,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Cotton Green,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Currey Road,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Let's confirm the new size


In [23]:
mumbai_grouped.shape

(39, 62)

#### Let's print each neighborhood along with the top 5 most common venues


In [24]:
num_top_venues = 5

for hood in mumbai_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = mumbai_grouped[mumbai_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Amboli----
           venue  freq
0     Playground   0.5
1            Gym   0.5
2  Women's Store   0.0
3      Juice Bar   0.0
4         Lounge   0.0


----Bandstand Promenade----
                   venue  freq
0         Scenic Lookout  0.25
1     Chinese Restaurant  0.25
2  Performing Arts Venue  0.25
3               Boutique  0.25
4          Women's Store  0.00


----Bhuleshwar----
                   venue  freq
0  Street Food Gathering   1.0
1          Women's Store   0.0
2             Playground   0.0
3              Juice Bar   0.0
4                 Lounge   0.0


----Breach Candy----
                venue  freq
0   Indian Restaurant   1.0
1  Italian Restaurant   0.0
2           Juice Bar   0.0
3              Lounge   0.0
4              Market   0.0


----C.G.S. colony----
            venue  freq
0  Airport Lounge  0.33
1  Ice Cream Shop  0.33
2      Smoke Shop  0.33
3      Playground  0.00
4       Juice Bar  0.00


----Cavel----
                venue  freq
0       Jewelry Store

#### Let's put that into a _pandas_ dataframe


First, let's write a function to sort the venues in descending order.


In [25]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 5 venues for each neighborhood.


In [26]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = mumbai_grouped['Neighborhood']

for ind in np.arange(mumbai_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(mumbai_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Amboli,Playground,Gym,Cocktail Bar,Hookah Bar,Gourmet Shop
1,Bandstand Promenade,Boutique,Scenic Lookout,Chinese Restaurant,Performing Arts Venue,Gastropub
2,Bhuleshwar,Street Food Gathering,Vegetarian / Vegan Restaurant,Cocktail Bar,Hookah Bar,Gym
3,Breach Candy,Indian Restaurant,Ice Cream Shop,Hookah Bar,Gym,Gourmet Shop
4,C.G.S. colony,Airport Lounge,Ice Cream Shop,Smoke Shop,Vegetarian / Vegan Restaurant,Fast Food Restaurant


<a id='item4'></a>


## 4. Cluster Neighborhoods


Run _k_-means to cluster the neighborhood into 5 clusters.


In [None]:
df.reset_index()

In [27]:
k = 5

mumbai_clustering = mumbai_grouped.drop('Neighborhood', axis=1)
kmeans = KMeans(n_clusters=k, init="k-means++", random_state=40).fit(mumbai_clustering) #Can choose any random_state

kmeans.labels_

array([1, 1, 1, 3, 1, 1, 1, 3, 1, 1, 1, 3, 2, 1, 1, 1, 1, 3, 1, 1, 3, 1,
       1, 1, 3, 3, 4, 1, 1, 3, 1, 3, 1, 1, 1, 1, 2, 2, 0], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.


In [28]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
mumbai_merged = df
mumbai_merged = mumbai_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

Finally, let's visualize the resulting clusters


<a id='item5'></a>


## 5. Examine Clusters


Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster. I will leave this exercise to you.


#### Cluster 1


In [30]:
mumbai_merged.loc[mumbai_merged['Cluster Labels'] == 0, mumbai_merged.columns[[1] + list(range(5, mumbai_merged.shape[1]))]]

Unnamed: 0,Location,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
77,South Mumbai,Coffee Shop,Vegetarian / Vegan Restaurant,Ice Cream Shop,Hookah Bar,Gym


#### Cluster 2


In [31]:
mumbai_merged.loc[mumbai_merged['Cluster Labels'] == 1, mumbai_merged.columns[[1] + list(range(5, mumbai_merged.shape[1]))]]

Unnamed: 0,Location,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Western Suburbs,Playground,Gym,Cocktail Bar,Hookah Bar,Gourmet Shop
1,Western Suburbs,Vegetarian / Vegan Restaurant,Asian Restaurant,Coffee Shop,Hookah Bar,Gym
7,Western Suburbs,Indian Restaurant,Bistro,Gym,Café,Pub
12,Western Suburbs,Boutique,Scenic Lookout,Chinese Restaurant,Performing Arts Venue,Gastropub
13,Western Suburbs,Gourmet Shop,Indian Restaurant,Snack Place,Bar,Burger Joint
15,Western Suburbs,Bakery,Bar,Fast Food Restaurant,Cheese Shop,Dessert Shop
17,Western Suburbs,Train Station,Hookah Bar,Coffee Shop,Hotel,Gym
25,Western Suburbs,Food Truck,Café,Vegetarian / Vegan Restaurant,Coffee Shop,Hookah Bar
26,Western Suburbs,Indian Restaurant,Bar,Chinese Restaurant,Arcade,Seafood Restaurant
30,Western Suburbs,Train Station,Men's Store,Middle Eastern Restaurant,Clothing Store,Vegetarian / Vegan Restaurant


#### Cluster 3


In [32]:
mumbai_merged.loc[mumbai_merged['Cluster Labels'] == 2, mumbai_merged.columns[[1] + list(range(5, mumbai_merged.shape[1]))]]

Unnamed: 0,Location,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
11,Western Suburbs,Bus Station,Vegetarian / Vegan Restaurant,Coffee Shop,Hookah Bar,Gym
46,Eastern Suburbs,Restaurant,Bus Station,Vegetarian / Vegan Restaurant,Coffee Shop,Hookah Bar
49,Harbour Suburbs,Restaurant,Vegetarian / Vegan Restaurant,Cocktail Bar,Hookah Bar,Gym


#### Cluster 4


In [33]:
mumbai_merged.loc[mumbai_merged['Cluster Labels'] == 3, mumbai_merged.columns[[1] + list(range(5, mumbai_merged.shape[1]))]]

Unnamed: 0,Location,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
9,Western Suburbs,Indian Restaurant,Ice Cream Shop,Hookah Bar,Gym,Gourmet Shop
14,Western Suburbs,Indian Restaurant,Bakery,Cupcake Shop,Coffee Shop,Hookah Bar
27,Western Suburbs,Indian Restaurant,Platform,Food Truck,Hookah Bar,Gym
34,Western Suburbs,Indian Restaurant,Bar,Market,Coffee Shop,Hookah Bar
39,Eastern Suburbs,Indian Restaurant,Multiplex,Coffee Shop,Hookah Bar,Gym
48,Harbour Suburbs,Indian Restaurant,Gastropub,Diner,Ice Cream Shop,Hookah Bar
55,South Mumbai,Indian Restaurant,Ice Cream Shop,Hookah Bar,Gym,Gourmet Shop
87,South Mumbai,Indian Restaurant,Dessert Shop,Lounge,Hotel,Asian Restaurant
90,South Mumbai,Indian Restaurant,Market,Cheese Shop,Coffee Shop,Hookah Bar


#### Cluster 5


In [34]:
mumbai_merged.loc[mumbai_merged['Cluster Labels'] == 4, mumbai_merged.columns[[1] + list(range(5, mumbai_merged.shape[1]))]]

Unnamed: 0,Location,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
73,South Mumbai,Fast Food Restaurant,Vegetarian / Vegan Restaurant,Ice Cream Shop,Hookah Bar,Gym


# Result

As we can observe from the five clusters above, it is evident that cluster 4 contains the most suitable venues for opening a restaurant. The other clusters having more variety in common places would not be suitable options.

In [39]:
new_restaurant = mumbai_merged.loc[mumbai_merged['Cluster Labels'] == 3, mumbai_merged.columns[[0, 1, 2, 3] + list(range(5, mumbai_merged.shape[1]))]]
new_restaurant.head()

Unnamed: 0,Neighborhood,Location,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
9,Mira Road,Western Suburbs,19.284167,72.871111,Indian Restaurant,Ice Cream Shop,Hookah Bar,Gym,Gourmet Shop
14,Pali Hill,Western Suburbs,19.068,72.826,Indian Restaurant,Bakery,Cupcake Shop,Coffee Shop,Hookah Bar
27,Khar Danda,Western Suburbs,19.068598,72.840042,Indian Restaurant,Platform,Food Truck,Hookah Bar,Gym
34,Irla,Western Suburbs,19.108056,72.838056,Indian Restaurant,Bar,Market,Coffee Shop,Hookah Bar
39,Pant Nagar,Eastern Suburbs,19.08,72.91,Indian Restaurant,Multiplex,Coffee Shop,Hookah Bar,Gym


In [40]:
map_res_locations = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, location, neighborhood in zip(new_restaurant['Latitude'], new_restaurant['Longitude'],
                                            new_restaurant['Location'], new_restaurant['Neighborhood']):
    label = '{}, {}'.format(neighborhood, location)
    folium.Marker([lat, lng], popup='{} has geographical coordinates ({:.4f}, {:.4f})'.format(label, lat, lng),
                  icon=folium.Icon(color='lightred'), tooltip=label).add_to(map_res_locations)
    
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='yellow',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_res_locations) 

map_res_locations

# Conclusion

Hence, after having analyzed the data, we can conclude that the neighborhoods of cluster 4 would be recommended in order to open a new restaurant