## IBM Applied Data Science Capstone Project

## Week 5

In [111]:
import pandas as pd
import matplotlib.cm as cm
import matplotlib.colors as colors
import numpy as np

In [2]:
import requests

Data scraping

In [5]:
url = 'https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Mumbai'

In [6]:
wiki_url = requests.get(url)

In [7]:
wiki_data = pd.read_html(wiki_url.text)

In [8]:
wiki_data

[                Area                 Location   Latitude  Longitude
 0             Amboli  Andheri,Western Suburbs  19.129300  72.843400
 1   Chakala, Andheri          Western Suburbs  19.111388  72.860833
 2         D.N. Nagar  Andheri,Western Suburbs  19.124085  72.831373
 3     Four Bungalows  Andheri,Western Suburbs  19.124714  72.827210
 4        Lokhandwala  Andheri,Western Suburbs  19.130815  72.829270
 ..               ...                      ...        ...        ...
 88             Parel             South Mumbai  18.990000  72.840000
 89      Gowalia Tank      Tardeo,South Mumbai  18.962450  72.809703
 90       Dava Bazaar             South Mumbai  18.946882  72.831362
 91           Dharavi                   Mumbai  19.040208  72.850850
 92             Thane                   Mumbai  19.200000  72.970000
 
 [93 rows x 4 columns]]

In [11]:
wiki_data[0]

Unnamed: 0,Area,Location,Latitude,Longitude
0,Amboli,"Andheri,Western Suburbs",19.129300,72.843400
1,"Chakala, Andheri",Western Suburbs,19.111388,72.860833
2,D.N. Nagar,"Andheri,Western Suburbs",19.124085,72.831373
3,Four Bungalows,"Andheri,Western Suburbs",19.124714,72.827210
4,Lokhandwala,"Andheri,Western Suburbs",19.130815,72.829270
...,...,...,...,...
88,Parel,South Mumbai,18.990000,72.840000
89,Gowalia Tank,"Tardeo,South Mumbai",18.962450,72.809703
90,Dava Bazaar,South Mumbai,18.946882,72.831362
91,Dharavi,Mumbai,19.040208,72.850850


Putting the data into the dataframe

In [19]:
df = pd.DataFrame(wiki_data[0])

In [20]:
df

Unnamed: 0,Area,Location,Latitude,Longitude
0,Amboli,"Andheri,Western Suburbs",19.129300,72.843400
1,"Chakala, Andheri",Western Suburbs,19.111388,72.860833
2,D.N. Nagar,"Andheri,Western Suburbs",19.124085,72.831373
3,Four Bungalows,"Andheri,Western Suburbs",19.124714,72.827210
4,Lokhandwala,"Andheri,Western Suburbs",19.130815,72.829270
...,...,...,...,...
88,Parel,South Mumbai,18.990000,72.840000
89,Gowalia Tank,"Tardeo,South Mumbai",18.962450,72.809703
90,Dava Bazaar,South Mumbai,18.946882,72.831362
91,Dharavi,Mumbai,19.040208,72.850850


Importing geopy in order to find the co-ordinates of the different areas in Mumbai.

In [12]:
from geopy.geocoders import Nominatim

In [13]:
address = 'Mumbai, Maharashtra'

geolocator = Nominatim(user_agent="mumbai_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The coordinates of Mumbai are {}, {}.'.format(latitude, longitude))

The coordinates of Mumbai are 19.0759899, 72.8773928.


Installing folium for mapping.

In [15]:
pip install folium

Collecting folium
  Using cached folium-0.12.1-py2.py3-none-any.whl (94 kB)
Collecting branca>=0.3.0
  Using cached branca-0.4.2-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.12.1
Note: you may need to restart the kernel to use updated packages.


In [16]:
import folium

In [22]:
map_Mumbai = folium.Map(location=[latitude, longitude], zoom_start=11)

# adding markers to map
for latitude, longitude, location, area in zip(df['Latitude'], df['Longitude'], df['Location'], df['Area']):
    label = '{}, {}'.format(area, location)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='red',
        fill=True
        ).add_to(map_Mumbai)  
    
map_Mumbai

Initializing the API credentials to use foursquare API to get nearby venues.

In [23]:
CLIENT_ID = 'SWJ0LEG2LUAIZKH5KCHUYP2P1MKSXJN5RZDWGFTRIYY1QS1S' 
CLIENT_SECRET = '0SL4GC0A2FDWL3YXFZDXECAHHBZNGRN11RYXGGIYOPKPXDPQ'
VERSION = '20180604' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: SWJ0LEG2LUAIZKH5KCHUYP2P1MKSXJN5RZDWGFTRIYY1QS1S
CLIENT_SECRET:0SL4GC0A2FDWL3YXFZDXECAHHBZNGRN11RYXGGIYOPKPXDPQ


Creating a funciton that gets us the nearby venues of a given location.

In [24]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius
            )
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Category']
    
    return(nearby_venues)

In [26]:
venues_in_Mumbai = getNearbyVenues(df['Area'], df['Latitude'], df['Longitude'])

Amboli
Chakala, Andheri
D.N. Nagar
Four Bungalows
Lokhandwala
Marol
Sahar
Seven Bungalows
Versova
Mira Road
Bhayandar
Uttan
Bandstand Promenade
Kherwadi
Pali Hill
I.C. Colony
Gorai
Dahisar
Aarey Milk Colony
Bangur Nagar
Jogeshwari West
Juhu
Charkop
Poisar
Mahavir Nagar
Thakur village
Pali Naka
Khar Danda
Dindoshi
Sunder Nagar
Kalina
Naigaon
Nalasopara
Virar
Irla
Vile Parle
Bhandup
Amrut Nagar
Asalfa
Pant Nagar
Kanjurmarg
Nehru Nagar
Nahur
Chandivali
Hiranandani Gardens
Indian Institute of Technology Bombay campus
Vidyavihar
Vikhroli
Chembur
Deonar
Mankhurd
Mahul
Agripada
Altamount Road
Bhuleshwar
Breach Candy
Carmichael Road
Cavel
Churchgate
Cotton Green
Cuffe Parade
Cumbala Hill
Currey Road
Dhobitalao
Dongri
Kala Ghoda
Kemps Corner
Lower Parel
Mahalaxmi
Mahim
Malabar Hill
Marine Drive
Marine Lines
Mumbai Central
Nariman Point
Prabhadevi
Sion
Walkeshwar
Worli
C.G.S. colony
Dagdi Chawl
Navy Nagar
Hindu colony
Ballard Estate
Chira Bazaar
Fanas Wadi
Chor Bazaar
Matunga
Parel
Gowalia Tank


In [28]:
venues_in_Mumbai

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category
0,Amboli,19.1293,72.8434,Cafe Arfa,Indian Restaurant
1,Amboli,19.1293,72.8434,"5 Spice , Bandra",Chinese Restaurant
2,Amboli,19.1293,72.8434,Subway,Sandwich Place
3,Amboli,19.1293,72.8434,Cafe Coffee Day,Coffee Shop
4,Amboli,19.1293,72.8434,Apple Service Centre,IT Services
...,...,...,...,...,...
1142,Thane,19.2000,72.9700,thane asead bus depot,Bus Station
1143,Thane,19.2000,72.9700,Khopat Bus Stand,Bus Station
1144,Thane,19.2000,72.9700,Vandana Talkies,Indie Movie Theater
1145,Thane,19.2000,72.9700,Royal Challenge II,Indian Restaurant


In [29]:
venues_in_Mumbai.groupby('Neighbourhood').head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category
0,Amboli,19.1293,72.8434,Cafe Arfa,Indian Restaurant
1,Amboli,19.1293,72.8434,"5 Spice , Bandra",Chinese Restaurant
2,Amboli,19.1293,72.8434,Subway,Sandwich Place
3,Amboli,19.1293,72.8434,Cafe Coffee Day,Coffee Shop
4,Amboli,19.1293,72.8434,Apple Service Centre,IT Services
...,...,...,...,...,...
1137,Thane,19.2000,72.9700,Bombay Barbeque,BBQ Joint
1138,Thane,19.2000,72.9700,Borivali Biryani Centre,Indian Restaurant
1139,Thane,19.2000,72.9700,FishLand,Seafood Restaurant
1140,Thane,19.2000,72.9700,Fish Land,Seafood Restaurant


In [30]:
venues_in_Mumbai.groupby('Venue Category').max()

Unnamed: 0_level_0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
ATM,Mankhurd,19.050000,72.930000,Axis Bank ATM
Afghan Restaurant,Amrut Nagar,19.102077,72.912835,Zaffran
American Restaurant,Sunder Nagar,19.175000,72.912835,Thank God It's Friday
Amphitheater,Khar Danda,19.068598,72.840042,The Habitat
Antique Shop,Chor Bazaar,18.960321,72.827176,Chor Bazaar (Thieves' Market)
...,...,...,...,...
Whisky Bar,Parel,18.990000,72.840000,Best Punjab
Wine Bar,Nariman Point,18.930000,72.823000,Opium Den
Wine Shop,Dava Bazaar,18.946882,72.831362,Peekay Wines
Women's Store,Matunga,19.130815,72.844763,Trios


So there are 163 different types of venue categories in Mumbai.

Now, we will filter those venue categories which have the term 'restaurant' in them.

In [31]:
resto_data = venues_in_Mumbai[venues_in_Mumbai['Venue Category'].str.contains('Restaurant', regex = False)]

In [32]:
resto_data

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category
0,Amboli,19.129300,72.843400,Cafe Arfa,Indian Restaurant
1,Amboli,19.129300,72.843400,"5 Spice , Bandra",Chinese Restaurant
5,Amboli,19.129300,72.843400,Spices & Chillies,Asian Restaurant
8,Amboli,19.129300,72.843400,Delhi Zaika,Halal Restaurant
12,"Chakala, Andheri",19.111388,72.860833,Faaso's,Fast Food Restaurant
...,...,...,...,...,...
1138,Thane,19.200000,72.970000,Borivali Biryani Centre,Indian Restaurant
1139,Thane,19.200000,72.970000,FishLand,Seafood Restaurant
1140,Thane,19.200000,72.970000,Fish Land,Seafood Restaurant
1141,Thane,19.200000,72.970000,Harish Lunch Home,Seafood Restaurant


In [34]:
from sklearn.cluster import KMeans

Now we will use the k-means clustering to cluster the different restaurants only based on their coordinates. Later, when we will use clustering, it will be based on more than one features and then we will see how different both the maps look.

In [35]:
k=3
resto_clustering = resto_data.drop(['Neighbourhood','Venue','Venue Category'],1)
kmeans = KMeans(n_clusters = k,random_state=0).fit(resto_clustering)
kmeans.labels_
resto_data.insert(0, 'Cluster Labels', kmeans.labels_)

In [36]:
resto_data

Unnamed: 0,Cluster Labels,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category
0,2,Amboli,19.129300,72.843400,Cafe Arfa,Indian Restaurant
1,2,Amboli,19.129300,72.843400,"5 Spice , Bandra",Chinese Restaurant
5,2,Amboli,19.129300,72.843400,Spices & Chillies,Asian Restaurant
8,2,Amboli,19.129300,72.843400,Delhi Zaika,Halal Restaurant
12,2,"Chakala, Andheri",19.111388,72.860833,Faaso's,Fast Food Restaurant
...,...,...,...,...,...,...
1138,1,Thane,19.200000,72.970000,Borivali Biryani Centre,Indian Restaurant
1139,1,Thane,19.200000,72.970000,FishLand,Seafood Restaurant
1140,1,Thane,19.200000,72.970000,Fish Land,Seafood Restaurant
1141,1,Thane,19.200000,72.970000,Harish Lunch Home,Seafood Restaurant


In [44]:
# create map
map_clusters = folium.Map(location=[19.0759899,72.8773928],zoom_start=10)

# set color scheme for the clusters
x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, neighborhood, cluster in zip(resto_data['Neighbourhood Latitude'], resto_data['Neighbourhood Longitude'], resto_data['Neighbourhood'], resto_data['Cluster Labels']):
    label = folium.Popup(' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [52]:
resto_data.drop(['Cluster Labels'],1)

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category
0,Amboli,19.129300,72.843400,Cafe Arfa,Indian Restaurant
1,Amboli,19.129300,72.843400,"5 Spice , Bandra",Chinese Restaurant
5,Amboli,19.129300,72.843400,Spices & Chillies,Asian Restaurant
8,Amboli,19.129300,72.843400,Delhi Zaika,Halal Restaurant
12,"Chakala, Andheri",19.111388,72.860833,Faaso's,Fast Food Restaurant
...,...,...,...,...,...
1138,Thane,19.200000,72.970000,Borivali Biryani Centre,Indian Restaurant
1139,Thane,19.200000,72.970000,FishLand,Seafood Restaurant
1140,Thane,19.200000,72.970000,Fish Land,Seafood Restaurant
1141,Thane,19.200000,72.970000,Harish Lunch Home,Seafood Restaurant


One Hot Encoding the different venue categories.

In [53]:
resto_venue_cat = pd.get_dummies(resto_data[['Venue Category']], prefix="", prefix_sep="")
resto_venue_cat

Unnamed: 0,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant,Falafel Restaurant,Fast Food Restaurant,French Restaurant,German Restaurant,...,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mughlai Restaurant,New American Restaurant,North Indian Restaurant,Restaurant,Seafood Restaurant,South Indian Restaurant,Vegetarian / Vegan Restaurant
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
12,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1138,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1139,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
1140,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
1141,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0


Adding the column 'Neighbourhood' to the one hot encoded table.

In [55]:
resto_venue_cat['Neighbourhood'] = resto_data['Neighbourhood'] 

# moving neighborhood column to the first column
fixed_columns = [resto_venue_cat.columns[-1]] + list(resto_venue_cat.columns[:-1])
resto_venue_cat = resto_venue_cat[fixed_columns]

resto_venue_cat

Unnamed: 0,Vegetarian / Vegan Restaurant,Neighbourhood,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant,Falafel Restaurant,Fast Food Restaurant,...,Maharashtrian Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mughlai Restaurant,New American Restaurant,North Indian Restaurant,Restaurant,Seafood Restaurant,South Indian Restaurant
0,0,Amboli,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,Amboli,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,0,Amboli,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,0,Amboli,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
12,0,"Chakala, Andheri",0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1138,0,Thane,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1139,0,Thane,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0
1140,0,Thane,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0
1141,0,Thane,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0


Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [80]:
resto_grouped = resto_venue_cat.groupby('Neighbourhood').mean().reset_index()
resto_grouped.head()

Unnamed: 0,Neighbourhood,Vegetarian / Vegan Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant,Falafel Restaurant,Fast Food Restaurant,...,Maharashtrian Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mughlai Restaurant,New American Restaurant,North Indian Restaurant,Restaurant,Seafood Restaurant,South Indian Restaurant
0,Agripada,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Altamount Road,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Amboli,0.0,0.0,0.0,0.25,0.0,0.25,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Amrut Nagar,0.0,0.066667,0.066667,0.066667,0.0,0.066667,0.0,0.066667,0.133333,...,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0
4,Bandstand Promenade,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.25,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Defining a function that returns the most common venus in each of the categories.

In [57]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [60]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighbourhood'] = resto_grouped['Neighbourhood']

for ind in np.arange(resto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(resto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agripada,Indian Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant,Falafel Restaurant
1,Altamount Road,Indian Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant,Falafel Restaurant
2,Amboli,Halal Restaurant,Asian Restaurant,Chinese Restaurant,Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Bengali Restaurant,Dim Sum Restaurant,Falafel Restaurant
3,Amrut Nagar,Indian Restaurant,Fast Food Restaurant,Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Chinese Restaurant,Mediterranean Restaurant,Falafel Restaurant,Italian Restaurant
4,Bandstand Promenade,Chinese Restaurant,Fast Food Restaurant,Italian Restaurant,Indian Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant
...,...,...,...,...,...,...,...,...,...,...,...
68,Uttan,Indian Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant,Falafel Restaurant
69,Vidyavihar,Restaurant,Fast Food Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant
70,Vile Parle,Indian Restaurant,Japanese Restaurant,Seafood Restaurant,South Indian Restaurant,Goan Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant
71,Walkeshwar,Fast Food Restaurant,Indian Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant


In [61]:
k_num_clusters = 3

resto_grouped_clustering = resto_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=k_num_clusters, random_state=0).fit(resto_grouped_clustering)
kmeans

KMeans(n_clusters=3, random_state=0)

In [62]:
kmeans.labels_

array([0, 0, 1, 1, 1, 0, 1, 0, 1, 2, 0, 2, 2, 2, 0, 0, 0, 2, 1, 1, 0, 0,
       1, 1, 0, 2, 1, 2, 0, 1, 2, 1, 2, 1, 2, 0, 1, 1, 0, 2, 1, 1, 1, 1,
       0, 2, 1, 1, 1, 1, 1, 1, 1, 2, 2, 0, 2, 1, 2, 1, 0, 1, 1, 2, 0, 1,
       1, 1, 0, 2, 0, 1, 1], dtype=int32)

In [63]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [110]:
neighborhoods_venues_sorted.head()

Unnamed: 0,Cluster Labels,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,Agripada,Indian Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant,Falafel Restaurant
1,0,Altamount Road,Indian Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant,Falafel Restaurant
2,1,Amboli,Halal Restaurant,Asian Restaurant,Chinese Restaurant,Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Bengali Restaurant,Dim Sum Restaurant,Falafel Restaurant
3,1,Amrut Nagar,Indian Restaurant,Fast Food Restaurant,Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Chinese Restaurant,Mediterranean Restaurant,Falafel Restaurant,Italian Restaurant
4,1,Bandstand Promenade,Chinese Restaurant,Fast Food Restaurant,Italian Restaurant,Indian Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant


In order to plot these clustered neighbourhoods, we need to add thr coordinates of the neighbourhoods.

The coordinates are in the dataframe 'df'. 

In [64]:
df

Unnamed: 0,Area,Location,Latitude,Longitude
0,Amboli,"Andheri,Western Suburbs",19.129300,72.843400
1,"Chakala, Andheri",Western Suburbs,19.111388,72.860833
2,D.N. Nagar,"Andheri,Western Suburbs",19.124085,72.831373
3,Four Bungalows,"Andheri,Western Suburbs",19.124714,72.827210
4,Lokhandwala,"Andheri,Western Suburbs",19.130815,72.829270
...,...,...,...,...
88,Parel,South Mumbai,18.990000,72.840000
89,Gowalia Tank,"Tardeo,South Mumbai",18.962450,72.809703
90,Dava Bazaar,South Mumbai,18.946882,72.831362
91,Dharavi,Mumbai,19.040208,72.850850


Renaming the 'area' column in df to join with the neighbourhood column of the 'neighborhoods_venues_sorted' table.

In [72]:
df = df.rename(columns = {'Area' : 'Neighbourhood'})

In [73]:
df

Unnamed: 0,Neighbourhood,Location,Latitude,Longitude
0,Amboli,"Andheri,Western Suburbs",19.129300,72.843400
1,"Chakala, Andheri",Western Suburbs,19.111388,72.860833
2,D.N. Nagar,"Andheri,Western Suburbs",19.124085,72.831373
3,Four Bungalows,"Andheri,Western Suburbs",19.124714,72.827210
4,Lokhandwala,"Andheri,Western Suburbs",19.130815,72.829270
...,...,...,...,...
88,Parel,South Mumbai,18.990000,72.840000
89,Gowalia Tank,"Tardeo,South Mumbai",18.962450,72.809703
90,Dava Bazaar,South Mumbai,18.946882,72.831362
91,Dharavi,Mumbai,19.040208,72.850850


# Joining the two tables.

In [75]:
resto_merged = df.join(neighborhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

resto_merged

Unnamed: 0,Neighbourhood,Location,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Amboli,"Andheri,Western Suburbs",19.129300,72.843400,1.0,Halal Restaurant,Asian Restaurant,Chinese Restaurant,Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Bengali Restaurant,Dim Sum Restaurant,Falafel Restaurant
1,"Chakala, Andheri",Western Suburbs,19.111388,72.860833,2.0,Restaurant,Fast Food Restaurant,Asian Restaurant,Falafel Restaurant,Seafood Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Bengali Restaurant
2,D.N. Nagar,"Andheri,Western Suburbs",19.124085,72.831373,0.0,Indian Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant,Falafel Restaurant
3,Four Bungalows,"Andheri,Western Suburbs",19.124714,72.827210,2.0,Vegetarian / Vegan Restaurant,Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant,Falafel Restaurant
4,Lokhandwala,"Andheri,Western Suburbs",19.130815,72.829270,0.0,Indian Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant,Falafel Restaurant
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
88,Parel,South Mumbai,18.990000,72.840000,1.0,Asian Restaurant,Indian Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant,Falafel Restaurant
89,Gowalia Tank,"Tardeo,South Mumbai",18.962450,72.809703,2.0,Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant,Falafel Restaurant
90,Dava Bazaar,South Mumbai,18.946882,72.831362,1.0,Indian Restaurant,Restaurant,American Restaurant,Middle Eastern Restaurant,Fast Food Restaurant,South Indian Restaurant,Goan Restaurant,Afghan Restaurant,Asian Restaurant,Bengali Restaurant
91,Dharavi,Mumbai,19.040208,72.850850,2.0,Fast Food Restaurant,South Indian Restaurant,Seafood Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant,Falafel Restaurant


Dropping null values from the merged table.

In [76]:
resto_merged_nonan = resto_merged.dropna(subset=['Cluster Labels'])

In [92]:
resto_merged_nonan

Unnamed: 0,Neighbourhood,Location,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Amboli,"Andheri,Western Suburbs",19.129300,72.843400,1.0,Halal Restaurant,Asian Restaurant,Chinese Restaurant,Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Bengali Restaurant,Dim Sum Restaurant,Falafel Restaurant
1,"Chakala, Andheri",Western Suburbs,19.111388,72.860833,2.0,Restaurant,Fast Food Restaurant,Asian Restaurant,Falafel Restaurant,Seafood Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Bengali Restaurant
2,D.N. Nagar,"Andheri,Western Suburbs",19.124085,72.831373,0.0,Indian Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant,Falafel Restaurant
3,Four Bungalows,"Andheri,Western Suburbs",19.124714,72.827210,2.0,Vegetarian / Vegan Restaurant,Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant,Falafel Restaurant
4,Lokhandwala,"Andheri,Western Suburbs",19.130815,72.829270,0.0,Indian Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant,Falafel Restaurant
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
88,Parel,South Mumbai,18.990000,72.840000,1.0,Asian Restaurant,Indian Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant,Falafel Restaurant
89,Gowalia Tank,"Tardeo,South Mumbai",18.962450,72.809703,2.0,Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant,Falafel Restaurant
90,Dava Bazaar,South Mumbai,18.946882,72.831362,1.0,Indian Restaurant,Restaurant,American Restaurant,Middle Eastern Restaurant,Fast Food Restaurant,South Indian Restaurant,Goan Restaurant,Afghan Restaurant,Asian Restaurant,Bengali Restaurant
91,Dharavi,Mumbai,19.040208,72.850850,2.0,Fast Food Restaurant,South Indian Restaurant,Seafood Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant,Falafel Restaurant


Now, we will show the different clusters on the map.


In [77]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(k_num_clusters)
ys = [i + x + (i*x)**2 for i in range(k_num_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(resto_merged_nonan['Latitude'], resto_merged_nonan['Longitude'], resto_merged_nonan['Neighbourhood'], resto_merged_nonan['Cluster Labels']):
    label = folium.Popup('Cluster ' + str(int(cluster) +1) + '\n' + str(poi) , parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)]
        ).add_to(map_clusters)
        
map_clusters

In [99]:
cluster1 = resto_merged_nonan.loc[resto_merged_nonan['Cluster Labels'] == 0, resto_merged_nonan.columns[[0] + [1]+ list(range(5, resto_merged_nonan.shape[1]))]]

In [100]:
cluster1.shape

(20, 12)

In [101]:
cluster1

Unnamed: 0,Neighbourhood,Location,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,D.N. Nagar,"Andheri,Western Suburbs",Indian Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant,Falafel Restaurant
4,Lokhandwala,"Andheri,Western Suburbs",Indian Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant,Falafel Restaurant
11,Uttan,"Mira-Bhayandar,Western Suburbs",Indian Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant,Falafel Restaurant
19,Bangur Nagar,"Goregaon,Western Suburbs",Indian Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant,Falafel Restaurant
21,Juhu,Western Suburbs,Indian Restaurant,Japanese Restaurant,Seafood Restaurant,South Indian Restaurant,Goan Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant
35,Vile Parle,Western Suburbs,Indian Restaurant,Japanese Restaurant,Seafood Restaurant,South Indian Restaurant,Goan Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant
39,Pant Nagar,"Ghatkopar,Eastern Suburbs",Indian Restaurant,Vegetarian / Vegan Restaurant,Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant
42,Nahur,"Mulund,Eastern Suburbs",Indian Restaurant,Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant
45,Indian Institute of Technology Bombay campus,"Powai,Eastern Suburbs",Indian Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant,Falafel Restaurant
48,Chembur,Harbour Suburbs,Indian Restaurant,Fast Food Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant


In [102]:
cluster2 = resto_merged_nonan.loc[resto_merged_nonan['Cluster Labels'] == 1, resto_merged_nonan.columns[[0] + [1] + list(range(5, resto_merged_nonan.shape[1]))]]

In [103]:
cluster2.shape

(35, 12)

In [105]:
cluster2

Unnamed: 0,Neighbourhood,Location,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Amboli,"Andheri,Western Suburbs",Halal Restaurant,Asian Restaurant,Chinese Restaurant,Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Bengali Restaurant,Dim Sum Restaurant,Falafel Restaurant
5,Marol,"Andheri,Western Suburbs",Indian Restaurant,Restaurant,Asian Restaurant,Chinese Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Bengali Restaurant,Dim Sum Restaurant
12,Bandstand Promenade,"Bandra,Western Suburbs",Chinese Restaurant,Fast Food Restaurant,Italian Restaurant,Indian Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant
13,Kherwadi,"Bandra,Western Suburbs",Indian Restaurant,Vegetarian / Vegan Restaurant,Asian Restaurant,Chinese Restaurant,German Restaurant,Seafood Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Bengali Restaurant
16,Gorai,"Borivali (West),Western Suburbs",Indian Restaurant,Seafood Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant
17,Dahisar,Western Suburbs,Restaurant,Indian Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant
20,Jogeshwari West,Western Suburbs,Fast Food Restaurant,Indian Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant
24,Mahavir Nagar,"Kandivali West,Western Suburbs",Indian Restaurant,Fast Food Restaurant,Italian Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant
25,Thakur village,"Kandivali East,Western Suburbs",Indian Restaurant,Restaurant,Japanese Restaurant,Fast Food Restaurant,South Indian Restaurant,Goan Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant
26,Pali Naka,"Khar,Western Suburbs",Indian Restaurant,Seafood Restaurant,Restaurant,Asian Restaurant,Chinese Restaurant,French Restaurant,South Indian Restaurant,Goan Restaurant,Afghan Restaurant,American Restaurant


In [107]:
cluster3 = resto_merged_nonan.loc[resto_merged_nonan['Cluster Labels'] == 2, resto_merged_nonan.columns[[0] + [1] + list(range(5, resto_merged_nonan.shape[1]))]]

In [108]:
cluster3.shape

(18, 12)

In [109]:
cluster3

Unnamed: 0,Neighbourhood,Location,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,"Chakala, Andheri",Western Suburbs,Restaurant,Fast Food Restaurant,Asian Restaurant,Falafel Restaurant,Seafood Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Bengali Restaurant
3,Four Bungalows,"Andheri,Western Suburbs",Vegetarian / Vegan Restaurant,Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant,Falafel Restaurant
7,Seven Bungalows,"Andheri,Western Suburbs",Chinese Restaurant,Seafood Restaurant,South Indian Restaurant,North Indian Restaurant,Dim Sum Restaurant,Indian Restaurant,Goan Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant
9,Mira Road,"Mira-Bhayandar,Western Suburbs",Vegetarian / Vegan Restaurant,Chinese Restaurant,Seafood Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Dim Sum Restaurant,Falafel Restaurant,Fast Food Restaurant
14,Pali Hill,"Bandra,Western Suburbs",Fast Food Restaurant,Italian Restaurant,New American Restaurant,Middle Eastern Restaurant,Indian Restaurant,Greek Restaurant,South Indian Restaurant,German Restaurant,Afghan Restaurant,American Restaurant
15,I.C. Colony,"Borivali (West),Western Suburbs",Chinese Restaurant,Fast Food Restaurant,Indian Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Dim Sum Restaurant
22,Charkop,"Kandivali West,Western Suburbs",Chinese Restaurant,South Indian Restaurant,Seafood Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Dim Sum Restaurant,Falafel Restaurant,Fast Food Restaurant
28,Dindoshi,"Malad,Western Suburbs",Restaurant,Fast Food Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant
32,Nalasopara,"Vasai,Western Suburbs",Seafood Restaurant,South Indian Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant,Falafel Restaurant,Fast Food Restaurant
43,Chandivali,"Powai,Eastern Suburbs",Indian Chinese Restaurant,South Indian Restaurant,Greek Restaurant,Afghan Restaurant,American Restaurant,Asian Restaurant,Bengali Restaurant,Chinese Restaurant,Dim Sum Restaurant,Falafel Restaurant


## Observations:

As we can observe, cluster1 has a dominance of Indian restaurants with it being the most common venue followed by South Indian restaurants. Hence, anyone planning to open an Indian or South Indian restaurant in these regions will not be profitable as there would be intense competition already.
Cluster2 has a mix of different restaurants so any restaurant you open in those regions will result in good enough competition, but it would not be the worst decision to open a restaurant here.

Cluster3, as we can see, is dominated by resturants that are not Indian or south Indian. Hence, it would be profitable if someone opens an Indian restuarant here or even South Indian. The neighbourhood in this cluster has enough of Chinese and italian and seafood restaurant so it would not be advisable to invest in those kinds of restaurants.

In a nutshell, we can say the following overall:
    i) Neighbourhoods in cluster1 - Any restaurant but Indian or South Indian.
    ii) Neighbourhoods in cluster2 - Every restaurant will have competition already. Still, no Indian restaurant as they are still in plenty here.
    iii) Neighbourhoods in cluster3 - Indian restaurants are not much here. So, they will have good odds of making a profit here. Chinese restauras are already there in abundance so it would be good to avoid establishing another one here.

## Acknowledgement

Conclusion:
I have completed the process of identifying the business problems, mentioning the data required, preprocessing the data, visualizing the results, performing machine learning by k-means, clustering the data into 3 clusters based on their frequency similarities, and then reaching to a definitive solution to the business problems. the aim of the project is to provide recommendations to the relevant investors regarding the best kinds of restaurants that can be opened in a given location, or to find the best location to open a definite kind of a restaurant . 