# Finding best place to open Mughlai Restaurant in Kolkata

## Introduction

### Background

Kolkata, formerly known as Calcutta, is the educational, commercial and cultural centre of the Eastern part of India, and is the third most populous metropolitan city of India. Kolkata is a pioneer in the field of drama, arts, theatre and literature with several nobel laureates contributing to the Kolkata culture. But one thing people seems to forget about is the food of Kolkata. Somewhere between Rossogollas and lip-smacking fish preparations, many rich Bengali delicacies go unnoticed. And one such delicacy is the Kolkata Biryani. One can also blame the greater popularity of its southern counterpart - Hyderabadi Biryani, which might have for long prevented the Kolkata Biryani to flourish in all its glory. But Kolkata Biryani is slowly working its way out of the canals of the City of Joy and gaining the due recognition it has long deserved.

### Business Problem

The aim of this project is to find out the best place in Kolkata to open a Mughlai Restaurant. In this project, leveraging venue data from Foursquare's 'Places AP' and 'k-means clustering' unsupervised machine learning algorithm, we will try to answer the question if someone want to open a Mughlai Restaurant in Kolkata which is the best best for it.

## Data

For this project, we need to have the below data:

* The List of the Neighborhood in Kolkata, India. This will help us to narrow down the place to a specific location to open the new restaurant.

* The Longitude and Latitude cordinates of the Kolkata Neighborhoods. This will help us to plot and visulaize each location on the map of Kolkata.

* The data about the venues in these nighborhood, precisely related Mughlai Restaurant which will help us in clustering of neighborhoods.

### Data Sources

1. We have collected the neighborhood data of Kolkata from Wikipedia using web scrapping. Then we have collected the latitude and longitude coodinates usin Python Geocoder Package.
2. For the venue data we have used Foursquare API to make RESTful API calls to retrieve data about venues in different neighborhoods.

## Methodology

In this section we will download the required data using web scrapping and Foursquare API. We will analyze the data and perform K-means clustering find the best place to open a Mughlai Restaurant in Kolkata.

In the below cell I have downloaded and imported all the required Python packages and libraries which we will require to perform analysis and clustering.

In [2]:
## Downloading and Importing all the reuired python packages

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

#!pip3 install geocoder
import geocoder

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

from bs4 import BeautifulSoup

#!pip3 install folium==0.5.0 #--download and install folium package and then comment it out before re running the code block
import folium # map rendering library

print('All Libraries have been imported.')

All Libraries has been imported.


To kick off this project, we require the actual geographical co-ordinates of the city, Kolkata. We can look this up in the web but I have used geocoder to extract the actual geographical co-ordinates of Kolkata.

In [3]:
city = 'Kolkata, IN'

geolocator = Nominatim(user_agent="kol_explorer")
location = geolocator.geocode(city)
latitude = location.latitude
longitude = location.longitude
print('The polar coordinates of Kolkata are {}, {}.'.format(latitude, longitude))

The polar coordinates of Kolkata are 22.5414185, 88.35769124388872.


### Fetching Neighborhood details of Kolkata from Wikipedia using Web scrapping

Using BeautifulSoup and request we have extracted the neighborhood details of Kolkata city.

In [18]:
# Send the GET request
data = requests.get("https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Kolkata").text
# Parse data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')
# Create a list to store neighbourhood data
neighborhoodList = []
# Append the data into the list
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
  neighborhoodList.append(row.text)
# Create a new DataFrame from the list
kol_df = pd.DataFrame({"Neighborhood": neighborhoodList})
kol_ndf = kol_df.iloc[1:]
print(kol_ndf)

                        Neighborhood
1                         Abhirampur
2                           Agarpara
3                         Ajoy Nagar
4                            Alipore
5                          Amodghata
6                             Amtala
7                 Anandapur, Kolkata
8                              Andul
9                          Ankurhati
10                            Argari
11                          Ariadaha
12                             Asuti
13                     B. B. D. Bagh
14                          Babughat
15                         Badartala
16                          Bagbazar
17                        Baghajatin
18                          Baguiati
19                        Baidyabati
20                      Balaram Pota
21           Balarampur, Budge Budge
22             Bally, Bally-Jagachha
23                     Bally, Howrah
24                        Ballygunge
25          Ballygunge Circular Road
26                       Bamangachhi
2

This is the extracted data from Wikipedia about neighborhood of Kolkata. From wikipedia we only able to get the neighbourhod names but thats not sufficient. We need the geographical co-ordinates of each places in order to use this data with Foursquare API.

In [19]:
# Defining a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Kolkata, India'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

# Call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(neighborhood) for neighborhood in kol_ndf["Neighborhood"].tolist()]

# Create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])
# Merge the coordinates into the original dataframe
kol_ndf['Latitude'] = df_coords['Latitude']
kol_ndf['Longitude'] = df_coords['Longitude']
print(kol_ndf)

                        Neighborhood   Latitude  Longitude
1                         Abhirampur  22.684050  88.391650
2                           Agarpara  22.489660  88.396400
3                         Ajoy Nagar  22.526600  88.335100
4                            Alipore  22.988010  88.388380
5                          Amodghata  22.505220  88.399030
6                             Amtala  22.514410  88.410320
7                 Anandapur, Kolkata  22.570530  88.371240
8                              Andul  22.610380  88.240010
9                          Ankurhati  22.570530  88.371240
10                            Argari  22.666470  88.366150
11                          Ariadaha  22.472170  88.255460
12                             Asuti  22.567630  88.344530
13                     B. B. D. Bagh  22.567290  88.341060
14                          Babughat  22.555080  88.246843
15                         Badartala  22.604020  88.366370
16                          Bagbazar  22.483950  88.3754

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Using geocoder package we are able to extract the details of geographical co ordinates of each location and created a dataframe.

In [20]:
kol_fdf = kol_ndf[:-1]
kol_fdf

Unnamed: 0,Neighborhood,Latitude,Longitude
1,Abhirampur,22.68405,88.39165
2,Agarpara,22.48966,88.3964
3,Ajoy Nagar,22.5266,88.3351
4,Alipore,22.98801,88.38838
5,Amodghata,22.50522,88.39903
6,Amtala,22.51441,88.41032
7,"Anandapur, Kolkata",22.57053,88.37124
8,Andul,22.61038,88.24001
9,Ankurhati,22.57053,88.37124
10,Argari,22.66647,88.36615


After viwing the data we have found one NAN value, so we have dropped that place from the data and created our final neighborhod dataset.

In [7]:
# create map of Kolkata using latitude and longitude values
map_kolkata = folium.Map(location=[latitude, longitude], zoom_start=10.5)

# add markers to map
for lat, lng, name in zip(kol_fdf['Latitude'], kol_fdf['Longitude'], kol_fdf['Neighborhood']):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_kolkata)  
    
map_kolkata

Using folium we have visulaized the map of Kolkata city and its neighborhoods.

### Exploring the Neighborhoods of Kolkata using Foursquare API

In [33]:
# The code was removed by Watson Studio for sharing.

We have taken the radius as 2 KM and limit as 100 to extract the venues and their details in the each neighborhood of Kolkata using Foursquare API.

In [36]:
radius = 2000 #2 KM
LIMIT = 100
venues = []
for lat, long, neighborhood in zip(kol_fdf['Latitude'], kol_fdf['Longitude'], kol_fdf['Neighborhood']):
    # Create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(CLIENT_ID,CLIENT_SECRET,VERSION,lat,long,radius,LIMIT)
    # Make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    # Return only relevant information for each nearby venue
    for venue in results:
        venues.append((neighborhood,lat,long,venue['venue']['name'],
        venue['venue']['location']['lat'],venue['venue']['location']['lng'],
        venue['venue']['categories'][0]['name']))

In [37]:
venues_df = pd.DataFrame(venues)
# Defining the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']
print(venues_df.shape)
venues_df.head()

(4285, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Abhirampur,22.68405,88.39165,Agarpara Railway Station,22.682886,88.385364,Train Station
1,Abhirampur,22.68405,88.39165,MedPlus,22.694171,88.402894,Pharmacy
2,Abhirampur,22.68405,88.39165,Shamvu da's tea shop,22.694959,88.379661,Bakery
3,Abhirampur,22.68405,88.39165,Events Bengal,22.694172,88.404686,Event Service
4,Abhirampur,22.68405,88.39165,MedPlus,22.700223,88.385307,Pharmacy


After extracting the venue details we have created the venues dataframe which has 4285 observations and 7 features.

In [40]:
# Lets check how many venues were returned for each neighbourhood
venues_df.groupby(["Neighborhood"]).count()
# Lets check out how many unique categories can be curated from all the returned values
print('There are {} unique categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 147 unique categories.


We are able to extract 147 unique venue catagories like Bakery, Pharmacy and as well as Mughlai Restaurant.

### Analyzing Neighborhoods

In [41]:
# One hot encoding
kol_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")
# Adding neighborhood column back to dataframe
kol_onehot['Neighborhoods'] = venues_df['Neighborhood']
# Moving neighbourhood column to the first column
fixed_columns = [kol_onehot.columns[-1]] + list(kol_onehot.columns[:-1])
kol_onehot = kol_onehot[fixed_columns]
print(kol_onehot.shape)

(4285, 148)


Aftre applying one-hot encoding we have managed to get 4285 observation and 148 features.

In [42]:
kol_grouped=kol_onehot.groupby(["Neighborhoods"]).sum().reset_index()
print(kol_grouped.shape)
kol_grouped

(184, 148)


Unnamed: 0,Neighborhoods,ATM,Airport,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Awadhi Restaurant,BBQ Joint,Bakery,Bank,Bar,Beer Bar,Beer Garden,Bengali Restaurant,Bistro,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Alley,Breakfast Spot,Brewery,Burger Joint,Bus Station,Bus Stop,Business Service,Café,Chinese Restaurant,Clothing Store,Coffee Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cricket Ground,Deli / Bodega,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store,Event Service,Falafel Restaurant,Fast Food Restaurant,Field,Film Studio,Flea Market,Food,Food & Drink Shop,Food Court,Food Truck,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gastropub,Gift Shop,Golf Course,Grocery Store,Gym,Harbor / Marina,Health & Beauty Service,Historic Site,History Museum,Hookah Bar,Hostel,Hotel,Hotel Pool,IT Services,Ice Cream Shop,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Indie Theater,Irish Pub,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kerala Restaurant,Light Rail Station,Lounge,Market,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Military Base,Mobile Phone Shop,Motorcycle Shop,Movie Theater,Mughlai Restaurant,Multicuisine Indian Restaurant,Multiplex,Museum,Music Venue,Neighborhood,Nightclub,North Indian Restaurant,Northeast Indian Restaurant,Optical Shop,Park,Performing Arts Venue,Pharmacy,Pizza Place,Planetarium,Platform,Playground,Plaza,Pool,Port,Pub,Racetrack,Residential Building (Apartment / Condo),Resort,Restaurant,River,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shoe Store,Shopping Mall,Snack Place,Soccer Stadium,South Indian Restaurant,Spa,Sports Club,Stadium,Steakhouse,Supermarket,Taxi Stand,Tea Room,Tex-Mex Restaurant,Thai Restaurant,Theme Park,Theme Restaurant,Tibetan Restaurant,Toll Booth,Train Station,Vegetarian / Vegan Restaurant,Watch Shop,Zoo
0,Abhirampur,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0
1,Agarpara,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Ajoy Nagar,0,0,0,0,0,1,0,0,0,1,0,2,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,6,1,0,0,0,0,0,0,0,0,1,2,0,0,0,0,0,2,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,3,1,0,1,1,1,0,0,0,1,0,0,0,1,0,1,1,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,1,0,2,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,2,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1
3,Alipore,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Amodghata,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,1,1,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
5,Amtala,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,1,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,2,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
6,"Anandapur, Kolkata",0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,3,0,0,1,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,2,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,2,0,1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0
7,Andul,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
8,Ankurhati,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,3,0,0,1,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,2,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,2,0,1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0
9,Argari,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0


In [43]:
len((kol_grouped[kol_grouped["Mughlai Restaurant"] > 0]))

67

By taking the sum of occurance of each catagory we have grouped the rows of the neighborhood and We have found there are 67 Mughlai resturant in Kolkata which quite high for a Mughlai restaurant number in a city. Now we have to find a place where the no of Mughlai resturant low or moderate so that we can set up the new restaurant there and can operate without much comptiion with other Mughlai Resturants.

### Clustering Neighborhoods

In [47]:
# Creating a dataframe for Mughlai Restaurant data only
kol_rest = kol_grouped[["Neighborhoods","Mughlai Restaurant"]]

We have created a new dataframe just only selecting Neighborhood and Mughlai Restaurant Feature as these two only needed for further analysis.

Now we have to cluster these neighborhood to find the number of Mughlai restaurant in each cluster. From there we can identify the clusters having lower number of Mughlai Restaurant, so that we can set up a restaurant there. 

In [48]:
# Setting the number of clusters
kclusters = 5
kol_clustering = kol_rest.drop(["Neighborhoods"], 1)
# Run k-means clustering algorithm
kmeans = KMeans(n_clusters=kclusters,random_state=0).fit(kol_clustering)
# Checking cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 2, 0, 2, 2, 2, 0, 2, 0], dtype=int32)

We have taken the number of clusters as 5 and applied clustering to segregate the neighborhoods and form different clusters.

In [49]:
# Creating a new dataframe that includes the cluster as well as the top 5 venues for each neighborhood.
kol_merged = kol_rest.copy()
# Add the clustering labels
kol_merged["Cluster Labels"] = kmeans.labels_
kol_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
kol_merged.head(5)

Unnamed: 0,Neighborhood,Mughlai Restaurant,Cluster Labels
0,Abhirampur,0,0
1,Agarpara,0,0
2,Ajoy Nagar,1,2
3,Alipore,0,0
4,Amodghata,1,2


In this view we can see the number of Mughlai Restaurant in a particular neighborhood and cluster lebels.

In [53]:
# Adding latitude and longitude values to the existing dataframe
kol_merged['Latitude'] = kol_fdf['Latitude']
kol_merged['Longitude'] = kol_fdf['Longitude']
# Sorting the results by Cluster Labels
kol_merged.sort_values(["Cluster Labels"], inplace=True)
kol_merged = kol_merged.iloc[1:]
kol_merged

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  from ipykernel import kernelapp as app
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  app.launch_new_instance()
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Unnamed: 0,Neighborhood,Mughlai Restaurant,Cluster Labels,Latitude,Longitude
105,"Durganagar, Kolkata",0,0,22.51084,88.37258
106,Duttapukur,0,0,22.56958,88.34257
107,East Kolkata,0,0,22.53186,88.32676
112,"Fort William, India",0,0,22.61993,88.39418
113,Ganye Gangadharpur,0,0,22.60947,88.41606
114,Garden Reach,0,0,22.56987,88.35171
104,"Dunlop, Kolkata",0,0,22.93472,88.37143
115,Garfa,0,0,22.65041,88.41566
118,Garshyamnagar,0,0,22.53529,88.32268
119,Gayespur,0,0,22.57053,88.37124


We added the geographical data of each neighborhood and sorted the dataframe using the cluster levels. Using this we can visualize the cluster in the map of Kolkata.

In [54]:
# Creating the map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)
# Setting color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
# Add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(kol_merged['Latitude'], kol_merged['Longitude'], kol_merged['Neighborhood'], kol_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker([lat,lon],radius=5,popup=label,color=rainbow[cluster-1],fill=True,fill_color=rainbow[cluster-1],fill_opacity=0.7).add_to(map_clusters)
map_clusters

We have visualized the clusters on the map of Kolkata and we can say that cluster zero has more occurance than any other clusters.

In [57]:
print(len(kol_merged.loc[kol_merged['Cluster Labels'] == 0]))
print(len(kol_merged.loc[kol_merged['Cluster Labels'] == 1]))
print(len(kol_merged.loc[kol_merged['Cluster Labels'] == 2]))
print(len(kol_merged.loc[kol_merged['Cluster Labels'] == 3]))
print(len(kol_merged.loc[kol_merged['Cluster Labels'] == 4]))


115
15
40
4
8


Utimately we have collected the results of our clustering.

## Results

From the result of the K-Means clustering we can cluster the neighborhoods into 5 clusters:
* Cluster 0: It catagories the neighborhoods which are having zero/ verylow number of Mughlai Restaurant.
* Cluster 1: It catagories the neighborhoods which are having low number of Mughlai Restaurant.
* Cluster 2: It catagories the neighborhoods which are having moderate number of Mughlai Restaurant.
* Cluster 3: It catagories the neighborhoods which are having high number of Mughlai Restaurant.
* Cluster 4: It catagories the neighborhoods which are having very high number of Mughlai Restaurant.

We have extracted the length of each cluster and cluster 0 has the largest number of neighborhoods where the number of Mughlai Restaurant is very low to none.
    

## Discussion

Cluster 1 and cluster 2 are also having neighborhoods with low to moderate number of Mughlai Restaurant but compare to Cluster 0 the number of neighborhoods are low. But we cannot discard those clsuters as having lower number of Mughlai Restaurant the people of those neighborhoods have a taste of Mughlai. If we want we can turn the favor towards us and make a profit by opening a Mughlai Resturant there also.

## Conclusion

Even though overall Kolkata has large number of Mughlai Restaurant, cluster 0 having lower number of Mughlai Restaurant shows a great opportunity to build a successfull business and make a profit out of it. As the trend is going, the hunger for Mughlai cusine is keep increasing among the people of the city Kolkata. So if anyone wants taking the advantage of lower number of Mughlai Restaurant in cluster 0 can set up a Mughlai Restaurant. Just keep in mind that we have used the number or the occurance of Mughlai resturant and analyzed and catagorized the data to come up with the results. There are other factors like cost of the project, population of the area, the average income of the population, type of shop like take away or in house sitting,  can be added to more precise prediction.

## Thank you