<a href="https://colab.research.google.com/github/kshbds/Coursera_Capstone/blob/master/Capstone_The_Battle_of_neighborhoods.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Capstone Project: The Battle of the Neighborhoods
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

The main objective of this project is to determine a location in city Kolkata suitable for opening a new restaurant. Here we will determine the location based on mainly two criteria: The location should be close to the centre of the City and there should be very few or no restaurants nearby.  
  
We will use our data science methodogies and analysis techniques to generate a few most promissing locations based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

## Data <a name="data"></a>

Based on the definition of our business problem we will be requiring the data of different locations of city Kolkata. To find out the prime and popular locations of Kolkata based on their pin code we have used the below website resources:  
https://www.mapsofindia.com/  
http://pincode.india-server.com/  
  
From the above resources we have created an CSV file of all popular locations of Kolkata along with Pin codes and stored the data in google drive that it can be used easily in our project. The link to the CSV file is:
https://drive.google.com/uc?export=download&id=1qCRjM5S6RzSNFT1MvUAszuBTxBlSZEvI
  
Besides the above data, following data sources will be needed to extract/generate the required information:  
•	Nomination Geocoder to get the latitude and longitude information of all the locations including the centre of the City Kolkata.  
•	Foursquare API to get all the restaurants located around the different locations of City Kolkata.



_Install and Import required libraries_

In [0]:
# import pandas
import pandas as pd

# import folium
import folium

# import nomination for geocoder
from geopy.geocoders import Nominatim

# library to add delay time while processing
import time

# library to handle requests
import requests 

# import math library
from math import radians, cos, sin, asin, sqrt

# import numpy
import numpy as np


_Download the location details file with pin code details for city Kolkata_

In [0]:
# download the location details file from my own google drive
!wget -O location_kol.csv "https://drive.google.com/uc?export=download&id=1qCRjM5S6RzSNFT1MvUAszuBTxBlSZEvI"

--2020-04-20 12:49:49--  https://drive.google.com/uc?export=download&id=1qCRjM5S6RzSNFT1MvUAszuBTxBlSZEvI
Resolving drive.google.com (drive.google.com)... 173.194.214.138, 173.194.214.100, 173.194.214.139, ...
Connecting to drive.google.com (drive.google.com)|173.194.214.138|:443... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://doc-10-5k-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/p3brb47go26fm2gmdukmasvs33ml9q0h/1587386925000/05692303969036401461/*/1qCRjM5S6RzSNFT1MvUAszuBTxBlSZEvI?e=download [following]
--2020-04-20 12:49:49--  https://doc-10-5k-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/p3brb47go26fm2gmdukmasvs33ml9q0h/1587386925000/05692303969036401461/*/1qCRjM5S6RzSNFT1MvUAszuBTxBlSZEvI?e=download
Resolving doc-10-5k-docs.googleusercontent.com (doc-10-5k-docs.googleusercontent.com)... 172.217.193.132, 2607:f8b0:400c:c03::84
Connecting to doc-10-5k-docs.googleusercontent.com (d

_Load the location CSV file into a pandas dataframe and display the first 10 rows_

In [0]:
location_df = pd.read_csv('location_kol.csv')
location_df.head()

Unnamed: 0,City,PostalCode,Location
0,Kolkata,700020,A.J.C.Bose Road
1,Kolkata,700046,Abinash Chaowdhury Lane
2,Kolkata,700005,Ahritola
3,Kolkata,700035,Alambazar
4,Kolkata,700003,Amrita Bazar Partika


_Get the latitude & longitude details of the locations using Nomination._  
_Take the populated latitude & longitude values in a list and add to the dataframe._

In [0]:
# list for latitude and longitude values
latList = []
lngList = []

geolocator = Nominatim(user_agent="kol_explorer")

for ind in location_df.index:
    location = None
    location = geolocator.geocode(location_df['Location'][ind] + ', ' + location_df['City'][ind])
    if location is None:
        latList.append(" ")
        lngList.append(" ")
    else:
        latList.append(location.latitude)
        lngList.append(location.longitude)
        
    time.sleep(1)
    
location_df['Latitude'] = latList 
location_df['Longitude'] = lngList

location_df.head(10)

Unnamed: 0,City,PostalCode,Location,Latitude,Longitude
0,Kolkata,700020,A.J.C.Bose Road,,
1,Kolkata,700046,Abinash Chaowdhury Lane,,
2,Kolkata,700005,Ahritola,,
3,Kolkata,700035,Alambazar,,
4,Kolkata,700003,Amrita Bazar Partika,,
5,Kolkata,700007,Archana,,
6,Kolkata,700030,Ashokegarh,,
7,Kolkata,700014,Asylum Lane,,
8,Kolkata,700044,Badartala,,
9,Kolkata,700086,Baghajatin,22.484,88.3756


In [0]:
print('The dataframe has {} unique postal codes.'.format(len(location_df['PostalCode'].unique())))

The dataframe has 81 unique postal codes.


_Dropping the rows from the datafrmae for which the latitude/longitude is not populated._  
_Printing the numer of unique postal codes in datafrmae after the drop._

In [0]:
df = location_df[location_df.Latitude != " "].reset_index(drop=True)
print('The dataframe has {} unique postal codes.'.format(len(df['PostalCode'].unique())))

The dataframe has 63 unique postal codes.


In [0]:
df.head()

Unnamed: 0,City,PostalCode,Location,Latitude,Longitude
0,Kolkata,700086,Baghajatin,22.484,88.3756
1,Kolkata,700022,Bakery Road,22.5466,88.3275
2,Kolkata,700019,Ballygunge,22.5259,88.366
3,Kolkata,700019,Ballygunge Sc College,22.5259,88.366
4,Kolkata,700007,Barabazar,22.5902,88.3512


_Create a map of City Kolkata with all the prime locations superimposed on top._

In [0]:
city = 'Kolkata, West Bengal'

# get the latitude and longitude of Kolkata
location = geolocator.geocode(city)
latitude = location.latitude
longitude = location.longitude

# create map of Kolkata using latitude and longitude values
map_kolkata = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, location, city in zip(df['Latitude'], df['Longitude'], df['Location'], df['City']):
    label = '{}, {}'.format(location, city)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_kolkata)  

folium.Marker([latitude, longitude], popup='Kolkata').add_to(map_kolkata)   

map_kolkata

_We need to calculate the location's distance from the centre of city Kolkata and add the value to the dataframe that we can choose the best location close to the center of the City._

In [0]:
#method to calculate distance of location from Centre of the City

def calc_xy_distance(x1, y1, x2, y2):
    
    # radians which converts from degrees to radians. 
    lon1 = radians(y1) 
    lon2 = radians(y2) 
    lat1 = radians(x1) 
    lat2 = radians(x2)

    # Haversine formula
    dlon = lon2 - lon1  
    dlat = lat2 - lat1
    a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2

    c = 2 * asin(sqrt(a))

    # Radius of earth in kilometers. Use 3956 for miles 
    r = 6371

    # calculate the result 
    return(c * r)

In [0]:
# list for calculating distance from centre
dist = []

location = geolocator.geocode(city)
latitude = location.latitude
longitude = location.longitude

for ind in df.index:
    dist.append(calc_xy_distance(latitude, longitude, df['Latitude'][ind], df['Longitude'][ind]))

df['Distance from Centre'] = dist 

df.head(10)

Unnamed: 0,City,PostalCode,Location,Latitude,Longitude,Distance from Centre
0,Kolkata,700086,Baghajatin,22.484,88.3756,7.09744
1,Kolkata,700022,Bakery Road,22.5466,88.3275,3.013952
2,Kolkata,700019,Ballygunge,22.5259,88.366,2.371353
3,Kolkata,700019,Ballygunge Sc College,22.5259,88.366,2.371353
4,Kolkata,700007,Barabazar,22.5902,88.3512,5.011128
5,Kolkata,700036,Baranagar,22.6385,88.3804,10.634008
6,Kolkata,700018,Bartala,22.5822,88.3558,4.089925
7,Kolkata,700006,Beadon Street,22.5898,88.3605,4.955307
8,Kolkata,700034,Behala,22.4977,88.3177,6.656152
9,Kolkata,700010,Beleghata,22.5629,88.3963,4.494522


In [0]:
df.shape

(120, 6)

_Credentials and version for Foursquare API_

In [0]:
#@title Credentials and version for Foursquare API
# @hidden_cell

CLIENT_ID = '3VNQTPNXCIDP5QPK5Z3Z0AVQUDZSNX3BMKNMQFKJNCPCDCZR' # Foursquare ID
CLIENT_SECRET = 'TMGOIWOQ3250VDKAOU4ZTLDA2KJ1M521SSBP3JTHBLXCB1RZ' # Foursquare Secret

VERSION = '20200412' # Foursquare API version

_Function to explore nearby vanues of all the locations in Kolkata using Foursqure API._

In [0]:
LIMIT = 100

def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Location', 
                  'Location Latitude', 
                  'Location Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

_Use above function on each location and create a new dataframe called kolkata_venues._

In [0]:
kolkata_venues = getNearbyVenues(names=df['Location'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

kolkata_venues.head()

Unnamed: 0,Location,Location Latitude,Location Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Baghajatin,22.483994,88.375583,Cafe Coffee Day,22.490273,88.372894,Café
1,Baghajatin,22.483994,88.375583,State Bank of India ATM,22.485421,88.379349,ATM
2,Baghajatin,22.483994,88.375583,State Bank of India ATM,22.484447,88.382992,ATM
3,Baghajatin,22.483994,88.375583,Calcutta Bistro,22.491861,88.372135,Bistro
4,Bakery Road,22.546598,88.327455,Ordnance Club,22.550051,88.328683,Food


In [0]:
kolkata_venues.shape

(2226, 7)

_From all the venues in Kolkata we will filter out the restaurants data depeneding upon if it is a Restaurant/BBQ/Dhaba/Diner/Pizza place._  
_The new dataframe for all the restaurants will be kolkata_restaurant._

In [0]:
kolkata_restaurant = kolkata_venues[kolkata_venues['Venue Category'].str.contains('Restaurant|BBQ|Dhaba|Diner|Pizza')]
kolkata_restaurant.reset_index(inplace = True, drop = True)
kolkata_restaurant.head(5)

Unnamed: 0,Location,Location Latitude,Location Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Ballygunge,22.525881,88.366047,6 Ballygunge Place,22.527712,88.368677,Bengali Restaurant
1,Ballygunge,22.525881,88.366047,Chowman,22.526977,88.368509,Chinese Restaurant
2,Ballygunge,22.525881,88.366047,China Whites,22.52196,88.365482,Chinese Restaurant
3,Ballygunge,22.525881,88.366047,Cream Centre,22.527096,88.368444,Vegetarian / Vegan Restaurant
4,Ballygunge,22.525881,88.366047,Bohemian,22.529426,88.368692,Bengali Restaurant


_As we have taken radius as 1000m to get the venues around a location, there may be duplicate restaurants in our dataframe._  
_So removing the duplicate venues depending upon the shortest diatance of the venue from a location._

In [0]:
# Calculating distance of venue from each location and adding to dataframe
dist = []

for ind in kolkata_restaurant.index:
    dist.append(calc_xy_distance(kolkata_restaurant['Venue Latitude'][ind], kolkata_restaurant['Venue Longitude'][ind], kolkata_restaurant['Location Latitude'][ind], kolkata_restaurant['Location Longitude'][ind]))

kolkata_restaurant['Distance bw LocVenue'] = dist 

kolkata_restaurant.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,Location,Location Latitude,Location Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Distance bw LocVenue
0,Ballygunge,22.525881,88.366047,6 Ballygunge Place,22.527712,88.368677,Bengali Restaurant,0.338246
1,Ballygunge,22.525881,88.366047,Chowman,22.526977,88.368509,Chinese Restaurant,0.2807
2,Ballygunge,22.525881,88.366047,China Whites,22.52196,88.365482,Chinese Restaurant,0.439843
3,Ballygunge,22.525881,88.366047,Cream Centre,22.527096,88.368444,Vegetarian / Vegan Restaurant,0.280873
4,Ballygunge,22.525881,88.366047,Bohemian,22.529426,88.368692,Bengali Restaurant,0.47868


In [0]:
# Sort the dataframe by Distance bw LocVenue
kolkata_restaurant.sort_values(by='Distance bw LocVenue')

# Keeping the first row only as it is the nearest to the location 
kolkata_restaurant.drop_duplicates(subset=['Venue', 'Venue Latitude', 'Venue Longitude'], keep='first', inplace = True)
kolkata_restaurant.reset_index(inplace = True, drop = True)

kolkata_restaurant.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.


Unnamed: 0,Location,Location Latitude,Location Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Distance bw LocVenue
0,Ballygunge,22.525881,88.366047,6 Ballygunge Place,22.527712,88.368677,Bengali Restaurant,0.338246
1,Ballygunge,22.525881,88.366047,Chowman,22.526977,88.368509,Chinese Restaurant,0.2807
2,Ballygunge,22.525881,88.366047,China Whites,22.52196,88.365482,Chinese Restaurant,0.439843
3,Ballygunge,22.525881,88.366047,Cream Centre,22.527096,88.368444,Vegetarian / Vegan Restaurant,0.280873
4,Ballygunge,22.525881,88.366047,Bohemian,22.529426,88.368692,Bengali Restaurant,0.47868


In [0]:
kolkata_restaurant.shape

(206, 8)

_Create a map of City Kolkata with all the Restaurants superimposed on top._

In [0]:
# get the latitude and longitude of Kolkata
location = geolocator.geocode(city)
latitude = location.latitude
longitude = location.longitude

# create map of Kolkata using latitude and longitude values
map_kolkata = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, venue, location in zip(kolkata_restaurant['Venue Latitude'], kolkata_restaurant['Venue Longitude'], kolkata_restaurant['Venue'], kolkata_restaurant['Location']):
    label = '{}, {}'.format(venue, location)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='blue',
        fill_opacity=0.9,
        parse_html=False).add_to(map_kolkata)  
        
folium.Marker([latitude, longitude], popup='Kolkata').add_to(map_kolkata)    

map_kolkata

_It is looking pretty good. This is the end of our data section. We got all the restaurant's details from the city of Kolkata in dataframes._

## Methodology <a name="methodology"></a>

In this project we will mainly focus on detecting location of city Kolkata that have low restaurant count. As our desired location should be close to the centre of the city we will here consider the locatins having less than or equal to 5km disstance from the city centre.  

We already have collected all the restaurants's data based on every locations in the first step. Second step in our analysis will be calculation and exploration of 'restaurant count' across different locations of Kolkata.  

In third and final step we will focus on most promising locations and within those create clusters of locations that meet some basic requirements established in discussion with stakeholders: we will take into consideration locations with no more than two restaurants nearby and it should be within 5km of the centre of the city.

For clustering the considered locations here we will be using **k-means clustering**  and will create a total of 10 clusters.

## Analysis <a name="analysis"></a>

Now we will perform some basic explanatory data analysis and derive some additional info from our data. Let's count the number of restaurants in every location candidate:

_Get the Counts of restaurants in each location an create a new dataframe._

In [0]:
kolkata_restaurant_count = kolkata_restaurant['Location'].value_counts()

kolkata_restaurant_count = kolkata_restaurant_count.reset_index()
kolkata_restaurant_count.columns=['Location', 'Restaurant Count']
kolkata_restaurant_count.head()

Unnamed: 0,Location,Restaurant Count
0,Elliot Road,31
1,T.G.Road,21
2,Indian Research,13
3,Circus Avenue,12
4,Ballygunge,12


_Creating a final dataframe by merging dataframe df & kolkata_restaurant_count_

In [0]:
kolkata_restaurant_final = df[['Location', 'Latitude', 'Longitude', 'Distance from Centre']]
kolkata_restaurant_final = kolkata_restaurant_final.join(kolkata_restaurant_count.set_index('Location'), on='Location')
kolkata_restaurant_final

Unnamed: 0,Location,Latitude,Longitude,Distance from Centre,Restaurant Count
0,Baghajatin,22.484,88.3756,7.097440,
1,Bakery Road,22.5466,88.3275,3.013952,
2,Ballygunge,22.5259,88.366,2.371353,12.0
3,Ballygunge Sc College,22.5259,88.366,2.371353,
4,Barabazar,22.5902,88.3512,5.011128,1.0
...,...,...,...,...,...
115,Tollygunge,22.5002,88.349,5.084959,
116,Ultadanga Main Road,22.5953,88.3825,6.145030,
117,Ultadanga,22.5961,88.3853,6.353698,
118,Viveknagar (Kolkata),22.5007,88.375,5.312937,


_Replacing the NaN values with zeroes._

In [0]:
kolkata_restaurant_final.fillna(0, inplace=True)
kolkata_restaurant_final.head()

Unnamed: 0,Location,Latitude,Longitude,Distance from Centre,Restaurant Count
0,Baghajatin,22.483994,88.375583,7.09744,0.0
1,Bakery Road,22.546598,88.327455,3.013952,0.0
2,Ballygunge,22.525881,88.366047,2.371353,12.0
3,Ballygunge Sc College,22.525881,88.366047,2.371353,0.0
4,Barabazar,22.590188,88.351242,5.011128,1.0


_Let's now filter out the above dataframe to exclude the locations having more than 2 restaurants nearby and if the location is more than 5 km away from the centre of the city._

In [0]:
kolkata_restaurant_final = kolkata_restaurant_final[(kolkata_restaurant_final['Distance from Centre'] <= 5.0) & (kolkata_restaurant_final['Restaurant Count'] <= 2)]
kolkata_restaurant_final.head()

Unnamed: 0,Location,Latitude,Longitude,Distance from Centre,Restaurant Count
1,Bakery Road,22.546598,88.327455,3.013952,0.0
3,Ballygunge Sc College,22.525881,88.366047,2.371353,0.0
7,Beadon Street,22.58984,88.360548,4.955307,2.0
9,Beleghata,22.562861,88.396255,4.494522,0.0
17,Bowbazar (Kolkata),22.568269,88.365164,2.683557,0.0


_Run k-means to cluster the neighborhood into 10 clusters._

In [0]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 10
df_clustering = kolkata_restaurant_final.drop('Location', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(df_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([2, 9, 7, 1, 9, 7, 6, 2, 1, 9], dtype=int32)

_Add the cluster labels to the kolkata_restaurant_final dataframe and create a new dataframe to visualize our preferred locations for a restaurant._

In [0]:
top_locations = kolkata_restaurant_final

# add clustering labels
top_locations.insert(5, 'Cluster_Labels', kmeans.labels_)
top_locations.head()

Unnamed: 0,Location,Latitude,Longitude,Distance from Centre,Restaurant Count,Cluster_Labels
1,Bakery Road,22.546598,88.327455,3.013952,0.0,2
3,Ballygunge Sc College,22.525881,88.366047,2.371353,0.0,9
7,Beadon Street,22.58984,88.360548,4.955307,2.0,7
9,Beleghata,22.562861,88.396255,4.494522,0.0,1
17,Bowbazar (Kolkata),22.568269,88.365164,2.683557,0.0,9


In [0]:
top_locations.shape

(56, 6)

_Let's visualize the resulting clusters and that will be our final result._

In [0]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# get the latitude and longitude of Kolkata
location = geolocator.geocode(city)
latitude = location.latitude
longitude = location.longitude

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=14)


# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# few circles indicating distance of 1km, 2km and 3km from the centre of th City      
folium.Circle([latitude, longitude], radius=1000, color='green', fill=True, fill_opacity=0.1).add_to(map_clusters)
folium.Circle([latitude, longitude], radius=2000, color='green', fill=True, fill_opacity=0.1).add_to(map_clusters)
folium.Circle([latitude, longitude], radius=3000, color='green', fill=True, fill_opacity=0.1).add_to(map_clusters)

# add markers to the map
markers_colors = []
for lat, lon, loc, cluster in zip(top_locations['Latitude'], top_locations['Longitude'], top_locations['Location'], top_locations['Cluster_Labels']):
    label = folium.Popup(str(loc) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

# display the centre of th City
folium.Marker([latitude, longitude], popup='Kolkata').add_to(map_clusters)  

map_clusters

_From the above map we can see that for cluster_label '5' all the locations are nearest to the centre._  
_Let's display all the locations for cluster_label '5' sorted by distance from centre._

In [0]:
kolkata_restaurant_final.loc[kolkata_restaurant_final['Cluster_Labels'] == 5].sort_values(by = 'Distance from Centre')

Unnamed: 0,Location,Latitude,Longitude,Distance from Centre,Restaurant Count,Cluster_Labels
102,Shakespeare Sarani,22.544095,88.360447,0.404542,0.0,5
71,Mall Road,22.548219,88.353772,0.438766,0.0,5
69,Little Russel Street,22.548238,88.350431,0.723346,0.0,5
97,Russel Street,22.548238,88.350431,0.723346,0.0,5
73,Middleton Row,22.55067,88.352234,0.747845,0.0,5


_This concludes our analysis. Now let's display top 5 address of best location for the restaurant which we can present to stakeholders._

In [0]:
print('===================================================================')
print('Top 5 addresses for the new restaurant near to the centre of Kolkata')
print('===================================================================\n')
print(df['Location'].iloc[102] + ', ' + df['City'].iloc[102] + ' - ' + df['PostalCode'].iloc[102].astype(str))
print(df['Location'].iloc[71] + ', ' + df['City'].iloc[71] + ' - ' + df['PostalCode'].iloc[71].astype(str))
print(df['Location'].iloc[69] + ', ' + df['City'].iloc[69] + ' - ' + df['PostalCode'].iloc[69].astype(str))
print(df['Location'].iloc[97] + ', ' + df['City'].iloc[97] + ' - ' + df['PostalCode'].iloc[97].astype(str))
print(df['Location'].iloc[73] + ', ' + df['City'].iloc[73] + ' - ' + df['PostalCode'].iloc[73].astype(str))

Top 5 addresses for the new restaurant near to the centre of Kolkata

Shakespeare Sarani, Kolkata - 700017
Mall Road, Kolkata - 700080
Little Russel Street, Kolkata - 700071
Russel Street, Kolkata - 700071
Middleton Row, Kolkata - 700071


## Results and Discussion <a name="results"></a>

From our analysis we can see that there are almost ~200 restaurants present in Kolkata around 120 prime locations. So the density of the restaurants present throughout the city is light and there are almost 56 locations where there are not more than 2 restaurants nearby (Considering the location is not more than 5km away from the centre of the city). From our visualized data we have got one cluster (cluster 5) nearest to the centre of the city and there are total 5 locations for which no restaurants are present nearby. As per our business problem these 5 locations can be the most promising locations to open a new restaurant. As we already have the postal code details for all the locations so we were able to derive the exact address of the locations which can be present to Stakeholders for their interest.  

After considering our conditions to find out the best location we have created 10 clusters here for all the suitable locations. If we compare our raw data with the visualized data we can see that the locations were clustered depending upon the distance from the city centre beside their nearby restaurant count. As for example if we look into the locations for cluster 2 we can see that all the locations are almost more than 3km away from the centre and they are spread out through the different areas of the city. If the neighbourhoods for these locations are popular and have popular venues around them, these can also be considered as promising locations for our stakeholders. 

## Conclusion <a name="conclusion"></a>

The goal of this project was to identify the best location in city Kolkata nearest to city centre to open a new restaurant depending upon the restaurant density in that location in order to help stakeholders to narrow down their search for the best location. We have collected postal code data for Kolkata to identify all the locations of the city and used Foursquare data to get all the restaurants details for those locations. We have analysed the data and extracted the locations distance from the centre of the city and also calculated the density of restaurants for each and every location. Finally we have used clustering technique to cluster the filtered out top locations to achieve our goal. While using Foursquare API to get the nearby venues for locations we could notice that it could not able to return all of the venues accurately for some locations. If this could be improved our analysis could return better results.  

However we were able to find out the promising locations best on our requirements. The final decission to determine the optimal restaurant location will be made by the stakeholders based on the characteristics of the neighborhood of every recomended location taking into consideration of other important factors like the popularity of the area, number of shool/college/office nearby, availibility of water/electricity, real estate availability, price, transport availibility etc.