# Final Capstone Project - The Battle of Neighborhoods

## Table of contents
* Introduction: Business Problem
* Data
* Methodology
* Results
* Discussion
* Conclusion

### Introduction: Business Problem 

#### Area :
New Delhi is India's capital and a hub of tourists attracting thousands of tourists every day. New Delhi serves as the seat of all three branches of the Government of India.Population density in New Delhi is about 19,300 Delhiites per square mile (7,400 per square kilometer). It is primarily divided into multiple districts **Central Delhi, East Delhi, New Delhi, North Delhi, North East Delhi, North West Delhi, South Delhi, South West Delhi, West Delhi**.

New Delhi presents itself as wonderful location for profitable businesses including restaurants.

#### Probelm :
A Food & Beverage entrepreneur is looking for a suitable location in **Central Delhi** to open a new restaurant to expand their business. Currently company has restaurants opened in other cities of India. Client is particularly interedted to identify a suitable neighborhood in Central Delhi for restaurant business.

We will focus our analysis on identifying the neighborhoods where business will be good with less competition. Objective is to short list 4-5 such places so that client can decide the best once based on other factors.

#### Target Audience:
Any business person interested in opening up a restaurant in Central Delhi

### Data 

First Dataset: List of neighbourhoods in Central Delhi: The web page http://zip.nowmsg.com contains the postal code of each district and its corresponding location. We used BeautifulSoup to scrap the page and get a table with four cloums: Ward, District, Latitude and Longitude. I will be using web scrapping tool BeautifulSoup for extracting the data in the form of a table from this page - https://zip.nowmsg.com/city.asp?country=IN&state=Delhi&county=Central%20Delhi. Final DataFrame will have columns: County, Place, Latitude, Longitude. 

Second Dataset: List of different venues in the neighbourhoods of Central Delhi:
This dataset will be formed using the Foursquare API. I will use the Foursquare location data to explore different venues in each neighbourhood of Central Delhi. Using the Foursquare location data, I can get information about these venues and analyze the neighbourhoods of Central Delhi easily based on this information.

I will use the geographical coordinates from above dataset to generate maps.

I will be using these two datasets to solve the business problem of finding the best place to open a restaurant within Central Delhi, primarily by identifying the neighbourhoods in the Central Delhi, fetching the venues around them, using clustering to cluster these into multiple clusters and finaly identify the cluster with less concentration of restaurants and further cleansing the identified locations to narrow down the target areas.

#### Install Packages & Import Libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

from bs4 import BeautifulSoup
import requests

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

# Matplotlib and associated plotting modules
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.cm as cm

import matplotlib.colors as colors

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# import k-means from clustering stage
from sklearn.cluster import KMeans

from sklearn.preprocessing import StandardScaler, normalize, scale
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.decomposition import PCA
from sklearn.metrics import mean_squared_error, r2_score

import re
print('Libraries imported!')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    openssl-1.1.1e             |       h516909a_0         2.1 MB  conda-forge
    altair-4.0.1               |             py_0         575 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    certifi-2019.11.28         |   py36h9f0ad1d_1         149 KB  conda-forge
    branca-0.4.0               |             py_0          26 KB  conda-forge
    ------------------------------------------------------------
                       

#### Web Scraping

In [175]:
response_obj = requests.get('https://zip.nowmsg.com/city.asp?country=IN&state=Delhi&county=Central%20Delhi').text

In [176]:
soup = BeautifulSoup(response_obj,'html.parser')

In [177]:
lst = []
for row in soup.select('div[class="col-md-3 my-padding-6"]'):
    place = row.text.strip()
    url = 'https://zip.nowmsg.com/postal_code.asp?country=IN&state=Delhi&county=Central%20Delhi&city='+place
    inside = requests.get(url).text
    s = BeautifulSoup(inside,'html.parser')
    s_list = s.select('table[class="table table-hover"]')
    tmp = str(s_list)
    patt = re.compile(r'\b\d+\.\d+')
    lnl = patt.findall(tmp)
    lat = lnl[0]
    lon = lnl[1]
    lst.append(['Central Delhi', place,lat,lon])

s_df = pd.DataFrame(lst, columns=["County","Place","Latitude","Longitude"])

In [178]:
s_df.head()

Unnamed: 0,County,Place,Latitude,Longitude
0,Central Delhi,A.G.C.R.,28.6453,77.2456
1,Central Delhi,A.K.Market,28.6417,77.2132
2,Central Delhi,Ajmeri Gate Extn.,28.6453,77.2456
3,Central Delhi,Anand Parbat,28.6431,77.2197
4,Central Delhi,Anand Parbat Indl. Area,28.6551,77.1833


In [179]:
s_df.to_csv('file1.csv')

In [180]:
s_df1=pd.read_csv("file1.csv")
s_df1.drop(s_df1.columns[[0]], axis = 1, inplace = True) 
s_df1.head()

Unnamed: 0,County,Place,Latitude,Longitude
0,Central Delhi,A.G.C.R.,28.6453,77.2456
1,Central Delhi,A.K.Market,28.6417,77.2132
2,Central Delhi,Ajmeri Gate Extn.,28.6453,77.2456
3,Central Delhi,Anand Parbat,28.6431,77.2197
4,Central Delhi,Anand Parbat Indl. Area,28.6551,77.1833


#### Creating a map of Central Delhi with all neighborhoods marked

In [181]:
address = 'Central Delhi, IN'

geolocator = Nominatim(user_agent="tr_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Central Delhi are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Central Delhi are 28.69854835, 77.21939141568413.


In [182]:
map1 = folium.Map(location=[latitude, longitude], zoom_start=11)

# instantiate a feature group for the incidents in the dataframe
incidents = folium.map.FeatureGroup()

for lat, lng, in zip(s_df1['Latitude'], s_df1['Longitude']):
    incidents.add_child(
        folium.features.CircleMarker(
            [lat, lng],
            radius=5,
            color='yellow',
            fill=True,
            fill_color='blue',
            fill_opacity=0.6
        )
    )

# add incidents to map
map1.add_child(incidents)

#### Getting venue details from FourSquare

In [183]:
CLIENT_ID = 'BRJMNCYXB34WIBDVMTSEXUGBA4PF4G1N2J4GUBML43KWPFZR'
CLIENT_SECRET = 'MLIOTMSGNEXJYA5P0DEYJDEJMJTZRCIBDBG25JRYLLBA2MGH'
VERSION = '20200320' # Foursquare API version

In [184]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

In [185]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Central_venues = getNearbyVenues(names=s_df1['Place'],
                               latitudes=s_df1['Latitude'],
                               longitudes=s_df1['Longitude']
                              )

In [186]:
Central_venues.to_csv('Central_venues.csv')
Central_venues=pd.read_csv("Central_venues.csv")

In [187]:
Central_venues.drop(Central_venues.columns[[0]], axis = 1, inplace = True)
Central_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,A.G.C.R.,28.6453,77.2456,Aap Ki Pasand,28.644786,77.24055,Food & Drink Shop
1,A.K.Market,28.6417,77.2132,The Drunkyard Cafe,28.641451,77.215506,Tibetan Restaurant
2,A.K.Market,28.6417,77.2132,The exotic roof top Restaurant,28.641039,77.213634,Indian Restaurant
3,A.K.Market,28.6417,77.2132,쉼터,28.641495,77.213152,Korean Restaurant
4,A.K.Market,28.6417,77.2132,Sita Ram Diwan Chand Chole Bhature,28.642324,77.210417,Food


In [188]:
Central_venues.shape

(1137, 7)

#### Methodology

In [189]:
Central_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A.G.C.R.,1,1,1,1,1,1
A.K.Market,30,30,30,30,30,30
Ajmeri Gate Extn.,1,1,1,1,1,1
Anand Parbat,11,11,11,11,11,11
Anand Parbat Indl. Area,4,4,4,4,4,4
Bank Street (Central Delhi),4,4,4,4,4,4
Baroda House,34,34,34,34,34,34
Bengali Market,34,34,34,34,34,34
Bhagat Singh Market,34,34,34,34,34,34
Connaught Place,11,11,11,11,11,11


#### Preprocessing the dataset Central_venues using one hot encodingthat to further cluster the dataset :

In [190]:
# one hot encoding
Central_onehot = pd.get_dummies(Central_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Central_onehot['Neighborhood'] = Central_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Central_onehot.columns[-1]] + list(Central_onehot.columns[:-1])
Central_onehot = Central_onehot[fixed_columns]

Central_onehot.head()

Unnamed: 0,Neighborhood,Asian Restaurant,BBQ Joint,Bakery,Bar,Breakfast Spot,Café,Chaat Place,Chinese Restaurant,Clothing Store,Coffee Shop,Convenience Store,Deli / Bodega,Dessert Shop,Fast Food Restaurant,Food,Food & Drink Shop,Hostel,Hotel,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Jazz Club,Korean Restaurant,Lounge,Mobile Phone Shop,Motel,Multiplex,Pharmacy,Pizza Place,Platform,Plaza,Pub,Restaurant,Road,Sandwich Place,Snack Place,Tibetan Restaurant
0,A.G.C.R.,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,A.K.Market,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
2,A.K.Market,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,A.K.Market,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,A.K.Market,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [191]:
Central_grouped = Central_onehot.groupby('Neighborhood').mean().reset_index()
Central_grouped.head()

Unnamed: 0,Neighborhood,Asian Restaurant,BBQ Joint,Bakery,Bar,Breakfast Spot,Café,Chaat Place,Chinese Restaurant,Clothing Store,Coffee Shop,Convenience Store,Deli / Bodega,Dessert Shop,Fast Food Restaurant,Food,Food & Drink Shop,Hostel,Hotel,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Jazz Club,Korean Restaurant,Lounge,Mobile Phone Shop,Motel,Multiplex,Pharmacy,Pizza Place,Platform,Plaza,Pub,Restaurant,Road,Sandwich Place,Snack Place,Tibetan Restaurant
0,A.G.C.R.,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,A.K.Market,0.0,0.0,0.033333,0.033333,0.033333,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.033333,0.0,0.066667,0.333333,0.0,0.033333,0.133333,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.033333,0.033333,0.033333,0.033333
2,Ajmeri Gate Extn.,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Anand Parbat,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.181818,0.0,0.090909,0.090909,0.363636,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909
4,Anand Parbat Indl. Area,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Our interest lies in venues under 'Restaurant' category. Hence we will include all venues that have 'restaurant' in category name, and we'll also identify all the subcategories of different restaurants in the neighborhood. 

In [192]:
col=['Neighborhood']
for column in Central_onehot.columns:
    if column.__contains__('Restaurant'):
        col.append(column)

In [193]:
Central_restaurants=Central_onehot[col]
Central_restaurants=Central_restaurants.groupby('Neighborhood').sum().reset_index()
Central_restaurants.head()

Unnamed: 0,Neighborhood,Asian Restaurant,Chinese Restaurant,Fast Food Restaurant,Indian Chinese Restaurant,Indian Restaurant,Korean Restaurant,Restaurant,Tibetan Restaurant
0,A.G.C.R.,0,0,0,0,0,0,0,0
1,A.K.Market,0,0,1,1,4,1,1,1
2,Ajmeri Gate Extn.,0,0,0,0,0,0,0,0
3,Anand Parbat,0,0,2,0,0,0,0,1
4,Anand Parbat Indl. Area,0,0,0,0,0,0,0,0


In [194]:
Central_restaurants['Total']=Central_restaurants.sum(axis=1)
Central_restaurants= Central_restaurants.drop('Neighborhood',axis=1)
Central_restaurants.head()

Unnamed: 0,Asian Restaurant,Chinese Restaurant,Fast Food Restaurant,Indian Chinese Restaurant,Indian Restaurant,Korean Restaurant,Restaurant,Tibetan Restaurant,Total
0,0,0,0,0,0,0,0,0,0
1,0,0,1,1,4,1,1,1,9
2,0,0,0,0,0,0,0,0,0
3,0,0,2,0,0,0,0,1,3
4,0,0,0,0,0,0,0,0,0


#### Using K-Means clustering algorithm to make clusters of venues :

In [195]:
# set number of clusters
kclusters = 5


# run k-means clustering
kmeans = KMeans(n_clusters=kclusters,random_state=0).fit(Central_restaurants)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 3, 1, 4, 1, 1, 0, 0, 0, 4], dtype=int32)

In [196]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [197]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
venues_sorted = pd.DataFrame(columns=columns)
venues_sorted['Neighborhood'] = Central_grouped['Neighborhood']

for ind in np.arange(Central_grouped.shape[0]):
    venues_sorted.iloc[ind, 1:] = return_most_common_venues(Central_grouped.iloc[ind, :], num_top_venues)

venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,A.G.C.R.,Food & Drink Shop,Tibetan Restaurant,Hotel,Food,Fast Food Restaurant
1,A.K.Market,Hotel,Indian Restaurant,Café,Hostel,Tibetan Restaurant
2,Ajmeri Gate Extn.,Food & Drink Shop,Tibetan Restaurant,Hotel,Food,Fast Food Restaurant
3,Anand Parbat,Hotel,Fast Food Restaurant,Tibetan Restaurant,Pizza Place,Food & Drink Shop
4,Anand Parbat Indl. Area,Ice Cream Shop,Chaat Place,Mobile Phone Shop,Pharmacy,Bar


In [198]:
# add clustering labels
venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [199]:
venues_sorted.head()

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,1,A.G.C.R.,Food & Drink Shop,Tibetan Restaurant,Hotel,Food,Fast Food Restaurant
1,3,A.K.Market,Hotel,Indian Restaurant,Café,Hostel,Tibetan Restaurant
2,1,Ajmeri Gate Extn.,Food & Drink Shop,Tibetan Restaurant,Hotel,Food,Fast Food Restaurant
3,4,Anand Parbat,Hotel,Fast Food Restaurant,Tibetan Restaurant,Pizza Place,Food & Drink Shop
4,1,Anand Parbat Indl. Area,Ice Cream Shop,Chaat Place,Mobile Phone Shop,Pharmacy,Bar


In [200]:
Central_merged = s_df1

Central_merged = Central_merged.join(venues_sorted.set_index('Neighborhood'), on='Place')

Central_merged

Unnamed: 0,County,Place,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Central Delhi,A.G.C.R.,28.6453,77.2456,1.0,Food & Drink Shop,Tibetan Restaurant,Hotel,Food,Fast Food Restaurant
1,Central Delhi,A.K.Market,28.6417,77.2132,3.0,Hotel,Indian Restaurant,Café,Hostel,Tibetan Restaurant
2,Central Delhi,Ajmeri Gate Extn.,28.6453,77.2456,1.0,Food & Drink Shop,Tibetan Restaurant,Hotel,Food,Fast Food Restaurant
3,Central Delhi,Anand Parbat,28.6431,77.2197,4.0,Hotel,Fast Food Restaurant,Tibetan Restaurant,Pizza Place,Food & Drink Shop
4,Central Delhi,Anand Parbat Indl. Area,28.6551,77.1833,1.0,Ice Cream Shop,Chaat Place,Mobile Phone Shop,Pharmacy,Bar
5,Central Delhi,Bank Street (Central Delhi),28.6551,77.1833,1.0,Ice Cream Shop,Chaat Place,Mobile Phone Shop,Pharmacy,Bar
6,Central Delhi,Baroda House,28.6369,77.2183,0.0,Indian Restaurant,Café,Chinese Restaurant,Bar,Deli / Bodega
7,Central Delhi,Bengali Market,28.6369,77.2183,0.0,Indian Restaurant,Café,Chinese Restaurant,Bar,Deli / Bodega
8,Central Delhi,Bhagat Singh Market,28.6369,77.2183,0.0,Indian Restaurant,Café,Chinese Restaurant,Bar,Deli / Bodega
9,Central Delhi,Connaught Place,28.6431,77.2197,4.0,Hotel,Fast Food Restaurant,Tibetan Restaurant,Pizza Place,Food & Drink Shop


In [201]:
Central_merged1=Central_merged.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False).reset_index()

In [202]:
Central_merged1.head()

Unnamed: 0,index,County,Place,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,0,Central Delhi,A.G.C.R.,28.6453,77.2456,1.0,Food & Drink Shop,Tibetan Restaurant,Hotel,Food,Fast Food Restaurant
1,1,Central Delhi,A.K.Market,28.6417,77.2132,3.0,Hotel,Indian Restaurant,Café,Hostel,Tibetan Restaurant
2,2,Central Delhi,Ajmeri Gate Extn.,28.6453,77.2456,1.0,Food & Drink Shop,Tibetan Restaurant,Hotel,Food,Fast Food Restaurant
3,3,Central Delhi,Anand Parbat,28.6431,77.2197,4.0,Hotel,Fast Food Restaurant,Tibetan Restaurant,Pizza Place,Food & Drink Shop
4,4,Central Delhi,Anand Parbat Indl. Area,28.6551,77.1833,1.0,Ice Cream Shop,Chaat Place,Mobile Phone Shop,Pharmacy,Bar


Mapping the Central Delhi clusters and neighbourhoods with different colours representing neighbourhoods belonging to different cluster:

In [203]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Central_merged1['Latitude'], Central_merged1['Longitude'], Central_merged1['Place'], Central_merged1['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7
    ).add_to(map_clusters)
map_clusters


### Cluster-wise segmentation of the dataset

In [204]:
df0=Central_merged1.loc[Central_merged1['Cluster Labels'] == 0, Central_merged1.columns[[2] + list(range(5, Central_merged1.shape[1]))]]
df0.head()

Unnamed: 0,Place,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
6,Baroda House,0.0,Indian Restaurant,Café,Chinese Restaurant,Bar,Deli / Bodega
7,Bengali Market,0.0,Indian Restaurant,Café,Chinese Restaurant,Bar,Deli / Bodega
8,Bhagat Singh Market,0.0,Indian Restaurant,Café,Chinese Restaurant,Bar,Deli / Bodega
10,Constitution House,0.0,Indian Restaurant,Café,Chinese Restaurant,Bar,Deli / Bodega
14,Election Commission,0.0,Indian Restaurant,Café,Chinese Restaurant,Bar,Deli / Bodega


In [205]:
df1=Central_merged1.loc[Central_merged1['Cluster Labels'] == 1, Central_merged1.columns[[2] + list(range(5, Central_merged1.shape[1]))]]
df1.head()

Unnamed: 0,Place,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,A.G.C.R.,1.0,Food & Drink Shop,Tibetan Restaurant,Hotel,Food,Fast Food Restaurant
2,Ajmeri Gate Extn.,1.0,Food & Drink Shop,Tibetan Restaurant,Hotel,Food,Fast Food Restaurant
4,Anand Parbat Indl. Area,1.0,Ice Cream Shop,Chaat Place,Mobile Phone Shop,Pharmacy,Bar
5,Bank Street (Central Delhi),1.0,Ice Cream Shop,Chaat Place,Mobile Phone Shop,Pharmacy,Bar
12,Darya Ganj,1.0,Food & Drink Shop,Tibetan Restaurant,Hotel,Food,Fast Food Restaurant


In [206]:
df2=Central_merged1.loc[Central_merged1['Cluster Labels'] == 2, Central_merged1.columns[[2] + list(range(5, Central_merged1.shape[1]))]]
df2.head()

Unnamed: 0,Place,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
11,Dada Ghosh Bhawan,2.0,Hotel,Restaurant,Hostel,Pizza Place,Café
18,Hauz Qazi,2.0,Hotel,Restaurant,Hostel,Pizza Place,Café
21,Jama Masjid,2.0,Hotel,Restaurant,Hostel,Pizza Place,Café
33,Patel Nagar (Central Delhi),2.0,Hotel,Restaurant,Hostel,Pizza Place,Café
34,Patel Nagar East,2.0,Hotel,Restaurant,Hostel,Pizza Place,Café


In [207]:
df3=Central_merged1.loc[Central_merged1['Cluster Labels'] == 3, Central_merged1.columns[[2] + list(range(5, Central_merged1.shape[1]))]]
df3.head()

Unnamed: 0,Place,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,A.K.Market,3.0,Hotel,Indian Restaurant,Café,Hostel,Tibetan Restaurant
28,Multani Dhanda,3.0,Hotel,Indian Restaurant,Café,Hostel,Tibetan Restaurant
30,Pahar Ganj,3.0,Hotel,Indian Restaurant,Café,Hostel,Tibetan Restaurant
32,Parliament House,3.0,Hotel,Indian Restaurant,Café,Hostel,Fast Food Restaurant
49,Swami Ram Tirth Nagar,3.0,Hotel,Indian Restaurant,Café,Hostel,Tibetan Restaurant


In [208]:
df4=Central_merged1.loc[Central_merged1['Cluster Labels'] == 4, Central_merged1.columns[[2] + list(range(5, Central_merged1.shape[1]))]]
df4.head()

Unnamed: 0,Place,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
3,Anand Parbat,4.0,Hotel,Fast Food Restaurant,Tibetan Restaurant,Pizza Place,Food & Drink Shop
9,Connaught Place,4.0,Hotel,Fast Food Restaurant,Tibetan Restaurant,Pizza Place,Food & Drink Shop
23,Karol Bagh,4.0,Fast Food Restaurant,Snack Place,Hotel,BBQ Joint,Bakery
31,Pandara Road,4.0,Hotel,Fast Food Restaurant,Tibetan Restaurant,Pizza Place,Food & Drink Shop


#### Result

In [209]:
print('Total number of neighbourhoods in cluster 0 is',Central_restaurants.loc[df0.index,:].shape[0])
print('Total number of restaurants in this cluster is', Central_restaurants.loc[df0.index,:]['Total'].sum())
print('Ratio of Restaurant/Neighbourhood in this cluster is',(Central_restaurants.loc[df0.index,:]['Total'].sum()/Central_restaurants.loc[df0.index,:].shape[0]) )

Total number of neighbourhoods in cluster 0 is 18
Total number of restaurants in this cluster is 216
Ratio of Restaurant/Neighbourhood in this cluster is 12.0


In [210]:
print('Total number of neighbourhoods in cluster 1 is',Central_restaurants.loc[df1.index,:].shape[0])
print('Total number of restaurants in this cluster is', Central_restaurants.loc[df1.index,:]['Total'].sum())
print('Ratio of Restaurant/Neighbourhood in this cluster is',(Central_restaurants.loc[df1.index,:]['Total'].sum()/Central_restaurants.loc[df1.index,:].shape[0]) )

Total number of neighbourhoods in cluster 1 is 16
Total number of restaurants in this cluster is 1
Ratio of Restaurant/Neighbourhood in this cluster is 0.0625


In [211]:
print('Total number of neighbourhoods in cluster 2 is',Central_restaurants.loc[df2.index,:].shape[0])
print('Total number of restaurants in this cluster is', Central_restaurants.loc[df2.index,:]['Total'].sum())
print('Ratio of Restaurant/Neighbourhood in this cluster is',(Central_restaurants.loc[df2.index,:]['Total'].sum()/Central_restaurants.loc[df2.index,:].shape[0]) )

Total number of neighbourhoods in cluster 2 is 8
Total number of restaurants in this cluster is 56
Ratio of Restaurant/Neighbourhood in this cluster is 7.0


In [212]:
print('Total number of neighbourhoods in cluster 3 is',Central_restaurants.loc[df3.index,:].shape[0])
print('Total number of restaurants in this cluster is', Central_restaurants.loc[df3.index,:]['Total'].sum())
print('Ratio of Restaurant/Neighbourhood in this cluster is',(Central_restaurants.loc[df3.index,:]['Total'].sum()/Central_restaurants.loc[df3.index,:].shape[0]) )

Total number of neighbourhoods in cluster 3 is 5
Total number of restaurants in this cluster is 46
Ratio of Restaurant/Neighbourhood in this cluster is 9.2


In [213]:
print('Total number of neighbourhoods in cluster 4 is',Central_restaurants.loc[df4.index,:].shape[0])
print('Total number of restaurants in this cluster is', Central_restaurants.loc[df4.index,:]['Total'].sum())
print('Ratio of Restaurant/Neighbourhood in this cluster is',(Central_restaurants.loc[df4.index,:]['Total'].sum()/Central_restaurants.loc[df4.index,:].shape[0]) )

Total number of neighbourhoods in cluster 4 is 4
Total number of restaurants in this cluster is 13
Ratio of Restaurant/Neighbourhood in this cluster is 3.25


### Discussion

Forusquare data shows that there are many restaurants in Central Delhi To identify the areas with lower density, I used K-means clustering algorithm and segmmented our neighbourhood dataset into 5 clusters.

Further i derived the Restaurant per neighborhood ratio to identify the cluster with lowest restaurant density.

I identified that cluster 1 has 16 neighborhoods but just 1 restaurant. Further this list was drilled down based on areas where we have similar joints offering food like Hotels, Cafes etc. After dropping those venues, i narrowed down to 6 final venues suitable for restaurant opening with no competition. 

In [214]:
Selected_df=Central_restaurants.loc[df1.index,:]
Selected_df.shape

(16, 9)

In [218]:
Selected_df1=Central_merged1.loc[Selected_df.index,:].reset_index()
Selected_df1

Unnamed: 0,level_0,index,County,Place,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,0,0,Central Delhi,A.G.C.R.,28.6453,77.2456,1.0,Food & Drink Shop,Tibetan Restaurant,Hotel,Food,Fast Food Restaurant
1,2,2,Central Delhi,Ajmeri Gate Extn.,28.6453,77.2456,1.0,Food & Drink Shop,Tibetan Restaurant,Hotel,Food,Fast Food Restaurant
2,4,4,Central Delhi,Anand Parbat Indl. Area,28.6551,77.1833,1.0,Ice Cream Shop,Chaat Place,Mobile Phone Shop,Pharmacy,Bar
3,5,5,Central Delhi,Bank Street (Central Delhi),28.6551,77.1833,1.0,Ice Cream Shop,Chaat Place,Mobile Phone Shop,Pharmacy,Bar
4,12,12,Central Delhi,Darya Ganj,28.6453,77.2456,1.0,Food & Drink Shop,Tibetan Restaurant,Hotel,Food,Fast Food Restaurant
5,13,15,Central Delhi,Desh Bandhu Gupta Road,28.6551,77.1833,1.0,Ice Cream Shop,Chaat Place,Mobile Phone Shop,Pharmacy,Bar
6,15,17,Central Delhi,Foreign Post Delhi IBC,28.6563,77.1366,1.0,Multiplex,Fast Food Restaurant,Convenience Store,Pizza Place,Tibetan Restaurant
7,16,18,Central Delhi,Gandhi Smarak Nidhi,28.6453,77.2456,1.0,Food & Drink Shop,Tibetan Restaurant,Hotel,Food,Fast Food Restaurant
8,17,19,Central Delhi,Guru Gobind Singh Marg,28.6551,77.1833,1.0,Ice Cream Shop,Chaat Place,Mobile Phone Shop,Pharmacy,Bar
9,19,22,Central Delhi,I.P.Estate,28.6453,77.2456,1.0,Food & Drink Shop,Tibetan Restaurant,Hotel,Food,Fast Food Restaurant


In above dataset, we can see that neighbourhoods with index 0,1,4,6,7,9,10,12,13 and 15 have Restaurant / Cafe / Food joints as their most common venue more than once and hence these neighbourhoods are not suitable for Restaurant business. Hence we have to remove these rows from our dataframe:

In [219]:
Selected_df1.drop([0,1,4,6,7,9,10,12,13,15],axis=0,inplace=True)
Selected_df1

Unnamed: 0,level_0,index,County,Place,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
2,4,4,Central Delhi,Anand Parbat Indl. Area,28.6551,77.1833,1.0,Ice Cream Shop,Chaat Place,Mobile Phone Shop,Pharmacy,Bar
3,5,5,Central Delhi,Bank Street (Central Delhi),28.6551,77.1833,1.0,Ice Cream Shop,Chaat Place,Mobile Phone Shop,Pharmacy,Bar
5,13,15,Central Delhi,Desh Bandhu Gupta Road,28.6551,77.1833,1.0,Ice Cream Shop,Chaat Place,Mobile Phone Shop,Pharmacy,Bar
8,17,19,Central Delhi,Guru Gobind Singh Marg,28.6551,77.1833,1.0,Ice Cream Shop,Chaat Place,Mobile Phone Shop,Pharmacy,Bar
11,26,30,Central Delhi,Master Prithvi Nath Marg,28.6551,77.1833,1.0,Ice Cream Shop,Chaat Place,Mobile Phone Shop,Pharmacy,Bar
14,45,51,Central Delhi,Sat Nagar,28.6551,77.1833,1.0,Ice Cream Shop,Chaat Place,Mobile Phone Shop,Pharmacy,Bar


In [220]:
Selected_df1.drop(Selected_df1.columns[[0,1]], axis = 1, inplace = True)
Selected_df1.reset_index(drop=True,inplace=True)
Selected_df1

Unnamed: 0,County,Place,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Central Delhi,Anand Parbat Indl. Area,28.6551,77.1833,1.0,Ice Cream Shop,Chaat Place,Mobile Phone Shop,Pharmacy,Bar
1,Central Delhi,Bank Street (Central Delhi),28.6551,77.1833,1.0,Ice Cream Shop,Chaat Place,Mobile Phone Shop,Pharmacy,Bar
2,Central Delhi,Desh Bandhu Gupta Road,28.6551,77.1833,1.0,Ice Cream Shop,Chaat Place,Mobile Phone Shop,Pharmacy,Bar
3,Central Delhi,Guru Gobind Singh Marg,28.6551,77.1833,1.0,Ice Cream Shop,Chaat Place,Mobile Phone Shop,Pharmacy,Bar
4,Central Delhi,Master Prithvi Nath Marg,28.6551,77.1833,1.0,Ice Cream Shop,Chaat Place,Mobile Phone Shop,Pharmacy,Bar
5,Central Delhi,Sat Nagar,28.6551,77.1833,1.0,Ice Cream Shop,Chaat Place,Mobile Phone Shop,Pharmacy,Bar


#### These places looks perfect for Restaurant opening since there is no other competiotion in these neighborhoods. All are very close in the same area.

In [221]:
Selected_df1.drop(Selected_df1.columns[[2,3,4,5,6,7,8,9]], axis = 1, inplace = True)
Selected_df1

Unnamed: 0,County,Place
0,Central Delhi,Anand Parbat Indl. Area
1,Central Delhi,Bank Street (Central Delhi)
2,Central Delhi,Desh Bandhu Gupta Road
3,Central Delhi,Guru Gobind Singh Marg
4,Central Delhi,Master Prithvi Nath Marg
5,Central Delhi,Sat Nagar


### Conclusion

Objective of this project was to identify the neighborhood in Central Delhi whetre restaurant competition is low making it an attractive location for new entrants. By deriving the restaurant density distribution based on Foursquare data and clustering technique, i have drilled down to the cluster with lowest density and have narrowed down to 6 neighbourhoods which are good for starting a new restaurant. Client can review the locations identified to further drill down based on factors like availability of places, rent etc. This concludes the project - The Battle of Neighborhoods.