# Coursera - IBM Applied Data Science Capstone - Final Project

By *Sadhiman Das*

## Problem Statement

The objective of this capstone project is to analyze different neighborhoods in the city of Bangalore, Karnataka, India to open a new Restaurant. Using data science methodology and machine learning techniques like clustering, this project aims to provide solutions to answer the business question: 

*In the city of Bangalore, India if a entrepreneur is looking to open a new restaurant, what type of restaurant would you recommend opening based on the location?*

## Importing all the required libraries

In [1]:
from tqdm.notebook import tqdm

import numpy as np

import pandas as pd
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json

from geopy.geocoders import Nominatim
import geocoder

import requests
from bs4 import BeautifulSoup

from pandas.io.json import json_normalize

import matplotlib.cm as cm
import matplotlib.colors as colors


from sklearn.cluster import KMeans

import folium

print("Libraries imported.")

Libraries imported.


## Getting a list of neighbourhoods in Bangalore from WikiPedia

In [2]:
tables = pd.read_html('https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Bangalore', header = 0)
bangalore_neighbourhoods = pd.DataFrame(columns=tables[0].columns)
for table in tables:
    if (list(table.columns) == list(bangalore_neighbourhoods.columns)):
        bangalore_neighbourhoods = bangalore_neighbourhoods.append(table, ignore_index=True)
bangalore_neighbourhoods = bangalore_neighbourhoods.drop(['Image', 'Summary'], axis=1)
bangalore_neighbourhoods.head()

Unnamed: 0,Name
0,Cantonment area
1,Domlur
2,Indiranagar
3,Jeevanbheemanagar
4,Malleswaram


## Getting the latitudes of each neighbourhood

In [3]:
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Bangalore, Karnataka, India'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [4]:
coords = [ get_latlng(neighborhood) for neighborhood in tqdm(bangalore_neighbourhoods["Name"].tolist()) ]

HBox(children=(FloatProgress(value=0.0, max=65.0), HTML(value='')))




In [5]:
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

Merging the list of latitudes and longitudes with the list of Neighbourhoods in Bangalore

In [6]:
bangalore_neighbourhoods['Latitude'] = df_coords['Latitude']
bangalore_neighbourhoods['Longitude'] = df_coords['Longitude']

In [7]:
bangalore_neighbourhoods.head()

Unnamed: 0,Name,Latitude,Longitude
0,Cantonment area,12.99435,77.59839
1,Domlur,12.94329,77.65602
2,Indiranagar,13.03006,77.49526
3,Jeevanbheemanagar,12.96601,77.65767
4,Malleswaram,13.006322,77.568416


## FInding out the geographical coordinates of Bangalore

In [8]:
address = 'Bangalore, India'

geolocator = Nominatim(user_agent="bangalore_project")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Bangalore, India is {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Bangalore, India is 12.9791198, 77.5912997.


## Plotting the neighbourhoods on a map using Folium

In [9]:
map_bangalore = folium.Map(location=[latitude, longitude], zoom_start=11)

for lat, lng, neighborhood in zip(bangalore_neighbourhoods['Latitude'], bangalore_neighbourhoods['Longitude'], bangalore_neighbourhoods['Name']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_bangalore)  
    
map_bangalore

## Declaring my Foursquare Credentials

In [10]:
# define Foursquare Credentials and Version
CLIENT_ID = 'OEIPOASTBDXIGRJKJQ3T243MULQRS0CK0MIZPE2S2MORQWTS' # your Foursquare ID
CLIENT_SECRET = 'MKVOFZC22RSXJ0JQR1GER5PWI3IVZXYCFJ1J2H2TMT553PE2' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: OEIPOASTBDXIGRJKJQ3T243MULQRS0CK0MIZPE2S2MORQWTS
CLIENT_SECRET:MKVOFZC22RSXJ0JQR1GER5PWI3IVZXYCFJ1J2H2TMT553PE2


## Getting all the nearby venues for each neighbourhood using Foursquare API

I have specified the category of venues as Food during API call

In [11]:
radius = 2000
LIMIT = 50

venues = []

for lat, long, neighborhood in zip(tqdm(bangalore_neighbourhoods['Latitude']), bangalore_neighbourhoods['Longitude'], bangalore_neighbourhoods['Name']):
    
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&categoryId=4d4b7105d754a06374d81259&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
  
    results = requests.get(url).json()["response"]['groups'][0]['items']

    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

HBox(children=(FloatProgress(value=0.0, max=65.0), HTML(value='')))




Converting the results returned from Foursquare into a dataframe

In [12]:
venues_df = pd.DataFrame(venues)

venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(2293, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Cantonment area,12.99435,77.59839,Ujwal Bar & Restaurant,12.99228,77.594473,Indian Restaurant
1,Cantonment area,12.99435,77.59839,Millers 46,12.991666,77.594207,Steakhouse
2,Cantonment area,12.99435,77.59839,Jayamahal Palace Hotel,12.996839,77.597163,Indian Restaurant
3,Cantonment area,12.99435,77.59839,Desserted,12.993039,77.589376,French Restaurant
4,Cantonment area,12.99435,77.59839,Pasta Street - Cunningham Road,12.988385,77.593891,Italian Restaurant


The number of venues in each area

In [13]:
venues_df.groupby(["Neighborhood"]).count().head()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Anjanapura,4,4,4,4,4,4
Arekere,50,50,50,50,50,50
BTM Layout,50,50,50,50,50,50
Banashankari,50,50,50,50,50,50
Banaswadi,50,50,50,50,50,50


In [14]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 79 uniques categories.


## Finding out the most popular places using one hot encoding

In [15]:
bangalore_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

bangalore_onehot['Neighborhood'] = venues_df['Neighborhood'] 

fixed_columns = [bangalore_onehot.columns[-1]] + list(bangalore_onehot.columns[:-1])
bangalore_onehot = bangalore_onehot[fixed_columns]

bangalore_onehot.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,American Restaurant,Andhra Restaurant,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bengali Restaurant,Bistro,Breakfast Spot,Burger Joint,Burrito Place,Cafeteria,Café,Caribbean Restaurant,Chaat Place,Chettinad Restaurant,Chinese Restaurant,Comfort Food Restaurant,Creperie,Deli / Bodega,Dim Sum Restaurant,Diner,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Falafel Restaurant,Fast Food Restaurant,Fish & Chips Shop,Food,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Gastropub,German Restaurant,Halal Restaurant,Hyderabadi Restaurant,Indian Chinese Restaurant,Indian Restaurant,Indian Sweet Shop,Italian Restaurant,Japanese Restaurant,Karnataka Restaurant,Kebab Restaurant,Kerala Restaurant,Korean Restaurant,Maharashtrian Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Mughlai Restaurant,Multicuisine Indian Restaurant,New American Restaurant,Noodle House,North Indian Restaurant,Pakistani Restaurant,Parsi Restaurant,Pizza Place,Punjabi Restaurant,Rajasthani Restaurant,Ramen Restaurant,Restaurant,Salad Place,Sandwich Place,Seafood Restaurant,Snack Place,South Indian Restaurant,Steakhouse,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tex-Mex Restaurant,Thai Restaurant,Theme Restaurant,Udupi Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,Cantonment area,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Cantonment area,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
2,Cantonment area,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Cantonment area,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Cantonment area,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [16]:
bangalore_grouped = bangalore_onehot.groupby('Neighborhood').mean().reset_index()
bangalore_grouped.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,American Restaurant,Andhra Restaurant,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bengali Restaurant,Bistro,Breakfast Spot,Burger Joint,Burrito Place,Cafeteria,Café,Caribbean Restaurant,Chaat Place,Chettinad Restaurant,Chinese Restaurant,Comfort Food Restaurant,Creperie,Deli / Bodega,Dim Sum Restaurant,Diner,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Falafel Restaurant,Fast Food Restaurant,Fish & Chips Shop,Food,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Gastropub,German Restaurant,Halal Restaurant,Hyderabadi Restaurant,Indian Chinese Restaurant,Indian Restaurant,Indian Sweet Shop,Italian Restaurant,Japanese Restaurant,Karnataka Restaurant,Kebab Restaurant,Kerala Restaurant,Korean Restaurant,Maharashtrian Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Mughlai Restaurant,Multicuisine Indian Restaurant,New American Restaurant,Noodle House,North Indian Restaurant,Pakistani Restaurant,Parsi Restaurant,Pizza Place,Punjabi Restaurant,Rajasthani Restaurant,Ramen Restaurant,Restaurant,Salad Place,Sandwich Place,Seafood Restaurant,Snack Place,South Indian Restaurant,Steakhouse,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tex-Mex Restaurant,Thai Restaurant,Theme Restaurant,Udupi Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,Anjanapura,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Arekere,0.0,0.02,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.02,0.02,0.02,0.0,0.14,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.08,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.02,0.0,0.02,0.0,0.02,0.0,0.02,0.06,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0
2,BTM Layout,0.0,0.0,0.02,0.04,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.02,0.04,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.3,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.02,0.0,0.02,0.0,0.02,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0
3,Banashankari,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.06,0.02,0.0,0.0,0.08,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.34,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.02,0.02,0.04,0.02,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Banaswadi,0.0,0.0,0.02,0.02,0.06,0.0,0.06,0.0,0.02,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.28,0.0,0.02,0.0,0.0,0.0,0.04,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0


In [17]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Top 10 venues for each neighbourhood

In [18]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = bangalore_grouped['Neighborhood']

for ind in np.arange(bangalore_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(bangalore_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Anjanapura,North Indian Restaurant,Snack Place,Asian Restaurant,Deli / Bodega,Dim Sum Restaurant,Diner,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Falafel Restaurant
1,Arekere,Indian Restaurant,Café,Fast Food Restaurant,Pizza Place,South Indian Restaurant,Chinese Restaurant,Rajasthani Restaurant,Kebab Restaurant,Mexican Restaurant,Middle Eastern Restaurant
2,BTM Layout,Indian Restaurant,Fast Food Restaurant,Bakery,Café,Snack Place,Diner,Asian Restaurant,Chinese Restaurant,Pizza Place,Vegetarian / Vegan Restaurant
3,Banashankari,Indian Restaurant,Fast Food Restaurant,Café,Pizza Place,South Indian Restaurant,Breakfast Spot,Chinese Restaurant,Sandwich Place,Italian Restaurant,Seafood Restaurant
4,Banaswadi,Indian Restaurant,Café,Bakery,Fast Food Restaurant,Korean Restaurant,Chinese Restaurant,BBQ Joint,Kerala Restaurant,Pizza Place,Vegetarian / Vegan Restaurant
5,Basavanagudi,Indian Restaurant,Fast Food Restaurant,Pizza Place,Restaurant,Sandwich Place,Snack Place,Breakfast Spot,Bakery,Italian Restaurant,Chinese Restaurant
6,Basaveshwaranagar,Indian Restaurant,Fast Food Restaurant,Bakery,Café,Pizza Place,Sandwich Place,Asian Restaurant,Chinese Restaurant,Restaurant,Snack Place
7,Begur,Indian Restaurant,Pizza Place,Bakery,Asian Restaurant,BBQ Joint,South Indian Restaurant,Fast Food Restaurant,Fish & Chips Shop,Diner,Donut Shop
8,Bellandur,Indian Restaurant,Café,Fast Food Restaurant,Bakery,Pizza Place,Sandwich Place,Food Court,Japanese Restaurant,Italian Restaurant,Vegetarian / Vegan Restaurant
9,Bommanahalli,Indian Restaurant,Bakery,Café,Pizza Place,Snack Place,Vegetarian / Vegan Restaurant,Fast Food Restaurant,Sandwich Place,North Indian Restaurant,Steakhouse


## Clustering the neighbourhoods in Bangalore using the most common food destinations

I have kept the number of clusters to be 5

In [37]:
kclusters = 5

bangalore_grouped_clustering = bangalore_grouped.drop('Neighborhood', 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(bangalore_grouped_clustering)

kmeans.labels_

array([4, 3, 1, 1, 3, 1, 1, 1, 3, 1, 1, 3, 1, 3, 1, 1, 2, 3, 3, 1, 3, 1,
       3, 0, 3, 1, 1, 3, 3, 1, 3, 2, 3, 0, 3, 3, 3, 3, 3, 0, 3, 1, 1, 3,
       3, 0, 3, 1, 3, 1, 3, 0, 2, 1, 1, 3, 3, 2, 0, 1, 1, 0, 0, 3, 0])

In [38]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

bangalore_merged = bangalore_neighbourhoods

bangalore_merged = bangalore_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Name')

bangalore_merged.head() # check the last columns!

Unnamed: 0,Name,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Cantonment area,12.99435,77.59839,1,Indian Restaurant,Café,Chinese Restaurant,Middle Eastern Restaurant,BBQ Joint,Bakery,Snack Place,Pizza Place,Asian Restaurant,Burger Joint
1,Domlur,12.94329,77.65602,3,Indian Restaurant,Café,Fast Food Restaurant,Restaurant,Chinese Restaurant,Pizza Place,Bakery,Food Court,Cafeteria,Snack Place
2,Indiranagar,13.03006,77.49526,0,Restaurant,Vegetarian / Vegan Restaurant,Fast Food Restaurant,Breakfast Spot,Gastropub,Falafel Restaurant,Dim Sum Restaurant,Diner,Donut Shop,Dumpling Restaurant
3,Jeevanbheemanagar,12.96601,77.65767,3,Indian Restaurant,Café,Restaurant,Burger Joint,Chinese Restaurant,Deli / Bodega,Middle Eastern Restaurant,Bakery,Korean Restaurant,BBQ Joint
4,Malleswaram,13.006322,77.568416,3,Indian Restaurant,Vegetarian / Vegan Restaurant,Donut Shop,Chinese Restaurant,Fast Food Restaurant,Italian Restaurant,Café,Bakery,Breakfast Spot,Pizza Place


## PLotting the clusters on the map

In [39]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(bangalore_merged['Latitude'], bangalore_merged['Longitude'], bangalore_merged['Name'], bangalore_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Analysis of each cluster

### Cluster 1

In [40]:
bangalore_merged.loc[bangalore_merged['Cluster Labels'] ==0].head()

Unnamed: 0,Name,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Indiranagar,13.03006,77.49526,0,Restaurant,Vegetarian / Vegan Restaurant,Fast Food Restaurant,Breakfast Spot,Gastropub,Falafel Restaurant,Dim Sum Restaurant,Diner,Donut Shop,Dumpling Restaurant
17,Varthur,12.94348,77.74703,0,Indian Restaurant,Pizza Place,Café,Food Court,Restaurant,Donut Shop,Chinese Restaurant,Rajasthani Restaurant,Fast Food Restaurant,Sandwich Place
18,Whitefield,12.97523,77.75238,0,Café,Indian Restaurant,Restaurant,Eastern European Restaurant,Chinese Restaurant,Pizza Place,Food Court,Fast Food Restaurant,Vegetarian / Vegan Restaurant,Italian Restaurant
33,Yeshwanthpur,13.02954,77.54022,0,Fast Food Restaurant,Restaurant,Indian Restaurant,Bakery,Punjabi Restaurant,Mediterranean Restaurant,Seafood Restaurant,Burger Joint,Food Court,Food
54,Kothnur,13.06434,77.64855,0,Bakery,Pizza Place,Restaurant,North Indian Restaurant,Asian Restaurant,Vegetarian / Vegan Restaurant,Italian Restaurant,Mediterranean Restaurant,Sandwich Place,Café


In [23]:
print("First preferrence: "+bangalore_merged.loc[bangalore_merged['Cluster Labels'] ==0]['1st Most Common Venue'].mode().iloc[0])
print("Second preferrence: "+bangalore_merged.loc[bangalore_merged['Cluster Labels'] ==0]['2nd Most Common Venue'].mode().iloc[0])
print("Third preferrence: "+bangalore_merged.loc[bangalore_merged['Cluster Labels'] ==0]['3rd Most Common Venue'].mode().iloc[0])

First preferrence: Pizza Place
Second preferrence: Indian Restaurant
Third preferrence: Fast Food Restaurant


As we can see from the data the most preferred place for this cluster of neighbourhoods is a Pizza Place followed by Indian Restaurant and Fast Food Restaurant. Opening any of these type of food joints would be profitable in these neighbourhoods

### Cluster 2

In [41]:
bangalore_merged.loc[bangalore_merged['Cluster Labels'] == 1].head()

Unnamed: 0,Name,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Cantonment area,12.99435,77.59839,1,Indian Restaurant,Café,Chinese Restaurant,Middle Eastern Restaurant,BBQ Joint,Bakery,Snack Place,Pizza Place,Asian Restaurant,Burger Joint
6,Sadashivanagar,13.01483,77.57771,1,Indian Restaurant,Café,Chinese Restaurant,Pizza Place,Vegetarian / Vegan Restaurant,Fast Food Restaurant,Italian Restaurant,Snack Place,Burger Joint,Breakfast Spot
7,Seshadripuram,12.99355,77.57988,1,Indian Restaurant,Chinese Restaurant,Restaurant,Breakfast Spot,Café,Donut Shop,Fast Food Restaurant,Vegetarian / Vegan Restaurant,Karnataka Restaurant,Asian Restaurant
10,Vasanth Nagar,12.99073,77.58861,1,Indian Restaurant,Chinese Restaurant,Café,Bakery,Donut Shop,Karnataka Restaurant,Restaurant,Rajasthani Restaurant,Asian Restaurant,BBQ Joint
16,Marathahalli,12.95466,77.70752,1,Indian Restaurant,Fast Food Restaurant,Pizza Place,Café,Vegetarian / Vegan Restaurant,BBQ Joint,Chinese Restaurant,Andhra Restaurant,Restaurant,Burger Joint


In [42]:
print("First preferrence: "+bangalore_merged.loc[bangalore_merged['Cluster Labels'] ==1]['1st Most Common Venue'].mode().iloc[0])
print("Second preferrence: "+bangalore_merged.loc[bangalore_merged['Cluster Labels'] ==1]['2nd Most Common Venue'].mode().iloc[0])
print("Third preferrence: "+bangalore_merged.loc[bangalore_merged['Cluster Labels'] ==1]['3rd Most Common Venue'].mode().iloc[0])

First preferrence: Indian Restaurant
Second preferrence: Fast Food Restaurant
Third preferrence: Pizza Place


As we can see from the data the most preferred place for this cluster of neighbourhoods is a Indian Restaurant followed by a Fast Food Restaurant and Pizza Place. Opening any of these type of food joints would be profitable in these neighbourhoods

### Cluster 3

In [43]:
bangalore_merged.loc[bangalore_merged['Cluster Labels'] == 2].head()

Unnamed: 0,Name,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
25,Ramamurthy Nagar,13.02382,77.67785,2,Indian Restaurant,Asian Restaurant,Pizza Place,South Indian Restaurant,Gastropub,Falafel Restaurant,German Restaurant,Dim Sum Restaurant,Diner,Donut Shop
48,Uttarahalli,12.89757,77.5283,2,Indian Restaurant,Andhra Restaurant,Restaurant,Fish & Chips Shop,Dim Sum Restaurant,Diner,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Falafel Restaurant
52,Gottigere,12.85568,77.58557,2,Indian Restaurant,Italian Restaurant,Food Truck,Fish & Chips Shop,Dim Sum Restaurant,Diner,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Falafel Restaurant
57,Kengeri,12.9087,77.48714,2,Indian Restaurant,Café,Creperie,Dim Sum Restaurant,Diner,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Falafel Restaurant,Fast Food Restaurant


In [44]:
print("First preferrence: "+bangalore_merged.loc[bangalore_merged['Cluster Labels'] ==2]['1st Most Common Venue'].mode().iloc[0])
print("Second preferrence: "+bangalore_merged.loc[bangalore_merged['Cluster Labels'] ==2]['2nd Most Common Venue'].mode().iloc[0])
print("Third preferrence: "+bangalore_merged.loc[bangalore_merged['Cluster Labels'] ==2]['3rd Most Common Venue'].mode().iloc[0])

First preferrence: Indian Restaurant
Second preferrence: Andhra Restaurant
Third preferrence: Creperie


The 1st preferred place to eat in these neighbourhoods is same as cluster 1 ie Indian Restaurant but Andhra Restaurants and Creeperies are much more popular in these neighbourhoods. Opening or investing in such restaurants would seem profitable if one is looking to invest in these neighbourhoods.

### Cluster 4

In [45]:
bangalore_merged.loc[bangalore_merged['Cluster Labels'] == 3].head()

Unnamed: 0,Name,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Domlur,12.94329,77.65602,3,Indian Restaurant,Café,Fast Food Restaurant,Restaurant,Chinese Restaurant,Pizza Place,Bakery,Food Court,Cafeteria,Snack Place
3,Jeevanbheemanagar,12.96601,77.65767,3,Indian Restaurant,Café,Restaurant,Burger Joint,Chinese Restaurant,Deli / Bodega,Middle Eastern Restaurant,Bakery,Korean Restaurant,BBQ Joint
4,Malleswaram,13.006322,77.568416,3,Indian Restaurant,Vegetarian / Vegan Restaurant,Donut Shop,Chinese Restaurant,Fast Food Restaurant,Italian Restaurant,Café,Bakery,Breakfast Spot,Pizza Place
5,Pete area,12.96618,77.5869,3,Indian Restaurant,Café,Vegetarian / Vegan Restaurant,Italian Restaurant,Breakfast Spot,Japanese Restaurant,Mexican Restaurant,Bakery,Sushi Restaurant,South Indian Restaurant
8,Shivajinagar,12.9872,77.60401,3,Indian Restaurant,Café,Italian Restaurant,Chinese Restaurant,Burger Joint,Japanese Restaurant,French Restaurant,Steakhouse,Breakfast Spot,Modern European Restaurant


In [46]:
print("First preferrence: "+bangalore_merged.loc[bangalore_merged['Cluster Labels'] ==3]['1st Most Common Venue'].mode().iloc[0])
print("Second preferrence: "+bangalore_merged.loc[bangalore_merged['Cluster Labels'] ==3]['2nd Most Common Venue'].mode().iloc[0])
print("Third preferrence: "+bangalore_merged.loc[bangalore_merged['Cluster Labels'] ==3]['3rd Most Common Venue'].mode().iloc[0])

First preferrence: Indian Restaurant
Second preferrence: Café
Third preferrence: Fast Food Restaurant


The preference of people in these neighbourhoods similar to the general trend in Bangalore. Indian Restaurants, Cafes and Fast Food Restaurants are very popular in these neighbourhoods

### Cluster 5

In [47]:
bangalore_merged.loc[bangalore_merged['Cluster Labels'] == 4].head()

Unnamed: 0,Name,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
49,Anjanapura,12.85811,77.55909,4,North Indian Restaurant,Snack Place,Asian Restaurant,Deli / Bodega,Dim Sum Restaurant,Diner,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Falafel Restaurant


In [48]:
print("First preferrence: "+bangalore_merged.loc[bangalore_merged['Cluster Labels'] ==4]['1st Most Common Venue'].mode().iloc[0])
print("Second preferrence: "+bangalore_merged.loc[bangalore_merged['Cluster Labels'] ==4]['2nd Most Common Venue'].mode().iloc[0])
print("Third preferrence: "+bangalore_merged.loc[bangalore_merged['Cluster Labels'] ==4]['3rd Most Common Venue'].mode().iloc[0])

First preferrence: North Indian Restaurant
Second preferrence: Snack Place
Third preferrence: Asian Restaurant


There is only one neighbourhood in this cluster as the preferences of this neighbourhood differ much more from the general trend in bangalore. North Indian Restaurant, Snack Place and Asian Restaurants are some of the popular restaurants in this Area.

## Discussion

Observing from the results and the map, most of the neighborhoods in Bangalore fall into cluster 2 and cluster 4. The general preference in the city for restaurants is Indian Restaurants. Starting a new Indian themed restaurant or investing in one will prove to be profitable in most of the neighborhoods and is the top choice according to our analysis. Cafes and Snack Places are also great choices for themes of a new restaurant in the city. However, as the frequency of these types of restaurants is also higher, one may face high competition in the business. <br>
According to me there are two choices that a new restaurant owner can take for the theme of his/her new restaurant.<br>
1)	Go with a theme which is popular in the neighborhood and face tough competition from other restaurants<br>
2)	Go with a theme which is not so popular in the neighborhood which is risky. But if successful, he/she will face less competition.


## Limitations and Suggestions for Future Research

In this project, we only consider one factor i.e. frequency of occurrence of restaurants, there are other factors such as population and income of residents that could influence the location decision of a new restaurant. However, to the best knowledge of this researcher such data are not available to the neighborhood level required by this project. Future research could devise a methodology to estimate such data to be used in the clustering algorithm to determine the preferred locations to open a new restaurant. Several factors such as restaurant ratings, price range etc, would provide helpful to determine the best type of restaurant in each neighborhood and can be added in the future when relevant data will be available. <br>
In addition, this project made use of the free Sandbox Tier Account of Foursquare API that came with limitations as to the number of API calls and results returned. Future research could make use of paid account to bypass these limitations and obtain more results.


## Conclusion 

In this project, we have gone through the process of identifying the business problem, specifying the data required, extracting and preparing the data, performing machine learning by clustering the data into 5 clusters based on their similarities, and lastly providing recommendations to the relevant stakeholders i.e. new restaurant owners and investors regarding the best theme for a new restaurant in each neighborhood in Bangalore. <br>
To answer the business question that was raised in the introduction section, the answer proposed by this project is: <br>
**The most popular type of restaurant in Bangalore is Indian Restaurants. Opening a new Indian Restaurant or investing in one would prove beneficial**
