# Capstone Project - The Battle of Neighborhoods
### Applied Data Science Capstone by IBM/Coursera

## Table of Contents

* [Introduction](#introduction)
* [Problem Definiton](#problemdefinition)
* [Data](#data)
* [Methology](#methodology)
* [Data Cleaning](#datacleaning)
* [Analysis](#analysis)
* [Results](#result)
* [Conclusions](#conclusion)

### Introduction <a name="introduction"></a>

Bangalore (officially known as Bengaluru) is the capital city and the largest city of the Indian State of Karnataka. It has a population of more than 8 million and a metropolitan population of around 11 million, making it the third most populous city and fifth most populous urban agglomeration in India. Located in the Southern India on the Deccan Plateau, at a height of over 900m (3000ft) above sea level, Bangalore is known for its pleasant climate throughout the year, its elevation is the highest among the major cities of India.

The diversity of the cuisine available is reflective of the social and economic diversity of Bengaluru. South Indian, North Indian, Arabic food, Italian, American, Chinese are all very popular in the city. Our focus will be on Italian restaurants and to predict a suitable location for an Italian Restaurant.

From being an unfamiliar food segment in Indian industry, Italian food has gone through many things which finally resulted in marking their presence in the market. Earlier people were just familiar about the native flavours and tastes to which their taste bud was adapted. But with globalization in the food industry, things have changed eventually for this industry.
Earlier chefs and recipes were imported from Italy as there was no one who could bring out that very taste in serving the people of our country. It eventually took a lot of time for our people in adapting to this taste and flavour which caused many outlets and restaurants to close down. But now as the trend has changed, Italian food products are one popular segment towards which people are really inclined and Italian has become the 2nd favourite International cuisine in the country.

### Problem Definition <a name="problemdefinition"></a>

In this project, we will try to visualize all major parts of the Bangalore City and try to predict an optimal location for an Italian restaurant. Since there are lots of restaurants in Bangalore we try to detect locations that are not already crowded with restaurants in vicinity. We would also prefer locations as close to city center as possible. 

Specifically, this report will target stakeholders who wants to set up an Italian restaurant in Bangalore, Karnataka, India.
We will use our data science powers to generate a few most promising neighbourhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.


### Data <a name="data" ></a>

For the analysis we need the following data:

Bangalore Restaurants data that contains the details of locality, restaurant name, ratings along with their location details.

**Data Source:** ***ZOMATO_KAGGLE_DATASET***

**Data description:** This data contains all the necessary information for our analysis. It consists of restaurants data from different countries including India, Sri Lanka, Brazil, Indonesia, New Zealand and USA. Each country has its own unique “Country Code”. For example “Country Code for India is 1”. Each country consists data of its cities popular restaurants. So for India there are total of 8652 restaurants from various cities of the country. Since our focus is on Bangalore city, we need to extract data particular to Bangalore city.

The above data consists of following features Restaurant ID, Restaurant Name, Country Code, City, Address, Locality, Locality Verbose, Longitude, Latitude, Cuisines, Average cost for two, Currency, Has table Booking, Has online delivery, Is delivering now, Switch to order menu, Price range, Aggregate rating, Rating colour, Rating text, votes. 
Among these features we need to select a suitable features for our analysis. Our feature selection is limited to Restaurant Name, Locality, Longitude, Latitude, Cuisines, Aggregate Rating, Rating Text, Votes.

**To get nearby places in each locality of Bangalore city:**

**Data Source:** ***FOURSQUARE.API***

**Data description:**  This API allows us to get information about all the venues in the neighbourhood of Bangalore City. For example to get the top venues near the neighbourhood of Bangalore like Kormangala, we can use the Foursquare Credentials to access the data.






### Methodology <a name="methodology" ></a>

We can use the Foursquare API to get venues in the local neighborhood of Bangalore City. The venues data consits of information related to restaurant name, geographical location and restaurant type. Using this data we can analyse the number of Italian restaurants present in each locality and find the most suitable region for opening a restaurant.

In [2]:
# firt let's import necessary modules

import pandas as pd
import numpy as np

import requests  # library to handle requests
from pandas import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

#! pip install geocoder
from geopy import Nominatim # To get the geographical location of given address

In [3]:
# Let's load the data into the dataframe
bang_data = pd.read_csv("Bangaloredata.csv")
bang_data.head()

Unnamed: 0,url,address,name,online_order,book_table,rate,votes,phone,location,rest_type,dish_liked,cuisines,approx_cost(for two people),reviews_list,menu_item,listed_in(type),listed_in(city)
0,https://www.zomato.com/bangalore/jalsa-banasha...,"942, 21st Main Road, 2nd Stage, Banashankari, ...",Jalsa,Yes,Yes,4.1/5,775,080 42297555\r\n+91 9743772233,Banashankari,Casual Dining,"Pasta, Lunch Buffet, Masala Papad, Paneer Laja...","North Indian, Mughlai, Chinese",800,"[('Rated 4.0', 'RATED\n A beautiful place to ...",[],Buffet,Banashankari
1,https://www.zomato.com/bangalore/spice-elephan...,"2nd Floor, 80 Feet Road, Near Big Bazaar, 6th ...",Spice Elephant,Yes,No,4.1/5,787,080 41714161,Banashankari,Casual Dining,"Momos, Lunch Buffet, Chocolate Nirvana, Thai G...","Chinese, North Indian, Thai",800,"[('Rated 4.0', 'RATED\n Had been here for din...",[],Buffet,Banashankari
2,https://www.zomato.com/SanchurroBangalore?cont...,"1112, Next to KIMS Medical College, 17th Cross...",San Churro Cafe,Yes,No,3.8/5,918,+91 9663487993,Banashankari,"Cafe, Casual Dining","Churros, Cannelloni, Minestrone Soup, Hot Choc...","Cafe, Mexican, Italian",800,"[('Rated 3.0', ""RATED\n Ambience is not that ...",[],Buffet,Banashankari
3,https://www.zomato.com/bangalore/addhuri-udupi...,"1st Floor, Annakuteera, 3rd Stage, Banashankar...",Addhuri Udupi Bhojana,No,No,3.7/5,88,+91 9620009302,Banashankari,Quick Bites,Masala Dosa,"South Indian, North Indian",300,"[('Rated 4.0', ""RATED\n Great food and proper...",[],Buffet,Banashankari
4,https://www.zomato.com/bangalore/grand-village...,"10, 3rd Floor, Lakshmi Associates, Gandhi Baza...",Grand Village,No,No,3.8/5,166,+91 8026612447\r\n+91 9901210005,Basavanagudi,Casual Dining,"Panipuri, Gol Gappe","North Indian, Rajasthani",600,"[('Rated 4.0', 'RATED\n Very good restaurant ...",[],Buffet,Banashankari


In [4]:
# Let's determie the size of the dataframe

bang_data.shape

(51717, 17)

### Data Cleaning <a name="datacleaning"></a>

In [5]:
# Let's filter the columns for our analysis

filtered_col = ["name", "location", "address", "rest_type", "cuisines", "rate", "votes"]
bang_data = bang_data[filtered_col]
bang_data.head()

Unnamed: 0,name,location,address,rest_type,cuisines,rate,votes
0,Jalsa,Banashankari,"942, 21st Main Road, 2nd Stage, Banashankari, ...",Casual Dining,"North Indian, Mughlai, Chinese",4.1/5,775
1,Spice Elephant,Banashankari,"2nd Floor, 80 Feet Road, Near Big Bazaar, 6th ...",Casual Dining,"Chinese, North Indian, Thai",4.1/5,787
2,San Churro Cafe,Banashankari,"1112, Next to KIMS Medical College, 17th Cross...","Cafe, Casual Dining","Cafe, Mexican, Italian",3.8/5,918
3,Addhuri Udupi Bhojana,Banashankari,"1st Floor, Annakuteera, 3rd Stage, Banashankar...",Quick Bites,"South Indian, North Indian",3.7/5,88
4,Grand Village,Basavanagudi,"10, 3rd Floor, Lakshmi Associates, Gandhi Baza...",Casual Dining,"North Indian, Rajasthani",3.8/5,166


In [6]:
# Let's check for the missing data

bang_data.isnull().sum()

name            0
location       21
address         0
rest_type     227
cuisines       45
rate         7775
votes           0
dtype: int64

In [7]:
# Since 21 locations are unknown we will drop those data from our dataframe

bang_data = bang_data.dropna()
bang_data.shape

(43780, 7)

In [8]:
# Create a dataframe for our neighborhoods data
neighborhood = bang_data["location"].unique()
neighborhoods = pd.DataFrame(neighborhood, columns=["Neighborhood"])

# Let's rename a neighborhood named Rammurthy Nagar, as it may cause an error in our analysis
neighborhoods.replace("Rammurthy Nagar", "Ramamurthy Nagar", inplace=True)
neighborhoods.head()

Unnamed: 0,Neighborhood
0,Banashankari
1,Basavanagudi
2,Mysore Road
3,Jayanagar
4,Kumaraswamy Layout


In [9]:
#Let's print out the unique neighorhoods that are available for analysis

neighborhoods["Neighborhood"].unique()

array(['Banashankari', 'Basavanagudi', 'Mysore Road', 'Jayanagar',
       'Kumaraswamy Layout', 'Rajarajeshwari Nagar', 'Vijay Nagar',
       'Uttarahalli', 'JP Nagar', 'South Bangalore', 'City Market',
       'Bannerghatta Road', 'BTM', 'Kanakapura Road', 'Bommanahalli',
       'CV Raman Nagar', 'Electronic City', 'Wilson Garden',
       'Shanti Nagar', 'Koramangala 5th Block', 'Richmond Road', 'HSR',
       'Marathahalli', 'Koramangala 7th Block', 'Bellandur',
       'Sarjapur Road', 'Whitefield', 'East Bangalore',
       'Old Airport Road', 'Indiranagar', 'Koramangala 1st Block',
       'Frazer Town', 'MG Road', 'Brigade Road', 'Lavelle Road',
       'Church Street', 'Ulsoor', 'Residency Road', 'Shivajinagar',
       'Infantry Road', 'St. Marks Road', 'Cunningham Road',
       'Race Course Road', 'Commercial Street', 'Vasanth Nagar', 'Domlur',
       'Koramangala 8th Block', 'Ejipura', 'Jeevan Bhima Nagar',
       'Old Madras Road', 'Seshadripuram', 'Kammanahalli',
       'Koramanga

This is the data frame created after scraping the data. We need to get the geographical coordinates in the form of latitude and longitude in order to be able to use Foursquare API. To do so, we will use the Geocoder package that will allow us to convert the address into geographical coordinates in the form of latitude and longitude.

In [10]:
# Defining a function to get coordinates

def get_latlng(neighborhood):
    address = "{}, Bangalore, Karnataka, India".format(neighborhood)
    geolocator = Nominatim(user_agent="my_explorer")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    
    return (latitude, longitude)

In [11]:
coords = [get_latlng(neighborhood) for neighborhood in neighborhoods["Neighborhood"].to_list()]
coords

[(12.9152186, 77.5736205),
 (12.9417261, 77.5755021),
 (12.9486566, 77.5357025),
 (12.9292731, 77.5824229),
 (12.9081487, 77.5553179),
 (12.9274413, 77.5155224),
 (12.96601575, 77.61252417144667),
 (12.9055682, 77.5455438),
 (12.91226375, 77.59045672324466),
 (13.0646907, 77.49626895712257),
 (12.965717999999999, 77.5762705372058),
 (12.887979, 77.5970812),
 (12.911275849999999, 77.60456543431182),
 (12.9177484, 77.5735837),
 (12.9089453, 77.6239038),
 (12.993612899999999, 77.66361850290086),
 (12.9791198, 77.5912997),
 (12.9489339, 77.5968273),
 (12.9575547, 77.5979099),
 (12.9438501, 77.6189683),
 (12.9657739, 77.6038697),
 (12.9116225, 77.6388622),
 (12.9552572, 77.6984163),
 (12.9311839, 77.6234063),
 (12.93577245, 77.66676103753434),
 (12.9242381, 77.6289059),
 (12.9696365, 77.7497448),
 (13.0215466, 77.7640586),
 (12.9606699, 77.6422283),
 (12.9732913, 77.6404672),
 (12.9355425, 77.6128717),
 (12.996845, 77.6130165),
 (12.9741854, 77.6124135),
 (12.9750605, 77.6080323),
 (12.9750

We have obtained the latitude and longitude coordinates for all the places, so we need to merge the coordinates into the original data frame.

In [47]:
# Create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=["Latitude", "Longitude"])

# Merge the coordinates into the original dataframe
neighborhoods["Latitude"] = df_coords["Latitude"]
neighborhoods["Longitude"] = df_coords["Longitude"]
neighborhoods

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Banashankari,12.915219,77.573621
1,Basavanagudi,12.941726,77.575502
2,Mysore Road,12.948657,77.535702
3,Jayanagar,12.929273,77.582423
4,Kumaraswamy Layout,12.908149,77.555318
...,...,...,...
87,Sahakara Nagar,13.062147,77.580061
88,Jalahalli,13.046453,77.548380
89,Nagarbhavi,12.954674,77.512172
90,Peenya,13.032942,77.527325


In [15]:
# Get the coordinates of Bangalore City

address = "Bangalore, Karnataka, India"
geolocator = Nominatim(user_agent="my_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Bangalore, Karnataka, India {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Bangalore, Karnataka, India 12.9791198, 77.5912997.


After gathering the data, we have to populate the data into a pandas DataFrame and then visualize the neighbourhoods in a map using Folium package.

In [16]:
bang_map = folium.Map(location=[latitude, longitude], zoom_start=10)

# Adding markers to map

for lat, long, neigborhood in zip(neighborhoods["Latitude"], neighborhoods["Longitude"], neighborhoods["Neighborhood"]):
    label = "{}".format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
                [lat, long],
                radius = 5,
                popup = label,
                color = 'blue',
                fill = True,
                fill_color = "#3186cc",
                fill_opacity = 0.7,
                parse_html=False).add_to(bang_map)

bang_map

                                            Bangalore City containing all the Neighborhoods

### Use the foursquare API to explore the neighbourhoods

In [17]:
CLIENT_ID = '5GVRQXNNTEIRE4VU4TXVFXGOKVJ5FBGNRSXURDJW31I3HVOG' # your Foursquare ID
CLIENT_SECRET = '30XEEC4HVZXOVFOIWSU0SZYESKZUZRVUFOFZWDMW45TTW5EY' # your Foursquare Secret
VERSION = '20180604'
radius = 2000
LIMIT = 100
venues = []

for lat, long, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Neighborhood']):
# Create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(CLIENT_ID,CLIENT_SECRET,VERSION,lat,long,radius,LIMIT)
# Make the GET request
    results = requests.get(url).json()["response"]["groups"][0]["items"]
# Return only relevant information for each nearby venue
    for venue in results:
        venues.append((neighborhood,lat,long,venue['venue']['name'],
        venue['venue']['location']['lat'],venue['venue']['location']['lng'],venue['venue']['categories'][0]['name']))

After extracting all the venues, we have to convert the venues list into a new DataFrame.

In [48]:
venue_df = pd.DataFrame(venues)

#Define the column names
venue_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'Venue Name', 'Venue Latitude', 'Venue Longitude', 'Venue Category']
print(venue_df.shape)
venue_df

(6467, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,Venue Name,Venue Latitude,Venue Longitude,Venue Category
0,Banashankari,12.915219,77.573621,Shivaji Military Hotel,12.917919,77.573925,Indian Restaurant
1,Banashankari,12.915219,77.573621,Corner House,12.922647,77.573560,Ice Cream Shop
2,Banashankari,12.915219,77.573621,Stoned Monkey,12.923579,77.569689,Ice Cream Shop
3,Banashankari,12.915219,77.573621,Natural Ice Cream,12.923863,77.576513,Ice Cream Shop
4,Banashankari,12.915219,77.573621,Davanagere benne dosa,12.908932,77.572983,Breakfast Spot
...,...,...,...,...,...,...,...
6462,KR Puram,13.007516,77.695935,Tandoor Box,12.992469,77.702759,BBQ Joint
6463,KR Puram,13.007516,77.695935,ABB Cafeteria,12.993985,77.705566,Cafeteria
6464,KR Puram,13.007516,77.695935,City Kitchen,13.023871,77.690793,Indian Restaurant
6465,KR Puram,13.007516,77.695935,Namdhari Fresh,12.991481,77.702810,Grocery Store


In [19]:
# Lets check how many venues were returned for each neighbourhood
venue_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,Venue Name,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
BTM,87,87,87,87,87,87
Banashankari,100,100,100,100,100,100
Banaswadi,77,77,77,77,77,77
Bannerghatta Road,89,89,89,89,89,89
Basavanagudi,100,100,100,100,100,100
...,...,...,...,...,...,...
West Bangalore,65,65,65,65,65,65
Whitefield,53,53,53,53,53,53
Wilson Garden,58,58,58,58,58,58
Yelahanka,19,19,19,19,19,19


In [20]:
# Lets check out how many unique categories can be curated from all the returned values
print('There are {} unique categories.'.format(len(venue_df['Venue Category'].unique())))

# Displaying the Venue Category names
venue_df['Venue Category'].unique()

There are 215 unique categories.


array(['Indian Restaurant', 'Ice Cream Shop', 'Breakfast Spot',
       'South Indian Restaurant', 'Pizza Place', 'Café',
       'Performing Arts Venue', 'Snack Place', 'Fast Food Restaurant',
       'Gym / Fitness Center', 'Salad Place', 'Coffee Shop',
       'Seafood Restaurant', 'Brewery', "Women's Store",
       'Italian Restaurant', 'Burger Joint', 'Restaurant', 'Bookstore',
       'Sandwich Place', 'Indian Chinese Restaurant',
       'Mexican Restaurant', 'Hyderabadi Restaurant',
       'Chinese Restaurant', 'Spa', 'Hookah Bar', 'Bakery', 'Pub',
       'Bengali Restaurant', 'Gastropub', 'Hotel', 'Jewelry Store',
       'Pharmacy', 'Boarding House', 'Middle Eastern Restaurant', 'Diner',
       'Shopping Plaza', 'Dessert Shop', 'Food Truck', 'Bistro',
       'Juice Bar', 'Farmers Market', 'Shop & Service', 'Boutique',
       'Botanical Garden', 'Park', 'Gym', 'Szechuan Restaurant', 'Lounge',
       'Asian Restaurant', 'Department Store', 'Andhra Restaurant',
       'Movie Theater', 

In [21]:
# One hot encoding
bang_onehot = pd.get_dummies(venue_df["Venue Category"], prefix="", prefix_sep="")

# Adding neighborhood column back to dataframe
bang_onehot["Neighborhood"] = venue_df["Neighborhood"]

# Moving neighbourhood column to the first column
fixed_columns = [bang_onehot.columns[-1]] + list(bang_onehot.columns[:-1])
bang_onehot = bang_onehot[fixed_columns]

print(bang_onehot.shape)

(6467, 215)


In [22]:
# Next, let’s group rows of neighbourhood by taking the sum of the frequency of occurrence of each category.
bang_grouped = bang_onehot.groupby(["Neighborhood"]).sum().reset_index()
print(bang_grouped.shape)
bang_grouped.columns

(91, 215)


Index(['Neighborhood', 'Yoga Studio', 'Accessories Store', 'Afghan Restaurant',
       'Airport Terminal', 'American Restaurant', 'Andhra Restaurant',
       'Arcade', 'Art Gallery', 'Art Museum',
       ...
       'Toy / Game Store', 'Track Stadium', 'Trail', 'Train Station',
       'Udupi Restaurant', 'Vegetarian / Vegan Restaurant',
       'Vietnamese Restaurant', 'Wine Bar', 'Wine Shop', 'Women's Store'],
      dtype='object', length=215)

In [23]:
# Let's check the number of Italian Restaurants we got
len((bang_grouped[bang_grouped["Italian Restaurant"] > 0]))

63

There are 63 Italian Restaurants in Bangalore. So now we have to select a suitable location where the number of Italian Restaurants is less so that our chances of setting up a restaurant at that location should be good.

In [24]:
#Let's create a dataframe for Italian restuarant only

Italian_rest = bang_grouped[["Neighborhood", "Italian Restaurant"]]
Italian_rest.head()

Unnamed: 0,Neighborhood,Italian Restaurant
0,BTM,2
1,Banashankari,3
2,Banaswadi,2
3,Bannerghatta Road,2
4,Basavanagudi,2


### Analysis <a name="analysis" ></a>

Now we need to cluster all the neighbourhoods into different clusters. The results will allow us to identify which neighbourhoods have a higher concentration of Italian Restaurants while which neighbourhoods have a fewer number of Italian Restaurants. Based on the occurrence of Italian Restaurants in different neighbourhoods, it will help us answer the question as to which neighbourhoods are most suitable to open new Italian Restaurants.

We set the number of clusters to 4 and run the algorithm. After applying the K-Means clustering algorithm, all the neighbourhoods get segregated and form different clusters.

In [25]:
#Clustering

# Setting the number of clusters
kclusters = 4
Italian_clustering = Italian_rest.drop(["Neighborhood"], 1)
kmeans = KMeans(n_clusters=kclusters,random_state=0).fit(Italian_clustering)

In [26]:
# Checking cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 3, 0, 0, 0, 2, 1, 1, 0, 1])

In [27]:
# Creating a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

Italian_df = Italian_rest.copy()

# Add the clustering labels to the new dataframe

Italian_df["Cluster Labels"] = kmeans.labels_

Italian_df

Unnamed: 0,Neighborhood,Italian Restaurant,Cluster Labels
0,BTM,2,0
1,Banashankari,3,3
2,Banaswadi,2,0
3,Bannerghatta Road,2,0
4,Basavanagudi,2,0
...,...,...,...
86,West Bangalore,0,2
87,Whitefield,1,1
88,Wilson Garden,0,2
89,Yelahanka,0,2


Here the Shopping Mall column represents the number of Italian Restaurants in that particular area and Cluster Labels represents the cluster number (either 0, 1, 2 or 3)

In [28]:
# Adding latitude and longitude values to dataframe
result_df = pd.merge(Italian_df, neighborhoods, on="Neighborhood")
print(result_df.shape)
result_df.head()

(91, 5)


Unnamed: 0,Neighborhood,Italian Restaurant,Cluster Labels,Latitude,Longitude
0,BTM,2,0,12.911276,77.604565
1,Banashankari,3,3,12.915219,77.573621
2,Banaswadi,2,0,13.014162,77.651854
3,Bannerghatta Road,2,0,12.887979,77.597081
4,Basavanagudi,2,0,12.941726,77.575502


In [43]:
# Let's sort the dataframe with respect to cluster labels

result_df.sort_values(["Cluster Labels"], inplace=True)
result_df.reset_index()

Unnamed: 0,index,Neighborhood,Italian Restaurant,Cluster Labels,Latitude,Longitude
0,0,BTM,2,0,12.911276,77.604565
1,37,Kanakapura Road,2,0,12.917748,77.573584
2,8,Brigade Road,2,0,12.975060,77.608032
3,69,Richmond Road,2,0,12.965774,77.603870
4,15,Cunningham Road,2,0,12.987112,77.594877
...,...,...,...,...,...,...
86,58,New BEL Road,3,3,13.034638,77.568173
87,73,Sankey Road,3,3,13.009888,77.574004
88,76,Shanti Nagar,3,3,12.957555,77.597910
89,1,Banashankari,3,3,12.915219,77.573621


In [30]:
# Creating the map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# Setting colour scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# Add markers to the map
markers_colors = []

for lat, long, neighborhood, cluster in zip(result_df['Latitude'], result_df['Longitude'], result_df['Neighborhood'], result_df['Cluster Labels']):
    label = folium.Popup(str(neighborhood) + " - cluster" + str(cluster), parse_html=True)
    
    folium.CircleMarker(
                [lat, long],
                popup = label,
                color = rainbow[cluster-1],
                fill = True,
                fill_color = rainbow[cluster-1],
                fill_opcaity = 0.7,
                parse_html=False).add_to(map_clusters)
    
map_clusters

Examining the clusters

In [31]:
# Create a dataframe for each individual clusters
cluster_1 = result_df[result_df['Cluster Labels'] == 0].reset_index()
cluster_2 = result_df[result_df['Cluster Labels'] == 1].reset_index()
cluster_3 = result_df[result_df['Cluster Labels'] == 2].reset_index()
cluster_4 = result_df[result_df['Cluster Labels'] == 3].reset_index()

In [32]:
# Let's check the number of places for each location
ncluster_1 = len(result_df.loc[result_df['Cluster Labels'] == 0])
ncluster_2 = len(result_df.loc[result_df['Cluster Labels'] == 1])
ncluster_3 = len(result_df.loc[result_df['Cluster Labels'] == 2])
ncluster_4 = len(result_df.loc[result_df['Cluster Labels'] == 3])

In [33]:
# Let's print the Neighborhood along with the number of Italian Restaurants in CLuster 1
print(cluster_1["Neighborhood"].unique())
print()
print("There are {} places in cluster 0 with 2 Italian Restaurants".format(ncluster_1))

['BTM' 'Infantry Road' 'Jayanagar' 'KR Puram' 'Kammanahalli'
 'Thippasandra' 'Koramangala 5th Block' 'Koramangala 6th Block' 'Hennur'
 'Koramangala 8th Block' 'Lavelle Road' 'MG Road' 'St. Marks Road'
 'Old Airport Road' 'Old Madras Road' 'Race Course Road' 'Richmond Road'
 'Langford Town' 'Electronic City' 'Kanakapura Road' 'Commercial Street'
 'Church Street' 'Brigade Road' 'Banaswadi' 'Domlur' 'Bannerghatta Road'
 'Cunningham Road' 'Basavanagudi' 'Ejipura']

There are 29 places in cluster 0 with 2 Italian Restaurants


In [44]:
# Let's print the Neighborhood along with the number of Italian Restaurants in CLuster 2

print(cluster_2["Neighborhood"].unique())
print()
print("There are {} places in cluster 1 with 1 Italian Restaurants".format(ncluster_2))

['CV Raman Nagar' 'Whitefield' 'Koramangala 1st Block' 'Bommanahalli'
 'Bellandur' 'Kalyan Nagar' 'Rajajinagar' 'Jalahalli' 'JP Nagar'
 'Vasanth Nagar' 'ITPL Main Road, Whitefield' 'Sadashiv Nagar'
 'Sahakara Nagar' 'Shivajinagar' 'HBR Layout' 'Majestic' 'Brookefield']

There are 17 places in cluster 1 with 1 Italian Restaurants


In [45]:
# Let's print the Neighborhood along with the number of Italian Restaurants in CLuster 3

print(cluster_3["Neighborhood"].unique())
print()
print("There are {} places in cluster 2 with 0 Italian Restaurants".format(ncluster_3))

['North Bangalore' 'Frazer Town' 'RT Nagar' 'South Bangalore'
 'Rajarajeshwari Nagar' 'Ramamurthy Nagar' 'Residency Road'
 'Wilson Garden' 'Sanjay Nagar' 'Peenya' 'Nagawara' 'Kumaraswamy Layout'
 'Mysore Road' 'West Bangalore' 'Hebbal' 'East Bangalore'
 'Varthur Main Road, Whitefield' 'Kaggadasapura' 'City Market'
 'Nagarbhavi' 'Seshadripuram' 'Uttarahalli' 'Ulsoor' 'Yelahanka'
 'Magadi Road' 'Basaveshwara Nagar' 'Marathahalli' 'Kengeri']

There are 28 places in cluster 2 with 0 Italian Restaurants


In [46]:
# Let's print the Neighborhood along with the number of Italian Restaurants in CLuster 4

print(cluster_4["Neighborhood"].unique())
print()
print("There are {} places in cluster 3 or more Italian Restaurants".format(ncluster_4))

['Vijay Nagar' 'Shanti Nagar' 'Koramangala 7th Block' 'Sankey Road'
 'New BEL Road' 'Malleshwaram' 'Koramangala 4th Block'
 'Koramangala 3rd Block' 'Koramangala 2nd Block' 'Jeevan Bhima Nagar'
 'Indiranagar' 'Hosur Road' 'HSR' 'Central Bangalore' 'Banashankari'
 'Sarjapur Road' 'Yeshwantpur']

There are 17 places in cluster 3 or more Italian Restaurants


### Results <a name="result" ></a>

The results from the K-means clustering show that we can categorize the neighbourhoods into 4 clusters based on the frequency of occurrence for “Italian Restaurant”:
* Cluster 0: Neighbourhoods with 2 Italian Restaurants
* Cluster 1: Neighbourhoods with 1 Italian Restaurants
* Cluster 2: Neighbourhoods with 0 or more Italian Restaurants
* CLuster 3: Neighbourhoods with 3 or more Italian Restaurants

We visualize the results of the clustering on the map with cluster 0 with red colour, cluster 1 with purple colour, cluster 2 with mint green colour, cluster 4 with yellow colour

### Conclusions <a name="conclusion" ></a>

Italian Restaurants are spread out throughout the city as we can visualize from the map. The moderate number of restaurants are concentrated at the central part of the city. As we can see the neighborhoods present in the cluster-0, which consists of 2 Italian Restaurants are located at the central part of the city. The neighborhoods which consists of maximum number of Italian restuarants are present in cluster-3, where the competition is very high.The neighborhoods from the cluster-1 consisting of exactly 1 restaurant with moderate competiton. Also there are neighborhoods with zero Italian Restaurants, as we can see from cluster-2, also some places from this cluster are at the central part of the city.

Therefore, this project recommends restaurants owners to capitalize on these findings to open Italian Restaurants in neighborhoods in cluster 2 where there is no competition. Restaurants owner's who can really stand out and provide good qualiity Italian foods can also consider neighborhoods in cluster-1 as there is moderate competion.

So we can apply the same approach for large datasets and can easily distinguish the venues based on the category. Suppose if there are 400 Chienese Restaurants in a city then we can easily segregate them into different clusters. We can apply this method not only for restaurants but shopping malls, movie theatre and much more. In this project, we only consider one factor i.e. frequency of occurrence of Italian Restaurant, there are other factors such as population and income of residents that could influence the location decision of a new Italian Restaurants.

But for setting up a restaurant we need to consider other factors such as the cost of rent, the surroundings around the resraurants, the kind of people in the locality, if it's a luxurious area many people prefer going out, their lifestyle will be different from others and therefore spend a lot. If we decide a place where the competition is less, then we need to consider the people living in that locality as well. If the people in that area spend a lot and love going out for food, then it’ll be a success. If the people staying near the restaurants don't prefer going out, then it's better to consider some other place with less competition and a good crowd.