# Opening a New Restaurant in Indore, India
## Week 5 Final Report
* Build a dataframe of neighborhoods in Indore, India by web scraping the data from Wikipedia page
* Get the geographical coordinates of the neighborhoods
* Obtain the venue data for the neighborhoods from Foursquare API
* Explore and cluster the neighborhoods
* Select the best cluster to open a new Restaurant


## **1. Importing Libraries**

In [1]:
import pandas as pd
import numpy as np
import json

from geopy.geocoders import Nominatim
import geocoder

import requests
from bs4 import BeautifulSoup

from pandas.io.json import json_normalize

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

import folium
print("Libraries imported.")

Libraries imported.


## **2. Scrap data from Wikipedia page into a DataFrame**

In [2]:
data=requests.get('https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Indore').text

In [3]:
soup = BeautifulSoup(data, 'html.parser')

In [4]:
neighborhoodList = []
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)

In [5]:
idf = pd.DataFrame({"Neighborhood": neighborhoodList})

idf

Unnamed: 0,Neighborhood
0,"Azad Nagar, Indore"
1,Bhanwar Kua
2,Bijalpur
3,"Chandan Nagar, Indore"
4,"Chandra Nagar, Indore"
5,Dewas Naka
6,Geeta Bhawan
7,Indore G.P.O.
8,Khandwa Naka
9,L.I.G. Colony


## **3. Get the geographical coordinates**

In [6]:
#import geocoder
def get_latlng(neighborhood):
    coords = None
    while(coords is None):
        g = geocoder.arcgis('{}, Indore, India'.format(neighborhood))
        coords = g.latlng
    return coords
    



In [7]:
coords = [ get_latlng(neighborhood) for neighborhood in idf["Neighborhood"].tolist() ]

In [8]:
coords

[[22.69658000000004, 75.88767000000007],
 [22.687640000000044, 75.86291000000006],
 [22.66324000000003, 75.84332000000006],
 [22.709380000000067, 75.82394000000005],
 [22.74147000000005, 75.89447000000007],
 [22.771764333333323, 75.90165866666665],
 [22.71822000000003, 75.88766000000004],
 [24.977950000000078, 77.85809000000006],
 [22.689258247414326, 75.87025921921465],
 [22.734780000000057, 75.88485000000003],
 [22.741695000000004, 75.89988900000002],
 [22.70479000000006, 75.87657000000007],
 [22.727690000000052, 75.88578000000007],
 [22.67416000000003, 75.84224000000006],
 [22.734780000000057, 75.88485000000003],
 [22.755351114631523, 75.89687741798681],
 [22.76022000000006, 75.87815000000006],
 [22.756410000000074, 75.89247000000006]]

In [9]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])


In [10]:
idf['Latitude'] = df_coords['Latitude']
idf['Longitude'] = df_coords['Longitude']

In [11]:
print(idf.shape)
idf

(18, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,"Azad Nagar, Indore",22.69658,75.88767
1,Bhanwar Kua,22.68764,75.86291
2,Bijalpur,22.66324,75.84332
3,"Chandan Nagar, Indore",22.70938,75.82394
4,"Chandra Nagar, Indore",22.74147,75.89447
5,Dewas Naka,22.771764,75.901659
6,Geeta Bhawan,22.71822,75.88766
7,Indore G.P.O.,24.97795,77.85809
8,Khandwa Naka,22.689258,75.870259
9,L.I.G. Colony,22.73478,75.88485


In [12]:
# save the DataFrame as CSV file
idf.to_csv("idf.csv", index=False)

## **4. Map of Indore with neighborhoods superimposed on top**

In [13]:
#indore map
geolocator = Nominatim(user_agent="i_explorer")
city ="Indore"
country ="India"
loc = geolocator.geocode(city+','+ country)
latitude = loc.latitude
longitude = loc.longitude
print("latitude is :-" ,latitude,"\nlongtitude is:-" ,longitude)
map_indore = folium.Map([latitude, longitude], zoom_start=10)
map_indore

latitude is :- 22.7203616 
longtitude is:- 75.8681996


In [14]:
#adding markers to map
for lat, lng, neighborhood in zip(idf['Latitude'], idf['Longitude'], idf['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_indore)  
    
map_indore

In [15]:
#save map as html file
map_indore.save('map_indore.html')

## **5. Use the Foursquare API to explore the neighborhoods**

In [16]:
CLIENT_ID = '' 
CLIENT_SECRET = ''
VERSION = '20180605'

In [19]:
radius = 1000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(idf['Latitude'], idf['Longitude'], idf['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [20]:
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VLatitude', 'VLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df

(200, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VLatitude,VLongitude,VenueCategory
0,"Azad Nagar, Indore",22.69658,75.88767,Indore Residency Club,22.693080,75.882831,Pool
1,"Azad Nagar, Indore",22.69658,75.88767,Stitched Poetry,22.689815,75.884521,Clothing Store
2,"Azad Nagar, Indore",22.69658,75.88767,Residency Club,22.703288,75.882173,Garden Center
3,Bhanwar Kua,22.68764,75.86291,Cafe Coffee Day,22.690024,75.861290,Coffee Shop
4,Bhanwar Kua,22.68764,75.86291,Titto's,22.691448,75.864910,Lounge
...,...,...,...,...,...,...,...
195,"Vijay Nagar, Indore",22.75641,75.89247,"Cafe Chokolade, Vijay Nagar",22.754156,75.900630,Café
196,"Vijay Nagar, Indore",22.75641,75.89247,Ambrosia @ Hotel Fortune Landmark,22.753143,75.883591,Restaurant
197,"Vijay Nagar, Indore",22.75641,75.89247,Country Inn By Carlson-Indore,22.747788,75.894629,Hotel
198,"Vijay Nagar, Indore",22.75641,75.89247,indore airport,22.757456,75.882893,Airport


In [21]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VLatitude,VLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Azad Nagar, Indore",3,3,3,3,3,3
Bhanwar Kua,6,6,6,6,6,6
"Chandan Nagar, Indore",6,6,6,6,6,6
"Chandra Nagar, Indore",19,19,19,19,19,19
Dewas Naka,4,4,4,4,4,4
Geeta Bhawan,22,22,22,22,22,22
Khandwa Naka,4,4,4,4,4,4
L.I.G. Colony,8,8,8,8,8,8
"Malviya Nagar, Indore",22,22,22,22,22,22
Navlakha,6,6,6,6,6,6


In [22]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 45 uniques categories.


In [23]:
 venues_df['VenueCategory'].unique()

array(['Pool', 'Clothing Store', 'Garden Center', 'Coffee Shop', 'Lounge',
       'Tea Room', 'Chinese Restaurant', 'Restaurant', 'Café',
       'Pizza Place', 'Motel', 'Fast Food Restaurant', "Women's Store",
       'Bus Station', 'Health & Beauty Service', 'Hookah Bar',
       'Sandwich Place', 'Shopping Mall', 'Pub', 'Indian Restaurant',
       'Multiplex', 'Hotel', 'Bakery', 'Asian Restaurant',
       'Furniture / Home Store', 'Convenience Store', 'Plaza',
       'Movie Theater', 'Hot Dog Joint', 'Snack Place',
       'Italian Restaurant', 'Arts & Entertainment', 'Cricket Ground',
       'Ice Cream Shop', 'Dessert Shop', 'Market', 'Electronics Store',
       'Stadium', 'Burger Joint', 'Punjabi Restaurant', 'Gym',
       'Mediterranean Restaurant', 'BBQ Joint', 'Airport',
       'Grocery Store'], dtype=object)

In [24]:
#"Neighborhood" in venues_df['VenueCategory'].unique()

## **6. Analyze Each Neighborhood**

In [25]:
ionehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
ionehot['Neighborhood'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [ionehot.columns[-1]] + list(ionehot.columns[:-1])
ionehot = ionehot[fixed_columns]

print(ionehot.shape)
ionehot.head()

(200, 46)


Unnamed: 0,Neighborhood,Airport,Arts & Entertainment,Asian Restaurant,BBQ Joint,Bakery,Burger Joint,Bus Station,Café,Chinese Restaurant,...,Pool,Pub,Punjabi Restaurant,Restaurant,Sandwich Place,Shopping Mall,Snack Place,Stadium,Tea Room,Women's Store
0,"Azad Nagar, Indore",0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
1,"Azad Nagar, Indore",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Azad Nagar, Indore",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Bhanwar Kua,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Bhanwar Kua,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [27]:
#combining all type of restaurants
ionehot['rest']=ionehot['Mediterranean Restaurant']+ionehot['Restaurant']+ionehot['Indian Restaurant']+ionehot['Chinese Restaurant']+ionehot['Fast Food Restaurant']+ionehot['Italian Restaurant']
#+ionehot['Greek Restaurant']
ionehot

Unnamed: 0,Neighborhood,Airport,Arts & Entertainment,Asian Restaurant,BBQ Joint,Bakery,Burger Joint,Bus Station,Café,Chinese Restaurant,...,Pub,Punjabi Restaurant,Restaurant,Sandwich Place,Shopping Mall,Snack Place,Stadium,Tea Room,Women's Store,rest
0,"Azad Nagar, Indore",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Azad Nagar, Indore",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Azad Nagar, Indore",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Bhanwar Kua,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Bhanwar Kua,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
195,"Vijay Nagar, Indore",0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
196,"Vijay Nagar, Indore",0,0,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0,1
197,"Vijay Nagar, Indore",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
198,"Vijay Nagar, Indore",1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### **Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category**

In [28]:
i_grouped = ionehot.groupby(["Neighborhood"]).mean().reset_index()

print(i_grouped.shape)
i_grouped

(16, 47)


Unnamed: 0,Neighborhood,Airport,Arts & Entertainment,Asian Restaurant,BBQ Joint,Bakery,Burger Joint,Bus Station,Café,Chinese Restaurant,...,Pub,Punjabi Restaurant,Restaurant,Sandwich Place,Shopping Mall,Snack Place,Stadium,Tea Room,Women's Store,rest
0,"Azad Nagar, Indore",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Bhanwar Kua,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,...,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.166667,0.0,0.333333
2,"Chandan Nagar, Indore",0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667
3,"Chandra Nagar, Indore",0.0,0.0,0.052632,0.0,0.052632,0.0,0.0,0.052632,0.0,...,0.052632,0.0,0.0,0.052632,0.157895,0.0,0.0,0.0,0.0,0.210526
4,Dewas Naka,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Geeta Bhawan,0.0,0.045455,0.0,0.0,0.045455,0.0,0.0,0.090909,0.0,...,0.0,0.0,0.0,0.045455,0.045455,0.045455,0.0,0.045455,0.0,0.272727
6,Khandwa Naka,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,L.I.G. Colony,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.375,0.0,...,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.25
8,"Malviya Nagar, Indore",0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.045455,0.0,...,0.045455,0.0,0.0,0.045455,0.090909,0.0,0.0,0.0,0.0,0.227273
9,Navlakha,0.0,0.0,0.0,0.0,0.166667,0.0,0.333333,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [29]:
len(i_grouped[i_grouped["rest"] > 0])

12

**Create a new DataFrame for Restaurant data only**

In [30]:
i_rest = i_grouped[["Neighborhood","rest"]]
i_rest

Unnamed: 0,Neighborhood,rest
0,"Azad Nagar, Indore",0.0
1,Bhanwar Kua,0.333333
2,"Chandan Nagar, Indore",0.166667
3,"Chandra Nagar, Indore",0.210526
4,Dewas Naka,0.0
5,Geeta Bhawan,0.272727
6,Khandwa Naka,0.0
7,L.I.G. Colony,0.25
8,"Malviya Nagar, Indore",0.227273
9,Navlakha,0.0


## **7. Cluster Neighborhoods**

In [31]:
kclusters = 3

i_clustering = i_rest.drop(["Neighborhood"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(i_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 2, 0, 0, 1, 0, 1, 0, 0, 1])

In [32]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
i_merged = i_rest.copy()

# add clustering labels
i_merged["Cluster Labels"] = kmeans.labels_

In [33]:
i_merged.rename(columns={"rest": "Restaurant"}, inplace=True)
i_merged

Unnamed: 0,Neighborhood,Restaurant,Cluster Labels
0,"Azad Nagar, Indore",0.0,1
1,Bhanwar Kua,0.333333,2
2,"Chandan Nagar, Indore",0.166667,0
3,"Chandra Nagar, Indore",0.210526,0
4,Dewas Naka,0.0,1
5,Geeta Bhawan,0.272727,0
6,Khandwa Naka,0.0,1
7,L.I.G. Colony,0.25,0
8,"Malviya Nagar, Indore",0.227273,0
9,Navlakha,0.0,1


In [34]:
i_merged = i_merged.join(idf.set_index("Neighborhood"), on="Neighborhood")

print(i_merged.shape)
i_merged

(16, 5)


Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Latitude,Longitude
0,"Azad Nagar, Indore",0.0,1,22.69658,75.88767
1,Bhanwar Kua,0.333333,2,22.68764,75.86291
2,"Chandan Nagar, Indore",0.166667,0,22.70938,75.82394
3,"Chandra Nagar, Indore",0.210526,0,22.74147,75.89447
4,Dewas Naka,0.0,1,22.771764,75.901659
5,Geeta Bhawan,0.272727,0,22.71822,75.88766
6,Khandwa Naka,0.0,1,22.689258,75.870259
7,L.I.G. Colony,0.25,0,22.73478,75.88485
8,"Malviya Nagar, Indore",0.227273,0,22.741695,75.899889
9,Navlakha,0.0,1,22.70479,75.87657


In [35]:
print(i_merged.shape)
i_merged.sort_values(["Cluster Labels"], inplace=True)
i_merged

(16, 5)


Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Latitude,Longitude
2,"Chandan Nagar, Indore",0.166667,0,22.70938,75.82394
3,"Chandra Nagar, Indore",0.210526,0,22.74147,75.89447
5,Geeta Bhawan,0.272727,0,22.71822,75.88766
7,L.I.G. Colony,0.25,0,22.73478,75.88485
8,"Malviya Nagar, Indore",0.227273,0,22.741695,75.899889
10,Palasia,0.272727,0,22.72769,75.88578
12,Rau Colony,0.25,0,22.73478,75.88485
0,"Azad Nagar, Indore",0.0,1,22.69658,75.88767
4,Dewas Naka,0.0,1,22.771764,75.901659
6,Khandwa Naka,0.0,1,22.689258,75.870259


**Finally, let's visualize the resulting clusters**

In [36]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(i_merged['Latitude'], i_merged['Longitude'], i_merged['Neighborhood'], i_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=8,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [37]:
#save map as html file
map_clusters.save('map_clusters.html')

## **8. Examine Clusters**

### **A. cluster 0**

In [38]:
i_merged.loc[i_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Latitude,Longitude
2,"Chandan Nagar, Indore",0.166667,0,22.70938,75.82394
3,"Chandra Nagar, Indore",0.210526,0,22.74147,75.89447
5,Geeta Bhawan,0.272727,0,22.71822,75.88766
7,L.I.G. Colony,0.25,0,22.73478,75.88485
8,"Malviya Nagar, Indore",0.227273,0,22.741695,75.899889
10,Palasia,0.272727,0,22.72769,75.88578
12,Rau Colony,0.25,0,22.73478,75.88485


### **B. Cluster 1**

In [39]:
i_merged.loc[i_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Latitude,Longitude
0,"Azad Nagar, Indore",0.0,1,22.69658,75.88767
4,Dewas Naka,0.0,1,22.771764,75.901659
6,Khandwa Naka,0.0,1,22.689258,75.870259
9,Navlakha,0.0,1,22.70479,75.87657


### **C. Cluster 2**

In [40]:
i_merged.loc[i_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Latitude,Longitude
1,Bhanwar Kua,0.333333,2,22.68764,75.86291
11,"Rajendra Nagar, Indore",0.333333,2,22.67416,75.84224
13,Satya Sai Chouraha,0.346154,2,22.755351,75.896877
14,"Sukhliya, Indore",0.5,2,22.76022,75.87815
15,"Vijay Nagar, Indore",0.384615,2,22.75641,75.89247


## **Observations:**
1. cluster 2 show higgest number of restaurant.
2. cluster 0 show average number of restaurant available in most areas.
3. cluster 1 show there is less number of restaurant available in that area.  
**From the observation while thinking of setting up a restaurant in indore it is more beneficial to select a place where there is least number of restaurants so that people there get chances to have something close to their place.
Therefore, this project recommends property developers or restaurants owner to capitalize on these findings to open new Restaurants in neighborhoods in cluster 1 with little to no competition.
Property developers with unique selling propositions to stand out from the competition can also open new Restaurant in neighborhoods in cluster 0 with moderate competition. Lastly, property developers are advised to avoid neighborhoods in cluster 2 which already have high concentration of restaurants and suffering from intense competition.**