# Opening a New Restaurant in Indore, India
## Week 5 Final Report
* Build a dataframe of neighborhoods in Indore, India by web scraping the data from Wikipedia page
* Get the geographical coordinates of the neighborhoods
* Obtain the venue data for the neighborhoods from Foursquare API
* Explore and cluster the neighborhoods
* Select the best cluster to open a new Restaurant


In [None]:
!conda install -c conda-forge folium=0.5.0 --yes
#!conda install -c conda-forge geopy --yes 
!conda install -c conda-forge geocoder

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    altair-4.0.1               |             py_0         575 KB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.0 MB

The following NEW packages will be 

## **1. Importing Libraries**

In [None]:
import pandas as pd
import numpy as np
import json

from geopy.geocoders import Nominatim
import geocoder

import requests
from bs4 import BeautifulSoup

from pandas.io.json import json_normalize

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

import folium
print("Libraries imported.")

## **2. Scrap data from Wikipedia page into a DataFrame**

In [None]:
data=requests.get('https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Indore').text

In [None]:
soup = BeautifulSoup(data, 'html.parser')

In [None]:
neighborhoodList = []
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)

In [None]:
idf = pd.DataFrame({"Neighborhood": neighborhoodList})

idf

## **3. Get the geographical coordinates**

In [None]:
#import geocoder
def get_latlng(neighborhood):
    coords = None
    while(coords is None):
        g = geocoder.arcgis('{}, Indore, India'.format(neighborhood))
        coords = g.latlng
    return coords
    



In [None]:
coords = [ get_latlng(neighborhood) for neighborhood in idf["Neighborhood"].tolist() ]

In [None]:
coords

In [None]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])


In [None]:
idf['Latitude'] = df_coords['Latitude']
idf['Longitude'] = df_coords['Longitude']

In [None]:
print(idf.shape)
idf

In [None]:
# save the DataFrame as CSV file
idf.to_csv("idf.csv", index=False)

## **4. Map of Indore with neighborhoods superimposed on top**

In [None]:
#indore map
geolocator = Nominatim(user_agent="i_explorer")
city ="Indore"
country ="India"
loc = geolocator.geocode(city+','+ country)
latitude = loc.latitude
longitude = loc.longitude
print("latitude is :-" ,latitude,"\nlongtitude is:-" ,longitude)
map_indore = folium.Map([latitude, longitude], zoom_start=10)
map_indore

In [None]:
#adding markers to map
for lat, lng, neighborhood in zip(idf['Latitude'], idf['Longitude'], idf['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_indore)  
    
map_indore

In [None]:
#save map as html file
map_indore.save('map_indore.html')

## **5. Use the Foursquare API to explore the neighborhoods**

In [None]:
CLIENT_ID = 'OXJXVBTIJOUXCAEN3W4IYSCFSNC2EBWEXXAEEV1UWEWHYTGM' 
CLIENT_SECRET = 'WRCMCR230EYROYJ0VL4HZCAMMMEHMCGW42PNDBUUORDWUXXK'
VERSION = '20180605'

In [None]:
radius = 15000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(idf['Latitude'], idf['Longitude'], idf['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [None]:
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VLatitude', 'VLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df

In [None]:
venues_df.groupby(["Neighborhood"]).count()

In [None]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

In [None]:
 venues_df['VenueCategory'].unique()

In [None]:
#"Neighborhood" in venues_df['VenueCategory'].unique()

## **6. Analyze Each Neighborhood**

In [None]:
ionehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
ionehot['Neighborhood'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [ionehot.columns[-1]] + list(ionehot.columns[:-1])
ionehot = ionehot[fixed_columns]

print(ionehot.shape)
ionehot.head()

In [None]:
#combining all type of restaurants
ionehot['rest']=ionehot['Mediterranean Restaurant']+ionehot['Restaurant']+ionehot['Indian Restaurant']+ionehot['Chinese Restaurant']+ionehot['Fast Food Restaurant']
#+ionehot['Italian Restaurant']
#+ionehot['Greek Restaurant']
ionehot

### **Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category**

In [None]:
i_grouped = ionehot.groupby(["Neighborhood"]).mean().reset_index()

print(i_grouped.shape)
i_grouped

In [None]:
len(i_grouped[i_grouped["rest"] > 0])

**Create a new DataFrame for Restaurant data only**

In [None]:
i_rest = i_grouped[["Neighborhood","rest"]]
i_rest

## **7. Cluster Neighborhoods**

In [None]:
kclusters = 3

i_clustering = i_rest.drop(["Neighborhood"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(i_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

In [None]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
i_merged = i_rest.copy()

# add clustering labels
i_merged["Cluster Labels"] = kmeans.labels_

In [None]:
i_merged.rename(columns={"rest": "Restaurant"}, inplace=True)
i_merged

In [None]:
i_merged = i_merged.join(idf.set_index("Neighborhood"), on="Neighborhood")

print(i_merged.shape)
i_merged

In [None]:
print(i_merged.shape)
i_merged.sort_values(["Cluster Labels"], inplace=True)
i_merged

**Finally, let's visualize the resulting clusters**

In [None]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(i_merged['Latitude'], i_merged['Longitude'], i_merged['Neighborhood'], i_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=8,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [None]:
#save map as html file
map_clusters.save('map_clusters.html')

## **8. Examine Clusters**

### **A. cluster 0**

In [None]:
i_merged.loc[i_merged['Cluster Labels'] == 0]

### **B. Cluster 1**

In [None]:
i_merged.loc[i_merged['Cluster Labels'] == 1]

### **C. Cluster 2**

In [None]:
i_merged.loc[i_merged['Cluster Labels'] == 2]

## **Observations:**
1. cluster 2 show higgest number of restaurant.
2. cluster 0 show average number of restaurant available in most areas.
3. cluster 1 show there is less number of restaurant available in that area.  
**From the observation while thinking of setting up a restaurant in indore it is more beneficial to select a place where there is least number of restaurants so that people there get chances to have something close to their place.
Therefore, this project recommends property developers or restaurants owner to capitalize on these findings to open new Restaurants in neighborhoods in cluster 1 with little to no competition.
Property developers with unique selling propositions to stand out from the competition can also open new Restaurant in neighborhoods in cluster 0 with moderate competition. Lastly, property developers are advised to avoid neighborhoods in cluster 2 which already have high concentration of restaurants and suffering from intense competition.**