# Capstone Assignment | Restaurant locations in Mumbai
### Prabhav Pratyaksh, August 2021



## Table of Contents
* [Introduction](#Introduction)
* [Methodology](#Methodology)
  * [Importing Libraries](#ImportingLibraries)
  * [Webscraping Wikipedia Page](#Webscraping)
  * [Geocoding Stations](#Geocoding)
  * [Geographical Visualization using Folium](#Geographical)
  * [Foursquare API call](#Foursquare)
  * [Heatmap Visualization using Folium](#Heatmap)

* [Results & Discussion](#Results)
* [Conclusion](#Conclusion)



## Introduction <a name ="Introduction"></a>

Mumbai is India's second largest city and its financial capital. The city is home to around 12 million people and continues to attract thousands of migrants from rest of the country, thereby earning the moniker 'City of Dreams'.
The backbone of the city is indubitably its railway network, called the Mumbai Suburban Railway. The network consists of 6 lines and has an average daily ridership of around 7 million. It would not be wrong to say that the daily lives of the people in Mumbai heavily depends on the functioning of the railway network. As such, the network plays an important role in the economic progress of the city.

For this case study, we will adopt the perspective of a budding chef who wants to decide on the location of his new restaurant. Like the rest of the people in the city, he is acutely aware of the importance of railways in shuffling the people to and fro throughout the city. As such, he decides that any location that is close enough to the railway station will attract more customers and thereby make his restaurant business profitable.

Therefore, he decides to apply some data analytics to hone in on the location of his new restaurant. To do this, he decides on a list of simple criteria that he believes will deliver the most profits. These are:

1. The restaurant needs to be located near the stations on the Western line of the railway network. This is because of the higher ridership on the line compared to other lines

2. The restaurnt needs to be located within the Mumbai city only, and not the adjoining suburban regions. For this purpose, he will only consider locations south of Borivali (refer map above)

3. The restaurant should be within the walkable distance of the station. Therefore, locations within 500m of the station will be considered

4. The restaurant will serve Indian cuisine, Chinese cuisine, and also provide some fast food snacks

5. And finally, it would be ideal if the restaurant did not have any competing restaurants nearby, since they would also be vying for the same customers and will drive down his business

The data required for this exercise will be pretty trivial. We will use Foursquare API in conjunction with the list of the stations on the Western Line.

To obtain the list of the stations we will utilise [this Wikipedia page ](https://en.wikipedia.org/wiki/List_of_Mumbai_Suburban_Railway_stations) to get the list of stations. Once we get that, we can use the Foursquare API to request data for different venues around the station within a 500m radius.

## Methodology <a name ="Methodology"></a>

### Importing Libraries <a name ="ImportingLibraries"></a>

For this project, we will require BeautifulSoup , geocoder, and Folium libraries apart from the commonly used libraries

In [2]:

from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np
!pip install geocoder
import geocoder
!pip install geopy
from geopy.geocoders import Nominatim 
import folium
import re
from folium import plugins
from folium.plugins import HeatMap
print("All libraries installed")

Collecting geocoder
  Downloading geocoder-1.38.1-py2.py3-none-any.whl (98 kB)
[?25l[K     |â–ˆâ–ˆâ–ˆâ–Ž                            | 10 kB 25.8 MB/s eta 0:00:01[K     |â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–‹                         | 20 kB 30.0 MB/s eta 0:00:01[K     |â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ                      | 30 kB 32.7 MB/s eta 0:00:01[K     |â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–Ž                  | 40 kB 22.1 MB/s eta 0:00:01[K     |â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–‹               | 51 kB 17.9 MB/s eta 0:00:01[K     |â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ            | 61 kB 12.7 MB/s eta 0:00:01[K     |â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–Ž        | 71 kB 14.1 MB/s eta 0:00:01[K     |â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–‹     | 81 kB 15.5 MB/s eta 0:00:01[K     |â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ 

 ##### The below code gets our Foursquare client credentials and has been hidden

In [3]:
#@title
CLIENT_ID = 'HUV1GGBRIKOBC5R35UPGYMS2QB3QFPZJQP1UKEXV2QYKZNWJ' # your Foursquare ID
CLIENT_SECRET = '4QH5W3BP2QPOI2KJRYM3ZC4VJFWXWZM3CTPW3KGHWEFRHFF2' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

#Foursquare credentials
client_id = CLIENT_ID
client_secret = CLIENT_SECRET
version = '20180605'

print('Your credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

{
    "tags": [
        "remove-cell"
    ]
}

Your credentials:
CLIENT_ID: HUV1GGBRIKOBC5R35UPGYMS2QB3QFPZJQP1UKEXV2QYKZNWJ
CLIENT_SECRET:4QH5W3BP2QPOI2KJRYM3ZC4VJFWXWZM3CTPW3KGHWEFRHFF2


{'tags': ['remove-cell']}

### Webscraping Wikipedia page for list of stations <a name ="Webscraping"></a>





In [4]:
response2 = requests.get("https://en.wikipedia.org/wiki/List_of_Mumbai_Suburban_Railway_stations").text
soup2 = BeautifulSoup(response2,'html5lib')

We filter for the tables, since our data is stored in a tabular form

In [5]:
table2 = soup2.find_all('table')
table3 = table2[1]

Finally, we scrape for the list of stations and convert that to a dataframe

In [6]:
station_df2 = pd.DataFrame(columns=['Station Name','Line'])
for rows in table3.find_all('tr'):
      cell = rows.find_all('td')
      if cell:
        station = cell[1].text
        line = cell[4].text
        station_df2 = station_df2.append({'Station Name':station,'Line':line}, ignore_index= True)

station_df2

Unnamed: 0,Station Name,Line
0,Airoli,Trans-Harbour Line
1,Aman Lodge,Central Line
2,Ambernath,Central Line
3,Ambivli,Central Line
4,Andheri,Western LineHarbour LineLine 1 (Mumbai Metro)
...,...,...
141,Nhava Sheva,Nerulâ€“Uran line
142,Ranjanpada,Nerulâ€“Uran line
143,Sagar Sangam,Nerulâ€“Uran line
144,Targhar,Nerulâ€“Uran line


Now, since our analysis is only focused on Western part of the city, we filter for stations on Western Line only. Post that, we append the word 'Station' to each entry. This is done to ensure that Nominatim geocoder can search for the coordinate data without any ambiguity.

In [7]:
station_df = station_df2[station_df2["Line"].str.contains("Western Line")]
#station_df

# Since we now know the stations, we can append 'Station' in to the name (this will help in searching the location on geocoder more  efficiently)
df = station_df[["Station Name"]]
#df

df['Station Name'] =df['Station Name'].astype(str) + " Station"
# df

df.reset_index(drop=True, inplace=True)
df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,Station Name
0,Andheri Station
1,Bandra Station
2,Bhayandar Station
3,Boisar Station
4,Borivali Station
5,Charni Road Station
6,Churchgate Station
7,Dadar Station
8,Dahanu Road Station
9,Dahisar Station


### Geocoding Stations <a name ="Geocoding"></a>

Now that we have the list of relevant stations, we use the geocoder to get the coordinates for these stations

In [8]:
geolocator = Nominatim(user_agent="mumbai_agent")

df["Latitude"] = df['Station Name'].apply(lambda x: geolocator.geocode(x)).apply(lambda x: x.latitude)
df["Longitude"] = df['Station Name'].apply(lambda x: geolocator.geocode(x)).apply(lambda x: x.longitude)

df

Unnamed: 0,Station Name,Latitude,Longitude
0,Andheri Station,19.119698,72.84642
1,Bandra Station,19.054928,72.840592
2,Bhayandar Station,19.310268,72.853097
3,Boisar Station,19.786338,72.79258
4,Borivali Station,19.229068,72.857363
5,Charni Road Station,18.952456,72.81744
6,Churchgate Station,18.935957,72.82734
7,Dadar Station,19.019282,72.842876
8,Dahanu Road Station,19.991524,72.743408
9,Dahisar Station,19.24945,72.859621


We notice that 3 stations have (obviously) incorrect data, because the names can be found in multiple countries. As such, we need to manually search in Nominatim and replace values in the dataframe.

In [9]:
df2 = df.copy() # Backup because I don't want to repeat the Nominatim call in case something goes wrong

In [10]:
# Manual search on Nominatim using the full location

df2.iloc[17,1:3] = (geolocator.geocode("Mahalaxmi Station, Mumbai").latitude,geolocator.geocode("Mahalaxmi Station, Mumbai").longitude)
df2.iloc[19,1:3] = (geolocator.geocode("Malad Station, Mumbai").latitude,geolocator.geocode("Malad Station, Mumbai").longitude)
df2.iloc[29,1:3] = (geolocator.geocode("Santacruz Station, Mumbai").latitude,geolocator.geocode("Santacruz Station, Mumbai").longitude)

df2

Unnamed: 0,Station Name,Latitude,Longitude
0,Andheri Station,19.119698,72.84642
1,Bandra Station,19.054928,72.840592
2,Bhayandar Station,19.310268,72.853097
3,Boisar Station,19.786338,72.79258
4,Borivali Station,19.229068,72.857363
5,Charni Road Station,18.952456,72.81744
6,Churchgate Station,18.935957,72.82734
7,Dadar Station,19.019282,72.842876
8,Dahanu Road Station,19.991524,72.743408
9,Dahisar Station,19.24945,72.859621


### Geographic Visualization using Folium <a name ="Geographical"></a>

In [11]:
# Placing Mumbai on a map
location = geolocator.geocode("Mumbai, India")
lat = location.latitude
lon = location.longitude
print(f"The coordinates of Mumbai are {lat,lon}")

The coordinates of Mumbai are (19.0759899, 72.8773928)


In [12]:
# Making a Folium map of all the stations in the dataframe

map_mum = folium.Map(location=[lat, lon], zoom_start=11)

# add markers to map
for Latitude, Longitude, station in zip(df2['Latitude'], df2['Longitude'], df2['Station Name']):
    label = '{}'.format(station)
    label = folium.Popup(label, parse_html=True)
    folium.Circle(
        [Latitude, Longitude],
        radius=500,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_mum)  
    
map_mum

We see a map of Mumbai where the locations are marked along the Western Line of the suburban railway network.
However, we also see one station pretty far away from the city (Umroli) and the one station mistakenly placed on a separate line (Naigaon). This is because of inherent limitations of Nominatim database that does not have the exact location coordinates for these 2 stations *within* Mumbai. As such, to simplify our exercise, we can remove these 2 stations from our dataframe.

In [13]:
df2.drop([24,31], axis = 0, inplace=True)
df2.reset_index(drop=True, inplace=True)
df2

Unnamed: 0,Station Name,Latitude,Longitude
0,Andheri Station,19.119698,72.84642
1,Bandra Station,19.054928,72.840592
2,Bhayandar Station,19.310268,72.853097
3,Boisar Station,19.786338,72.79258
4,Borivali Station,19.229068,72.857363
5,Charni Road Station,18.952456,72.81744
6,Churchgate Station,18.935957,72.82734
7,Dadar Station,19.019282,72.842876
8,Dahanu Road Station,19.991524,72.743408
9,Dahisar Station,19.24945,72.859621


Now, we plot the map again using the modified dataframe

In [14]:
map_mum = folium.Map(location=[lat, lon], zoom_start=11)

# add markers to map
for Latitude, Longitude, station in zip(df2['Latitude'], df2['Longitude'], df2['Station Name']):
    label = '{}'.format(station)
    label = folium.Popup(label, parse_html=True)
    folium.Circle(
        [Latitude, Longitude],
        radius=500,
        popup=label,
        color='black',
        fill=True,
        fill_color='#DE71D1',
        fill_opacity=0.18,
        parse_html=False).add_to(map_mum)  

map_mum

Now we are ready to get venues from Foursquare for all the specified locations in the dataframe

### Foursquare API call <a name ="Foursquare"></a>

In [15]:
# Defining a function to extract 100 nearby venues within a 500m radius

def get_nearby_venues(names, lats, lngs, radius=500, limit=100):
    venues_list = []
    for name, lat, lng in zip(names, lats, lngs):
        # specify the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            client_id,
            client_secret,
            version,
            lat,
            lng,
            radius,
            limit)
        # make the request, store the response
        results = requests.get(url).json()['response']['groups'][0]['items']
        # extract relevant information from each venue
        venues_list.append([(
            name,
            lat,
            lng,
            v['venue']['name'],
            v['venue']['location']['lat'],
            v['venue']['location']['lng'],
            v['venue']['categories'][0]['name']) for v in results])
    # populate the dataframe with venues list
    venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    venues.columns = ['Station',
                      'Station Latitude',
                      'Station Longitude',
                      'Venue',
                      'Venue Latitude',
                      'Venue Longitude',
                      'Venue Category']
    return(venues)

In [16]:
# Using the above function on our dataframe

all_venues = get_nearby_venues(
    df2['Station Name'],
    df['Latitude'],
    df['Longitude']
)
all_venues

Unnamed: 0,Station,Station Latitude,Station Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Andheri Station,19.119698,72.846420,Merwans Cake shop,19.119300,72.845418,Bakery
1,Andheri Station,19.119698,72.846420,Narayan Sandwich,19.121398,72.850270,Sandwich Place
2,Andheri Station,19.119698,72.846420,McDonald's,19.119691,72.846102,Fast Food Restaurant
3,Andheri Station,19.119698,72.846420,Cafe Alfa,19.119667,72.843560,Indian Restaurant
4,Andheri Station,19.119698,72.846420,Vaibhav Restaurant,19.118235,72.847991,Indian Restaurant
...,...,...,...,...,...,...,...
361,Virar Station,19.382668,72.832025,Celebrity Hotel,19.382698,72.828143,Indian Restaurant
362,Virar Station,19.382668,72.832025,Kraft Bakery,19.382571,72.829485,Bakery
363,Virar Station,19.382668,72.832025,Woodland,19.380517,72.829270,Clothing Store
364,Virar Station,19.382668,72.832025,Globalnet Computers,19.381249,72.827964,Electronics Store


In [17]:
all_venues["Station"].nunique()

28

Do note that we only have data for 28 stations from Foursquare

We can check the categories that are populated in the Venue Category column of the dataframe

In [18]:
all_venues['Venue Category'].unique()

array(['Bakery', 'Sandwich Place', 'Fast Food Restaurant',
       'Indian Restaurant', 'Restaurant', 'Food Court', 'Platform',
       'CafÃ©', 'Convenience Store', 'Train Station', 'Design Studio',
       'Italian Restaurant', 'Paper / Office Supplies Store', 'Brewery',
       'Punjabi Restaurant', 'Pier', 'Pub', 'Lake', 'Ice Cream Shop',
       'Food Truck', 'Bike Rental / Bike Share', 'Pizza Place',
       'Department Store', 'Chinese Restaurant', 'Clothing Store',
       'Burger Joint', 'Snack Place', 'Liquor Store', 'Juice Bar',
       'Harbor / Marina', 'Breakfast Spot', 'Beach', 'Gastropub',
       'Opera House', 'Indie Movie Theater',
       'Multicuisine Indian Restaurant', 'Asian Restaurant',
       'Coffee Shop', 'Gym', 'Grocery Store', 'Aquarium', 'Theater',
       'Music Venue', 'Cricket Ground', 'College Academic Building',
       'General Entertainment', 'Hotel', 'Gym / Fitness Center', 'Lounge',
       'Donut Shop', 'Hockey Arena', 'Japanese Restaurant', 'Flea Market',
 

We convert the previous dataframe into a one-hot encoded table

In [19]:
# one hot encoding
all_venues_onehot = pd.get_dummies(all_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
all_venues_onehot['Station'] = all_venues['Station'] 
all_venues_onehot = all_venues_onehot.groupby('Station').mean().reset_index()

all_venues_onehot.head()

Unnamed: 0,Station,American Restaurant,Amphitheater,Aquarium,Asian Restaurant,Athletics & Sports,Bakery,Bank,Bar,Beach,Beer Bar,Beer Garden,Bengali Restaurant,Bike Rental / Bike Share,Bookstore,Breakfast Spot,Brewery,Bridal Shop,Burger Joint,Bus Station,CafÃ©,Cantonese Restaurant,Chinese Restaurant,Clothing Store,Coffee Shop,College Academic Building,College Technology Building,Concert Hall,Convenience Store,Cricket Ground,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Donut Shop,Electronics Store,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Flea Market,...,Mediterranean Restaurant,Men's Store,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Movie Theater,Mughlai Restaurant,Multicuisine Indian Restaurant,Multiplex,Music Venue,Nightclub,Office,Opera House,Paper / Office Supplies Store,Park,Pier,Pizza Place,Platform,Playground,Plaza,Post Office,Pub,Punjabi Restaurant,Recreation Center,Restaurant,Sandwich Place,Seafood Restaurant,Shopping Mall,Smoke Shop,Snack Place,South Indian Restaurant,Tea Room,Thai Restaurant,Theater,Train Station,Vegetarian / Vegan Restaurant,Waterfront,Wine Shop,Women's Store,Yoga Studio
0,Andheri Station,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Bandra Station,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0625,0.0,0.0625,0.0,0.0,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0
2,Bhayandar Station,0.0,0.0,0.0,0.0,0.0,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0
3,Borivali Station,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.071429,0.0,0.142857,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Charni Road Station,0.0,0.0,0.027778,0.027778,0.0,0.027778,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.027778,0.0,0.0,0.027778,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.027778,0.0,0.0,0.0,0.0,0.0


Then, using the encoded table, we make another table that shows the 10 most common venues near the stations

In [20]:
# Displaying top venues in each neighbourhood
def top_venues(row, num_venues):
    row_cats = row.iloc[1:]
    row_cats_sorted = row_cats.sort_values(ascending=False)
    return row_cats_sorted.index.values[0:num_venues]

num_venues = 10
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
cols = ['Station']
for i in np.arange(num_venues):
    try:
        cols.append(f"{i+1}{indicators[i]} Most Common Venue")
    except:
        cols.append(f"{i+1}th Most Common Venue")

# create a dataframe of 10 most common venues by neighborhood
all_common = pd.DataFrame(columns=cols)
all_common['Station'] = all_venues_onehot['Station']

for i in np.arange(all_venues_onehot.shape[0]):
    all_common.iloc[i, 1:] = top_venues(all_venues_onehot.iloc[i, :], num_venues)

all_common

Unnamed: 0,Station,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Andheri Station,Indian Restaurant,Fast Food Restaurant,Restaurant,Food Court,Sandwich Place,Bakery,CafÃ©,Platform,Falafel Restaurant,Electronics Store
1,Bandra Station,Indian Restaurant,CafÃ©,Paper / Office Supplies Store,Brewery,Platform,Pier,Italian Restaurant,Pub,Convenience Store,Design Studio
2,Bhayandar Station,Ice Cream Shop,Bakery,Bike Rental / Bike Share,Food Truck,Train Station,Lake,Pizza Place,Farmers Market,Falafel Restaurant,Electronics Store
3,Borivali Station,Ice Cream Shop,Clothing Store,Chinese Restaurant,Restaurant,CafÃ©,Department Store,Burger Joint,Liquor Store,Food Truck,Snack Place
4,Charni Road Station,Indian Restaurant,Juice Bar,Ice Cream Shop,Restaurant,Harbor / Marina,Coffee Shop,CafÃ©,Opera House,Music Venue,Multicuisine Indian Restaurant
5,Churchgate Station,Indian Restaurant,Hotel,Ice Cream Shop,Cricket Ground,Fast Food Restaurant,Train Station,Movie Theater,CafÃ©,Cantonese Restaurant,Chinese Restaurant
6,Dadar Station,Indian Restaurant,Fast Food Restaurant,Movie Theater,Coffee Shop,CafÃ©,Flea Market,Maharashtrian Restaurant,Lounge,Plaza,Farmers Market
7,Dahanu Road Station,Train Station,Mobile Phone Shop,Fast Food Restaurant,Yoga Studio,Flea Market,Cricket Ground,Deli / Bodega,Department Store,Design Studio,Dessert Shop
8,Dahisar Station,Restaurant,Indian Restaurant,Train Station,Pizza Place,CafÃ©,Yoga Studio,Fast Food Restaurant,Cricket Ground,Deli / Bodega,Department Store
9,Goregaon Station,Indian Restaurant,Fast Food Restaurant,Bar,Design Studio,Seafood Restaurant,CafÃ©,Snack Place,Bookstore,Mobile Phone Shop,Vegetarian / Vegan Restaurant


Now, we only need locations for restaurants since other places won't compete with restaurants

In [21]:
df3 = all_venues.loc[all_venues["Venue Category"].str.contains("Restaurant"),:]
df3.reset_index(drop=True, inplace=True)
df3

Unnamed: 0,Station,Station Latitude,Station Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Andheri Station,19.119698,72.846420,McDonald's,19.119691,72.846102,Fast Food Restaurant
1,Andheri Station,19.119698,72.846420,Cafe Alfa,19.119667,72.843560,Indian Restaurant
2,Andheri Station,19.119698,72.846420,Vaibhav Restaurant,19.118235,72.847991,Indian Restaurant
3,Andheri Station,19.119698,72.846420,Amar Restaurant,19.118193,72.845210,Restaurant
4,Andheri Station,19.119698,72.846420,McDonald's,19.118411,72.848002,Fast Food Restaurant
...,...,...,...,...,...,...,...
112,Ram Mandir Station,19.014881,72.827956,Royale Lebanese Wraps,19.017249,72.830762,Falafel Restaurant
113,Ram Mandir Station,19.014881,72.827956,Downtown Dhaba,19.018531,72.830612,Indian Restaurant
114,Vile Parle Station,19.882991,72.763166,Annapurna Vadapav Center,19.884150,72.762802,Fast Food Restaurant
115,Virar Station,19.382668,72.832025,Dhuri Food Plaza,19.383414,72.830915,Multicuisine Indian Restaurant


Let's say our restaurant will serve only fast foods, Chinese, and Indian items. Let's filter for those

In [22]:
df4 = df3[(df3['Venue Category'] == 'Indian Restaurant') | (df3['Venue Category'] == 'Fast Food Restaurant') | (df3['Venue Category'] == 'Chinese Restaurant') ]
df4.groupby(["Venue Category"]).count()

Unnamed: 0_level_0,Station,Station Latitude,Station Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Chinese Restaurant,12,12,12,12,12,12
Fast Food Restaurant,21,21,21,21,21,21
Indian Restaurant,49,49,49,49,49,49


### Heatmap Visualization using Folium <a name = "Heatmap"></a>

Now we need to add the locations of these restaurants to the map of the stations (plotted earlier). We will use a heatmap layer to display where the concentration of the restaurants is the highest and the lowest.

In [23]:
heat_df = df4[['Venue Latitude', 'Venue Longitude']]

In [24]:
heat_data = [[row['Venue Latitude'],row['Venue Longitude']] for index, row in heat_df.iterrows()]
HeatMap(heat_data,name = 'heat map', min_opacity= 0.75, radius= 15, overlay=True).add_to(map_mum)
#  gradient= {0.15: "#150AE9", 0.35: "#AFC209", 0.58: "#D02727"}

mp = folium.map.LayerControl(position = 'topright', collapsed = True)    
mp.add_to(map_mum)

map_mum

This map can help us look at the potential locations for our restaurant. It shows a 500m circle around the stations and through the heatmap, displays the concentration of competing restaurants in the vicinity of the stations.

Do note that the heatmap is added a separate layer on top of out of original map. We can toggle this layer on/off for greater clarity by using the widget on the top right corner.

Now, we come to the final phase of our analysis. The map rendered above should give us a quick visual idea of which stations are ideal locations for setting up our restaurant. 
Starting from the southern end of the line, **Mahalaxmi** looks like a good candidate. However, we will not iclude this as Foursquare does not have venue data for this location. Moving northwards, **Matunga Road** looks like a promising candidates. Further noth, we see **Santacruz**, **Vile Parle**, **Ram Mandir** and finally **Malad** with very few competing restaurants around (we are limiting our analysis to the city centre only, i.e. not including suburban townships around Mumbai. Therefore, stations north of Borivali are not considered).
We can explore these 5 stations in a bit more detail.

## Results & Discussion <a name ="Results"></a>

Now that we have narrowed down the location to these 5 stations, we can look at these more closely. To do that, we will filter for these stations from our original dataframe to look at the venue categories around these.

In [25]:
locations = ['Matunga Road Station', 'Santacruz Station','Vile Parle Station', 'Ram Mandir Station', 'Malad Station']
df5 = all_venues[all_venues["Station"].isin(locations)]
df6 = all_common[all_common["Station"].isin(locations)]

In [26]:
df6

Unnamed: 0,Station,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,Malad Station,American Restaurant,Pizza Place,Gas Station,Bar,Post Office,Burger Joint,Farmers Market,Cricket Ground,Deli / Bodega,Department Store
19,Matunga Road Station,Train Station,Department Store,Indian Restaurant,Gym,Bar,Market,Concert Hall,Sandwich Place,South Indian Restaurant,Chinese Restaurant
22,Ram Mandir Station,Indian Restaurant,CafÃ©,Chinese Restaurant,Bakery,Electronics Store,Yoga Studio,Falafel Restaurant,Mobile Phone Shop,Coffee Shop,Dessert Shop
23,Santacruz Station,Train Station,Multiplex,Tea Room,Yoga Studio,Flea Market,Cricket Ground,Deli / Bodega,Department Store,Design Studio,Dessert Shop
26,Vile Parle Station,Fast Food Restaurant,Yoga Studio,Concert Hall,Cricket Ground,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Donut Shop,Electronics Store


In [29]:
df5["Venue Category"].value_counts()

Indian Restaurant                6
Train Station                    4
Chinese Restaurant               3
CafÃ©                             3
Electronics Store                2
Pizza Place                      2
Bar                              2
Bakery                           2
Fast Food Restaurant             2
Multiplex                        2
Department Store                 2
Burger Joint                     1
South Indian Restaurant          1
Yoga Studio                      1
Vegetarian / Vegan Restaurant    1
Gas Station                      1
Market                           1
American Restaurant              1
Bus Station                      1
Post Office                      1
Coffee Shop                      1
Sandwich Place                   1
Concert Hall                     1
Theater                          1
Mobile Phone Shop                1
Furniture / Home Store           1
Smoke Shop                       1
Tea Room                         1
Falafel Restaurant 

Note that for **Ram Mandir** station and **Vile Parle** station, we have Indian and Fast Food restaurants as the most common venue. Hence, we can remove these from our scope. Therefore, we have **Malad**, **Matunga Road**, and **Santacruz** stations as the best locations for opening our restaurant.


## Conclusion <a name ="Conclusion"></a>

We started with a premise of finding suitable locations for our new restaurant. To simplify this task, we came up with certain assumptions and based on those, looked for ideal locations. Leveraging Foursquare API and the Folium library, we did manage to find suitable locations.

Of course, this was a very simplistic analysis. We could have used even more complex criteria to scout for locations such as real estate prices, building regulations, zoning requirements, median income level in the neighbourhood, availability of raw goods (logistic supply chain) and utilities, etc. However, as a starting point, this analysis helped us to narrow down potential locations and allow a starting point from which more complex analysis could be done. Much of this code and the resulting analysis can be reused in different modules that aim to do more complex analysis.


Thank you for taking the time to review this notebook! ðŸ˜ƒ