# <center><font color='white'>__I am about to open restaurant in South Mumbai !__</font></center>
## <center><font color='yellow'>__But which type of restaurant should i setup?__</font></center>
### <center><font color='green'>__DataScience can help us answer this question__</font></center>
***
## Table of Contents

* [Introduction]()
* [API Based Data Collection]()
* [Data Preprocessing/ Cleaning]()
* [Lets Visualize Data]()

# Introduction

**Competition** can be an important factor towards deciding whether  what type of restaurant has to be setup in a particular locality. Multiple factors comes into play such as pre-existing restaurants in the area, their ratings, location, prices, etc.
Now, for anyone who wants to switch towards hotel business and want to begin a food franchise in **South Mumbai, India**, data science can help us analyse this topic.<br><br>
The target of this project is to suggest types of food franchises that can be setup in South Mumbai based on the historic data about the existing restaurants. Using **FourSquare API** and **Zomato API**, it is possible to extract data of existing food businesses around the locality and form a conclusion about what can be the best type of restaurant that can be set up which has an **higher probability of being profitable**.<br><br>
Based on the data collected from the two API about the existing business in that particular area we can then move towards cleaning and finding possible correlation between the business, its opinion based on the rating, location, price etc. Once curated, the user can be prompted with a suggestion regarding what type of restaurants is lacking in what part of South Mumbai and this can **help the user come closer towards making a decison**. <br><br>

# API Based Data Collection
_Two main APIs will be required to majorly satisfy the data requirement for this project and they are :_
* **FourSquare API**: This API can help collect all the venues upto a radius specified. We are going to analyse a radius of upto 8km-10km depending on the data requirement in conjugate to the api calls that can be made.
* **Zomato API**: Fetched venues from the above api can be used as input for this api which in return gives rating of the venue, price ranges, etc 
### Now let us view South of Mumbai using the folium Map library

**South Bombay or South Mumbai** is the Mumbai City district which is the southernmost precinct of Greater Mumbai. It extends from Colaba to Mahim. It comprises the city's main business localities, making it the wealthiest urban precinct in India. Property prices in South Mumbai are by far the highest in India and among the highest in the world.<br>
Let us view the map of South Mumbai using folium library in python.

In [9]:

import folium

TARGET_LATITUDE = 18.940
TARGET_LONGITUDE = 72.826
TARGET = 'South Mumbai'

target_map = folium.Map(location=[TARGET_LATITUDE, TARGET_LONGITUDE], zoom_start=13)
folium.Marker(location=[TARGET_LATITUDE, TARGET_LONGITUDE]).add_to(target_map)
target_map



###  Now we can begin fetching the venues in the region of south mumbai with a radius of 8km wrt to the the target latitude and longitude as the center.<br>
**The Foursquare API has the explore API which allows us to find venue recommendations within a given radius from the given coordinates. We will use this API to find all the venues we need.**

We first import all the required libraries except folium which is imported above.

In [10]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import requests
from pandas.io.json import json_normalize

In [11]:
FOURSQUARE_CLIENT_ID = '0HH2B0MRFB2FALD3CL3SQAGF5KPCVO53DS5OEOKOP4MWUCJO'
FOURSQUARE_CLIENT_SECRET = 'D5KMPZK1RAFC0RSUS3VCUOIAIIA2KVCOWHIP1RJX3D1L0UQS'
FOURSQUARE_VERSION = '20200412'
TOTAL_VENUES =  100 #four square can max out to about 50 venues when categoryId is mentioned
radius = 4000 # 4 KILOMETERS
offset = 0
categoryId = "4d4b7105d754a06374d81259" #this category represents the Food category in the venues
target_venues = 0 # target venues needs to be 100 or more incase to have a better prediction rate at the end
foursquare_venues = pd.DataFrame(columns = ['name', 'categories', 'lat', 'lng']) # final venues dataframe
fetched_venues = 0

In [4]:
def get_category_type(row):
    try:
        category_list = row['categories']
    except:
        category_list = row['venue.categories']
    if len(category_list) == 0:
        return None
    else:
        return category_list[0]['name']

In [7]:
offset = 0
while True:
    url = ('https://api.foursquare.com/v2/venues/explore?categoryId={}&client_id={}'
           '&client_secret={}&v={}&ll={},{}&radius={}&limit={}&offset={}').format(categoryId, FOURSQUARE_CLIENT_ID, 
                                                                        FOURSQUARE_CLIENT_SECRET, 
                                                                        FOURSQUARE_VERSION, 
                                                                        TARGET_LATITUDE, 
                                                                        TARGET_LONGITUDE, 
                                                                        radius,
                                                                        TOTAL_VENUES,
                                                                        offset)
    result = requests.get(url).json()
    feteched_venues = len(result['response']['groups'][0]['items'])
    target_venues = target_venues + fetched_venues
    # Now in the json the the required list is in the key items
    venues = result['response']['groups'][0]['items']
    venues = json_normalize(venues)
    if len(result['response']['groups'][0]['items'])>0:
        # Out of everything only these columns are required
        required_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng'] 
        venues = venues.loc[:, required_columns]

        # since we can notice that the the categories is a dictionary. To make it an absolute value we use the get_category_type
        # function 
        venues['venue.categories'] = venues.apply(get_category_type, axis=1)

        # Now lets make the names of columns right. E.g. venues.name to name
        venues.columns = [col.split('.')[-1] for col in venues.columns]

        # append the venues in the final data frame
        foursquare_venues = pd.concat([foursquare_venues, venues],axis = 0, sort = False)

        # since data is limited we need to call the url again changing the offset by 100
        if (target_venues > 100):
            print('break')
            print()
            break
        else: 
            offset = offset+100
            print("offset changed")

        foursquare_venues.reset_index(drop=True, inplace=True)
    else: 
        break
foursquare_venues.shape


offset changed
offset changed


(109, 4)

In [8]:
# changing the offset can introduce many duplicate values. a simple duplicate drop can remove them
foursquare_venues.drop_duplicates(inplace=True)
foursquare_venues.shape

(109, 4)

In [9]:
print("Total Venues: {}".format(len(foursquare_venues)))

Total Venues: 109


**Thus we have extracted 109 unique venues in south mumbai using the foursquare API.**<br><br>

## Zomato API
Zomato API provides us with various forms of access. Depending on what is needed for that particular project like cuisine, daily menu, review etc, all of these data can be accessed through **Zomato's REST API**.<br>
Accessing the data through the API requires a **user access key** which is to be accepted from the developer website page, by submitting necessary information. For this analysis, Zomato's **search API** will be used, which will help to search any particular venue based on it name, latitude, longitude etc.Since we have 3 major values for search already ready from the previoius API used, it will be helpful in determining the other values of the venue itself.<br><br>
**Inputs to the API call includes:**
* _name_ of the venue itself
* _Latitude_ and _Longitude_ of the venue
* Since we have the precise latitude and Longitude, _count_ value will be 1 because we need that one venue and not a collection
* _start_ value determines the offset from the location that is provided that helps when count is more that 1 or when a collection of restraunts is required. But we need not require that value to it is kept _0_.
* _sort_ is based on real_distance so each time we get the venue we're searching based on location coordinates.



In [12]:
zomato_start = 0
error_list = [] #stores indexes of venues whose data cannot be provided by zomato
zomato_count = 1
zomato_cols = ['name', 'lat', 'lng', 'avg_2_cost','price_range','agg_rating', 'rating_text','address', 'votes', 'review_count', 'cuisine']
header = {'user-key':'f9ab88b5002170593e8128ab423b5b13'} # add your key here, i aint gonna give my key ;P
#dataframe for the api data received
final_venue = []
for index, row in foursquare_venues.iterrows():
    venue = []
    url = ('https://developers.zomato.com/api/v2.1/search?q={}&start={}&count={}&lat={}&lon={}').format(row['name'],
                                                                                                        zomato_start, zomato_count, 
                                                                                                                    row['lat'], row['lng'])
    try:
        result = requests.get(url,headers = header).json()
    except:
        print('Error Index: {}'.format(index))
        error_list.append(index)
    
    if(len(result['restaurants'])>0):
        print(index)
        venue.append(result['restaurants'][0]['restaurant']['name'])
        venue.append(result['restaurants'][0]['restaurant']['location']['latitude'])
        venue.append(result['restaurants'][0]['restaurant']['location']['longitude'])
        venue.append(result['restaurants'][0]['restaurant']['average_cost_for_two'])
        venue.append(result['restaurants'][0]['restaurant']['price_range'])
        venue.append(result['restaurants'][0]['restaurant']['user_rating']['aggregate_rating'])
        venue.append(result['restaurants'][0]['restaurant']['user_rating']['rating_text'])    
        venue.append(result['restaurants'][0]['restaurant']['location']['address'])
        venue.append(result['restaurants'][0]['restaurant']['user_rating']['votes'])
        venue.append(result['restaurants'][0]['restaurant']['all_reviews_count'])
        venue.append(result['restaurants'][0]['restaurant']['cuisines'])
        print(venue)
        final_venue.append(venue)
    else:
        print("error")
        error_list.append(index) # index in the error list because no info is present regarding the venue
        final_venue.append([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

zomato_df = pd.DataFrame(final_venue, columns=zomato_cols)    
print(error_list)

0
['Food For Thought', '18.9312431990', '72.8315844387', 1000, 3, '4.3', 'Very Good', '45/47, Kitabkhana, Somaiya Bhavan, Mahatma Gandhi Road, Flora Fountain, Fort, Mumbai', 1699, 472, 'Cafe']
1
['Royal China', '18.9380073532', '72.8329124674', 2500, 4, '4.4', 'Very Good', 'Hazarimal Somani Marg, Near Sterling Cinema, Fort, Mumbai', 2265, 513, 'Asian, Chinese']
2
['Shree Thaker Bhojanalay', '18.9512018821', '72.8282779455', 1200, 3, '4.9', 'Excellent', '31, Dadisheth Agyari Lane, Off Kalbadevi Road, Kalbadevi, Mumbai', 1776, 852, 'Gujarati']
3
['The Bayview Restaurant - Marine Plaza', '18.9311731109', '72.8230456263', 2500, 4, '4.2', 'Very Good', 'Hotel Marine Plaza, 29, Marine Drive, Churchgate, Mumbai', 1067, 591, 'Continental, North Indian, Salad']
4
['The Oriental Blossom - Marine Plaza', '18.9311731109', '72.8230456263', 3200, 4, '3.8', 'Good', 'Hotel Marine Plaza, 29, Marine Drive, Churchgate, Mumbai', 484, 206, 'Chinese, Seafood']
5
['Pizza By The Bay', '18.9334111600', '72.8239

error
43
['The Pantry', '18.9287200096', '72.8321329504', 1400, 3, '4.1', 'Very Good', 'Yeshwant Chambers, Military Square Lane, Kala Ghoda, Fort, Mumbai', 3700, 1027, 'Cafe, Healthy Food, Fast Food']
44
['The Butler And The Bayleaf', '19.1069174000', '72.8240719000', 2000, 4, '4.2', 'Very Good', '5, 1st Floor, Juhu - Tara Road, Opposite St. Joseph Church, Juhu, Mumbai', 261, 160, 'North Indian, Salad, Street Food']
45
['Surti', '18.9524135269', '72.8297404200', 400, 1, '3.6', 'Good', '3/01, Bhuleshwar Corner, Opposite Cotton Exchange, Kalbadevi, Mumbai', 184, 110, 'North Indian, South Indian, Chinese']
46
['Vinay Health Home', '18.9530724602', '72.8172295913', 300, 1, '4.7', 'Excellent', 'Jawar Mansion, Dr BA Jaikar Marg, Charni Road, Mumbai', 1074, 580, 'Maharashtrian, South Indian, Fast Food']
47
['Shalimar', '18.9581662624', '72.8324266523', 700, 2, '4.0', 'Very Good', 'Vazir Building, Shalimar Corner, Bhendi Bazaar, Sandhurst Road, Mohammad Ali Road, Mumbai', 11552, 1402, 'North I

85
['Amrut Punjab', '18.9677043564', '72.8199785203', 600, 2, '4.0', 'Very Good', 'Central Avenue Building, C Wing, DB Marg, Mumbai Central, Mumbai', 233, 42, 'North Indian, Chinese']
86
["Pete's Pizzeria & Kitchen", '18.9881806000', '72.8274666700', 400, 1, '3.7', 'Good', '102, Tantia Jogani Industrial Estate, Opposite Kasturba Hospital, J R Boricha Marg, Lower Parel, Mumbai', 1521, 106, 'Pizza']
87
['Sigdi', '19.0615560000', '72.8334580000', 650, 2, '3.6', 'Good', '29th Road Off, Waterfield Road, Linking Road, Bandra West, Mumbai', 12537, 415, 'Mughlai, North Indian, Chinese, Biryani']
88
['Celejor', '19.0403843729', '72.8412631899', 300, 1, '3.8', 'Good', 'L J Road, Mahim, Mumbai', 312, 119, 'Bakery, Desserts']
89
['Gourmet Gusto', '19.1006182356', '72.8403120115', 800, 2, '3.8', 'Good', 'Shop 8/B, Seva Sadan, Lajpat Rai Road, Near HDFC Bank, Vile Parle West, Mumbai', 232, 160, 'Cafe, Italian, Fast Food']
90
['Chinese Palace', '18.9725764026', '72.8142000362', 900, 2, '4.0', 'Very G

* Since we have limited amounts of call that we can make to the API. It will be a essential step in __saving__ the dataframes as csv files

In [11]:
error_list = [24, 42]
print(error_list)

[24, 42]


Out of the **109** rows fetched through the foursquare API, only __2__ of the venues have no data regarding it registered in the Zomato Database and hence the API returns no info. The index of the unknown venue is stored in the **error_list** and will be removed in the next Stage.
<br><br>
# Data Preprocessing/ Cleaning
 This stage includes combining data from multiple sources and filtering out the un-relevant data. Starting with removing all the venues from the dataframes that has no local data from zomato to back it up based on the list **error_list** derived from api call made during zomato api calls.<br>
Once the above processing has taken place we have to combine both the dataframes as one based on one or more common columns
that can be name, latitude or longitude. It will be explained below after cleaning the dataframe.

In [12]:
zomato_df.drop(index=error_list, axis=0, inplace=True)
foursquare_venues.drop(index=error_list, axis=0, inplace=True)
print("Indexes {} are removed from both dataframes".format(error_list))

Indexes [24, 42] are removed from both dataframes


### code below is optional

In [13]:
# foursquare_venues.to_csv('FourSquare.csv')
# zomato_df.to_csv('Zomato.csv')

# Checkpoint Here !
#comment the above two lines of code and uncomment the below two lines

foursquare_venues =pd.read_csv('FourSquare.csv')
zomato_df = pd.read_csv('Zomato.csv')

Now the next processing task is to merge both the dataframes as one thereby dealing with one common dataframes. But Merging two dataframes effectively and accurately required one or more common columns between the data frames.
Initially, Let us try mapping both the zomato and foursquare data onto a map and see how close the overlapping is.

In [14]:
fMap = folium.Map(location=[TARGET_LATITUDE, TARGET_LONGITUDE], zoom_start = 15,tiles='CartoDB dark_matter')
#plotting lat and lng from foursquare api

for index, row in foursquare_venues.iterrows():
    label = folium.Popup(row['name'], parse_html=True)
    folium.CircleMarker(
        [row['lat'], row['lng']],
        radius = 5,
        popup = label, 
        color='blue',
        fill=True, 
        fill_color='purple',
        fill_opacity=0.5).add_to(fMap)
#plotting lat and lng from Zomato api
for index, row in zomato_df.iterrows():
    label = folium.Popup(row['name'], parse_html=True)
    folium.CircleMarker(
        [row['lat'], row['lng']],
        radius = 5,
        popup = label, 
        color='red',
        fill=True, 
        fill_color='green',
        fill_opacity=0.7).add_to(fMap)

fMap
    

In [15]:
# reduce after decimal values to only 4 digits
def reduce(val):
    return float("{:.4f}".format(val))

In [16]:


# to improve accuracy we reduce the decimal in the latitude and longitude to about 4 digits

foursquare_venues['lat'] =  foursquare_venues['lat'].apply(reduce)
foursquare_venues['lng'] = foursquare_venues['lng'].apply(reduce)

#convert lat lng datatype in zomato dataframe from string to float
zomato_df['lat'] = pd.to_numeric(zomato_df['lat'])
zomato_df['lng'] = pd.to_numeric(zomato_df['lng'])

zomato_df['lat'] = zomato_df['lat'].apply(reduce)
zomato_df['lng'] = zomato_df['lng'].apply(reduce)


In [17]:
zomato_df[['lat', 'lng']].head()


Unnamed: 0,lat,lng
0,18.9312,72.8316
1,18.938,72.8329
2,18.9512,72.8283
3,18.9312,72.823
4,18.9312,72.823


In [18]:

f1Map = folium.Map(location=[TARGET_LATITUDE, TARGET_LONGITUDE], zoom_start = 12,tiles='CartoDB dark_matter')
#plotting lat and lng from foursquare api

for index, row in foursquare_venues.iterrows():
    label = folium.Popup(row['name'], parse_html=True)
    folium.CircleMarker(
        [row['lat'], row['lng']],
        radius = 5,
        popup = label, 
        color='blue',
        fill=True, 
        fill_color='#3186cc',
        fill_opacity=0.5).add_to(f1Map)
#plotting lat and lng from Zomato api
for index, row in zomato_df.iterrows():
    label = folium.Popup(row['name'], parse_html=True)
    folium.CircleMarker(
        [row['lat'], row['lng']],
        radius = 5,
        popup = label, 
        color='red',
        fill=True, 
        fill_color='green',
        fill_opacity=0.7).add_to(f1Map)

f1Map

__Here we can observe that reducing the digits after decimal point does not induce much change in the overall position of the markers on the map. And hence the above step might not be as necessary.__<br>
But inorder to combine both the dataset we require a unique indentifier between them. 

# **Note**: 
Out of the two dataframes extracted throught the APIs data has to be curated together into one dataframe for analysis. In the Foursquare dataset we need the name and the category of each restaurants as the venues and in the Zomato Dataset we require price_range, agg_rating, rating_text, address, votes, review_count and the types of cuisines provided in each of the restaurants.<br>
Now we understand that both the dataframes are arranged simliar and their indexes are similar i.e every venue received from the FourSquare API has its corresponding information at the same index in the Zomato dataframe.<br>
__But__ the zomato API does not always gives us the same latitude, longitude and even the same venue information. As seen in the above map it is understood that many of the same venues from both the dataframes overlap with each other. But some of them do not.Also, names for venues provided by both dataframes mostly do not perfectly match with each other._(There can be difference in letter or there are special characters in the names)_ So also **for a particular venue provided by the foursquare API a completely new venue will be provided zomato API**. Therefore, inorder to curate a final dataframe for evaluation we need to combine both the venues from both the dataframes by eliminating the duplicates<br><br>

* ## Process of Elimination and Merging.
name and the location cordinates are the two major points that can be used to determine if venues in both the 


In [19]:
final = pd.merge(foursquare_venues, zomato_df, left_index= True, right_index=True)
final.drop(['Unnamed: 0_x', 'Unnamed: 0.1_x'], axis=1, inplace=True)
final

Unnamed: 0,name_x,categories,lat_x,lng_x,Unnamed: 0_y,Unnamed: 0.1_y,name_y,lat_y,lng_y,avg_2_cost,price_range,agg_rating,rating_text,address,votes,review_count,cuisine
0,Food for Thought,Café,18.9320,72.8317,0,0,Food For Thought,18.9312,72.8316,1000,3,4.3,Very Good,"45/47, Kitabkhana, Somaiya Bhavan, Mahatma Gan...",1699,472,Cafe
1,Royal China,Chinese Restaurant,18.9387,72.8329,1,1,Royal China,18.9380,72.8329,2500,4,4.4,Very Good,"Hazarimal Somani Marg, Near Sterling Cinema, F...",2265,513,"Asian, Chinese"
2,Shree Thaker Bhojnalay,Indian Restaurant,18.9512,72.8283,2,2,Shree Thaker Bhojanalay,18.9512,72.8283,1200,3,4.9,Excellent,"31, Dadisheth Agyari Lane, Off Kalbadevi Road,...",1776,852,Gujarati
3,The Bayview - Hotel Marine Plaza,Italian Restaurant,18.9319,72.8230,3,3,The Bayview Restaurant - Marine Plaza,18.9312,72.8230,2500,4,4.2,Very Good,"Hotel Marine Plaza, 29, Marine Drive, Churchga...",1067,591,"Continental, North Indian, Salad"
4,"The Oriental Blossom, Marine Plaza",Asian Restaurant,18.9316,72.8231,4,4,The Oriental Blossom - Marine Plaza,18.9312,72.8230,3200,4,3.8,Good,"Hotel Marine Plaza, 29, Marine Drive, Churchga...",484,206,"Chinese, Seafood"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
102,Mafco,Fast Food Restaurant,18.9659,72.8037,104,104,Mafco,19.0662,72.8952,550,2,2.9,Average,"GM Road, Chembur, Mumbai",82,37,"North Indian, Chinese"
103,Mafco Farm Fair ¤,Chinese Restaurant,18.9659,72.8036,105,105,Farm Cafe,19.1613,72.9444,750,2,3.6,Good,"Shop G 11, Goregaon-Mulund Link Road, Near For...",2351,635,"Cafe, Fast Food"
104,Howra,Café,18.9663,72.8041,106,106,Howra Burger,19.0543,72.8283,600,2,4.0,Very Good,"Shop 6, Dunhill Apartment, Waroda Road, Hill R...",3744,697,"American, Burger"
105,Ministry of Salads,Salad Place,18.9669,72.8036,107,107,Bombay Salad Co.,19.0648,72.8308,900,2,4.7,Excellent,"Shop 1, 16th Road, Linking Road, Bandra West, ...",3234,1542,"Salad, Healthy Food, Juices"


### Merging can be based on two factors:
* Initally a column consisting of difference between the latitude and longitude of both the venues based on index. If the difference in the latitude or longitude between both the venues is __less than or equal to__ 0.004 then we go on to analyse the next feature i.e. the venue name
* Names in both the dataframes can be same and different in some cases. It is obvious that if the venue data received from the zomato API is compeltely different then the names wont match and the possibility of venues with same name in extreme close proximity is very low
* For every name in each row *(name_x, name_y)* we split the name into individual words can compare each words in both dataframes now combined. If there is just one word as the name of the venue the we compare name and the location cordinates. But when the name __exceeds 2 or more__, we compare words and if words in the venue names have half more more words similar + have latitude and logitude similar then it is combined in the final dataset. 

#### Lets start by creating column in final dataframe that shows the differnce in the latitude and longitude values respectively

In [20]:
final['lat_diff'] = 0
final['lng_diff'] = 0
final['lat_diff'] = final['lat_x'] - final['lat_y']
final['lng_diff'] = final['lng_x'] - final['lng_y']
final

Unnamed: 0,name_x,categories,lat_x,lng_x,Unnamed: 0_y,Unnamed: 0.1_y,name_y,lat_y,lng_y,avg_2_cost,price_range,agg_rating,rating_text,address,votes,review_count,cuisine,lat_diff,lng_diff
0,Food for Thought,Café,18.9320,72.8317,0,0,Food For Thought,18.9312,72.8316,1000,3,4.3,Very Good,"45/47, Kitabkhana, Somaiya Bhavan, Mahatma Gan...",1699,472,Cafe,0.0008,0.0001
1,Royal China,Chinese Restaurant,18.9387,72.8329,1,1,Royal China,18.9380,72.8329,2500,4,4.4,Very Good,"Hazarimal Somani Marg, Near Sterling Cinema, F...",2265,513,"Asian, Chinese",0.0007,0.0000
2,Shree Thaker Bhojnalay,Indian Restaurant,18.9512,72.8283,2,2,Shree Thaker Bhojanalay,18.9512,72.8283,1200,3,4.9,Excellent,"31, Dadisheth Agyari Lane, Off Kalbadevi Road,...",1776,852,Gujarati,0.0000,0.0000
3,The Bayview - Hotel Marine Plaza,Italian Restaurant,18.9319,72.8230,3,3,The Bayview Restaurant - Marine Plaza,18.9312,72.8230,2500,4,4.2,Very Good,"Hotel Marine Plaza, 29, Marine Drive, Churchga...",1067,591,"Continental, North Indian, Salad",0.0007,0.0000
4,"The Oriental Blossom, Marine Plaza",Asian Restaurant,18.9316,72.8231,4,4,The Oriental Blossom - Marine Plaza,18.9312,72.8230,3200,4,3.8,Good,"Hotel Marine Plaza, 29, Marine Drive, Churchga...",484,206,"Chinese, Seafood",0.0004,0.0001
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
102,Mafco,Fast Food Restaurant,18.9659,72.8037,104,104,Mafco,19.0662,72.8952,550,2,2.9,Average,"GM Road, Chembur, Mumbai",82,37,"North Indian, Chinese",-0.1003,-0.0915
103,Mafco Farm Fair ¤,Chinese Restaurant,18.9659,72.8036,105,105,Farm Cafe,19.1613,72.9444,750,2,3.6,Good,"Shop G 11, Goregaon-Mulund Link Road, Near For...",2351,635,"Cafe, Fast Food",-0.1954,-0.1408
104,Howra,Café,18.9663,72.8041,106,106,Howra Burger,19.0543,72.8283,600,2,4.0,Very Good,"Shop 6, Dunhill Apartment, Waroda Road, Hill R...",3744,697,"American, Burger",-0.0880,-0.0242
105,Ministry of Salads,Salad Place,18.9669,72.8036,107,107,Bombay Salad Co.,19.0648,72.8308,900,2,4.7,Excellent,"Shop 1, 16th Road, Linking Road, Bandra West, ...",3234,1542,"Salad, Healthy Food, Juices",-0.0979,-0.0272


In [21]:
# This function helps us see if the names associated with the venues gathered from both the API's are same or not. 
# Works but converting names to lower case and then checking if all the words in the smaller name is present in the larger list of names.
# If size of both venue names are same each word is checked to see if it is part of the other.

def checkName(a, b):
    list_A = [x.lower() for x in a.split()]
    list_B = [x.lower() for x in b.split()]
    min_list = min(list_A, list_B)
    max_list = max(list_A, list_B)
    x = len(min_list)
    y = len(max_list)
    if x == 1 and y == 1:
        if min_list[0] == max_list[0]:
            return True
        else: 
            return False
    else:
        res = 0
        for i in range(0, x):
            if min_list[i] in max_list:
                res += 1 
        if res == x:
            return True
        else: 
            return False

In [22]:
result = []
for index, row in final.iterrows():
    val = (abs(row['lat_diff']) <= 0.0004) & (abs(row['lng_diff']) <= 0.0004) | checkName(row['name_x'], row['name_y'])
    if val == True:
        result.append(index)
print(len(result))

77


In [23]:
# result consists of all the indexes in the final dataframe that is to be a part of the final dataframe to
# be visualized and evaluated

final_df = final.loc[result, :]
final_df.reset_index(inplace=True, drop=True)
final_df.drop(columns=['Unnamed: 0_y', 'Unnamed: 0.1_y', 'name_y', 'lat_y', 'lng_y', 'price_range', 'votes' ,'lat_diff', 'lng_diff'], inplace=True)
final_df.columns = ['name', 'categories', 'lat', 'lng', 'avg_cost_for_2', 'agg_rating', 'rating_test', 'address', 'review_count', 'cuisines']
final_df

Unnamed: 0,name,categories,lat,lng,avg_cost_for_2,agg_rating,rating_test,address,review_count,cuisines
0,Food for Thought,Café,18.9320,72.8317,1000,4.3,Very Good,"45/47, Kitabkhana, Somaiya Bhavan, Mahatma Gan...",472,Cafe
1,Royal China,Chinese Restaurant,18.9387,72.8329,2500,4.4,Very Good,"Hazarimal Somani Marg, Near Sterling Cinema, F...",513,"Asian, Chinese"
2,Shree Thaker Bhojnalay,Indian Restaurant,18.9512,72.8283,1200,4.9,Excellent,"31, Dadisheth Agyari Lane, Off Kalbadevi Road,...",852,Gujarati
3,"The Oriental Blossom, Marine Plaza",Asian Restaurant,18.9316,72.8231,3200,3.8,Good,"Hotel Marine Plaza, 29, Marine Drive, Churchga...",206,"Chinese, Seafood"
4,Pizza By The Bay,Pizza Place,18.9335,72.8239,2000,4.3,Very Good,"143, Soona Mahal, Marine Drive, Churchgate, Mu...",3163,Italian
...,...,...,...,...,...,...,...,...,...,...
72,The Sun,Vegetarian / Vegan Restaurant,18.9550,72.7985,1000,2.9,Average,"93, Beach View, Bhulabhai Desai Road, Breach C...",66,"North Indian, Fast Food, Beverages"
73,Santosh sagar,Indian Restaurant,18.9550,72.7977,550,4.1,Very Good,"Shop 6, Napeansea Road, Matru Ashish, Malabar ...",153,"South Indian, Chinese, Street Food, Fast Food"
74,Delifresh,Bakery,18.9650,72.8040,650,3.3,Average,"Dhunabad, 106 Bhulabhai Desai Road, Kemps Corn...",3,"Bakery, Cafe, Desserts"
75,Mafco,Fast Food Restaurant,18.9659,72.8037,550,2.9,Average,"GM Road, Chembur, Mumbai",37,"North Indian, Chinese"


In [26]:
f2Map = folium.Map(location=[TARGET_LATITUDE, TARGET_LONGITUDE], zoom_start = 15,tiles='CartoDB dark_matter')
#plotting lat and lng from foursquare api

for index, row in final_df.iterrows():
    label = folium.Popup(row['name'], parse_html=True)
    folium.CircleMarker(
        [row['lat'], row['lng']],
        radius = 5,
        popup = label, 
        color='yellow',
        fill=True, 
        fill_color='green',
        fill_opacity=0.3).add_to(f2Map)

f2Map
    

In [27]:
print("No.of restaurants scrapped: {}" .format(final_df.shape[0]))

No.of restaurants scrapped: 77


__Now we have a total of 77 restaurants scrapped from the entire data frame collected from the API which had a total of 117. Using only the location(_lat, lng_) difference of 0.0004 we only get 45 restaurant venues. But including the variable name gives us more out of the data from the API.__
<br>

 __The logic behind considering combining data points by analysing the difference in the latitude and longitude comes from Saptashwa Bhattacharyya and his article *Exploring the Tokyo Neighbourhoods: Data-Science in Real Life* where he explore the neighbourhood of Tokyo and found out the trending food choices amongst the working class in the major sub areas__

# Let Us Now Visualize the Data Gathered.