# A recommendation system for setting up a new restaurant / eatery focussed primarily on the corporate demographic (based on employee count and location of corporate offices in Bangalore, India)

## Brief Introduction

## Part 1: Description of problem

### India is an extremely densely populated country (one of the most dense), with more than 1.34 billion residents.
### Obviously it is difficult to start a business here due to high real estate costs. 
### So, an entrepreneur aiming at a corporate centric market should know the best places to set up shop.

### A large population of Bangalore lies in this corporate demographic (more than 200 corporations), and 800+ startups, so eating snack foods out is more popular and convenient than ever, hence goal is to find the best places in Bangalore to setup a new food shop/ restaurant.
### The example chosen is for Bangalore, but this project can be used for various different locations like Chennai, Mumbai, Delhi etc.

### The objective is to find the optimal location for setting up a new business (based on location of offices, eateries in Bangalore, India). 

### Target audience: 
### Entrepreneurs and small-scale businessmen/women interested in the food/ snacks industry, aiming at the corporate demographic for maximising profits.

 #   

## Part 2: Data that is needed, and process

### 1. We need a list of the corporate offices in Bangalore. Their latitude and longitude will be calculated using Geopy Nominatim (a Python Library).

This data can be found on Wikipedia, as well as many other websites.

For instance: https://en.wikipedia.org/wiki/Category:Companies_based_in_Bangalore

### 2. **Then we can use the FourSquare API to find the number of eateries in a 1km radius around each office.** The API will provide us with Postal Code, Neighborhood, Venue, Venue Summary and Venue Category.

Foursquare is a local search-and-discovery service mobile app which provides search results for its users (Wikipedia). It has more than 60 million users.

### 3. Processing the Retrieved data and creating a structured DataFrame for all the venues, grouped by offices. 

### 4. Selecting relevant venues (food related only).

### **The offices with highest ratio of `(no. of employees)/(no. of eateries)` would be the best places to start a restaurant.** (supply and demand)

We can also create clusters of most highly student populated areas

### Imports

In [1]:
import requests  # library to handle requests
import pandas as pd  # library for data analsysis
import numpy as np  # library to handle data in a vectorized manner
import random  # library for random number generation
import string # Manipulation of name for Folium Map
# module to convert an address into latitude and longitude values
from geopy.geocoders import Nominatim

# libraries for displaying images
from IPython.display import Image
from IPython.core.display import HTML

# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize
import folium  # plotting library

from sklearn.cluster import KMeans # for clustering

print('Libraries imported.')

Libraries imported.


### Declaring API keys

In [2]:
CLIENT_ID = '1NW3OMZEIVJXCFGGIYLXLLH4CQWIYX3GSO4ERVDST4FXYI4E'  # your Foursquare ID
# your Foursquare Secret
CLIENT_SECRET = 'NNWGYNG2RFCIPLNG0ER4BNC15GPE2CSY10UA32BJFCBYOO0Y'
VERSION = '20180604'
LIMIT = 40
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 1NW3OMZEIVJXCFGGIYLXLLH4CQWIYX3GSO4ERVDST4FXYI4E
CLIENT_SECRET:NNWGYNG2RFCIPLNG0ER4BNC15GPE2CSY10UA32BJFCBYOO0Y


### Listing the offices we will study

In [3]:
offices = ['Amazon','Infosys','Dell','HP','Tech Mahindra','SAP','Samsung R&D','Accenture','Wipro','TCS', 'IBM','Oracle',
           'Cognizant','Capgemini','Cisco','Mindtree','HCL','Mu Sigma','Robert Bosch','Thomson Reuters','Honeywell','CGI',
           'Mphasis','EY','Deloitte','Nokia','Intel','Huawei','Goldman Sachs','Flipkart','KPMG','Zomato','Swiggy','HP']
print(len(offices))

34


In [4]:
for i in offices:
    print(i)

Amazon
Infosys
Dell
HP
Tech Mahindra
SAP
Samsung R&D
Accenture
Wipro
TCS
IBM
Oracle
Cognizant
Capgemini
Cisco
Mindtree
HCL
Mu Sigma
Robert Bosch
Thomson Reuters
Honeywell
CGI
Mphasis
EY
Deloitte
Nokia
Intel
Huawei
Goldman Sachs
Flipkart
KPMG
Zomato
Swiggy
HP


### Adding 'Bangalore' to each institute to help Nominatim find latitude and longitude easily

In [5]:
offices = [x+" Bangalore" for x in offices]
print(offices[:5])

['Amazon Bangalore', 'Infosys Bangalore', 'Dell Bangalore', 'HP Bangalore', 'Tech Mahindra Bangalore']


### Function to get latitude and longitude of each institute

In [6]:
def coords(office):
    d = {}
    d['office'] = office
    geolocator = Nominatim(user_agent='myapplication')
    try:
        location = geolocator.geocode(office).raw
        d['latitude'] = location['lat']
        d['longitude'] = location['lon']
        return d
    except Exception as e:
        print("Office %s not found"%office)
        return -1

In [7]:
offices_list = []
for i in offices:
    details = coords(i)
    if(details!=-1):
        offices_list.append(coords(i))
print(offices_list)
print(len(offices_list))

Office HCL Bangalore not found
Office Mu Sigma Bangalore not found
Office Robert Bosch Bangalore not found
Office Thomson Reuters Bangalore not found
Office Mphasis Bangalore not found
Office Goldman Sachs Bangalore not found
Office Zomato Bangalore not found
Office Swiggy Bangalore not found
[{'office': 'Amazon Bangalore', 'latitude': '12.9795028', 'longitude': '77.6959454'}, {'office': 'Infosys Bangalore', 'latitude': '12.84508845', 'longitude': '77.6649530443891'}, {'office': 'Dell Bangalore', 'latitude': '12.9383649', 'longitude': '77.629499'}, {'office': 'HP Bangalore', 'latitude': '12.99646925', 'longitude': '77.6888267135983'}, {'office': 'Tech Mahindra Bangalore', 'latitude': '12.85094555', 'longitude': '77.6778630567074'}, {'office': 'SAP Bangalore', 'latitude': '12.9608001', 'longitude': '77.6372345'}, {'office': 'Samsung R&D Bangalore', 'latitude': '13.0043687', 'longitude': '77.5524019'}, {'office': 'Accenture Bangalore', 'latitude': '12.9678934', 'longitude': '77.724013051

### We were able to find the locations of 27 organizations

### Latitude and Longitude of Bangalore, India

In [9]:
ban_lat = 12.9716
ban_lon = 77.5946

### Plotting all the institutes that we are considering

In [16]:
office_map = folium.Map(location = [ban_lat, ban_lon], zoom_start=11, tiles = "Stamen Terrain")

for d in offices_list:
    folium.CircleMarker(
    [float(d['latitude']), float(d['longitude'])],
        radius = 5, 
        popup = d['office'].translate(str.maketrans('', '', string.punctuation)),
        fill = True,
        color = '#0012EE',
        fill_color = 'red',
        fill_opacity = 0.5
    ).add_to(office_map)
    
office_map

TypeError: 'int' object is not subscriptable

### Using the FourSquare Venues API to find all food related categories

In [None]:
url = 'https://api.foursquare.com/v2/venues/categories?&client_id={}&client_secret={}&v={}'.format(
    CLIENT_ID,
    CLIENT_SECRET,
    VERSION,)

results = requests.get(url).json()

In [None]:
food_categs = []
for i in results['response']['categories'][3]['categories']:
    food_categs.append(i['name'])
print(len(food_categs))

We have 91 food related categories

In [None]:
food_categs[:5]

### Function to generate a random color hex code (for use in map making):

In [None]:
import random
def randomcol():
    r = lambda: random.randint(0,255)
    return('#%02X%02X%02X' % (r(),r(),r()))

### Function to extract the category of a venue from a dataframe

In [None]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']

    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

## Main plotting function for all institutes and all the food venues within 1500 metres

In [None]:
main_map = folium.Map(location=[mum_lat, mum_lon], zoom_start=11, tiles = "Stamen Terrain")
radius = 1500

eatery_data = []

def fullplot():
    for i in institutes:
        name = i['institute']
        print(name, end=' ')
        lat = i['latitude']
        lon = i['longitude']
        
        # Using the foursquare venues API to find nearby venues for an institute
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID,
            CLIENT_SECRET,
            VERSION,
            lat,
            lon,
            radius,
            100)
        results = requests.get(url).json()
        
        venues = results['response']['groups'][0]['items']

        nearby_venues = json_normalize(venues)  # flatten JSON

        # filter columns
        filtered_columns = ['venue.name', 'venue.categories',
                            'venue.location.lat', 'venue.location.lng']
        nearby_venues = nearby_venues.loc[:, filtered_columns]

        # filter the category for each row
        nearby_venues['venue.categories'] = nearby_venues.apply(
            get_category_type, axis=1)

        # clean columns
        nearby_venues.columns = [col.split(".")[-1]
                                 for col in nearby_venues.columns]
        nearby_venues = nearby_venues[nearby_venues['categories'].isin(
            food_categs)]
        
        print(", Venues: ",nearby_venues.shape[0])

        venues_i = []
        for index, row in nearby_venues.iterrows():
            d = {}
            d['name'] = row['name'].translate(str.maketrans('', '', string.punctuation))
            d['lat'] = row['lat']
            d['lng'] = row['lng']
            venues_i.append(d)
        
        # Generating a random color
        color = randomcol()
        
        # Plotting venues
        for d in venues_i:
            folium.CircleMarker(
                [float(d['lat']), float(d['lng'])],
                radius=1.5,
                popup=d['name'].translate(str.maketrans('', '', string.punctuation)),
                fill=True,
                color=color,
                fill_color='blue',
                fill_opacity=0.5
            ).add_to(main_map)
            eatery_data.append(d)
        
        # Plotting institute
        folium.CircleMarker(
            [float(lat), float(lon)],
            radius=5,
            popup=name.translate(str.maketrans('', '', string.punctuation)),
            fill=True,
            color=color,
            fill_color='red'
        ).add_to(main_map)
        
# Calling the function that adds markers to the map
fullplot()

# Printing our map
main_map

**The above map is completely interactive**

**There is a problem with this map. Some of the venues overlap. So instead of grouping locations based on colleges, we can use clustering**

### Institute locations

In [None]:
instdf = pd.DataFrame(institutes)
instdf.head(10)

### Eateries locations

In [None]:
eaterydf = pd.DataFrame(eatery_data)
print(len(eaterydf))
eaterydf.head(10)

## Clustering

### Let us cluster the educational institutes first.

In [None]:
num_ci = 6
kmeans_inst = KMeans(n_clusters=num_ci, random_state=0).fit(instdf.loc[:, ['latitude', 'longitude']])
id_label_inst = kmeans_inst.labels_

In [None]:
map2 = folium.Map(location=[mum_lat, mum_lon], zoom_start=11, tiles="Stamen Terrain")

for i in range(num_ci):
    cluster = np.where(id_label_inst == i)[0]
    col = randomcol()
    for la, lo, name in zip(instdf.latitude[cluster].values, instdf.longitude[cluster].values, instdf.institute[cluster].values):
        folium.CircleMarker(
            [float(la), float(lo)],
            radius=5,
            popup=name.translate(str.maketrans('', '', string.punctuation)),
            fill=True,
            color=col,
            fill_color='red'
        ).add_to(map2)

map2

### Now clustering the eateries into 10 groups

In [None]:
num_ce = 10
kmeans_eat = KMeans(n_clusters=num_ce, random_state=0).fit(eaterydf.loc[:, ['lat', 'lng']])
id_label_eat = kmeans_eat.labels_

### Visualizing the clusters

In [None]:
map3 = folium.Map(location=[mum_lat, mum_lon], zoom_start=11, tiles = "Stamen Terrain")

for i in range(num_ci):
    cluster = np.where(id_label_inst == i)[0]
    col = randomcol()
    for la, lo, name in zip(instdf.latitude[cluster].values, instdf.longitude[cluster].values, instdf.institute[cluster].values):
        folium.CircleMarker(
            [float(la), float(lo)],
            radius=5,
            popup=name.translate(str.maketrans('', '', string.punctuation)),
            fill=True,
            color=col,
            fill_color='red'
        ).add_to(map3)

for i in range(num_ce):
    cluster = np.where(id_label_eat == i)[0]
    col = randomcol()
    for la, lo, name in zip(eaterydf.lat[cluster].values, eaterydf.lng[cluster].values, eaterydf.name[cluster].values):
        folium.CircleMarker(
            [float(la), float(lo)],
            radius=1,
            popup=str(i)+ ":"+ name.translate(str.maketrans('', '', string.punctuation)),
            fill=True,
            color=col,
            fill_color='blue',
            fill_opacity=0.5
        ).add_to(map3)

map3

Each eatery has a label denoting the cluster it is in.

### Analysis

From the map, we can see that some clusters are much smaller than others. Let us look into the counts for each cluster

In [None]:
from itertools import groupby

counts = [(i, len(list(c))) for i,c in groupby(sorted(id_label_eat))]
print(counts)

We can see that cluster 1 has a very small number of eateries around it. This is near *Tolani College.*

### Inferences from our analysis

Cluster 1: **Tolani College** has very few eateries around it. 
It is a well reputed college in Mumbai and has a large student population.

**A good eatery aimed at students, or even at the general public would do well here due to sheer lack of competition in the vicinity.**

## Hence we find that the best place for a food shop aimed at students in the age range of 13 to 20 year old will be near Tolani College in Andheri East,  Mumbai

## 2nd Result

We also find that 2 clusters have approximately *200* eateries near them.

**These could be great localities to advertise an upcoming new fast food shop or restaurant.**

### The first cluster is near BD Somani College,  Elphinstone College, Jai Hind College and Sophia College. (Cluster 0)

### The second is near Mithibai College, Narsee Monjee College and Jamnabai Narsee School. (Cluster 9)


### Hence these places would be good for advertising an upcoming new eatery.