# A recommendation system for food stalls aimed at students (based on location of schools/ colleges  and eateries in Mumbai, India)

## Brief Introduction

## Part 1: Description of problem

### Mumbai, India is an extremely densely populated city (one of the most dense), with more than 18 million residents.

### Obviously it is tough to start a business here due to high real estate costs. So, an entrepreneur aiming at a student centric market (13 - 20 year old demographic) should know the best places to set up shop.

### A large population of Mumbai lies in this student demographic (more than 50 schools), and eating snack foods out is more popular and convenient than ever, hence we will find the best places in Mumbai to set up a food shop/ restaurant

### Target audience: 
### Entrepreneurs and small-scale businessmen/women interested in the food/ snacks industry, aiming at the student demographic

 #   

## Part 2: Data that is needed, and process

### 1. We need a list of the most populated schools in Mumbai. Their latitude and longitude will be calculated using geopy Nominatim. 

This data can be found on Wikipedia, as well as the school websites.

For instance: https://en.wikipedia.org/wiki/List_of_educational_institutions_in_Mumbai

### 2. Then we can use the FourSquare API to find the number of eateries in a 1.5 km radius around each school. The API will provide us with Postal Code, Neighborhood, Venue, Venue Summary and Venue Category.

Foursquare is a local search-and-discovery service mobile app which provides search results for its users (Wikipedia). It has more than 60 million users.

### 3. We can also use the FourSquare API to find all food related categories that we will filter

### 4. Processing the Retrieved data and creating a structured DataFrame for all the venues, grouped by schools. 

### 5. Selecting relevant venues (food related only).

### The schools with least number of eateries around them would be the best places to start a food stall/ restaurant. (supply and demand)

### 6. Clustering the eateries to find the colleges with least competition around them.

### Thank you for your time, I would greatly appreciate any feedback (sidjain1412@gmail.com)

 #  

### Imports

In [29]:
import requests  # library to handle requests
import pandas as pd  # library for data analsysis
import numpy as np  # library to handle data in a vectorized manner
import random  # library for random number generation
import string # Manipulation of name for Folium Map
# module to convert an address into latitude and longitude values
from geopy.geocoders import Nominatim

# libraries for displaying images
from IPython.display import Image
from IPython.core.display import HTML

# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize
import folium  # plotting library

from sklearn.cluster import KMeans # for clustering

print('Libraries imported.')

Libraries imported.


### Declaring API keys

In [15]:
CLIENT_ID = 'WEMY4AM5NRBMPJ55IDUZ1XRYOHE52FANWWSHMCT2S0I1JUG3'  # your Foursquare ID
# your Foursquare Secret
CLIENT_SECRET = 'GAGO1KZFQ1DI3IKT1DG42DNQLGHPEBSJIE0QMDRXBJIHGJB1'
VERSION = '20180604'
LIMIT = 40
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: WEMY4AM5NRBMPJ55IDUZ1XRYOHE52FANWWSHMCT2S0I1JUG3
CLIENT_SECRET:GAGO1KZFQ1DI3IKT1DG42DNQLGHPEBSJIE0QMDRXBJIHGJB1


### Listing the schools/ colleges we will study

In [6]:
insts = ['Lilavatibai podar santacruz', 'Narsee Monjee College', 'University of Mumbai', 'Jai Hind College',
         'Mithibai College', 'Ramnarain Ruia College', 'Sophia College', "St. Andrew's College",
         'St. Xaviers College', 'Wilson College', 'IIT Bombay', 'Arya Vidya Mandir', 'BD Somani', 'Cambridge School',
         'Don Bosco High School', 'Hiranandani Foundation School Powai', 'Oberoi International', 'Vibgyor High School',
        'Tolani College of Commerce', 'Elphinstone College', 'HR College of Commerce', 
         'Ramniranjan Anandilal Podar College of Commerce and Economics', 'Sathaye College',
         'Swami Vivekanand College', 'Thakur College', 'Veermata Jijabai', 'Institute of Chemical Technology',
         'Dhirubhai Ambani International School', 'Ecole Mondiale World School', 'Gokuldham High School', 
         'Hansraj Morarji Public School', 'Jamnabai Narsee School', 'JB Petit High School', 'Rajhans Vidyalaya'
        ]
print(len(insts))

34


In [43]:
for i in insts:
    print(i)

Lilavatibai podar santacruz Mumbai
Narsee Monjee College Mumbai
University of Mumbai Mumbai
Jai Hind College Mumbai
Mithibai College Mumbai
Ramnarain Ruia College Mumbai
Sophia College Mumbai
St. Andrew's College Mumbai
St. Xaviers College Mumbai
Wilson College Mumbai
IIT Bombay Mumbai
Arya Vidya Mandir Mumbai
BD Somani Mumbai
Cambridge School Mumbai
Don Bosco High School Mumbai
Hiranandani Foundation School Powai Mumbai
Oberoi International Mumbai
Vibgyor High School Mumbai
Tolani College of Commerce Mumbai
Elphinstone College Mumbai
HR College of Commerce Mumbai
Ramniranjan Anandilal Podar College of Commerce and Economics Mumbai
Sathaye College Mumbai
Swami Vivekanand College Mumbai
Thakur College Mumbai
Veermata Jijabai Mumbai
Institute of Chemical Technology Mumbai
Dhirubhai Ambani International School Mumbai
Ecole Mondiale World School Mumbai
Gokuldham High School Mumbai
Hansraj Morarji Public School Mumbai
Jamnabai Narsee School Mumbai
JB Petit High School Mumbai
Rajhans Vidyala

### Adding 'Mumbai' to each institute to help Nominatim find latitude and longitude easily

In [7]:
insts = [x+" Mumbai" for x in insts]
print(insts[:5])

['Lilavatibai podar santacruz Mumbai', 'Narsee Monjee College Mumbai', 'University of Mumbai Mumbai', 'Jai Hind College Mumbai', 'Mithibai College Mumbai']


### Function to get latitude and longitude of each institute

In [8]:
def coords(institute):
    d = {}
    d['institute'] = institute
    geolocator = Nominatim(user_agent='myapplication')
    try:
        location = geolocator.geocode(institute).raw
        d['latitude'] = location['lat']
        d['longitude'] = location['lon']
        return d
    except Exception as e:
        print("Institute %s not found"%institute)
        return -1

In [9]:
institutes = []
for i in insts:
    details = coords(i)
    if(details!=-1):
        institutes.append(coords(i))
print(institutes)
print(len(institutes))

Institute University of Mumbai Mumbai not found
Institute HR College of Commerce Mumbai not found
Institute Ramniranjan Anandilal Podar College of Commerce and Economics Mumbai not found
Institute Sathaye College Mumbai not found
Institute Swami Vivekanand College Mumbai not found
Institute Veermata Jijabai Mumbai not found
Institute Dhirubhai Ambani International School Mumbai not found
Institute Ecole Mondiale World School Mumbai not found
Institute Gokuldham High School Mumbai not found
Institute Hansraj Morarji Public School Mumbai not found
Institute JB Petit High School Mumbai not found
Institute Rajhans Vidyalaya Mumbai not found
[{'longitude': '72.8371727', 'latitude': '19.0810735', 'institute': 'Lilavatibai podar santacruz Mumbai'}, {'longitude': '72.837347688538', 'latitude': '19.1037065', 'institute': 'Narsee Monjee College Mumbai'}, {'longitude': '72.8251531862371', 'latitude': '18.93455995', 'institute': 'Jai Hind College Mumbai'}, {'longitude': '72.8374936781393', 'latitu

We were able to find the locations of 22 institutes

Latitude and Longitude of Mumbai, India

In [10]:
mum_lat = 19.0760
mum_lon = 72.8777

### Plotting all the institutes that we are considering

In [45]:
inst_map = folium.Map(location = [mum_lat, mum_lon], zoom_start=11, tiles = "Stamen Terrain")

for d in institutes:
    folium.CircleMarker(
    [float(d['latitude']), float(d['longitude'])],
        radius = 5, 
        popup = d['institute'].translate(str.maketrans('', '', string.punctuation)),
        fill = True,
        color = '#0012EE',
        fill_color = 'red',
        fill_opacity = 0.5
    ).add_to(inst_map)
    
inst_map

### Using the FourSquare Venues API to find all food related categories

In [16]:
url = 'https://api.foursquare.com/v2/venues/categories?&client_id={}&client_secret={}&v={}'.format(
    CLIENT_ID,
    CLIENT_SECRET,
    VERSION,)

results = requests.get(url).json()

In [17]:
food_categs = []
for i in results['response']['categories'][3]['categories']:
    food_categs.append(i['name'])
print(len(food_categs))

91


We have 91 food related categories

In [18]:
food_categs[:5]

['Afghan Restaurant',
 'African Restaurant',
 'American Restaurant',
 'Asian Restaurant',
 'Australian Restaurant']

### Function to generate a random color hex code (for use in map making):

In [20]:
import random
def randomcol():
    r = lambda: random.randint(0,255)
    return('#%02X%02X%02X' % (r(),r(),r()))

### Function to extract the category of a venue from a dataframe

In [21]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']

    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

## Main plotting function for all institutes and all the food venues within 1500 metres

In [47]:
main_map = folium.Map(location=[mum_lat, mum_lon], zoom_start=11, tiles = "Stamen Terrain")
radius = 1500

eatery_data = []

def fullplot():
    for i in institutes:
        name = i['institute']
        print(name, end=' ')
        lat = i['latitude']
        lon = i['longitude']
        
        # Using the foursquare venues API to find nearby venues for an institute
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID,
            CLIENT_SECRET,
            VERSION,
            lat,
            lon,
            radius,
            100)
        results = requests.get(url).json()
        
        venues = results['response']['groups'][0]['items']

        nearby_venues = json_normalize(venues)  # flatten JSON

        # filter columns
        filtered_columns = ['venue.name', 'venue.categories',
                            'venue.location.lat', 'venue.location.lng']
        nearby_venues = nearby_venues.loc[:, filtered_columns]

        # filter the category for each row
        nearby_venues['venue.categories'] = nearby_venues.apply(
            get_category_type, axis=1)

        # clean columns
        nearby_venues.columns = [col.split(".")[-1]
                                 for col in nearby_venues.columns]
        nearby_venues = nearby_venues[nearby_venues['categories'].isin(
            food_categs)]
        
        print(", Venues: ",nearby_venues.shape[0])

        venues_i = []
        for index, row in nearby_venues.iterrows():
            d = {}
            d['name'] = row['name'].translate(str.maketrans('', '', string.punctuation))
            d['lat'] = row['lat']
            d['lng'] = row['lng']
            venues_i.append(d)
        
        # Generating a random color
        color = randomcol()
        
        # Plotting venues
        for d in venues_i:
            folium.CircleMarker(
                [float(d['lat']), float(d['lng'])],
                radius=1.5,
                popup=d['name'].translate(str.maketrans('', '', string.punctuation)),
                fill=True,
                color=color,
                fill_color='blue',
                fill_opacity=0.5
            ).add_to(main_map)
            eatery_data.append(d)
        
        # Plotting institute
        folium.CircleMarker(
            [float(lat), float(lon)],
            radius=5,
            popup=name.translate(str.maketrans('', '', string.punctuation)),
            fill=True,
            color=color,
            fill_color='red'
        ).add_to(main_map)
        
# Calling the function that adds markers to the map
fullplot()

# Printing our map
main_map

Lilavatibai podar santacruz Mumbai , Venues:  46
Narsee Monjee College Mumbai , Venues:  63
Jai Hind College Mumbai , Venues:  53
Mithibai College Mumbai , Venues:  63
Ramnarain Ruia College Mumbai , Venues:  58
Sophia College Mumbai , Venues:  52
St. Andrew's College Mumbai , Venues:  62
St. Xaviers College Mumbai , Venues:  54
Wilson College Mumbai , Venues:  69
IIT Bombay Mumbai , Venues:  25
Arya Vidya Mandir Mumbai , Venues:  63
BD Somani Mumbai , Venues:  60
Cambridge School Mumbai , Venues:  31
Don Bosco High School Mumbai , Venues:  45
Hiranandani Foundation School Powai Mumbai , Venues:  53
Oberoi International Mumbai , Venues:  28
Vibgyor High School Mumbai , Venues:  26
Tolani College of Commerce Mumbai , Venues:  27
Elphinstone College Mumbai , Venues:  50
Thakur College Mumbai , Venues:  28
Institute of Chemical Technology Mumbai , Venues:  41
Jamnabai Narsee School Mumbai , Venues:  58


**The above map is completely interactive**

**There is a problem with this map. Some of the venues overlap. So instead of grouping locations based on colleges, we can use clustering**

### Institute locations

In [44]:
instdf = pd.DataFrame(institutes)
instdf.head(10)

Unnamed: 0,institute,latitude,longitude
0,Lilavatibai podar santacruz Mumbai,19.0810735,72.8371727
1,Narsee Monjee College Mumbai,19.1037065,72.837347688538
2,Jai Hind College Mumbai,18.93455995,72.8251531862371
3,Mithibai College Mumbai,19.1028853,72.8374936781393
4,Ramnarain Ruia College Mumbai,19.02381515,72.8500989494695
5,Sophia College Mumbai,18.970042,72.8070136
6,St. Andrew's College Mumbai,19.0566226,72.8287305
7,St. Xaviers College Mumbai,18.943156,72.831870310951
8,Wilson College Mumbai,18.9567432,72.810628561733
9,IIT Bombay Mumbai,19.1330262,72.9091997


### Eateries locations

In [46]:
eaterydf = pd.DataFrame(eatery_data)
print(len(eaterydf))
eaterydf.head(10)

1055


Unnamed: 0,lat,lng,name
0,19.0807,72.840414,Sandwizzaa
1,19.07822,72.836411,Ram Shyam
2,19.077202,72.837742,Nice Fast Food Corner
3,19.075523,72.831745,Starbucks Coffee A Tata Alliance
4,19.075315,72.834669,Radhe Krishna Chat
5,19.072488,72.826692,LSD Love Sugar Dough
6,19.078858,72.829909,Le Pain Quotidien
7,19.074134,72.832351,Fellas
8,19.077763,72.837744,Yoko Sizzlers
9,19.082411,72.840759,Shabari Restaurant


## Clustering

### Let us cluster the educational institutes first.

In [35]:
num_ci = 6
kmeans_inst = KMeans(n_clusters=num_ci, random_state=0).fit(instdf.loc[:, ['latitude', 'longitude']])
id_label_inst = kmeans_inst.labels_

In [48]:
map2 = folium.Map(location=[mum_lat, mum_lon], zoom_start=11, tiles="Stamen Terrain")

for i in range(num_ci):
    cluster = np.where(id_label_inst == i)[0]
    col = randomcol()
    for la, lo, name in zip(instdf.latitude[cluster].values, instdf.longitude[cluster].values, instdf.institute[cluster].values):
        folium.CircleMarker(
            [float(la), float(lo)],
            radius=5,
            popup=name.translate(str.maketrans('', '', string.punctuation)),
            fill=True,
            color=col,
            fill_color='red'
        ).add_to(map2)

map2

### Now clustering the eateries into 10 groups

In [39]:
num_ce = 10
kmeans_eat = KMeans(n_clusters=num_ce, random_state=0).fit(eaterydf.loc[:, ['lat', 'lng']])
id_label_eat = kmeans_eat.labels_

### Visualizing the clusters

In [50]:
map3 = folium.Map(location=[mum_lat, mum_lon], zoom_start=11, tiles = "Stamen Terrain")

for i in range(num_ci):
    cluster = np.where(id_label_inst == i)[0]
    col = randomcol()
    for la, lo, name in zip(instdf.latitude[cluster].values, instdf.longitude[cluster].values, instdf.institute[cluster].values):
        folium.CircleMarker(
            [float(la), float(lo)],
            radius=5,
            popup=name.translate(str.maketrans('', '', string.punctuation)),
            fill=True,
            color=col,
            fill_color='red'
        ).add_to(map3)

for i in range(num_ce):
    cluster = np.where(id_label_eat == i)[0]
    col = randomcol()
    for la, lo, name in zip(eaterydf.lat[cluster].values, eaterydf.lng[cluster].values, eaterydf.name[cluster].values):
        folium.CircleMarker(
            [float(la), float(lo)],
            radius=1,
            popup=str(i)+ ":"+ name.translate(str.maketrans('', '', string.punctuation)),
            fill=True,
            color=col,
            fill_color='blue',
            fill_opacity=0.5
        ).add_to(map3)

map3

Each eatery has a label denoting the cluster it is in.

### Analysis

From the map, we can see that some clusters are much smaller than others. Let us look into the counts for each cluster

In [41]:
from itertools import groupby

counts = [(i, len(list(c))) for i,c in groupby(sorted(id_label_eat))]
print(counts)

[(0, 210), (1, 27), (2, 45), (3, 99), (4, 78), (5, 128), (6, 54), (7, 164), (8, 59), (9, 191)]


We can see that cluster 1 has a very small number of eateries around it. This is near *Tolani College.*

### Inferences from our analysis

Cluster 1: **Tolani College** has very few eateries around it. 
It is a well reputed college in Mumbai and has a large student population.

**A good eatery aimed at students, or even at the general public would do well here due to sheer lack of competition in the vicinity.**

## Hence we find that the best place for a food shop aimed at students in the age range of 13 to 20 year old will be near Tolani College in Andheri East,  Mumbai

## 2nd Result

We also find that 2 clusters have approximately *200* eateries near them.

**These could be great localities to advertise an upcoming new fast food shop or restaurant.**

### The first cluster is near BD Somani College,  Elphinstone College, Jai Hind College and Sophia College. (Cluster 0)

### The second is near Mithibai College, Narsee Monjee College and Jamnabai Narsee School. (Cluster 9)


### Hence these places would be good for advertising an upcoming new eatery.