# Introduction and Business Problem

# Introduction
The city of Mumbai, India is relatively Big and it is packed with restaurants, night life and amazing
people. For people that are new to Hoboken, despite its small geographic size, it can be daunting to figure out what 
restaurants are worth going to and where they are.  For people that used to live in Hoboken or are visiting Mumbai, 
how do you know what the best places are to get something to eat?
    
# Business Problem
For this project, I created a simple guide on where to eat based on Foursquare likes, restaurant category and geographic location data for restaurants in Mumbai.  I will then cluster these restaurants based on their similarities so that a 
user can easily determine what type of restaurants are best to eat at based on Foursquare user feedback.

# Data Requirements and Methodology

# Data Requirements
For this project, I will be utilizing the Foursquare API to pull the following location data on restaurants in Hoboken, NJ:
•	Venue Name
•	Venue ID
•	Venue Location
•	Venue Category
•	Count of Likes

# Data Acquisition Approach
To acquire the data mentioned above, I will need to do the following:
•	Get geolocator lat and long coordinates for Mumbai, India
•	Use Foursquare API to get a list of all venues in Mumbai
o	Get venue name, venue ID, location, category, and likes


In [12]:

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

! conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

#import beautiful soup
from urllib.request import urlopen
from bs4 import BeautifulSoup


print('Libraries imported.')

usage: conda [-h] [-V] command ...
conda: error: unrecognized arguments: # uncomment this line if you haven't completed the Foursquare API lab


Libraries imported.


usage: conda [-h] [-V] command ...
conda: error: unrecognized arguments: # uncomment this line if you haven't completed the Foursquare API lab


In [13]:
CLIENT_ID = 'D0YVHBVBS1MCYPR2AGEOK5V3VSCQMLHFRJ3Q2JG5F2Z21GS0' # your Foursquare ID
CLIENT_SECRET = 'P34NPCVIHKJQHZJJFYUEU5Q4YY1C3YIMFNORC3U4BKPGHV1A' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: D0YVHBVBS1MCYPR2AGEOK5V3VSCQMLHFRJ3Q2JG5F2Z21GS0
CLIENT_SECRET:P34NPCVIHKJQHZJJFYUEU5Q4YY1C3YIMFNORC3U4BKPGHV1A


CLIENT_ID="D0YVHBVBS1MCYPR2AGEOK5V3VSCQMLHFRJ3Q2JG5F2Z21GS0"
CLIENT_SECRET="P34NPCVIHKJQHZJJFYUEU5Q4YY1C3YIMFNORC3U4BKPGHV1A"
VERSION="20180605"

print("CLIENT_ID :"+CLIENT_ID)
print("CLIENT_SECRET :"+CLIENT_SECRET)
print("VERSION : "+VERSION)

In [14]:
LIMIT=100
radius=50000
latitude=19.0760
longitude=72.8777
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=D0YVHBVBS1MCYPR2AGEOK5V3VSCQMLHFRJ3Q2JG5F2Z21GS0&client_secret=P34NPCVIHKJQHZJJFYUEU5Q4YY1C3YIMFNORC3U4BKPGHV1A&v=20180605&ll=19.076,72.8777&radius=50000&limit=100'

In [15]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5cfdd9d61ed21914bf7d2907'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Mumbai',
  'headerFullLocation': 'Mumbai',
  'headerLocationGranularity': 'city',
  'totalResults': 230,
  'suggestedBounds': {'ne': {'lat': 19.52600045000045,
    'lng': 73.35295865165908},
   'sw': {'lat': 18.625999549999552, 'lng': 72.40244134834093}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4d9b39ed2ae860fc0f5a81cb',
       'name': 'Sofitel Mumbai BKC',
       'location': {'address': 'C 57 Block G Bandra Kurla Complex',
        'crossStreet': 'Bandra East',
        'lat': 19.067628414049473,
        'lng': 72.86949888814853,
        'labeledLatLngs': [{'label

In [16]:
def get_category_type(row):
    try :
        category_list=row["categories"]
    except:
        categories_list=row["venue.categories"]
        
    if len(categories_list)==0:
        return None
    else:
        return categories_list[0]["name"]

In [17]:
# Pull actual data from Foursquare App
venues= results['response']['groups'][0]['items']
nearby_venues=json_normalize(venues)
filtered_columns=['venue.name','venue.id','venue.categories','venue.location.lat','venue.location.lng']
nearby_venues

nearby_venues=nearby_venues.loc[:,filtered_columns]
nearby_venues['venue.categories']=nearby_venues.apply(get_category_type,axis=1)
nearby_venues

Unnamed: 0,venue.name,venue.id,venue.categories,venue.location.lat,venue.location.lng
0,Sofitel Mumbai BKC,4d9b39ed2ae860fc0f5a81cb,Hotel,19.067628,72.869499
1,Hamleys,51d2a531454ad6055f94cdce,Toy / Game Store,19.086655,72.889783
2,Starbucks Coffee Capital,546f25c5498e2d4d056ad6eb,Coffee Shop,19.063457,72.861576
3,PVR Cinemas,50950f23e4b0578aae7a7a00,Movie Theater,19.086643,72.889839
4,JW Marriott Mumbai Sahar,54a3a602498e6d5ac992f927,Hotel,19.102944,72.878108
5,Peshawari,4b0587dbf964a52069a422e3,Indian Restaurant,19.103954,72.869879
6,MCA Club,4b5b4ecdf964a5206af328e3,Gym Pool,19.060443,72.86602
7,Trident,4b865eeaf964a520af8731e3,Hotel,19.066808,72.867468
8,Hitchki,5a21b1cea4236243c9628ad2,Bar,19.06973,72.869761
9,Khau Galli,4b4c8f5df964a5201cb626e3,Food Truck,19.080442,72.904319


In [18]:
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
nearby_venues

Unnamed: 0,name,id,categories,lat,lng
0,Sofitel Mumbai BKC,4d9b39ed2ae860fc0f5a81cb,Hotel,19.067628,72.869499
1,Hamleys,51d2a531454ad6055f94cdce,Toy / Game Store,19.086655,72.889783
2,Starbucks Coffee Capital,546f25c5498e2d4d056ad6eb,Coffee Shop,19.063457,72.861576
3,PVR Cinemas,50950f23e4b0578aae7a7a00,Movie Theater,19.086643,72.889839
4,JW Marriott Mumbai Sahar,54a3a602498e6d5ac992f927,Hotel,19.102944,72.878108
5,Peshawari,4b0587dbf964a52069a422e3,Indian Restaurant,19.103954,72.869879
6,MCA Club,4b5b4ecdf964a5206af328e3,Gym Pool,19.060443,72.86602
7,Trident,4b865eeaf964a520af8731e3,Hotel,19.066808,72.867468
8,Hitchki,5a21b1cea4236243c9628ad2,Bar,19.06973,72.869761
9,Khau Galli,4b4c8f5df964a5201cb626e3,Food Truck,19.080442,72.904319


In [19]:
# find a list of unique categories from the API SO WE Can see what may or may not fit for restaurent
nearby_venues['categories'].unique()

array(['Hotel', 'Toy / Game Store', 'Coffee Shop', 'Movie Theater',
       'Indian Restaurant', 'Gym Pool', 'Bar', 'Food Truck',
       'Salad Place', 'Dim Sum Restaurant', 'Deli / Bodega', 'Park',
       'Theater', 'Bakery', 'Seafood Restaurant', 'Dessert Shop',
       'Restaurant', 'Café', 'Vegetarian / Vegan Restaurant',
       'Fast Food Restaurant', 'Irani Cafe', 'Snack Place',
       'General Entertainment', 'Cupcake Shop', 'Scenic Lookout',
       'Ice Cream Shop', 'Breakfast Spot', 'Chinese Restaurant',
       'Grocery Store', 'Pizza Place', 'Italian Restaurant',
       'Shopping Mall', 'Performing Arts Venue',
       'North Indian Restaurant', 'Mediterranean Restaurant', 'Brewery',
       'Punjabi Restaurant', 'Tea Room', 'Playground',
       'American Restaurant', 'Asian Restaurant', 'Pub', 'Multiplex',
       'Lounge', 'Diner', 'French Restaurant', 'Sushi Restaurant'],
      dtype=object)

In [20]:
# creating a list of categorie to remove from our dataframe because they are not restaurants
# I am sure there is a function that can be written to do this at scale but since it was a small list, I did it manually

removal_list = ['Gym / Fitness Center', 'Bakery', 'Park', "Women's Store", 'Sporting Goods Shop', 'Dog Run', 'Gaming Cafe',
               'Optical Shop', 'Yoga Studio', 'Pet Store', 'Shoe Repair', 'Jewelry Store', 'Record Shop', 'Juice Bar', 
               'Cosmetics Shop', 'Business Service', 'Salon / Barbershop', 'Liquor Store', 'Grocery Store', 'Stationery Store',
               'Pilates Studio', 'Dessert Shop', 'Bookstore', 'Concert Hall', 'Video Game Store', 'Pharmacy', 'Mobile Phone Shop',
               'Deli / Bodega']

nearby_venues2 = nearby_venues.copy()


#getting a clear dataframe of just restaurants
nearby_venues2 = nearby_venues2[~nearby_venues2['categories'].isin(removal_list)]
nearby_venues2

Unnamed: 0,name,id,categories,lat,lng
0,Sofitel Mumbai BKC,4d9b39ed2ae860fc0f5a81cb,Hotel,19.067628,72.869499
1,Hamleys,51d2a531454ad6055f94cdce,Toy / Game Store,19.086655,72.889783
2,Starbucks Coffee Capital,546f25c5498e2d4d056ad6eb,Coffee Shop,19.063457,72.861576
3,PVR Cinemas,50950f23e4b0578aae7a7a00,Movie Theater,19.086643,72.889839
4,JW Marriott Mumbai Sahar,54a3a602498e6d5ac992f927,Hotel,19.102944,72.878108
5,Peshawari,4b0587dbf964a52069a422e3,Indian Restaurant,19.103954,72.869879
6,MCA Club,4b5b4ecdf964a5206af328e3,Gym Pool,19.060443,72.86602
7,Trident,4b865eeaf964a520af8731e3,Hotel,19.066808,72.867468
8,Hitchki,5a21b1cea4236243c9628ad2,Bar,19.06973,72.869761
9,Khau Galli,4b4c8f5df964a5201cb626e3,Food Truck,19.080442,72.904319


In [21]:
venue_id_list=nearby_venues2['id'].tolist()
venue_id_list

['4d9b39ed2ae860fc0f5a81cb',
 '51d2a531454ad6055f94cdce',
 '546f25c5498e2d4d056ad6eb',
 '50950f23e4b0578aae7a7a00',
 '54a3a602498e6d5ac992f927',
 '4b0587dbf964a52069a422e3',
 '4b5b4ecdf964a5206af328e3',
 '4b865eeaf964a520af8731e3',
 '5a21b1cea4236243c9628ad2',
 '4b4c8f5df964a5201cb626e3',
 '5496e94e498e349305d81639',
 '524edc34498ebd9556e084fc',
 '5275159f11d265069773cd85',
 '4ee333ba9adf3982fe44da3b',
 '4b0587caf964a520dfa122e3',
 '4bcfe10f046076b0e02e6f71',
 '4d04fc8b30a58cfaead2a2e7',
 '4dc6a998b0fb5556cd0e99a2',
 '4bc78f5793bdeee1848337ae',
 '56e3bcf6498e13373cb379c6',
 '4b0587cef964a52077a222e3',
 '4b0587c9f964a520bca122e3',
 '4b0587d9f964a52021a422e3',
 '5107ff0ce4b05e493f979b9a',
 '4b812df7f964a5208f9930e3',
 '4b0587d8f964a52008a422e3',
 '55ffb37f498e62f1de3f065b',
 '50f63f31e4b050d06e1cb448',
 '4be958699a54a593eaee0a11',
 '4bc085064cdfc9b6a5ee9221',
 '4b0587ddf964a520afa422e3',
 '514c4e2ee4b0181e4990bfc1',
 '51582b9fe4b0bad2d31794da',
 '522c3e1011d20c34ea55b14c',
 '4b0587cdf964

In [22]:
url_list=[]
like_list=[]
json_list=[]

for i in venue_id_list:
    venue_url = 'https://api.foursquare.com/v2/venues/{}/likes?client_id={}&client_secret={}&v={}'.format(i,CLIENT_ID, CLIENT_SECRET, VERSION)
    url_list.append(venue_url)

for link in url_list:
    result=requests.get(link).json()
    likes=result['response']['likes']['count']
    like_list.append(likes)
print(like_list)
    

[150, 22, 20, 286, 92, 100, 112, 169, 10, 97, 54, 96, 208, 189, 244, 158, 116, 59, 164, 18, 334, 212, 196, 44, 72, 162, 70, 358, 40, 8, 58, 70, 226, 262, 216, 223, 40, 46, 67, 28, 98, 81, 11, 139, 16, 21, 114, 145, 39, 698, 179, 50, 372, 115, 26, 30, 104, 59, 89, 104, 32, 44, 297, 31, 143, 50, 139, 389, 52, 12, 53, 583, 242, 108, 31, 377, 38, 52, 343, 133, 87, 180, 27, 108, 161, 35]


In [23]:
print(len(like_list))
print(len(venue_id_list))

86
86


# Data Prep Intro
The thought process behind this is that likes are a proxy for quality. The more likes there are, the better the restaurant is. This might be incorrect but API call issues (how many I can use for free) holds me back from getting price / rating data. I will then bin this data into a quality categorical variables so we can cluster appropriately.

I am also going to create new categorical variables for the restaurants to better group them based on type of cuisine. This way you can look for good Indian food or now what type of food might be best to eat in Mumbai if you are new to the area.

In [24]:
mumbai_venues=nearby_venues2.copy()
mumbai_venues

Unnamed: 0,name,id,categories,lat,lng
0,Sofitel Mumbai BKC,4d9b39ed2ae860fc0f5a81cb,Hotel,19.067628,72.869499
1,Hamleys,51d2a531454ad6055f94cdce,Toy / Game Store,19.086655,72.889783
2,Starbucks Coffee Capital,546f25c5498e2d4d056ad6eb,Coffee Shop,19.063457,72.861576
3,PVR Cinemas,50950f23e4b0578aae7a7a00,Movie Theater,19.086643,72.889839
4,JW Marriott Mumbai Sahar,54a3a602498e6d5ac992f927,Hotel,19.102944,72.878108
5,Peshawari,4b0587dbf964a52069a422e3,Indian Restaurant,19.103954,72.869879
6,MCA Club,4b5b4ecdf964a5206af328e3,Gym Pool,19.060443,72.86602
7,Trident,4b865eeaf964a520af8731e3,Hotel,19.066808,72.867468
8,Hitchki,5a21b1cea4236243c9628ad2,Bar,19.06973,72.869761
9,Khau Galli,4b4c8f5df964a5201cb626e3,Food Truck,19.080442,72.904319


In [25]:
# add in the list of likes

mumbai_venues['total likes'] = like_list
mumbai_venues.head()

Unnamed: 0,name,id,categories,lat,lng,total likes
0,Sofitel Mumbai BKC,4d9b39ed2ae860fc0f5a81cb,Hotel,19.067628,72.869499,150
1,Hamleys,51d2a531454ad6055f94cdce,Toy / Game Store,19.086655,72.889783,22
2,Starbucks Coffee Capital,546f25c5498e2d4d056ad6eb,Coffee Shop,19.063457,72.861576,20
3,PVR Cinemas,50950f23e4b0578aae7a7a00,Movie Theater,19.086643,72.889839,286
4,JW Marriott Mumbai Sahar,54a3a602498e6d5ac992f927,Hotel,19.102944,72.878108,92


In [26]:
# lets bin total likes
print(mumbai_venues['total likes'].max())
print(mumbai_venues['total likes'].min())
print(mumbai_venues['total likes'].median())
print(mumbai_venues['total likes'].mean())

698
8
97.5
131.77906976744185


In [27]:
# let's visualize our total likes based on a histogram

import matplotlib.pyplot as plt
mumbai_venues['total likes'].hist(bins=4)
plt.show()


<Figure size 640x480 with 1 Axes>

In [28]:
# Converted likes into bins
print(np.percentile(mumbai_venues['total likes'], 25))
print(np.percentile(mumbai_venues['total likes'], 50))
print(np.percentile(mumbai_venues['total likes'], 75))

44.0
97.5
176.5


In [29]:
poor =mumbai_venues['total likes'] <= 40
below_avg=mumbai_venues[(mumbai_venues['total likes']>40) & (mumbai_venues['total likes'] <=104)]
abv_avg=mumbai_venues[(mumbai_venues['total likes']) > 104 & (mumbai_venues['total likes'] <= 194)]
good=mumbai_venues['total likes'] > 194

In [30]:
def condition(s):
    if s['total likes']<= 40:
        return 'poor'
    if s['total likes'] <= 104:
        return 'below_avg'
    if s['total likes'] <=194:
        return 'abv_avg'
    if s['total likes'] >194:
        return 'good'
    
mumbai_venues['total_likes_cat']=mumbai_venues.apply(condition,axis=1)

In [31]:
mumbai_venues['categories'].unique()

array(['Hotel', 'Toy / Game Store', 'Coffee Shop', 'Movie Theater',
       'Indian Restaurant', 'Gym Pool', 'Bar', 'Food Truck',
       'Salad Place', 'Dim Sum Restaurant', 'Theater',
       'Seafood Restaurant', 'Restaurant', 'Café',
       'Vegetarian / Vegan Restaurant', 'Fast Food Restaurant',
       'Irani Cafe', 'Snack Place', 'General Entertainment',
       'Cupcake Shop', 'Scenic Lookout', 'Ice Cream Shop',
       'Breakfast Spot', 'Chinese Restaurant', 'Pizza Place',
       'Italian Restaurant', 'Shopping Mall', 'Performing Arts Venue',
       'North Indian Restaurant', 'Mediterranean Restaurant', 'Brewery',
       'Punjabi Restaurant', 'Tea Room', 'Playground',
       'American Restaurant', 'Asian Restaurant', 'Pub', 'Multiplex',
       'Lounge', 'Diner', 'French Restaurant', 'Sushi Restaurant'],
      dtype=object)

In [32]:
bars=['Bar','Pub']
Hang_out=['Scenic Lookout','Historic Site','Shopping Mall','Gym Pool','Multiplex','Art Gallery','Theater','Lounge','Movie Theater']
Entertainment=['Toy / Game Store','Performing Arts Venue','Playground','General Entertainment']
veg_restaurant=['Indian Restaurant','Vegetarian / Vegan Restaurant','Punjabi Restaurant','Salad Place','BBQ Joint','Restaurant']
fast_food=['Pizza Place','Fast Food Restaurant','Café','Snack Place','Coffee Shop','Food Truck','Breakfast Spot']
other_restaurant=['Italian Restaurant','North Indian Restaurant','Dim Sum Restaurant','Seafood Restaurant','Chinese Restaurant']
desert=['Ice Cream Shop','Cupcake Shop']    
Hotels=['Hotel']

def conditions2(s):
    if s['categories'] in bars:
        return 'bars'
    if s['categories'] in Hotels:
        return 'Hotels'
    if s['categories'] in Hang_out:
        return 'Hang_out'
    if s['categories'] in Entertainment:
        return 'Entertainment'
    if s['categories'] in veg_restaurant:
        return 'veg_restaurant'
    if s['categories'] in fast_food:
        return 'fast_food'
    if s['categories'] in other_restaurant:
        return 'other_restaurant'
    if s['categories'] in desert:
        return 'desert'
mumbai_venues['categories_new']=mumbai_venues.apply(conditions2,axis=1)

In [33]:
mumbai_venues

Unnamed: 0,name,id,categories,lat,lng,total likes,total_likes_cat,categories_new
0,Sofitel Mumbai BKC,4d9b39ed2ae860fc0f5a81cb,Hotel,19.067628,72.869499,150,abv_avg,Hotels
1,Hamleys,51d2a531454ad6055f94cdce,Toy / Game Store,19.086655,72.889783,22,poor,Entertainment
2,Starbucks Coffee Capital,546f25c5498e2d4d056ad6eb,Coffee Shop,19.063457,72.861576,20,poor,fast_food
3,PVR Cinemas,50950f23e4b0578aae7a7a00,Movie Theater,19.086643,72.889839,286,good,Hang_out
4,JW Marriott Mumbai Sahar,54a3a602498e6d5ac992f927,Hotel,19.102944,72.878108,92,below_avg,Hotels
5,Peshawari,4b0587dbf964a52069a422e3,Indian Restaurant,19.103954,72.869879,100,below_avg,veg_restaurant
6,MCA Club,4b5b4ecdf964a5206af328e3,Gym Pool,19.060443,72.86602,112,abv_avg,Hang_out
7,Trident,4b865eeaf964a520af8731e3,Hotel,19.066808,72.867468,169,abv_avg,Hotels
8,Hitchki,5a21b1cea4236243c9628ad2,Bar,19.06973,72.869761,10,poor,bars
9,Khau Galli,4b4c8f5df964a5201cb626e3,Food Truck,19.080442,72.904319,97,below_avg,fast_food


In [34]:
# one hot encoding
# Now let's create dummy variables for our total likes and categories so we can cluster
mumbai_onehot = pd.get_dummies(mumbai_venues[['categories_new', 'total_likes_cat']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
mumbai_onehot['Name'] = mumbai_venues['name'] 

# move neighborhood column to the first column
fixed_columns = [mumbai_onehot.columns[-1]] + list(mumbai_onehot.columns[:-1])
mumbai_onehot = mumbai_onehot[fixed_columns]

mumbai_onehot.head()


Unnamed: 0,Name,Entertainment,Hang_out,Hotels,bars,desert,fast_food,other_restaurant,veg_restaurant,abv_avg,below_avg,good,poor
0,Sofitel Mumbai BKC,0,0,1,0,0,0,0,0,1,0,0,0
1,Hamleys,1,0,0,0,0,0,0,0,0,0,0,1
2,Starbucks Coffee Capital,0,0,0,0,0,1,0,0,0,0,0,1
3,PVR Cinemas,0,1,0,0,0,0,0,0,0,0,1,0
4,JW Marriott Mumbai Sahar,0,0,1,0,0,0,0,0,0,1,0,0


In [35]:
cluster_df = mumbai_onehot.drop('Name', axis=1)

k_clusters = 4

# run k-means clustering
kmeans = KMeans(n_clusters=k_clusters, random_state=0).fit(cluster_df)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([2, 1, 1, 3, 0, 0, 2, 2, 1, 0])

In [36]:
mumbai_venues['label'] = kmeans.labels_
mumbai_venues.head()

Unnamed: 0,name,id,categories,lat,lng,total likes,total_likes_cat,categories_new,label
0,Sofitel Mumbai BKC,4d9b39ed2ae860fc0f5a81cb,Hotel,19.067628,72.869499,150,abv_avg,Hotels,2
1,Hamleys,51d2a531454ad6055f94cdce,Toy / Game Store,19.086655,72.889783,22,poor,Entertainment,1
2,Starbucks Coffee Capital,546f25c5498e2d4d056ad6eb,Coffee Shop,19.063457,72.861576,20,poor,fast_food,1
3,PVR Cinemas,50950f23e4b0578aae7a7a00,Movie Theater,19.086643,72.889839,286,good,Hang_out,3
4,JW Marriott Mumbai Sahar,54a3a602498e6d5ac992f927,Hotel,19.102944,72.878108,92,below_avg,Hotels,0


In [37]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(k_clusters)
ys = [i+x+(i*x)**2 for i in range(k_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(mumbai_venues['lat'], mumbai_venues['lng'], mumbai_venues['name'], mumbai_venues['label']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# characteristics (label=0)
 o good quality food
 
 o Mostly fast food or Hangout/Ready wala food

In [38]:
mumbai_venues.loc[mumbai_venues['label']==0]

Unnamed: 0,name,id,categories,lat,lng,total likes,total_likes_cat,categories_new,label
4,JW Marriott Mumbai Sahar,54a3a602498e6d5ac992f927,Hotel,19.102944,72.878108,92,below_avg,Hotels,0
5,Peshawari,4b0587dbf964a52069a422e3,Indian Restaurant,19.103954,72.869879,100,below_avg,veg_restaurant,0
9,Khau Galli,4b4c8f5df964a5201cb626e3,Food Truck,19.080442,72.904319,97,below_avg,fast_food,0
10,Bombay Salad Co.,5496e94e498e349305d81639,Salad Place,19.064715,72.83092,54,below_avg,veg_restaurant,0
11,Masala Library,524edc34498ebd9556e084fc,Indian Restaurant,19.068931,72.869738,96,below_avg,veg_restaurant,0
22,IVY Restaurant & Banquets,4dc6a998b0fb5556cd0e99a2,Restaurant,19.069663,72.900535,59,below_avg,veg_restaurant,0
28,Mehman Nawazi,5107ff0ce4b05e493f979b9a,Indian Restaurant,19.111485,72.909,44,below_avg,veg_restaurant,0
29,Candies,4b812df7f964a5208f9930e3,Fast Food Restaurant,19.070288,72.826245,72,below_avg,fast_food,0
31,SodaBottleOpenerWala,55ffb37f498e62f1de3f065b,Irani Cafe,19.063632,72.862403,70,below_avg,,0
36,Otters Club,4b0587ddf964a520afa422e3,General Entertainment,19.060571,72.821401,58,below_avg,Entertainment,0


In [39]:
# characteristics (label = 1)
# below average food
# Mostly veg_restaurant or fast_food

In [40]:
mumbai_venues.loc[mumbai_venues['label']==1]

Unnamed: 0,name,id,categories,lat,lng,total likes,total_likes_cat,categories_new,label
1,Hamleys,51d2a531454ad6055f94cdce,Toy / Game Store,19.086655,72.889783,22,poor,Entertainment,1
2,Starbucks Coffee Capital,546f25c5498e2d4d056ad6eb,Coffee Shop,19.063457,72.861576,20,poor,fast_food,1
8,Hitchki,5a21b1cea4236243c9628ad2,Bar,19.06973,72.869761,10,poor,bars,1
24,Bombay Coffee House,56e3bcf6498e13373cb379c6,Café,19.063425,72.834671,18,poor,fast_food,1
34,Punjab Sweet House And Restaurant,4be958699a54a593eaee0a11,Indian Restaurant,19.062553,72.829423,40,poor,veg_restaurant,1
35,Chembur Post Office Wada Pav,4bc085064cdfc9b6a5ee9221,Snack Place,19.05694,72.898056,8,poor,fast_food,1
43,Naturals,5353bd55498ed44d14437bd3,Ice Cream Shop,19.111204,72.837255,40,poor,desert,1
47,Ming Yang,4b5dc6e6f964a520166b29e3,Chinese Restaurant,19.043582,72.819199,28,poor,other_restaurant,1
51,Trilok Bar & Resturant,4ed139495c5c9528fa406b06,Bar,19.027733,72.851789,11,poor,bars,1
54,Bastian,573f159b498ee6b4bb61bd71,Seafood Restaurant,19.063566,72.834807,16,poor,other_restaurant,1


In [41]:
# characteristics
# Poor quality food
# Mostly all type of restaurant

In [42]:
mumbai_venues.loc[mumbai_venues['label']==2]

Unnamed: 0,name,id,categories,lat,lng,total likes,total_likes_cat,categories_new,label
0,Sofitel Mumbai BKC,4d9b39ed2ae860fc0f5a81cb,Hotel,19.067628,72.869499,150,abv_avg,Hotels,2
6,MCA Club,4b5b4ecdf964a5206af328e3,Gym Pool,19.060443,72.86602,112,abv_avg,Hang_out,2
7,Trident,4b865eeaf964a520af8731e3,Hotel,19.066808,72.867468,169,abv_avg,Hotels,2
13,Yauatcha,4ee333ba9adf3982fe44da3b,Dim Sum Restaurant,19.06137,72.86268,189,abv_avg,other_restaurant,2
18,Prithvi Theatre,4bcfe10f046076b0e02e6f71,Theater,19.106157,72.82581,158,abv_avg,Hang_out,2
20,Mahesh Lunch Home,4d04fc8b30a58cfaead2a2e7,Seafood Restaurant,19.10368,72.826794,116,abv_avg,other_restaurant,2
23,ITC Maratha,4bc78f5793bdeee1848337ae,Hotel,19.104023,72.869638,164,abv_avg,Hotels,2
30,Guru Kripa,4b0587d8f964a52008a422e3,Indian Restaurant,19.042955,72.861796,162,abv_avg,veg_restaurant,2
53,Pali Village Café,4be57f22bcef2d7fa4d603e5,Café,19.0633,72.829452,139,abv_avg,fast_food,2
56,Joey's Pizza,4cc1d37c3d7fa1cd0de39a5f,Pizza Place,19.126762,72.830001,114,abv_avg,fast_food,2


In [43]:
# characteristics
# above average food
# Mostly Hotels

In [44]:
mumbai_venues.loc[mumbai_venues['label']==3]

Unnamed: 0,name,id,categories,lat,lng,total likes,total_likes_cat,categories_new,label
3,PVR Cinemas,50950f23e4b0578aae7a7a00,Movie Theater,19.086643,72.889839,286,good,Hang_out,3
12,Starbucks Coffee: A Tata Alliance,5275159f11d265069773cd85,Coffee Shop,19.075523,72.831745,208,good,fast_food,3
15,Grand Hyatt,4b0587caf964a520dfa122e3,Hotel,19.076832,72.85127,244,good,Hotels,3
25,Café Madras,4b0587cef964a52077a222e3,Café,19.027721,72.855196,334,good,fast_food,3
26,JW Marriott Mumbai Juhu,4b0587c9f964a520bca122e3,Hotel,19.101915,72.826325,212,good,Hotels,3
27,Ram Ashraya,4b0587d9f964a52021a422e3,Vegetarian / Vegan Restaurant,19.028092,72.851729,196,good,veg_restaurant,3
32,Starbucks Coffee: A Tata Alliance,50f63f31e4b050d06e1cb448,Coffee Shop,19.116317,72.9097,358,good,fast_food,3
38,Starbucks Coffee: A Tata Alliance,51582b9fe4b0bad2d31794da,Coffee Shop,19.092329,72.85605,226,good,fast_food,3
39,Starbucks Coffee: A Tata Alliance,522c3e1011d20c34ea55b14c,Coffee Shop,19.100479,72.827438,262,good,fast_food,3
40,Prithvi Cafe,4b0587cdf964a52047a222e3,Café,19.106153,72.825866,216,good,fast_food,3
