# Capstone Project - The Battle of the Neighborhoods (Week 2)

## Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Results](#results)
* [Discussion](#discussion)
* [Conclusion](#conclusion)



## Introduction: Business Problem <a name="introduction"></a>

How many times have you had food in a restaurant and wished you had never been there? There are many things which can 
put you away from a restaurant and food safety must be the most important of it. 

A bad tasting food can leave you in a bad taste and mood, but an unsafe food will cause danger to your health and can 
take you to a hospital. Hence, food safety should be of prime importance along with the taste and value for money.

Can we analyze the various restaurants located in the neighborhoods of London based on the food safety rating 
received by them via an authorized government agency, online user ratings (likes) and price category, and publish 
this data for customers to choose the neighborhoods where eating out is the best option in London?

## Data <a name="data"></a>

The analysis is largely based on the food hygiene ratings given by **Food Standards Agency (FSA) of the UK** and **Foursquare** data of the restaurants in various neighborhoods of the **city of London**.

The data about the **food safety ratings for establishments located in London** can be obtained from https://ratings.food.gov.uk/open-data/en-GB

The datasets used for this project to get online customer likes, popular categories, geolocation details, price ratings etc. were extracted using Foursquare API Venues Platform. To retrieve the necessary data types from the online platform a URL request was build using parameters from Foursquare repository. 

This detail of the Foursquare API can be found at Foursquare developer’s page https://developer.foursquare.com/docs/api

In [2]:
#importing all libraries that may be required
import pandas as pd
import numpy as np
import folium
import requests
import geocoder
from requests.auth import HTTPBasicAuth
from geopy.geocoders import Nominatim
from sklearn.cluster import KMeans
import json
from pandas.io.json import json_normalize
import random
import matplotlib.pyplot as plt # plotting library
# backend for rendering plots within the browser
%matplotlib inline 
import matplotlib.cm as cm
import matplotlib.colors as colors
print('All libraries imported!')

All libraries imported!


### I had to manually clean/edit a bit of the XML file because:
* for some of the businesses the latitude and longitude was not provided
* some of these were not restauranats so i deleted them
* Post code was also missing, so first i fixed that and based on post code i got the latitude and longitude from https://www.freemaptools.com/convert-uk-postcode-to-lat-lng.htm
* I could have used an API but the amount of data was not much
* Finally, i have my XML with all required data in ti ready to be imported below

In [125]:
#I have manually downloaded food safety rating XML file from FSA site. Will load the same here
from lxml import etree
#inFile = 'zomato-restaurants-data\\London-GB.xml'
inFile = 'zomato-restaurants-data\\London-GB.xml'
tree = etree.parse(inFile)
df_cols = ["BusinessName", "BusinessType", "PostCode", "FoodSafetyRating"]
rows = []

# First extract all the CLASS_DEF entries into a dictionary
for impexp in tree.iter("EstablishmentDetail"):
    BusinessName = impexp.find('BusinessName').text
    BusinessType = impexp.find('BusinessType').text
    Post = impexp.find('PostCode').text
    RatingValue = impexp.find('RatingValue').text

    rows.append({"BusinessName": BusinessName, "BusinessType": BusinessType, "PostCode": Post, "FoodSafetyRating": RatingValue})

    
out_df = pd.DataFrame(rows, columns = df_cols)
out_df.head()

Unnamed: 0,BusinessName,BusinessType,PostCode,FoodSafetyRating
0,Babble City,Pub/bar/nightclub,EC2N 1HT,5
1,Bad Egg,Restaurant/Cafe/Canteen,EC2Y 9AW,4
2,Badolina,Restaurant/Cafe/Canteen,EC3M 7HB,5
3,Badolina,Takeaway/sandwich shop,EC2M 4NR,5
4,Bagel Mania,Takeaway/sandwich shop,EC4Y 1BT,5


In [126]:
import xml.etree.ElementTree as ET
#import urllib2
tree =etree.parse(inFile)
root = tree.getroot()
lati=[]
lngi=[]
for each in root.findall('.//Geocode'):
    Lat = each.find('.//Latitude').text
    lati.append(Lat)
    Lng = each.find('.//Longitude').text
    lngi.append(Lng)
    
out_df['FSALatitude']=lati
out_df['FSALongitude'] = lngi

    

In [127]:
out_df.head()

Unnamed: 0,BusinessName,BusinessType,PostCode,FoodSafetyRating,FSALatitude,FSALongitude
0,Babble City,Pub/bar/nightclub,EC2N 1HT,5,51.515695,-0.084028
1,Bad Egg,Restaurant/Cafe/Canteen,EC2Y 9AW,4,51.519437,-0.089608
2,Badolina,Restaurant/Cafe/Canteen,EC3M 7HB,5,51.511858,-0.084218
3,Badolina,Takeaway/sandwich shop,EC2M 4NR,5,51.517835,-0.079643
4,Bagel Mania,Takeaway/sandwich shop,EC4Y 1BT,5,51.513942,-0.109554


In [99]:
out_df.shape

(1773, 6)

In [100]:
out_df['BusinessType'].value_counts()

Restaurant/Cafe/Canteen                  841
Takeaway/sandwich shop                   344
Pub/bar/nightclub                        229
Other catering premises                  148
Retailers - other                        134
Retailers - supermarkets/hypermarkets     34
Hotel/bed & breakfast/guest house         14
Mobile caterer                            10
Hospitals/Childcare/Caring Premises        8
School/college/university                  7
Distributors/Transporters                  2
Manufacturers/packers                      1
Importers/Exporters                        1
Name: BusinessType, dtype: int64

### We will just focus on places which serve food such as restaurants, pub, bar etc.

In [101]:
out_df=out_df[out_df.BusinessType.isin(['Restaurant/Cafe/Canteen','Takeaway/sandwich shop','Pub/bar/nightclub','Hotel/bed & breakfast/guest house'])]

In [102]:
out_df.shape

(1428, 6)

### Now we define the parameters for Foursquare API and search each restaurant from above data

In [178]:
CLIENT_ID = 'BOTFOOJQLGKTDTFJVR2GRUX0J1SCMHH0G5DUHP1SNVLC5BCM' # your Foursquare ID
CLIENT_SECRET = '5SDO5WRZSXLD5GA3Y13UCBKWYJLUA5UZOLB4EGSFPXUYJVIA' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 5 # limit of number of venues returned by Foursquare API
radius = 200 # define radius 

In [45]:
address = 'London, UK'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

51.5073219 -0.1276474


In [None]:
venues_list=[]
for index, row in out_df.iterrows():
    query=row['BusinessName']
    businesstype=row['BusinessType']
    foodrating=row['FoodSafetyRating']
    lat=row['FSALatitude']
    lng=row['FSALongitude']
    url='https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&v={}&query={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION,
            query,
            lat, 
            lng, 
            radius, 
            LIMIT)
    results = requests.get(url).json()
    venues = results['response']['venues']
    
    for v in venues:
        venueid = v['id']
        #url = 'https://api.foursquare.com/v2/venues/{}/likes?client_id={}&client_secret={}&v={}'.format(venueid, CLIENT_ID, CLIENT_SECRET, VERSION)
        #result = requests.get(url).json()
        #print(result)
        #likes = result['response']['likes']['count']
        
        url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            venueid,
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        result = requests.get(url).json()
        #print(result)
        #print('*********************************')
        try:
            likes = result['response']['venue']['likes']['count']
        except KeyError:
            likes = None
        try:
            rating = result['response']['venue']['rating']
        except KeyError:
            rating = None
        try:
            price = result['response']['venue']['price']["tier"]
        except KeyError:
            price = None #An object containing the price tier from 1 (least pricey) - 4 (most pricey) and a message describing the price tier.
        #print(likes,rating,price)
        venues_list.append([(
            query,
            businesstype,
            foodrating,
            v['id'],
            v['location']['lat'], 
            v['location']['lng'],  
            v['categories'][0]['name'],
            likes,
            rating,
            price)])

london_venues_r = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
london_venues_r.columns = ['Name', 
                  'Business Type', 
                  'Food Safety Rating', 
                  'FS_ID',
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category',
                  'Total Likes',          
                  'User Rating',
                  'Price']
london_venues_r.head()

## Foursquare API limit issue here:
**After running the above code, I found that the FS API was hitting the allowed limit throwing the below error:**<br>
{'meta': {'code': 429,<br>
  'errorType': 'quota_exceeded',<br>
  'errorDetail': 'Quota exceeded',<br>
  'requestId': '5e686096963d29001bdeba5d'},<br>
 'response': {}}<br>
**Some data had been collected which indicated that for 60 entries in FSA file, it got around 169 results back which matched the name.**

**Some of the result is not required as the category is different and I will clean it later. 
Meanwhile, it makes more sense to now send limited request to FS using multiple accounts and also limit my request from FSA XML to only BusinessType='Restaurant/Cafe/Canteen'**

**Let's see if we are able to get a better result that way :)**

In [None]:
from lxml import etree
#inFile = 'zomato-restaurants-data\\London-GB.xml'
inFile = 'zomato-restaurants-data\\London-GB.xml'
tree = etree.parse(inFile)
df_cols = ["BusinessName", "BusinessType", "PostCode", "FoodSafetyRating"]
rows = []

# First extract all the CLASS_DEF entries into a dictionary
for impexp in tree.iter("EstablishmentDetail"):
    BusinessName = impexp.find('BusinessName').text
    BusinessType = impexp.find('BusinessType').text
    Post = impexp.find('PostCode').text
    RatingValue = impexp.find('RatingValue').text

    rows.append({"BusinessName": BusinessName, "BusinessType": BusinessType, "PostCode": Post, "FoodSafetyRating": RatingValue})

    
out_df = pd.DataFrame(rows, columns = df_cols)
out_df.head()

In [None]:
import xml.etree.ElementTree as ET
#import urllib2
tree =etree.parse(inFile)
root = tree.getroot()
lati=[]
lngi=[]
for each in root.findall('.//Geocode'):
    Lat = each.find('.//Latitude').text
    lati.append(Lat)
    Lng = each.find('.//Longitude').text
    lngi.append(Lng)
    
out_df['FSALatitude']=lati
out_df['FSALongitude'] = lngi

In [157]:
out_df_R = out_df[out_df['BusinessType']=='Restaurant/Cafe/Canteen']

In [385]:
out_df_R.shape

(212, 7)

In [384]:
out_df_R = pd.read_csv('zomato-restaurants-data\\afs_only_rest.csv')

In [373]:
CLIENT_ID = '0RYZWKVMMYH2HMTYIDLQVFHRVTYYZMBIFR4CZYKTRSK2UFAR' # your Foursquare ID
CLIENT_SECRET = 'PHWOJXRDGUEYC5WYDJVMHG520UUECYJFJZVCHXQFWOQSI253' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 50 # limit of number of venues returned by Foursquare API
radius = 100 # define radius 

In [None]:
venues_list=[]
for index, row in out_df_R.iterrows():
    query=row['BusinessName']
    businesstype=row['BusinessType']
    foodrating=row['FoodSafetyRating']
    lat=row['FSALatitude']
    lng=row['FSALongitude']
    url='https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&v={}&query={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION,
            query,
            lat, 
            lng, 
            radius, 
            LIMIT)
    results = requests.get(url).json()
    venues = results['response']['venues']
    
    for v in venues:
        venueid = v['id']
        #url = 'https://api.foursquare.com/v2/venues/{}/likes?client_id={}&client_secret={}&v={}'.format(venueid, CLIENT_ID, CLIENT_SECRET, VERSION)
        #result = requests.get(url).json()
        #print(venueid,)
        #likes = result['response']['likes']['count']
        
        url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            venueid,
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        result = requests.get(url).json()
        #print('*********************************')
        try:
            likes = result['response']['venue']['likes']['count']
        except KeyError:
            likes = None
        try:
            rating = result['response']['venue']['rating']
        except KeyError:
            rating = None
        try:
            price = result['response']['venue']['price']["tier"]
        except KeyError:
            price = None 
        try:
            name = result['response']['venue']['name']
        except KeyError:
            name = None 
        print(name,likes,rating,price)
        venues_list.append([(
            name,
            businesstype,
            foodrating,
            v['id'],
            v['location']['lat'], 
            v['location']['lng'],  
            v['categories'][0]['name'],
            likes,
            rating,
            price)])

In [212]:
london_original=london_venues

In [375]:
london_venues_r = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
london_venues_r.columns = ['Name', 
                  'Business Type', 
                  'Food Safety Rating', 
                  'Venue ID',
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category',
                  'Total Likes',       
                  'User Rating',
                  'Price']
london_venues_r.head()

Unnamed: 0,Name,Business Type,Food Safety Rating,Venue ID,Venue Latitude,Venue Longitude,Venue Category,Total Likes,User Rating,Price
0,EAT.,Restaurant/Cafe/Canteen,5,4b19280af964a52094d823e3,51.519854,-0.097873,Sandwich Place,5.0,5.4,1.0
1,EAT.,Restaurant/Cafe/Canteen,5,4c36eec5ae2da593edf5fdc5,51.514508,-0.096357,Sandwich Place,5.0,5.9,1.0
2,EAT.,Restaurant/Cafe/Canteen,5,4d3ad9576de7721e8a92f249,51.513962,-0.095676,Sandwich Place,7.0,4.9,1.0
3,EAT.,Restaurant/Cafe/Canteen,5,572f63e8cd102de0088dbfd6,51.518014,-0.096905,Sandwich Place,0.0,5.8,1.0
4,Eatwell Restaurant,Restaurant/Cafe/Canteen,5,5d10c6e269ad4f0023b5921e,51.517162,-0.099004,Corporate Cafeteria,0.0,,


In [376]:
london_venues_r.shape

(235, 10)

### Since I had to consume the API many times so I am appending the result of new dataset to the old one below

In [377]:
london_venues=london_venues.append(london_venues_r)

In [378]:
london_venues.shape

(372, 10)

In [302]:
london_venues['Name'].value_counts()

Caffè Nero                          10
Barcelona Tapas Bar & Restaurant     3
Abokado                              3
1901 Restaurant and Bar              3
Bibimbap ToGo                        2
                                    ..
14 Hills                             1
City Firefly                         1
La Bottega Del Caffe                 1
Bierschenke                          1
Cafe Brera                           1
Name: Name, Length: 82, dtype: int64

### Dropping all fields with NaN values
**It was noted that Foursquare search API does not provide very precise result based on name matching. It included venues around the restuarants, many of them we didnt need. So I deleted all venues with no price and rating since they were not restaurants**

In [379]:
london_venues = london_venues.dropna()

In [380]:
london_venues.shape

(237, 10)

In [321]:
london_venues

Unnamed: 0,Name,Business Type,Food Safety Rating,Venue ID,Venue Latitude,Venue Longitude,Venue Category,Total Likes,User Rating,Price
0,"1 Lombard Street, Restaurant",Restaurant/Cafe/Canteen,5,4ac518bcf964a5207da220e3,51.513179,-0.088872,Restaurant,68.0,7.5,3.0
1,14 Hills,Restaurant/Cafe/Canteen,5,5d0f5a7044627d0023f4e721,51.512021,-0.081007,Restaurant,1.0,7.6,2.0
6,1901 Restaurant and Bar,Restaurant/Cafe/Canteen,5,5beac305d1a402002c7b31ea,51.513599,-0.087253,Restaurant,14.0,8.0,2.0
8,1901 Restaurant and Bar,Restaurant/Cafe/Canteen,5,4c1905d2f551ef3bd11d4768,51.518561,-0.093132,Pub,9.0,5.9,1.0
9,1901 Restaurant and Bar,Restaurant/Cafe/Canteen,5,594d318ba4b51b7d05bb26ed,51.525200,-0.082651,Japanese Restaurant,90.0,8.2,2.0
...,...,...,...,...,...,...,...,...,...,...
42,Canto Corvino,Restaurant/Cafe/Canteen,4,5606cd78498e9b2a65a4a9e7,51.518541,-0.078407,Italian Restaurant,34.0,7.9,4.0
43,Caravaggio,Restaurant/Cafe/Canteen,4,4ac518d5f964a52009a820e3,51.513546,-0.080872,Italian Restaurant,14.0,7.3,2.0
44,Carluccio's,Restaurant/Cafe/Canteen,5,4b1e7c10f964a5206e1a24e3,51.518430,-0.101594,Italian Restaurant,46.0,6.7,2.0
45,Casella,Restaurant/Cafe/Canteen,5,4bffb597f61ea593a49bea13,51.513802,-0.106277,Italian Restaurant,5.0,7.4,2.0


### Deleting any duplicates below since some of duplicate restaurants were found within close proximity of common venues

In [381]:
london_venues.drop_duplicates()

Unnamed: 0,Name,Business Type,Food Safety Rating,Venue ID,Venue Latitude,Venue Longitude,Venue Category,Total Likes,User Rating,Price
0,"1 Lombard Street, Restaurant",Restaurant/Cafe/Canteen,5,4ac518bcf964a5207da220e3,51.513179,-0.088872,Restaurant,68.0,7.5,3.0
1,14 Hills,Restaurant/Cafe/Canteen,5,5d0f5a7044627d0023f4e721,51.512021,-0.081007,Restaurant,1.0,7.6,2.0
6,1901 Restaurant and Bar,Restaurant/Cafe/Canteen,5,5beac305d1a402002c7b31ea,51.513599,-0.087253,Restaurant,14.0,8.0,2.0
8,1901 Restaurant and Bar,Restaurant/Cafe/Canteen,5,4c1905d2f551ef3bd11d4768,51.518561,-0.093132,Pub,9.0,5.9,1.0
9,1901 Restaurant and Bar,Restaurant/Cafe/Canteen,5,594d318ba4b51b7d05bb26ed,51.525200,-0.082651,Japanese Restaurant,90.0,8.2,2.0
...,...,...,...,...,...,...,...,...,...,...
221,Notes Coffee Barrows,Restaurant/Cafe/Canteen,5,530dcb9a498ec337c000fcea,51.519407,-0.089254,Coffee Shop,7.0,6.7,1.0
222,Nusa Kitchen,Restaurant/Cafe/Canteen,5,593fe3f10d8a0f7b00a3c00f,51.515768,-0.090636,Soup Place,5.0,5.9,1.0
228,Original Bagel Bakery,Restaurant/Cafe/Canteen,4,4bcef39bcc8cd13a480ec5cf,51.522379,-0.097641,Bagel Shop,10.0,8.0,1.0
229,Osteria,Restaurant/Cafe/Canteen,4,56b25967498e51046e48118d,51.519273,-0.093718,Italian Restaurant,7.0,6.4,2.0


In [397]:
london_venues.shape

(237, 10)

In [429]:
london_venues.describe()

Unnamed: 0,Food Safety Rating,Venue Latitude,Venue Longitude,Total Likes,User Rating,Price
count,237.0,237.0,237.0,237.0,237.0,237.0
mean,4.607595,51.514987,-0.088648,35.890295,6.947257,1.797468
std,0.809128,0.002915,0.012959,56.743892,0.936797,0.818945
min,0.0,51.504177,-0.111537,0.0,4.8,1.0
25%,4.0,51.512927,-0.096381,7.0,6.1,1.0
50%,5.0,51.514472,-0.087253,18.0,7.1,2.0
75%,5.0,51.517248,-0.082179,45.0,7.7,2.0
max,5.0,51.5252,0.049773,539.0,8.8,4.0


In [383]:
london_venues['Food Safety Rating'].value_counts()

5    177
4     38
3     15
2      4
1      2
0      1
Name: Food Safety Rating, dtype: int64

In [None]:
london_venues['Food Safety Rating'].astype(int)

In [453]:
cheapandbest=london_venues
keys = list(pricyandbest.columns.values)
i1=cheapandbest.set_index(keys).index
i2=pricyandbest.set_index(keys).index
cheapandbest=cheapandbest[~i1.isin(i2)]
keys = list(reallybad.columns.values)
i1=cheapandbest.set_index(keys).index
i2=reallybad.set_index(keys).index
cheapandbest=cheapandbest[~i1.isin(i2)]

## Methodology <a name="methodology"></a>

The basic idea of this project is to rate the restuarants in the city of London based on:
* Food safety rating
* User Rating
* Price

We have collected data of various restaurants with above fields included. We will do **one hot encoding** based on the food safety rating, user rating and price.

We will then try to fit the data into **4 clusters** and run the data through the **k-mean clustering** algorithm. 

The generated **labels** will then be assigned to the dataset.

We will then **examine data based on the labels** generated. If we find a pattern to the data based on labels, we will **rename label** to provide some **meaning to the data** as our own recommendation for the restaurant.

Finally, we will **project the various restuarants** as clusters on a map of **London** city with its **name and our recommendation**.

### Lets look at all the restuarants we have in our dataset in the map of London below

In [393]:
address = 'Coleman Street, London'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

51.5157224 -0.0899299


In [456]:
map_london=folium.Map(location=[latitude, longitude], zoom_start=14)

for lat, lon, poi, rating in zip(london_venues['Venue Latitude'], london_venues['Venue Longitude'], london_venues['Name'], london_venues['Food Safety Rating']):
    label = folium.Popup(str(poi) + " Food safety rating:" + str(rating), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='#BD2C0D',
        fill=True,
        fill_color='#F3F049',
        fill_opacity=0.7).add_to(map_london)
map_london

In [396]:
import random # library for random number generation
import numpy as np # library for vectorized computation
from sklearn.datasets.samples_generator import make_blobs
from matplotlib.ticker import NullFormatter
import matplotlib.ticker as ticker
from sklearn import preprocessing

### Here we do one hot encoding of our dataset based on the three fields: 'Food Safety Rating', 'User Rating', 'Price'

In [458]:
# one hot encoding
to_onehot = pd.get_dummies(london_venues[['Food Safety Rating', 'User Rating', 'Price']], prefix="", prefix_sep="")


to_onehot['Name'] = london_venues['Name'] 

# move name column to the first column
fixed_columns = [to_onehot.columns[-1]] + list(to_onehot.columns[:-1])
to_onehot = to_onehot[fixed_columns]

to_onehot.head()

Unnamed: 0,Name,Food Safety Rating,User Rating,Price
0,"1 Lombard Street, Restaurant",5,7.5,3.0
1,14 Hills,5,7.6,2.0
6,1901 Restaurant and Bar,5,8.0,2.0
8,1901 Restaurant and Bar,5,5.9,1.0
9,1901 Restaurant and Bar,5,8.2,2.0


### Now we define the cluster = 4, and then run the k-mean algorithm on the data

In [459]:
cluster_df = to_onehot.drop('Name', axis=1)

k_clusters = 4

# run k-means clustering
kmeans = KMeans(n_clusters=k_clusters, random_state=0).fit(cluster_df)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 3, 3, 0, 3, 0, 0, 0, 0, 0])

### The labels generated by the k-mean algo will be assigned to the dataset

In [460]:
london_venues['label'] = kmeans.labels_
london_venues.head()

Unnamed: 0,Name,Business Type,Food Safety Rating,Venue ID,Venue Latitude,Venue Longitude,Venue Category,Total Likes,User Rating,Price,label
0,"1 Lombard Street, Restaurant",Restaurant/Cafe/Canteen,5,4ac518bcf964a5207da220e3,51.513179,-0.088872,Restaurant,68.0,7.5,3.0,1
1,14 Hills,Restaurant/Cafe/Canteen,5,5d0f5a7044627d0023f4e721,51.512021,-0.081007,Restaurant,1.0,7.6,2.0,3
6,1901 Restaurant and Bar,Restaurant/Cafe/Canteen,5,5beac305d1a402002c7b31ea,51.513599,-0.087253,Restaurant,14.0,8.0,2.0,3
8,1901 Restaurant and Bar,Restaurant/Cafe/Canteen,5,4c1905d2f551ef3bd11d4768,51.518561,-0.093132,Pub,9.0,5.9,1.0,0
9,1901 Restaurant and Bar,Restaurant/Cafe/Canteen,5,594d318ba4b51b7d05bb26ed,51.5252,-0.082651,Japanese Restaurant,90.0,8.2,2.0,3


In [462]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=14)

# set color scheme for the clusters
x = np.arange(k_clusters)
ys = [i+x+(i*x)**2 for i in range(k_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, rating, cluster in zip(london_venues['Venue Latitude'], london_venues['Venue Longitude'], london_venues['Name'], london_venues['Food Safety Rating'], london_venues['label']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### We projected the result on a map and it looks good but we need to look at the data for each cluster to make sense out of it and find how it has been classified

In [478]:
london_venues[london_venues['label']==0].head(5)#very cheap, bad user rating, good food safety rating-affordableandok

Unnamed: 0,Name,Business Type,Food Safety Rating,Venue ID,Venue Latitude,Venue Longitude,Venue Category,Total Likes,User Rating,Price,label
8,1901 Restaurant and Bar,Restaurant/Cafe/Canteen,5,4c1905d2f551ef3bd11d4768,51.518561,-0.093132,Pub,9.0,5.9,1.0,0
37,Abokado,Takeaway/sandwich shop,5,4b13de37f964a520e79923e3,51.520128,-0.104246,Sushi Restaurant,18.0,6.7,2.0,0
43,Abokado,Takeaway/sandwich shop,5,4acdd443f964a52029cd20e3,51.523959,-0.087432,Sushi Restaurant,30.0,6.0,2.0,0
45,Abokado,Takeaway/sandwich shop,5,530b0630498e233d6936f4e0,51.513516,-0.073368,Sushi Restaurant,12.0,6.1,2.0,0
0,Bad Egg,Restaurant/Cafe/Canteen,4,5490afd9498ec53ea63bbf2c,51.519212,-0.089948,Diner,90.0,6.3,2.0,0


### If we look at the data above it indicates that the restaurants with label=0 are very cheap, with bad user rating, good food safety rating so we can call them 'cheap & ok but not liked'

In [474]:
london_venues[london_venues['label']==1].head()#pricey, good user rating, good food safety rating - priceyandbest

Unnamed: 0,Name,Business Type,Food Safety Rating,Venue ID,Venue Latitude,Venue Longitude,Venue Category,Total Likes,User Rating,Price,label
0,"1 Lombard Street, Restaurant",Restaurant/Cafe/Canteen,5,4ac518bcf964a5207da220e3,51.513179,-0.088872,Restaurant,68.0,7.5,3.0,1
13,Baraka Restaurant,Restaurant/Cafe/Canteen,5,50237f538055249578a0f772,51.51911,-0.086729,Seafood Restaurant,61.0,7.3,4.0,1
18,Be At One,Restaurant/Cafe/Canteen,5,4b6dfa98f964a520c5a02ce3,51.518909,-0.078269,Cocktail Bar,73.0,6.6,3.0,1
0,Cabotte,Restaurant/Cafe/Canteen,4,57dfb4a1498e923fc6548fc8,51.514984,-0.091293,French Restaurant,8.0,6.8,3.0,1
5,Camino,Restaurant/Cafe/Canteen,5,4fd110d4e4b0b63780abf524,51.510791,-0.08154,Tapas Restaurant,39.0,7.7,3.0,1


### If we look at the data above it indicates that the restaurants with label=1 are pricey, with good user rating, good food safety rating so we can call them 'pricey and best'

In [475]:
london_venues[london_venues['label']==2].head()#pricey, avg user rating, bad food safety rating -verybad

Unnamed: 0,Name,Business Type,Food Safety Rating,Venue ID,Venue Latitude,Venue Longitude,Venue Category,Total Likes,User Rating,Price,label
15,Barcelona Tapas Bar & Restaurant,Restaurant/Cafe/Canteen,3,4df9e831aeb785aedbeffc6d,51.511777,-0.083728,Spanish Restaurant,6.0,6.1,2.0,2
16,Barcelona Tapas Bar & Restaurant,Restaurant/Cafe/Canteen,3,4bb4f53329269c748be0ca92,51.512828,-0.083401,Tapas Restaurant,6.0,5.9,3.0,2
19,Bea's of Bloomsbury,Restaurant/Cafe/Canteen,3,4ccc16e897d0224bed705db8,51.513382,-0.095616,Tea Room,94.0,7.1,2.0,2
21,Bengal Tiger,Restaurant/Cafe/Canteen,3,4b9ff194f964a520764c37e3,51.513349,-0.10206,Indian Restaurant,18.0,5.6,2.0,2
27,Beppes Cafe,Restaurant/Cafe/Canteen,3,4c404717d691c9b67be38a0a,51.51775,-0.10141,Italian Restaurant,19.0,7.3,2.0,2


### If we look at the data above it indicates that the restaurants with label=2 are pricey, with average user rating, bad food safety rating so we can call them 'very bad'

In [476]:
london_venues[london_venues['label']==3].head()#cheap and the best

Unnamed: 0,Name,Business Type,Food Safety Rating,Venue ID,Venue Latitude,Venue Longitude,Venue Category,Total Likes,User Rating,Price,label
1,14 Hills,Restaurant/Cafe/Canteen,5,5d0f5a7044627d0023f4e721,51.512021,-0.081007,Restaurant,1.0,7.6,2.0,3
6,1901 Restaurant and Bar,Restaurant/Cafe/Canteen,5,5beac305d1a402002c7b31ea,51.513599,-0.087253,Restaurant,14.0,8.0,2.0,3
9,1901 Restaurant and Bar,Restaurant/Cafe/Canteen,5,594d318ba4b51b7d05bb26ed,51.5252,-0.082651,Japanese Restaurant,90.0,8.2,2.0,3
6,Bangalore Express,Restaurant/Cafe/Canteen,4,4b0da488f964a520f54c23e3,51.512928,-0.084752,Indian Restaurant,21.0,7.3,2.0,3
8,Banh Mi Bay,Restaurant/Cafe/Canteen,4,54cb7bae498e99dacf3c3a28,51.512621,-0.094847,Vietnamese Restaurant,40.0,8.1,2.0,3


### If we look at the data above it indicates that the restaurants with label=3 are cheap, with very good user rating, good food safety rating so we can call them 'cheap and best'

In [482]:
london_venues.loc[london_venues['label'] == 3, 'label'] = 'cheap and best'
london_venues.loc[london_venues['label'] == 2, 'label'] = 'very bad'
london_venues.loc[london_venues['label'] == 1, 'label'] = 'pricey and best'
london_venues.loc[london_venues['label'] == 0, 'label'] = 'cheap & ok but not liked'

### Renaming labels above

In [497]:
london_venues.loc[london_venues['label'] == 'cheap and best', 'color'] = 3
london_venues.loc[london_venues['label'] == 'very bad', 'color'] = 2
london_venues.loc[london_venues['label'] == 'pricey and best', 'color'] = 1
london_venues.loc[london_venues['label'] == 'cheap & ok but not liked', 'color'] = 0

## Results <a name="results"></a>

In [504]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=14)

# set color scheme for the clusters
x = np.arange(k_clusters)
ys = [i+x+(i*x)**2 for i in range(k_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, label, color in zip(london_venues['Venue Latitude'], london_venues['Venue Longitude'], london_venues['Name'], london_venues['label'], london_venues['color']):
    label = folium.Popup('Restaurant: ' + str(poi) + ', Recommendation: ' + str(label), parse_html=True) 
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(color)-1],
        fill=True,
        fill_color=rainbow[int(color)-1],
        fill_opacity=0.7).add_to(map_clusters)
map_clusters

### Finally, we project the changed data on the map of London again and you can now click on any restaurant to get my recommendation for it!

Our analysis shows that we can grade the various places to eat in the city of London. We can grade a restaurant into 4 different categories:

* Cheap and best 
* Pricey and best
* Cheap & ok but not liked
* Very bad

## Discussion <a name="discussion"></a>

The various categories of restaurants can be further explained as belows. It also has some actions and learning for the various stakeholders:
* Cheap and best - restaurants which are a value for money and are healthy too. They are also popular with people visiting London
* Pricey and best - restaurants which are popular, healthy to eat but may be heavier on your pocket
* Cheap & ok but not liked - restaurants which are with good health rating and cheap too but for some reasons are not very liked by people who have visited them. This is a curious case which must be inspected by both the restaurant owners and the food safety agency
* Very bad - certainly a case where nothing is good. These have very bad food rating, average user rating and are costly as well. The restuarant owners must certainly look at these and try to improve



## Conclusion <a name="conclusion"></a>

We can not be very judgemental about which neighborhood is better in terms of quality of restaurant because all categories of restaurants are pretty spread out but my recommenations can certainly help when you are in a neighborhood and want to visit a particular restaurant.

This report will prove pretty useful for restaurant owners who can improve on their food quality, taste, service and affordability of the restaurants so that they can get better recommendations

The health officials can look at the few curious cases where a good food safety rating has been assigned to restaurants which are not very much liked by their customers.