<h1 style='text-align: center'>Battle of Neighborhoods in London</h1>

<h3>1. Inroduction</h3>

In this project, we’ll try to solve a problem which usually happens to people who are visiting a new city
which is, what are the good affordable restaurants in a given neighborhood/town that serve good and affordable
cuisines of ones liking? So, the scope of this project will be to provide a list of good restaurants which provide
the cuisine of their choice also it should be affordable and should have good reviews as well.

Since London is one of the many multicultural cities around the world, we will be comparing the
neighborhoods of London and cumulating a list of restaurants based on the cuisine they serve,
and their ratings. For this to work we’ll rely on data collected from different sources and various types of
visualizations including plotting the areas on the map.

<h3>2. Data</h3>

For this project we’ll be gathering public data from following sources:
1. <b>Wikipedia</b>: Here we can get the data related to boroughs and the locations within those boroughs. Since this data is based on the locations/neighborhoods under the boroughs, but using grouping we can group these areas based on the boroughs they belong to. Click <a href = 'https://en.wikipedia.org/wiki/List_of_areas_of_London'>here</a> to visit the Wikipedia page.


2. <b>Foursquare</b>: Using Foursquare API we can obtain longitudes and latitudes of all the boroughs and
using that data we can plot the locations and popular venues in London. Also using this API, we
can also get the details of all the restaurants that serve different types of cuisines from around
the world.

#### Importing necessary libraries

In [1]:
# Data imports
import pandas as pd
import numpy as np

# Visualization imports
import matplotlib.pyplot as plt
import seaborn as sns

# Webpage scraping imports
import urllib.request
from bs4 import BeautifulSoup
import requests

# Imports to transform from JSON data to Pandas DataFrame
from pandas.io.json import json_normalize

# Import for reading a JSON file 
import json

# Import for getting location coordinates
import geocoder

# Import for creating and plotting the data on a map
import folium

# Import for getting the coordinates of the given location
from geopy.geocoders import Nominatim

#### Getting Foursquare API credentials

In [2]:
#Getting foursquare credentials
f=open('credentials.json',)
cred = json.load(f)
f.close()
VERSION = '20180605' # Foursquare API version

#### Predefining important functions

In [3]:
# function that returns coordinates for given postal code
def get_coordinates(postalCode):
    coords = None
    while(coords is None):
        g = geocoder.arcgis('{}, London, United Kingdom'.format(postalCode))
        coords = g.latlng
    return coords

#creating a function that gets a nearby venues for the given location
def getVenues(name,latitude,longitude,radius=500):
    venues_list=[]
    limit=100
    for name,lat,lng in zip(name,latitude,longitude):
        #creating API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            cred['CLIENT_ID'], 
            cred['CLIENT_SECRET'], 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
        
        #creating a GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        #appending the relevant values in venue list
        venues_list.append([(
            name, 
            v['venue']['name'], 
            v['venue']['id'],
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
        
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Location', 'Venue','Venue ID', 'Venue Latitude', 'Venue Longitude', 'Venue Category']
    
    return(nearby_venues)


# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# function that returns the map plotted with areas marked based on the given address, dataframe of locations and marker color
def create_map(address,df,color):
    # Getting the corrdinates of Westminster using address
    geolocator = Nominatim(user_agent="map")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude

    # Creating a map of Westmister
    Map = folium.Map(location=[latitude, longitude], zoom_start=10)

    # Plotting all the locations on the map
    for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Boroughs'], df['Location']):
        label = '{}, {}'.format(neighborhood, borough)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color=color,
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(Map)  

    return Map

# function that returns ratings for given venue id
def get_venue_data(venue_id):
        
    #url to fetch data from foursquare api
    url = 'https://api.foursquare.com/v2/venues/{}?&client_id={}&client_secret={}&v={}'.format(
            venue_id,
            cred['CLIENT_ID'], 
            cred['CLIENT_SECRET'], 
            VERSION)
    #print(url)
    # get all the data
    rating = requests.get(url).json()["response"]["venue"]["rating"]
    likes = requests.get(url).json()["response"]["venue"]["likes"]['count']
    tipCount = requests.get(url).json()["response"]["venue"]["stats"]['tipCount']
    return rating,likes,tipCount

#### Scraping web data from wikipedia

In [4]:
#URL for wikipedia data
url='https://en.wikipedia.org/wiki/List_of_areas_of_London'

# opening the URL using urllib.request.uropen() method into the page variable
page = urllib.request.urlopen(url)

# parsing the HTML from our URL into the BeautifulSoup parse tree format
soup = BeautifulSoup(page,'lxml')
soup.prettify()

# using 'find_all' function, we can bring back all instances of the 'table' especially,
# under tbody tag in the HTML and store it in a variable
ldn_table = soup.find('table',class_ = 'wikitable sortable').tbody

#creating an area list for our data from table in wikipedia page
area_list = [[],[],[],[],[],[]]

# appending the data from wikipedia into out area list 
for row in ldn_table.findAll('tr'):
    cells=row.findAll('td')
    if len(cells)==6:
        for i in range(6):
            area_list[i].append(cells[i].find(text=True).strip())

# creating a list for column names
column_names = ['Location','Boroughs','Post Town','Postcode','Dial Code','OS grid ref']

# creating a DataFrame from our list
london_df = pd.DataFrame(columns=column_names)
for i in range(len(column_names)):
    london_df[column_names[i]] = area_list[i]
london_df

Unnamed: 0,Location,Boroughs,Post Town,Postcode,Dial Code,OS grid ref
0,Abbey Wood,"Bexley, Greenwich",LONDON,SE2,020,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4",020,TQ205805
2,Addington,Croydon,CROYDON,CR0,020,TQ375645
3,Addiscombe,Croydon,CROYDON,CR0,020,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",020,TQ478728
...,...,...,...,...,...,...
528,Woolwich,Greenwich,LONDON,SE18,020,TQ435795
529,Worcester Park,"Sutton, Kingston upon Thames",WORCESTER PARK,KT4,020,TQ225655
530,Wormwood Scrubs,Hammersmith and Fulham,LONDON,W12,020,TQ225815
531,Yeading,Hillingdon,HAYES,UB4,020,TQ115825


Since we won't be using Dial Code and OS grid ref columns in this project, we'll drop those columns

In [5]:
london_df.drop(['Dial Code','OS grid ref'],axis=1,inplace=True)
london_df

Unnamed: 0,Location,Boroughs,Post Town,Postcode
0,Abbey Wood,"Bexley, Greenwich",LONDON,SE2
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4"
2,Addington,Croydon,CROYDON,CR0
3,Addiscombe,Croydon,CROYDON,CR0
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14"
...,...,...,...,...
528,Woolwich,Greenwich,LONDON,SE18
529,Worcester Park,"Sutton, Kingston upon Thames",WORCESTER PARK,KT4
530,Wormwood Scrubs,Hammersmith and Fulham,LONDON,W12
531,Yeading,Hillingdon,HAYES,UB4


Removing all the locations from the DataFrame which don't have London as its Post Town

In [6]:
london_df = london_df[london_df['Post Town']=='LONDON']
london_df

Unnamed: 0,Location,Boroughs,Post Town,Postcode
0,Abbey Wood,"Bexley, Greenwich",LONDON,SE2
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4"
6,Aldgate,City,LONDON,EC3
7,Aldwych,Westminster,LONDON,WC2
9,Anerley,Bromley,LONDON,SE20
...,...,...,...,...
522,Wood Green,Haringey,LONDON,N22
523,Woodford,Redbridge,LONDON,"IG8, E18"
527,Woodside Park,Barnet,LONDON,N12
528,Woolwich,Greenwich,LONDON,SE18


Getting the a count for number of locations under each borough

In [7]:
london_df['Boroughs'].value_counts()

Barnet                                      25
Tower Hamlets                               21
Westminster                                 19
Hackney                                     18
Lewisham                                    17
Camden                                      17
Haringey                                    15
Islington                                   14
Newham                                      13
Southwark                                   13
Greenwich                                   13
Brent                                       11
Wandsworth                                  10
Lambeth                                     10
Hammersmith and Fulham                       9
Kensington and Chelsea                       9
Waltham Forest                               8
Enfield                                      6
Merton                                       5
Richmond upon Thames                         4
Bromley                                      4
Ealing       

The result above shows us that Boroughs like Barnet, Tower Hamlets, Westminster and Hackney etc have higher number of locations than others.

For our project we'll be choosing locations of Westminster.

In [8]:
#creating a new dataframe for the locations of Westminster.
westminster_df = london_df[london_df['Boroughs']=='Westminster']
westminster_df

Unnamed: 0,Location,Boroughs,Post Town,Postcode
7,Aldwych,Westminster,LONDON,WC2
28,Bayswater,Westminster,LONDON,W2
35,Belgravia,Westminster,LONDON,SW1
87,Charing Cross,Westminster,LONDON,WC2
95,Chinatown,Westminster,LONDON,W1
114,Covent Garden,Westminster,LONDON,WC2
273,Knightsbridge,Westminster,LONDON,SW1
287,Lisson Grove,Westminster,LONDON,NW8
289,Little Venice,Westminster,LONDON,"W9, W2"
296,Maida Vale,Westminster,LONDON,W9


Creating a function that returns coordinates for the given location.

In [9]:
#creating a list for getting all the coordinates
postalCodes,coordinates = westminster_df['Postcode'].tolist(),[[],[]]

for postcode in postalCodes:
    coor = get_coordinates(postcode)
    coordinates[0].append(coor[0])
    coordinates[1].append(coor[1])

Adding the coordinate data to out dataframe

In [10]:
westminster_df['Latitude'] = coordinates[0]
westminster_df['Longitude'] = coordinates[1]
westminster_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  westminster_df['Latitude'] = coordinates[0]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  westminster_df['Longitude'] = coordinates[1]


Unnamed: 0,Location,Boroughs,Post Town,Postcode,Latitude,Longitude
7,Aldwych,Westminster,LONDON,WC2,51.51651,-0.11968
28,Bayswater,Westminster,LONDON,W2,51.51494,-0.18048
35,Belgravia,Westminster,LONDON,SW1,51.49714,-0.13829
87,Charing Cross,Westminster,LONDON,WC2,51.51651,-0.11968
95,Chinatown,Westminster,LONDON,W1,51.51656,-0.1477
114,Covent Garden,Westminster,LONDON,WC2,51.51651,-0.11968
273,Knightsbridge,Westminster,LONDON,SW1,51.49714,-0.13829
287,Lisson Grove,Westminster,LONDON,NW8,51.53398,-0.17378
289,Little Venice,Westminster,LONDON,"W9, W2",51.52587,-0.19526
296,Maida Vale,Westminster,LONDON,W9,51.52587,-0.19526


<h3>3. Exploring the data</h3>

Let's first plot the areas on a map

In [11]:
#creating a plotting the areas of Westminster on a map
westminster_map = create_map('City of Westminster, London, UK',westminster_df,'blue')
westminster_map

#### Getting the top venues in Westminster

In [12]:
westminster_venues = getVenues(name=westminster_df['Location'],
                                   latitude=westminster_df['Latitude'],
                                   longitude=westminster_df['Longitude']
                                  )
westminster_venues['Restaurants?'] = westminster_venues['Venue Category'].str.contains('Restaurant',case=False)
westminster_venues

Unnamed: 0,Location,Venue,Venue ID,Venue Latitude,Venue Longitude,Venue Category,Restaurants?
0,Aldwych,Scarfes Bar,5261511311d2d7cfe4189803,51.517813,-0.118184,Hotel Bar,False
1,Aldwych,Rosewood London,52628efb11d2aab0a9e71f3f,51.517468,-0.117810,Hotel,False
2,Aldwych,The Hoxton Holborn,54240085498e62eee21eb8da,51.517229,-0.122002,Hotel,False
3,Aldwych,Sir John Soane's Museum,4ac518d3f964a5204fa720e3,51.516833,-0.117540,History Museum,False
4,Aldwych,Lincoln's Inn Fields,4ad862b4f964a520291121e3,51.516114,-0.116558,Park,False
...,...,...,...,...,...,...,...
1576,Westminster,Laos Cafe,568ac80c498e7291c6072233,51.493768,-0.141835,Restaurant,True
1577,Westminster,Subway,4c7e3968d65437047c86c2a2,51.493315,-0.139483,Sandwich Place,False
1578,Westminster,Pret A Manger,4cfe186dfeec6dcb9ccb5536,51.492836,-0.138057,Sandwich Place,False
1579,Westminster,Loco Mexicano,4c9a1bcad4b1b1f78d3acd35,51.493041,-0.140975,Mexican Restaurant,True


Let's create a list of restaurants from the venue list of Westminster

In [13]:
westminster_restaurants = westminster_venues[westminster_venues['Restaurants?']==True]
westminster_restaurants.drop('Restaurants?',axis=1,inplace=True)
westminster_restaurants

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(


Unnamed: 0,Location,Venue,Venue ID,Venue Latitude,Venue Longitude,Venue Category
6,Aldwych,Mirror Room,528a9b15498e8f7643a57ebe,51.517444,-0.117824,Restaurant
8,Aldwych,Holborn Dining Room,52fd08fe498e21503ed433f4,51.517493,-0.117541,English Restaurant
12,Aldwych,Barrafina,55a93450498ee308cbd91ed0,51.514417,-0.121768,Tapas Restaurant
19,Aldwych,The Delaunay,4ede68fdbe7be2833c6a17a2,51.513181,-0.117988,Restaurant
22,Aldwych,Abeno,4ad0ceaef964a520fed920e3,51.517447,-0.125168,Okonomiyaki Restaurant
...,...,...,...,...,...,...
1568,Westminster,Hai Cenato,589cb7ba5a58692759902ef3,51.497138,-0.143829,Italian Restaurant
1569,Westminster,Giraffe,4af3e1c4f964a52089ef21e3,51.493882,-0.141923,Restaurant
1576,Westminster,Laos Cafe,568ac80c498e7291c6072233,51.493768,-0.141835,Restaurant
1579,Westminster,Loco Mexicano,4c9a1bcad4b1b1f78d3acd35,51.493041,-0.140975,Mexican Restaurant


Let's see the number of restaurants in Westminster

In [14]:
print('Number of restaurants in Westminster: {}'.format(westminster_restaurants.shape[0]))

Number of restaurants in Westminster: 334


#### Let's get the ratings add to the dataframe for our restaurants in Westminster using Foursquare API

In [15]:
# I saved the westminster_restaurants data in csv file since we have a daily limit for the foursquare API calls.
# loading the westminster restaurant data from the csv file
# westminster_restaurants = pd.read_csv('westminster_restaurants.csv')
westminster_restaurants['Venue Category'].value_counts()

Italian Restaurant               38
Restaurant                       35
Sushi Restaurant                 31
French Restaurant                26
Indian Restaurant                24
Chinese Restaurant               20
Fast Food Restaurant             15
Thai Restaurant                  14
Japanese Restaurant              13
Korean Restaurant                12
English Restaurant                9
Tapas Restaurant                  9
Modern European Restaurant        8
Argentinian Restaurant            7
Portuguese Restaurant             6
Greek Restaurant                  6
Vegetarian / Vegan Restaurant     6
Lebanese Restaurant               6
Australian Restaurant             6
Seafood Restaurant                6
Mexican Restaurant                6
Asian Restaurant                  6
Scandinavian Restaurant           6
Turkish Restaurant                4
Okonomiyaki Restaurant            3
Falafel Restaurant                2
Malay Restaurant                  2
Middle Eastern Restaurant   

As we can see a person can a have a wide range of food choices, as there are restaurants that serve a wide range of cuisines like
Italian, Indian, French, Lebanese etc.

For our purpose we'll be going with regular choices that people make like Indian, Vegan, English, Mexican etc.

In [16]:
westminster_restaurants=westminster_restaurants[westminster_restaurants['Venue Category'].isin(['Indian Restaurant','Mexican Restaurant','English Restaurant','Vegetarian / Vegan Restaurant','Turkish Restaurant'])]
westminster_restaurants

Unnamed: 0,Location,Venue,Venue ID,Venue Latitude,Venue Longitude,Venue Category
8,Aldwych,Holborn Dining Room,52fd08fe498e21503ed433f4,51.517493,-0.117541,English Restaurant
89,Bayswater,Flavours of India,4c10000cce640f47ac393952,51.514022,-0.178533,Indian Restaurant
160,Bayswater,Lolita,4d874f81f9f3a1cd590bf064,51.515098,-0.176143,Turkish Restaurant
190,Belgravia,Quilon,4ac518ddf964a5208aa920e3,51.498772,-0.137522,Indian Restaurant
223,Belgravia,The English Rose Cafe And Tea Shop,5991647ab5461874ace24313,51.498144,-0.144437,English Restaurant
265,Belgravia,The Curry Room,59e502c31f74406800228913,51.49829,-0.143543,Indian Restaurant
280,Belgravia,Loco Mexicano,4c9a1bcad4b1b1f78d3acd35,51.493041,-0.140975,Mexican Restaurant
290,Charing Cross,Holborn Dining Room,52fd08fe498e21503ed433f4,51.517493,-0.117541,English Restaurant
456,Chinatown,Trishna,4ace5b33f964a5200fd020e3,51.51852,-0.153063,Indian Restaurant
458,Chinatown,Jikoni,57e44165498e7d9d2d6846c7,51.518477,-0.153213,Indian Restaurant


In [19]:
# res_ids = westminster_restaurants['Venue ID'].tolist()
# venue_data=[[],[],[]]
# for ID in res_ids:
#     rating,likes,tipCount = get_venue_data(ID)
#     venue_data[0].append(rating)
#     venue_data[1].append(likes)
#     venue_data[2].append(tipCount)