<h1 style='text-align: center'>Battle of Neighborhoods in London</h1>

<h3>1. Inroduction</h3>

In this project, we’ll try to solve a problem which happens to people who are visiting a new city
which is for example, what are the good affordable restaurants in a given neighborhood/town that serve good and affordable
cuisines of ones liking? So, the scope of this project will be to provide a list of good restaurants which provide
the cuisine of their choice also it should be affordable and should have good reviews as well.

Since London is one of the many multicultural cities around the world, we will be comparing the
neighborhoods of London and cumulating a list of restaurants based on the cuisine they serve,
their affordability and ratings.
For this to work we’ll rely on data collected from different sources and various types of
visualizations including plotting the areas on the map.

<h3>2. Gathering required data</h3>

For this project we’ll be gathering public data from following sources:
1. Wikipedia (https://en.wikipedia.org/wiki/List_of_areas_of_London): Here we can get the data
regarding boroughs and the areas within those boroughs. Since this data is based on the areas
under the boroughs, but using grouping we can group these areas based on the boroughs they
belong to. 


2. Foursquare: Using Foursquare API we can obtain longitudes and latitudes of all the boroughs and
using that data we can plot the locations and popular venues in London. Also using this API, we
can also get the details of all the restaurants that serve different types of cuisines from around
the world.

#### Importing necessary libraries

In [1]:
# Data imports
import pandas as pd
import numpy as np

# Visualization imports
import matplotlib.pyplot as plt
import seaborn as sns

# Webpage scraping imports
import urllib.request
from bs4 import BeautifulSoup
import requests

# Imports to transform from JSON data to Pandas DataFrame
from pandas.io.json import json_normalize

# Import for reading a JSON file 
import json

# Import for getting location coordinates
import geocoder

# Import for creating and plotting the data on a map
import folium

# Import for getting the coordinates of the given location
from geopy.geocoders import Nominatim

#### Scraping web data from wikipedia

In [2]:
#URL for wikipedia data
url='https://en.wikipedia.org/wiki/List_of_areas_of_London'

# opening the URL using urllib.request.uropen() method into the page variable
page = urllib.request.urlopen(url)

# parsing the HTML from our URL into the BeautifulSoup parse tree format
soup = BeautifulSoup(page,'lxml')
soup.prettify()

# using 'find_all' function, we can bring back all instances of the 'table' especially,
# under tbody tag in the HTML and store it in a variable
ldn_table = soup.find('table',class_ = 'wikitable sortable').tbody

#creating an area list for our data from table in wikipedia page
area_list = [[],[],[],[],[],[]]

# appending the data from wikipedia into out area list 
for row in ldn_table.findAll('tr'):
    cells=row.findAll('td')
    if len(cells)==6:
        for i in range(6):
            area_list[i].append(cells[i].find(text=True).strip())

# creating a list for column names
column_names = ['Location','Boroughs','Post Town','Postcode','Dial Code','OS grid ref']

# creating a DataFrame from our list
london_df = pd.DataFrame(columns=column_names)
for i in range(len(column_names)):
    london_df[column_names[i]] = area_list[i]
london_df

Unnamed: 0,Location,Boroughs,Post Town,Postcode,Dial Code,OS grid ref
0,Abbey Wood,"Bexley, Greenwich",LONDON,SE2,020,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4",020,TQ205805
2,Addington,Croydon,CROYDON,CR0,020,TQ375645
3,Addiscombe,Croydon,CROYDON,CR0,020,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",020,TQ478728
...,...,...,...,...,...,...
528,Woolwich,Greenwich,LONDON,SE18,020,TQ435795
529,Worcester Park,"Sutton, Kingston upon Thames",WORCESTER PARK,KT4,020,TQ225655
530,Wormwood Scrubs,Hammersmith and Fulham,LONDON,W12,020,TQ225815
531,Yeading,Hillingdon,HAYES,UB4,020,TQ115825


Since we won't be using Dial Code and OS grid ref columns in this project, we'll drop those columns

In [3]:
london_df.drop(['Dial Code','OS grid ref'],axis=1,inplace=True)
london_df

Unnamed: 0,Location,Boroughs,Post Town,Postcode
0,Abbey Wood,"Bexley, Greenwich",LONDON,SE2
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4"
2,Addington,Croydon,CROYDON,CR0
3,Addiscombe,Croydon,CROYDON,CR0
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14"
...,...,...,...,...
528,Woolwich,Greenwich,LONDON,SE18
529,Worcester Park,"Sutton, Kingston upon Thames",WORCESTER PARK,KT4
530,Wormwood Scrubs,Hammersmith and Fulham,LONDON,W12
531,Yeading,Hillingdon,HAYES,UB4


Removing all the locations from the DataFrame which don't have London as its Post Town

In [4]:
london_df = london_df[london_df['Post Town']=='LONDON']
london_df

Unnamed: 0,Location,Boroughs,Post Town,Postcode
0,Abbey Wood,"Bexley, Greenwich",LONDON,SE2
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4"
6,Aldgate,City,LONDON,EC3
7,Aldwych,Westminster,LONDON,WC2
9,Anerley,Bromley,LONDON,SE20
...,...,...,...,...
522,Wood Green,Haringey,LONDON,N22
523,Woodford,Redbridge,LONDON,"IG8, E18"
527,Woodside Park,Barnet,LONDON,N12
528,Woolwich,Greenwich,LONDON,SE18


Getting the a count for number of locations under each borough

In [5]:
london_df['Boroughs'].value_counts()

Barnet                                      25
Tower Hamlets                               21
Westminster                                 19
Hackney                                     18
Lewisham                                    17
Camden                                      17
Haringey                                    15
Islington                                   14
Newham                                      13
Southwark                                   13
Greenwich                                   13
Brent                                       11
Lambeth                                     10
Wandsworth                                  10
Kensington and Chelsea                       9
Hammersmith and Fulham                       9
Waltham Forest                               8
Enfield                                      6
Merton                                       5
Bromley                                      4
Richmond upon Thames                         4
Croydon      

The result above shows us that Boroughs like Barnet, Tower Hamlets, Westminster and Hackney etc have higher number of locations than others.

For our project we'll be choosing locations of Westminster and Camden which is a borough in London. 

In [6]:
#creating a new dataframe for the locations of Westminster.
westminster_df = london_df[london_df['Boroughs']=='Westminster']
westminster_df

Unnamed: 0,Location,Boroughs,Post Town,Postcode
7,Aldwych,Westminster,LONDON,WC2
28,Bayswater,Westminster,LONDON,W2
35,Belgravia,Westminster,LONDON,SW1
87,Charing Cross,Westminster,LONDON,WC2
95,Chinatown,Westminster,LONDON,W1
114,Covent Garden,Westminster,LONDON,WC2
273,Knightsbridge,Westminster,LONDON,SW1
287,Lisson Grove,Westminster,LONDON,NW8
289,Little Venice,Westminster,LONDON,"W9, W2"
296,Maida Vale,Westminster,LONDON,W9


Creating a function that returns coordinates for the given location.

In [7]:
def get_coordinates(postalCode):
    coords = None
    while(coords is None):
        g = geocoder.arcgis('{}, London, United Kingdom'.format(postalCode))
        coords = g.latlng
    return coords

#creating a list for getting all the coordinates
postalCodes,coordinates = westminster_df['Postcode'].tolist(),[[],[]]

for postcode in postalCodes:
    coor = get_coordinates(postcode)
    coordinates[0].append(coor[0])
    coordinates[1].append(coor[1])

Adding the coordinate data to out dataframe

In [8]:
westminster_df['Latitude'] = coordinates[0]
westminster_df['Longitude'] = coordinates[1]
westminster_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  westminster_df['Latitude'] = coordinates[0]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  westminster_df['Longitude'] = coordinates[1]


Unnamed: 0,Location,Boroughs,Post Town,Postcode,Latitude,Longitude
7,Aldwych,Westminster,LONDON,WC2,51.51651,-0.11968
28,Bayswater,Westminster,LONDON,W2,51.51494,-0.18048
35,Belgravia,Westminster,LONDON,SW1,51.49714,-0.13829
87,Charing Cross,Westminster,LONDON,WC2,51.51651,-0.11968
95,Chinatown,Westminster,LONDON,W1,51.51656,-0.1477
114,Covent Garden,Westminster,LONDON,WC2,51.51651,-0.11968
273,Knightsbridge,Westminster,LONDON,SW1,51.49714,-0.13829
287,Lisson Grove,Westminster,LONDON,NW8,51.53398,-0.17378
289,Little Venice,Westminster,LONDON,"W9, W2",51.52587,-0.19526
296,Maida Vale,Westminster,LONDON,W9,51.52587,-0.19526


In [9]:
#creating a new dataframe for the locations of Camden.
camden_df = london_df[london_df['Boroughs']=='Camden']
camden_df

Unnamed: 0,Location,Boroughs,Post Town,Postcode
39,Belsize Park,Camden,LONDON,NW3
54,Bloomsbury,Camden,LONDON,WC1
76,Camden Town,Camden,LONDON,NW1
86,Chalk Farm,Camden,LONDON,NW1
174,Fitzrovia,Camden,LONDON,W1
182,Frognal,Camden,LONDON,NW3
192,Gospel Oak,Camden,LONDON,"NW5, NW3"
210,Hampstead,Camden,LONDON,NW3
239,Highgate,Camden,LONDON,N6
242,Holborn,Camden,LONDON,"WC1, WC2"


Getting the coordinates for the locations in Camden using get_coordinates funtion

In [10]:
postalCodes,coordinates = camden_df['Postcode'].tolist(),[[],[]]

for postcode in postalCodes:
    coor = get_coordinates(postcode)
    coordinates[0].append(coor[0])
    coordinates[1].append(coor[1])
    
camden_df['Latitude'] = coordinates[0]
camden_df['Longitude'] = coordinates[1]
camden_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  camden_df['Latitude'] = coordinates[0]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  camden_df['Longitude'] = coordinates[1]


Unnamed: 0,Location,Boroughs,Post Town,Postcode,Latitude,Longitude
39,Belsize Park,Camden,LONDON,NW3,51.55506,-0.17348
54,Bloomsbury,Camden,LONDON,WC1,51.5245,-0.12273
76,Camden Town,Camden,LONDON,NW1,51.53354,-0.14606
86,Chalk Farm,Camden,LONDON,NW1,51.53354,-0.14606
174,Fitzrovia,Camden,LONDON,W1,51.51656,-0.1477
182,Frognal,Camden,LONDON,NW3,51.55506,-0.17348
192,Gospel Oak,Camden,LONDON,"NW5, NW3",51.55506,-0.17348
210,Hampstead,Camden,LONDON,NW3,51.55506,-0.17348
239,Highgate,Camden,LONDON,N6,51.57145,-0.14983
242,Holborn,Camden,LONDON,"WC1, WC2",51.5245,-0.12273


<h3>3. Exploring the data</h3>

Let's first plot the areas on a map

In [11]:
# Assigning the address to a variable
address = 'City of Westminster, London, UK'

# Getting the corrdinates of Westminster using address
geolocator = Nominatim(user_agent="westminster_map")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

# Creating a map of Westmister
westmister_map = folium.Map(location=[latitude, longitude], zoom_start=10)

# Plotting all the locations on the map
for lat, lng, borough, neighborhood in zip(westminster_df['Latitude'], westminster_df['Longitude'], westminster_df['Boroughs'], westminster_df['Location']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(westmister_map)  
    
westmister_map

In [12]:
#Getting foursquare credentials from a JSON file
f=open('credentials.json',)
cred = json.load(f)
f.close()
VERSION = '20180605' # Foursquare API version

#### Getting the top 100 venues in Westminster

In [13]:
# creating the URL for the request first
limit = 500 # limit for the number of venues returned by Foursquare API
radius=500 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    cred['CLIENT_ID'], 
    cred['CLIENT_SECRET'], 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    limit)

#getting the results from the API
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5eb6d135b9a389001c926459'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'City of Westminster',
  'headerFullLocation': 'City of Westminster, London',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 75,
  'suggestedBounds': {'ne': {'lat': 51.501820604500004,
    'lng': -0.1299341681427764},
   'sw': {'lat': 51.4928205955, 'lng': -0.1443638318572236}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '53645bf211d2576d6ff59feb',
       'name': 'Curzon Victoria',
       'location': {'address': '58 Victoria St',
        'lat': 51.497472657293244,
        'lng': -0.13674352372372592,
        'labeledLatLngs': [{'label': 'display',
  

Creating a function that extracts the category of the venue

In [14]:
def get_details(venue_id):
        
    #url to fetch data from foursquare api
    url = 'https://api.foursquare.com/v2/venues/{}?&client_id={}&client_secret={}&v={}'.format(
            venue_id,
            cred['CLIENT_ID'], 
            cred['CLIENT_SECRET'], 
            VERSION)
    #print(url)
    # get all the data
    results = requests.get(url).json()
    return results

In [15]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.id', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues

  nearby_venues = json_normalize(venues) # flatten JSON


Unnamed: 0,name,id,categories,lat,lng
0,Curzon Victoria,53645bf211d2576d6ff59feb,Movie Theater,51.497473,-0.136744
1,Run & Become,4b648050f964a52095b82ae3,Sporting Goods Shop,51.498128,-0.135426
2,Iris & June,534295ec498e271e54f5f569,Coffee Shop,51.496791,-0.136011
3,Taj 51 Buckingham Gate Suites & Residences,4f64ea20e4b0c7552d443a6c,Hotel,51.498598,-0.137404
4,Chez Antoinette,5c67f97335d3fc002c79193b,French Restaurant,51.497964,-0.135455
...,...,...,...,...,...
70,The Curry Room,59e502c31f74406800228913,Indian Restaurant,51.498290,-0.143543
71,Hai Cenato,589cb7ba5a58692759902ef3,Italian Restaurant,51.497138,-0.143829
72,The Jugged Hare,4b61dbc2f964a52082272ae3,Pub,51.493043,-0.138240
73,Queen Mother Sports Centre,4ac518f6f964a52082af20e3,Gym / Fitness Center,51.493856,-0.140324


Let's see what are the different venues in our dataframe

In [16]:
nearby_venues['categories'].value_counts()

Coffee Shop                   9
Hotel                         9
Sandwich Place                7
Pub                           4
Theater                       4
Sushi Restaurant              3
Gym / Fitness Center          3
Italian Restaurant            3
Indian Restaurant             2
Hotel Bar                     2
Juice Bar                     2
Sporting Goods Shop           2
Breakfast Spot                1
Tea Room                      1
Pedestrian Plaza              1
Australian Restaurant         1
Modern European Restaurant    1
Chocolate Shop                1
Art Gallery                   1
Café                          1
French Restaurant             1
Tapas Restaurant              1
Camera Store                  1
Supermarket                   1
Beer Bar                      1
Fast Food Restaurant          1
Playground                    1
Cocktail Bar                  1
Scandinavian Restaurant       1
Historic Site                 1
Street Food Gathering         1
Bistro  

In [None]:
# for id in nearby_venues['id'].tolist():
#     print(get_details(id))