# Peer-graded Assignment: Capstone Project - The Battle of Neighborhoods

This report includes six sections as follows:


* Introduction 
* Data 
* Methodology 
* Results 
* Discussion 
* Conclusion 

## Introduction


Manchester and Liverpool are two major cities in North West England. Both cities are noted for their culture, architecture, musical exports, sports clubs and transport links. 

According to Wikepedia (www.wikipedia.org):
    
    * Manchester is the third-most visited city in the UK with a population of 545,500 as of 2017 and with a GDP of $102.3bn (2015). The economy grew relatively strongly between 2002 and 2012. 
    According to 2019 property investment research, Manchester is rated as the second location for "The Best Places to Invest in Property In The UK". 
    
    * Liverpool is the fifth-largest city in the UK with a population of 2.24 million in 2011 and with a GDP at $65.8bn (2014). The Economy of Liverpool is one of the largest within the UK.
    The important component of Liverpool's economy are the tuourism and leisure sectors. Car manufacturing also takes place in the city. 
    
In this project, the above two cities will be compared in details using machine learning segmentation and clustering along with Foursquare data. 
The objectives include: 

    * How similar these two cities are? 
    * Which city is better for living for a university student?
    

## Data

In order to apply this study, basic geo data of this two cities need to be collected:
    
    * Postcode for Liverpool (https://en.wikipedia.org/wiki/L_postcode_area)
    * Postcode for Manchester (https://en.wikipedia.org/wiki/M_postcode_area)
            
    The latitude and longitude data also required:
    
    * (https://www.freemaptools.com/download/outcode-postcodes/postcode-outcodes.csv)

In [2]:
#pip install wikipedia, lxml
!conda install -c conda-forge wikipedia --yes 
!conda install -c conda-forge lxml --yes
import pandas as pd 
 
import wikipedia as wp
from bs4 import BeautifulSoup


Solving environment: done


  current version: 4.5.11
  latest version: 4.7.11

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs: 
    - wikipedia


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    soupsieve-1.9.2            |           py36_0          59 KB  conda-forge
    wikipedia-1.4.0            |             py_2          13 KB  conda-forge
    beautifulsoup4-4.8.0       |           py36_0         144 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         216 KB

The following NEW packages will be INSTALLED:

    soupsieve:      1.9.2-py36_0 conda-forge
    wikipedia:      1.4.0-py_2   conda-forge

The following packages will be UPDATED:

    beautifulsoup4: 4.6.3-py

In [3]:
# get postcode data for Manchester and Liverpool
html_m = wp.page("M postcode area").html().encode("UTF-8")
html_l = wp.page("L postcode area").html().encode("UTF-8")
df_m = pd.read_html(html_m, header = 0)[1]
df_l = pd.read_html(html_l, header = 0)[1]
df_m = df_m.rename(columns={'Postcode district': 'postcode'}) 
df_l = df_l.rename(columns={'Postcode district': 'postcode'})


In [4]:
df_m.shape

(52, 4)

In [5]:
df_m

Unnamed: 0,postcode,Post town,Coverage,Local authority area
0,M1,MANCHESTER,"Piccadilly, City Centre, Market Street",Manchester
1,M2,MANCHESTER,"Deansgate, City Centre",Manchester
2,"M3(Sectors 1, 2, 3, 4 and 9)",MANCHESTER,"City Centre, Deansgate, Castlefield",Manchester
3,"M3(Sectors 5, 6 and 7)",SALFORD,"Blackfriars, Greengate, Trinity",Salford
4,M4,MANCHESTER,"Ancoats, Northern Quarter, Strangeways",Manchester
5,M5,SALFORD,"Ordsall, Seedley, Weaste, University",Salford
6,M6,SALFORD,"Pendleton, Irlams o' th' Height, Langworthy, S...",Salford
7,M7,SALFORD,"Higher Broughton, Cheetwood, Lower Broughton, ...",Salford
8,M8,MANCHESTER,"Crumpsall, Cheetham Hill",Manchester
9,M9,MANCHESTER,"Harpurhey, Blackley",Manchester


In [6]:
df_l

Unnamed: 0,postcode,Post town,Coverage,Local authority area
0,L1,LIVERPOOL,City Centre,Liverpool
1,L2,LIVERPOOL,City Centre,Liverpool
2,L3,LIVERPOOL,"City Centre, Everton, Vauxhall",Liverpool
3,L4,LIVERPOOL,"Anfield, Kirkdale, Walton",Liverpool
4,L5,LIVERPOOL,"Anfield, Everton, Kirkdale, Vauxhall",Liverpool
5,L6,LIVERPOOL,"Anfield, City Centre, Everton, Fairfield, Kens...",Liverpool
6,L7,LIVERPOOL,"City Centre, Edge Hill, Fairfield, Kensington",Liverpool
7,L8,LIVERPOOL,"City Centre, Dingle, Toxteth",Liverpool
8,L9,LIVERPOOL,"Aintree, Fazakerley, Orrell Park, Walton","Liverpool, Sefton"
9,L10,LIVERPOOL,"Aintree Village, Fazakerley","Sefton, Liverpool, Knowsley"


In [7]:
# get the Latitude and Longitude of each postcode

postcode = pd.read_csv("https://www.freemaptools.com/download/outcode-postcodes/postcode-outcodes.csv")


In [8]:
postcode.head


<bound method NDFrame.head of         id postcode  latitude  longitude
0        2     AB10  57.13514   -2.11731
1        3     AB11  57.13875   -2.09089
2        4     AB12  57.10100   -2.11060
3        5     AB13  57.10801   -2.23776
4        6     AB14  57.10076   -2.27073
...    ...      ...       ...        ...
2998  3001     WV98   0.00000    0.00000
2999  3002      S95   0.00000    0.00000
3000  3003     PA80   0.00000    0.00000
3001  3004      L80   0.00000    0.00000
3002  3005      BS0   0.00000    0.00000

[3003 rows x 4 columns]>

In [9]:
# tidy the table, delete the first column
postcode = postcode.drop('id',1)
postcode.head
# df.drop('column_name', axis=1, inplace=True)

<bound method NDFrame.head of      postcode  latitude  longitude
0        AB10  57.13514   -2.11731
1        AB11  57.13875   -2.09089
2        AB12  57.10100   -2.11060
3        AB13  57.10801   -2.23776
4        AB14  57.10076   -2.27073
...       ...       ...        ...
2998     WV98   0.00000    0.00000
2999      S95   0.00000    0.00000
3000     PA80   0.00000    0.00000
3001      L80   0.00000    0.00000
3002      BS0   0.00000    0.00000

[3003 rows x 3 columns]>

In [10]:
postcode.shape

(3003, 3)

In [11]:
# Merge the geocodes into the data frames of the two cities

import requests
import io


df_m = pd.merge(postcode, df_m, on='postcode')
df_m.shape
df_m

Unnamed: 0,postcode,latitude,longitude,Post town,Coverage,Local authority area
0,M1,53.47734,-2.23508,MANCHESTER,"Piccadilly, City Centre, Market Street",Manchester
1,M11,53.47834,-2.17933,MANCHESTER,"Clayton, Openshaw, Beswick",Manchester
2,M12,53.46482,-2.20187,MANCHESTER,"Ardwick, Longsight, Chorlton-on-Medlock",Manchester
3,M13,53.4603,-2.21389,MANCHESTER,"Ardwick, Longsight, Chorlton-on-Medlock",Manchester
4,M14,53.4477,-2.22437,MANCHESTER,"Fallowfield, Moss Side, Ladybarn, Rusholme, Lo...",Manchester
5,M15,53.46563,-2.25008,MANCHESTER,"Hulme, Manchester Science Park, Old Trafford","Manchester, Trafford"
6,M16,53.45481,-2.26357,MANCHESTER,"Firswood, Old Trafford, Whalley Range, Moss Side","Manchester, Trafford"
7,M17,53.46906,-2.31789,MANCHESTER,"Trafford Park, The Trafford Centre",Trafford
8,M18,53.46127,-2.16871,MANCHESTER,"Abbey Hey, Gorton",Manchester
9,M19,53.43696,-2.19421,MANCHESTER,"Levenshulme, Burnage","Manchester, Stockport"


In [12]:
df_l = pd.merge(postcode, df_l, on='postcode')

In [13]:
df_l.shape

(51, 6)

In [14]:
df_l

Unnamed: 0,postcode,latitude,longitude,Post town,Coverage,Local authority area
0,L1,53.40254,-2.97928,LIVERPOOL,City Centre,Liverpool
1,L10,53.47398,-2.92668,LIVERPOOL,"Aintree Village, Fazakerley","Sefton, Liverpool, Knowsley"
2,L11,53.44801,-2.91407,LIVERPOOL,"Clubmoor, Croxteth, Gillmoss, Norris Green",Liverpool
3,L12,53.43467,-2.89421,LIVERPOOL,"Croxteth Park, West Derby",Liverpool
4,L13,53.4174,-2.91943,LIVERPOOL,"Clubmoor, Old Swan, Stoneycroft, Tuebrook",Liverpool
5,L14,53.41861,-2.87883,LIVERPOOL,"Broadgreen, Dovecot, Knotty Ash, Page Moss","Liverpool, Knowsley"
6,L15,53.39763,-2.91901,LIVERPOOL,Wavertree,Liverpool
7,L16,53.39876,-2.88744,LIVERPOOL,"Broadgreen, Bowring Park, Childwall","Liverpool, Knowsley"
8,L17,53.37769,-2.93962,LIVERPOOL,"Aigburth, St Michael's Hamlet, Sefton Park",Liverpool
9,L18,53.38064,-2.90661,LIVERPOOL,"Allerton, Mossley Hill",Liverpool


# Methodology

In this section, the Foursquare API will be applied to expolore neighborhoods in both cities, Manchester and Liverpool. Then, the most common venue categories in each neighborhood will be explored.
K-means clustering algorithm will be used to group the neighborhoods into clusters. 
Clustered data will be presented by using the Folium library.


### Import necessary Libraries

In [15]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

print('Folium installed')
print('Libraries imported.')

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.11

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          90 KB

The following NEW packages will be INSTALLED:

    geographiclib: 1.49-py_0   conda-forge
    geopy:         1.20.0-py_0 conda-forge


Downloading and Extracting Packages
geopy-1.20.0         | 57 KB     | ##################################### | 100% 
geographiclib-1.49   | 32 KB     | ##

### Define Foursquare Credentials and Version

In [16]:
CLIENT_ID = '40TA5YCVA3CXTBIWMNPDMBKLDL4O4BVC5E4BFN5KXKMMKHL4' #  Foursquare ID
CLIENT_SECRET = 'OFICYUUXDQKDMPHZOTTNFIWVMK5EFL23DO1HPXN2SKZMBJZM' #  Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('My credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

My credentails:
CLIENT_ID: 40TA5YCVA3CXTBIWMNPDMBKLDL4O4BVC5E4BFN5KXKMMKHL4
CLIENT_SECRET:OFICYUUXDQKDMPHZOTTNFIWVMK5EFL23DO1HPXN2SKZMBJZM


### Search for a specific venue category

For example, let's say we love Chinese food. Let's have a look of number of Chinese restaurant around The University of Liverpool and The University of Manchester :)

##### Chinese restaurants around The University of Liverpool

In [17]:
address_l = 'Chatham St, Liverpool L69 7ZN'

geolocator_l = Nominatim(user_agent="foursquare_agent")
location_l = geolocator_l.geocode(address_l)
latitude_l = location_l.latitude
longitude_l = location_l.longitude
print(latitude_l, longitude_l)

53.4001881 -2.9644371


In [18]:
search_query= 'Chinese'
radius_l = 1000
print(search_query + ' .... OK!')
url_l = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude_l, longitude_l, VERSION, search_query, radius_l, LIMIT)
url_l
results_l = requests.get(url_l).json()
results_l

Chinese .... OK!


{'meta': {'code': 200, 'requestId': '5d52af455bc9e30024d2d7c9'},
 'response': {'venues': [{'id': '544e9938498e020c02a65b06',
    'name': 'Campus real Chinese restaurant',
    'location': {'address': '12 Myrtle Street',
     'crossStreet': 'Liverpool',
     'lat': 53.40097,
     'lng': -2.966229,
     'labeledLatLngs': [{'label': 'display',
       'lat': 53.40097,
       'lng': -2.966229}],
     'distance': 147,
     'postalCode': 'L7 7DP',
     'cc': 'GB',
     'country': 'United Kingdom',
     'formattedAddress': ['12 Myrtle Street (Liverpool)',
      'L7 7DP',
      'United Kingdom']},
    'categories': [{'id': '4bf58dd8d48988d145941735',
      'name': 'Chinese Restaurant',
      'pluralName': 'Chinese Restaurants',
      'shortName': 'Chinese',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/asian_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1565699909',
    'hasPerk': False},
   {'id': '54198a46498efed5afdc5d80',
    'name': 'Lida

In [19]:
# assign relevant part of JSON to venues
venues_l = results_l['response']['venues']

# tranform venues into a dataframe
dataframe_l = json_normalize(venues_l)
dataframe_l.head()

Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.crossStreet,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.country,location.formattedAddress,location.city,location.state
0,544e9938498e020c02a65b06,Campus real Chinese restaurant,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1565699909,False,12 Myrtle Street,Liverpool,53.40097,-2.966229,"[{'label': 'display', 'lat': 53.40097, 'lng': ...",147,L7 7DP,GB,United Kingdom,"[12 Myrtle Street (Liverpool), L7 7DP, United ...",,
1,54198a46498efed5afdc5d80,Lida Chinese supermarket,"[{'id': '4d954b0ea243a5684a65b473', 'name': 'C...",v-1565699909,False,,,53.400806,-2.961559,"[{'label': 'display', 'lat': 53.4008062956977,...",203,,GB,United Kingdom,[United Kingdom],,
2,4d9092e3d4ec8cfa1fb0a589,Arch Chinese Restaurant And Takeaway,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1565699909,False,9-13 Berry St.,,53.400157,-2.976411,"[{'label': 'display', 'lat': 53.400157, 'lng':...",794,L1 9DF,GB,United Kingdom,"[9-13 Berry St., Liverpool, L1 9DF, United Kin...",Liverpool,Liverpool
3,4ba22649f964a52054df37e3,Hondo Chinese Supermarket,"[{'id': '4bf58dd8d48988d118951735', 'name': 'G...",v-1565699909,False,,,53.399247,-2.974334,"[{'label': 'display', 'lat': 53.39924686153291...",665,,GB,United Kingdom,[United Kingdom],,
4,50acff2f498e9668ba74b303,Mabo Chinese restaurant,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1565699909,False,,,53.399361,-2.976776,"[{'label': 'display', 'lat': 53.39936105955277...",824,,GB,United Kingdom,[United Kingdom],,


In [20]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe_l.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe_l.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,country,formattedAddress,city,state,id
0,Campus real Chinese restaurant,Chinese Restaurant,12 Myrtle Street,Liverpool,53.40097,-2.966229,"[{'label': 'display', 'lat': 53.40097, 'lng': ...",147,L7 7DP,GB,United Kingdom,"[12 Myrtle Street (Liverpool), L7 7DP, United ...",,,544e9938498e020c02a65b06
1,Lida Chinese supermarket,Convenience Store,,,53.400806,-2.961559,"[{'label': 'display', 'lat': 53.4008062956977,...",203,,GB,United Kingdom,[United Kingdom],,,54198a46498efed5afdc5d80
2,Arch Chinese Restaurant And Takeaway,Chinese Restaurant,9-13 Berry St.,,53.400157,-2.976411,"[{'label': 'display', 'lat': 53.400157, 'lng':...",794,L1 9DF,GB,United Kingdom,"[9-13 Berry St., Liverpool, L1 9DF, United Kin...",Liverpool,Liverpool,4d9092e3d4ec8cfa1fb0a589
3,Hondo Chinese Supermarket,Grocery Store,,,53.399247,-2.974334,"[{'label': 'display', 'lat': 53.39924686153291...",665,,GB,United Kingdom,[United Kingdom],,,4ba22649f964a52054df37e3
4,Mabo Chinese restaurant,Chinese Restaurant,,,53.399361,-2.976776,"[{'label': 'display', 'lat': 53.39936105955277...",824,,GB,United Kingdom,[United Kingdom],,,50acff2f498e9668ba74b303
5,SIDA Chinese Supermarket,Food & Drink Shop,Bold St,,53.40207,-2.976003,"[{'label': 'display', 'lat': 53.40207027311445...",795,,GB,United Kingdom,"[Bold St, Liverpool, United Kingdom]",Liverpool,Liverpool,53285222498e5a5ccd1df306
6,favorite Chinese take away,Chinese Restaurant,,,53.399204,-2.977068,"[{'label': 'display', 'lat': 53.399204, 'lng':...",845,,GB,United Kingdom,[United Kingdom],,,56a13df3498e05a14e96b16f
7,Mr Chilli Chinese Restaurant,Chinese Restaurant,46-48 Mt Pleasant,,53.404192,-2.975556,"[{'label': 'display', 'lat': 53.404192, 'lng':...",862,L3,GB,United Kingdom,"[46-48 Mt Pleasant, Liverpool, L3, United King...",Liverpool,Liverpool,4cf033de7e93f04d4a9e4569
8,Liverpool Chinese Gospel Church,Church,,,53.397592,-2.978805,"[{'label': 'display', 'lat': 53.397592, 'lng':...",996,,GB,United Kingdom,[United Kingdom],,,4ff05936e4b0a36f6f21ced1
9,SIDA Chinese Supermarket,Grocery Store,London Road,,53.409794,-2.967026,"[{'label': 'display', 'lat': 53.40979370945597...",1083,,GB,United Kingdom,"[London Road, Liverpool, United Kingdom]",Liverpool,Liverpool,516beda3e4b0510f20faca98


In [21]:
dataframe_filtered.name

0           Campus real Chinese restaurant
1                 Lida Chinese supermarket
2     Arch Chinese Restaurant And Takeaway
3                Hondo Chinese Supermarket
4                  Mabo Chinese restaurant
5                 SIDA Chinese Supermarket
6               favorite Chinese take away
7             Mr Chilli Chinese Restaurant
8          Liverpool Chinese Gospel Church
9                 SIDA Chinese Supermarket
10                   Chung Wah Supermarket
11                                 Mei Mei
Name: name, dtype: object

In [22]:
venues_map = folium.Map(location=[latitude_l, longitude_l], zoom_start=13) # generate map centred around the Conrad Hotel

# add a red circle marker to represent the U of L
folium.features.CircleMarker(
    [latitude_l, longitude_l],
    radius=10,
    color='red',
    # popup='Mr Chilli',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Italian restaurants as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

##### Chinese restaurants around The University of Manchester

In [23]:
CLIENT_ID = '40TA5YCVA3CXTBIWMNPDMBKLDL4O4BVC5E4BFN5KXKMMKHL4' #  Foursquare ID
CLIENT_SECRET = 'OFICYUUXDQKDMPHZOTTNFIWVMK5EFL23DO1HPXN2SKZMBJZM' #  Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('My credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)



My credentails:
CLIENT_ID: 40TA5YCVA3CXTBIWMNPDMBKLDL4O4BVC5E4BFN5KXKMMKHL4
CLIENT_SECRET:OFICYUUXDQKDMPHZOTTNFIWVMK5EFL23DO1HPXN2SKZMBJZM


In [24]:
address_m = 'Oxford Rd, Manchester M13 9PL'

geolocator_m = Nominatim(user_agent="foursquare_agent")
location_m = geolocator_m.geocode(address_m)
latitude_m = location_m.latitude
longitude_m = location_m.longitude
print(latitude_m, longitude_m)


53.4669225 -2.234321


In [25]:
search_query= 'Chinese'
radius = 1000
print(search_query + ' .... OK!')
url_m = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude_m, longitude_m, VERSION, search_query, radius, LIMIT)
url_m
results_m = requests.get(url_m).json()
results_m


Chinese .... OK!


{'meta': {'code': 200, 'requestId': '5d52af46ba9211002531c42e'},
 'response': {'venues': [{'id': '4ade0e19f964a520cb6e21e3',
    'name': 'Tai Pan Chinese Restaurant | 喜臨門大酒楼',
    'location': {'address': 'Brunswick House 81-97 Upper Brook St.',
     'lat': 53.46750336549708,
     'lng': -2.229680915813638,
     'labeledLatLngs': [{'label': 'display',
       'lat': 53.46750336549708,
       'lng': -2.229680915813638}],
     'distance': 314,
     'postalCode': 'M13 9TX',
     'cc': 'GB',
     'city': 'Manchester',
     'country': 'United Kingdom',
     'formattedAddress': ['Brunswick House 81-97 Upper Brook St.',
      'Manchester',
      'M13 9TX',
      'United Kingdom']},
    'categories': [{'id': '4bf58dd8d48988d145941735',
      'name': 'Chinese Restaurant',
      'pluralName': 'Chinese Restaurants',
      'shortName': 'Chinese',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/asian_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1565

In [26]:

# assign relevant part of JSON to venues
venues_m = results_m['response']['venues']

# tranform venues into a dataframe
dataframe_m = json_normalize(venues_m)
dataframe_m.head()



Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.country,location.formattedAddress,location.state
0,4ade0e19f964a520cb6e21e3,Tai Pan Chinese Restaurant | 喜臨門大酒楼,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1565699910,False,Brunswick House 81-97 Upper Brook St.,53.467503,-2.229681,"[{'label': 'display', 'lat': 53.46750336549708...",314,M13 9TX,GB,Manchester,United Kingdom,"[Brunswick House 81-97 Upper Brook St., Manche...",
1,4aef4a07f964a52043d721e3,Red Chilli Chinese Restaurant | 红椒京川菜馆,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1565699910,False,403-419 Oxford Rd.,53.462975,-2.229924,"[{'label': 'display', 'lat': 53.46297481668661...",527,M13 9WL,GB,Manchester,United Kingdom,"[403-419 Oxford Rd., Manchester, M13 9WL, Unit...",
2,4ec02bfc61af06192b6e6825,Azuma Chinese Restaurant,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1565699910,False,73 Cavendish St.,53.468743,-2.241827,"[{'label': 'display', 'lat': 53.46874339500669...",537,M15 6BN,GB,Manchester,United Kingdom,"[73 Cavendish St., Manchester, M15 6BN, United...",
3,4b6ac719f964a520e3df2be3,Lameizi Chinese Supermarket & Restaurant,"[{'id': '4bf58dd8d48988d1f9941735', 'name': 'F...",v-1565699910,False,Unit 2 The Quadrangle Chester St,53.47224,-2.24031,"[{'label': 'display', 'lat': 53.47224025486125...",712,M1 5QS,GB,Manchester,United Kingdom,"[Unit 2 The Quadrangle Chester St, Manchester,...",
4,4ce92a29d27560fc61aa983a,Manchester Chinese Centre,"[{'id': '4bf58dd8d48988d1a8941735', 'name': 'G...",v-1565699910,False,67 Ardwick Green North,53.471406,-2.225007,"[{'label': 'display', 'lat': 53.47140574894196...",793,M12 6FX,GB,Manchester,United Kingdom,"[67 Ardwick Green North, Manchester, Greater M...",Greater Manchester


In [27]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe_m.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe_m.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_mist = row['categories']
    except:
        categories_mist = row['venue.categories']
        
    if len(categories_mist) == 0:
        return None
    else:
        return categories_mist[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

dataframe_filtered.name


0         Tai Pan Chinese Restaurant | 喜臨門大酒楼
1      Red Chilli Chinese Restaurant | 红椒京川菜馆
2                    Azuma Chinese Restaurant
3    Lameizi Chinese Supermarket & Restaurant
4                   Manchester Chinese Centre
Name: name, dtype: object

In [28]:
venues_map = folium.Map(location=[latitude_m, longitude_m], zoom_start=13) # generate map centred around the Conrad Hotel

# add a red circle marker to represent the U of L
folium.features.CircleMarker(
    [latitude_m, longitude_m],
    radius=10,
    color='red',
    # popup='Mr Chilli',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Italian restaurants as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map


## Let's check all venues around each area

In [29]:
# search_query= 'Chinese'
radius_m = 1000
LIMIT=100

address_m = 'Oxford Rd, Manchester M13 9PL'
geolocator_m = Nominatim(user_agent="foursquare_agent")
location_m = geolocator_m.geocode(address_m)
latitude_m = location_m.latitude
longitude_m = location_m.longitude
print(latitude_m, longitude_m)

#print(search_query + ' .... OK!')
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude_m, 
    longitude_m, 
    radius_m, 
    LIMIT)

results = requests.get(url).json()

#borrow the get_category_type function from the Foursquare lab.
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

#clean the json and structure it into a pandas dataframe
venues = results['response']['groups'][0]['items']    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name','venue.location.postalCode', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
nearby_venues['postcode']=nearby_venues['postalCode'].fillna("")
f = lambda x: x["postcode"].split(' ', 1)[0]
#reviews["disappointed"] = reviews.apply(f, axis=1)
nearby_venues['postcode']=nearby_venues.apply(f,axis=1)
print('{} venues were returned by Foursquare for The U of M.'.format(nearby_venues.shape[0]))
nearby_venues_m=nearby_venues
nearby_venues_m.head()


53.4669225 -2.234321
74 venues were returned by Foursquare for The U of M.


Unnamed: 0,name,postalCode,categories,lat,lng,postcode
0,Royal Northern College of Music (RNCM),M13 9RD,College Arts Building,53.468365,-2.236709,M13
1,Christie’s Bistro,M13 9PL,Café,53.46531,-2.233827,M13
2,The Manchester Museum,M13 9PL,Museum,53.466526,-2.234001,M13
3,Eighth Day Café (Eighth Day Co-op),M1 7DU,Vegetarian / Vegan Restaurant,53.471027,-2.237811,M1
4,Sandbar,M1 7HL,Bar,53.470669,-2.235885,M1


In [30]:
radius_l = 1000
LIMIT=100

address_l = 'Chatham St, Liverpool L69 7ZN'
geolocator_l = Nominatim(user_agent="foursquare_agent")
location_l = geolocator_l.geocode(address_l)
latitude_l = location_l.latitude
longitude_l = location_l.longitude
print(latitude_l, longitude_l)

#print(search_query + ' .... OK!')
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude_l, 
    longitude_l, 
    radius_l, 
    LIMIT)

results = requests.get(url).json()

#borrow the get_category_type function from the Foursquare lab.
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

#clean the json and structure it into a pandas dataframe
venues = results['response']['groups'][0]['items']    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name','venue.location.postalCode', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
nearby_venues['postcode']=nearby_venues['postalCode'].fillna("")
f = lambda x: x["postcode"].split(' ', 1)[0]
#reviews["disappointed"] = reviews.apply(f, axis=1)
nearby_venues['postcode']=nearby_venues.apply(f,axis=1)
print('{} venues were returned by Foursquare for The U of L.'.format(nearby_venues.shape[0]))
nearby_venues_l=nearby_venues
nearby_venues_l.head()

53.4001881 -2.9644371
87 venues were returned by Foursquare for The U of L.


Unnamed: 0,name,postalCode,categories,lat,lng,postcode
0,The Belvedere,L7 7EB,Bar,53.400209,-2.969568,L7
1,Free State Kitchen,L1 9DE,American Restaurant,53.402436,-2.970429,L1
2,Moose and Moonshine,L1 9BW,Café,53.400972,-2.970662,L1
3,Philharmonic Dining Rooms,L1 9BX,Pub,53.401645,-2.970363,L1
4,Hope Street Hotel,L1 9DA,Hotel,53.400943,-2.970766,L1


In [31]:
nearby_venues_m['postcode']=nearby_venues_m['postcode'].dropna()
venue_m=pd.merge(nearby_venues_m,df_m, on='postcode',how='left')
venue_m

Unnamed: 0,name,postalCode,categories,lat,lng,postcode,latitude,longitude,Post town,Coverage,Local authority area
0,Royal Northern College of Music (RNCM),M13 9RD,College Arts Building,53.468365,-2.236709,M13,53.46030,-2.21389,MANCHESTER,"Ardwick, Longsight, Chorlton-on-Medlock",Manchester
1,Christie’s Bistro,M13 9PL,Café,53.465310,-2.233827,M13,53.46030,-2.21389,MANCHESTER,"Ardwick, Longsight, Chorlton-on-Medlock",Manchester
2,The Manchester Museum,M13 9PL,Museum,53.466526,-2.234001,M13,53.46030,-2.21389,MANCHESTER,"Ardwick, Longsight, Chorlton-on-Medlock",Manchester
3,Eighth Day Café (Eighth Day Co-op),M1 7DU,Vegetarian / Vegan Restaurant,53.471027,-2.237811,M1,53.47734,-2.23508,MANCHESTER,"Piccadilly, City Centre, Market Street",Manchester
4,Sandbar,M1 7HL,Bar,53.470669,-2.235885,M1,53.47734,-2.23508,MANCHESTER,"Piccadilly, City Centre, Market Street",Manchester
5,Manchester Academy 3,M13 9PR,Music Venue,53.464280,-2.231852,M13,53.46030,-2.21389,MANCHESTER,"Ardwick, Longsight, Chorlton-on-Medlock",Manchester
6,Five Guys,M13,Burger Joint,53.467491,-2.235672,M13,53.46030,-2.21389,MANCHESTER,"Ardwick, Longsight, Chorlton-on-Medlock",Manchester
7,Tai Pan Chinese Restaurant | 喜臨門大酒楼,M13 9TX,Chinese Restaurant,53.467503,-2.229681,M13,53.46030,-2.21389,MANCHESTER,"Ardwick, Longsight, Chorlton-on-Medlock",Manchester
8,Archie's,M13 9NS,Fast Food Restaurant,53.470875,-2.237712,M13,53.46030,-2.21389,MANCHESTER,"Ardwick, Longsight, Chorlton-on-Medlock",Manchester
9,Öl Brewery Bar,M1 7ED,Brewery,53.471896,-2.238473,M1,53.47734,-2.23508,MANCHESTER,"Piccadilly, City Centre, Market Street",Manchester


In [32]:
#check how many venues were returned for each area
print('There are {} uniques categories around UofM.'.format(len(venue_m['categories'].unique())))
venue_m.groupby('categories').count()

There are 38 uniques categories around UofM.


Unnamed: 0_level_0,name,postalCode,lat,lng,postcode,latitude,longitude,Post town,Coverage,Local authority area
categories,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Art Gallery,1,1,1,1,1,1,1,1,1,1
Bakery,1,1,1,1,1,1,1,1,1,1
Bar,8,8,8,8,8,8,8,8,8,8
Brewery,1,1,1,1,1,1,1,1,1,1
Burger Joint,2,2,2,2,2,2,2,2,2,2
Burrito Place,1,1,1,1,1,1,1,1,1,1
Café,4,4,4,4,4,4,4,4,4,4
Camera Store,1,1,1,1,1,1,1,1,1,1
Chinese Restaurant,5,5,5,5,5,5,5,5,5,5
Cocktail Bar,1,1,1,1,1,1,1,1,1,1


In [33]:
nearby_venues_l['postcode']=nearby_venues_l['postcode'].dropna()
venue_l=pd.merge(nearby_venues_l,df_l, on='postcode',how='left')
print(venue_l.head)

<bound method NDFrame.head of                          name postalCode              categories        lat  \
0               The Belvedere     L7 7EB                     Bar  53.400209   
1          Free State Kitchen     L1 9DE     American Restaurant  53.402436   
2         Moose and Moonshine     L1 9BW                    Café  53.400972   
3   Philharmonic Dining Rooms     L1 9BX                     Pub  53.401645   
4           Hope Street Hotel     L1 9DA                   Hotel  53.400943   
..                        ...        ...                     ...        ...   
82              Tesco Express     L3 5UB           Grocery Store  53.405850   
83  Catherine street bus stop        NaN                Bus Stop  53.396741   
84          Nightingale House     L8 1TG  Thrift / Vintage Store  53.395763   
85             The Crypt Hall        NaN   General Entertainment  53.405809   
86                   Fire Fit        NaN    Gym / Fitness Center  53.391606   

         lng postcode

In [34]:
#check how many venues were returned for each area
print('There are {} uniques categories around UofM.'.format(len(venue_l['categories'].unique())))
venue_l.groupby('categories').count()

There are 52 uniques categories around UofM.


Unnamed: 0_level_0,name,postalCode,lat,lng,postcode,latitude,longitude,Post town,Coverage,Local authority area
categories,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
American Restaurant,2,2,2,2,2,2,2,2,2,2
Art Museum,1,1,1,1,1,1,1,1,1,1
Bagel Shop,1,1,1,1,1,1,1,1,1,1
Bakery,1,1,1,1,1,1,1,1,1,1
Bar,7,6,7,7,7,6,6,6,6,6
Beer Bar,1,1,1,1,1,1,1,1,1,1
Bistro,1,1,1,1,1,1,1,1,1,1
Bookstore,1,1,1,1,1,1,1,1,1,1
Bus Stop,1,0,1,1,1,0,0,0,0,0
Café,4,4,4,4,4,4,4,4,4,4


## Comparison of the venues of the two cities

In [35]:
df_m.head(5)
df_m=df_m[df_m.latitude !=0 ]
df_m

Unnamed: 0,postcode,latitude,longitude,Post town,Coverage,Local authority area
0,M1,53.47734,-2.23508,MANCHESTER,"Piccadilly, City Centre, Market Street",Manchester
1,M11,53.47834,-2.17933,MANCHESTER,"Clayton, Openshaw, Beswick",Manchester
2,M12,53.46482,-2.20187,MANCHESTER,"Ardwick, Longsight, Chorlton-on-Medlock",Manchester
3,M13,53.4603,-2.21389,MANCHESTER,"Ardwick, Longsight, Chorlton-on-Medlock",Manchester
4,M14,53.4477,-2.22437,MANCHESTER,"Fallowfield, Moss Side, Ladybarn, Rusholme, Lo...",Manchester
5,M15,53.46563,-2.25008,MANCHESTER,"Hulme, Manchester Science Park, Old Trafford","Manchester, Trafford"
6,M16,53.45481,-2.26357,MANCHESTER,"Firswood, Old Trafford, Whalley Range, Moss Side","Manchester, Trafford"
7,M17,53.46906,-2.31789,MANCHESTER,"Trafford Park, The Trafford Centre",Trafford
8,M18,53.46127,-2.16871,MANCHESTER,"Abbey Hey, Gorton",Manchester
9,M19,53.43696,-2.19421,MANCHESTER,"Levenshulme, Burnage","Manchester, Stockport"


In [36]:
df_l.head(5)
df_l=df_l[df_l.latitude !=0 ]
df_l

Unnamed: 0,postcode,latitude,longitude,Post town,Coverage,Local authority area
0,L1,53.40254,-2.97928,LIVERPOOL,City Centre,Liverpool
1,L10,53.47398,-2.92668,LIVERPOOL,"Aintree Village, Fazakerley","Sefton, Liverpool, Knowsley"
2,L11,53.44801,-2.91407,LIVERPOOL,"Clubmoor, Croxteth, Gillmoss, Norris Green",Liverpool
3,L12,53.43467,-2.89421,LIVERPOOL,"Croxteth Park, West Derby",Liverpool
4,L13,53.4174,-2.91943,LIVERPOOL,"Clubmoor, Old Swan, Stoneycroft, Tuebrook",Liverpool
5,L14,53.41861,-2.87883,LIVERPOOL,"Broadgreen, Dovecot, Knotty Ash, Page Moss","Liverpool, Knowsley"
6,L15,53.39763,-2.91901,LIVERPOOL,Wavertree,Liverpool
7,L16,53.39876,-2.88744,LIVERPOOL,"Broadgreen, Bowring Park, Childwall","Liverpool, Knowsley"
8,L17,53.37769,-2.93962,LIVERPOOL,"Aigburth, St Michael's Hamlet, Sefton Park",Liverpool
9,L18,53.38064,-2.90661,LIVERPOOL,"Allerton, Mossley Hill",Liverpool


In [37]:
LIMIT=100


def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Area', 
                  'Area Latitude', 
                  'Area Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#run the above function on each neighborhood and create a new dataframe
m_venues = getNearbyVenues(names=df_m['Coverage'],
                                   latitudes=df_m['latitude'],
                                   longitudes=df_m['longitude']
                                  )

#check the size of the resulting dataframe
print(m_venues.shape)
m_venues.head()

Piccadilly, City Centre, Market Street
Clayton, Openshaw, Beswick
Ardwick, Longsight, Chorlton-on-Medlock
Ardwick, Longsight, Chorlton-on-Medlock
Fallowfield, Moss Side, Ladybarn, Rusholme, Longsight
Hulme, Manchester Science Park, Old Trafford
Firswood, Old Trafford, Whalley Range, Moss Side
Trafford Park, The Trafford Centre
Abbey Hey, Gorton
Levenshulme, Burnage
Deansgate, City Centre
Didsbury, Withington
Chorlton-cum-Hardy, Barlow Moor
Wythenshawe, Northenden, Sharston Industrial Area
Baguley, Brooklands (Manchester and Trafford), Roundthorn Industrial Estate
Middleton, Alkrington, Chadderton
Prestwich, Sedgley Park, Simister
Radcliffe, Stoneclough
Swinton, Clifton, Pendlebury, Wardley, Agecroft
Worsley, Walkden, Boothstown, Mosley Common, Wardley Industrial Estate
Tyldesley, Astley
Eccles, Monton, Peel Green, Winton, Patricroft, Barton-upon-Irwell, Ellesmere Park
Carrington, Partington
Stretford, Trafford Park
Sale, Brooklands (Manchester and Trafford)
Denton, Audenshaw
Failsworth

Unnamed: 0,Area,Area Latitude,Area Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Piccadilly, City Centre, Market Street",53.47734,-2.23508,The Molly House,53.477325,-2.237201,Pub
1,"Piccadilly, City Centre, Market Street",53.47734,-2.23508,Richmond Tea Rooms,53.477652,-2.23681,Tea Room
2,"Piccadilly, City Centre, Market Street",53.47734,-2.23508,Holiday Inn Manchester - City Centre,53.479058,-2.2341,Hotel
3,"Piccadilly, City Centre, Market Street",53.47734,-2.23508,Alan Turing Memorial Statue,53.47669,-2.236049,Monument / Landmark
4,"Piccadilly, City Centre, Market Street",53.47734,-2.23508,Hotel Motel One Manchester-Piccadilly,53.477407,-2.232318,Hotel


In [38]:
#check how many venues were returned for each area
print('There are {} uniques categories of Manchester.'.format(len(m_venues['Venue Category'].unique())))
m_venues.groupby('Venue Category').count()

There are 143 uniques categories of Manchester.


Unnamed: 0_level_0,Area,Area Latitude,Area Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Adult Boutique,1,1,1,1,1,1
Airport,1,1,1,1,1,1
Airport Lounge,2,2,2,2,2,2
Airport Service,1,1,1,1,1,1
American Restaurant,6,6,6,6,6,6
Antique Shop,1,1,1,1,1,1
Art Gallery,4,4,4,4,4,4
Arts & Crafts Store,4,4,4,4,4,4
Asian Restaurant,7,7,7,7,7,7
Australian Restaurant,1,1,1,1,1,1


In [39]:
#check how many venues were returned for each area
print('There are {} uniques categories of Manchester.'.format(len(m_venues['Venue Category'].unique())))
m_venues.groupby('Area').count()

There are 143 uniques categories of Manchester.


Unnamed: 0_level_0,Area Latitude,Area Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Abbey Hey, Gorton",7,7,7,7,7,7
"Ancoats, Northern Quarter, Strangeways",86,86,86,86,86,86
"Ardwick, Longsight, Chorlton-on-Medlock",7,7,7,7,7,7
Atherton,8,8,8,8,8,8
"Baguley, Brooklands (Manchester and Trafford), Roundthorn Industrial Estate",4,4,4,4,4,4
"Carrington, Partington",3,3,3,3,3,3
"Chorlton-cum-Hardy, Barlow Moor",16,16,16,16,16,16
"Clayton, Openshaw, Beswick",1,1,1,1,1,1
"Collyhurst, Miles Platting, Moston, New Moston, Newton Heath",4,4,4,4,4,4
"Crumpsall, Cheetham Hill",8,8,8,8,8,8


In [40]:
#run the above function on each neighborhood and create a new dataframe
l_venues = getNearbyVenues(names=df_l['Coverage'],
                                   latitudes=df_l['latitude'],
                                   longitudes=df_l['longitude']
                                  )

#check the size of the resulting dataframe
print(l_venues.shape)
l_venues.head()

City Centre
Aintree Village, Fazakerley
Clubmoor, Croxteth, Gillmoss, Norris Green
Croxteth Park, West Derby
Clubmoor, Old Swan, Stoneycroft, Tuebrook
Broadgreen, Dovecot, Knotty Ash, Page Moss
Wavertree
Broadgreen, Bowring Park, Childwall
Aigburth, St Michael's Hamlet, Sefton Park
Allerton, Mossley Hill
Garston, Grassendale, Aigburth,
City Centre
Bootle, Orrell
Kirkdale
Ford, Litherland, Seaforth
Waterloo
Blundellsands, Brighton-le-Sands, Crosby, Little Crosby, Thornton
Hale, Speke
Belle Vale, Gateacre, Hunts Cross, Woolton, Halewood
Halewood
Netherley
Stockbridge Village
Lunt, Sefton Village
City Centre, Everton, Vauxhall
Bootle, Netherton
Maghull, Lydiate, Melling, Waddicar
Kirkby
Kirkby
Prescot, Knowsley Village
Prescot, Whiston, Rainhill
Huyton, Roby, Tarbock
Formby, Little Altcar, Great Altcar
Ince Blundell, Hightown
Ormskirk, Aughton
Anfield, Kirkdale, Walton
Burscough, Mawdesley, Scarisbrick, Rufford, Holmeswood
Anfield, Everton, Kirkdale, Vauxhall
Anfield, City Centre, Everton

Unnamed: 0,Area,Area Latitude,Area Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,City Centre,53.40254,-2.97928,Kazimier Garden,53.402805,-2.981879,Beer Garden
1,City Centre,53.40254,-2.97928,Leaf,53.402869,-2.977708,Café
2,City Centre,53.40254,-2.97928,Elif,53.403523,-2.979076,Turkish Restaurant
3,City Centre,53.40254,-2.97928,Mowgli Street Food,53.402828,-2.9776,Indian Restaurant
4,City Centre,53.40254,-2.97928,Bold Street Coffee,53.402394,-2.976993,Coffee Shop


In [41]:
#check how many venues were returned for each area
print('There are {} uniques categories of Liverpool.'.format(len(l_venues['Venue Category'].unique())))
l_venues.groupby('Venue Category').count()

There are 143 uniques categories of Liverpool.


Unnamed: 0_level_0,Area,Area Latitude,Area Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
American Restaurant,8,8,8,8,8,8
Argentinian Restaurant,1,1,1,1,1,1
Art Gallery,5,5,5,5,5,5
Art Museum,2,2,2,2,2,2
Arts & Crafts Store,1,1,1,1,1,1
Asian Restaurant,2,2,2,2,2,2
Athletics & Sports,2,2,2,2,2,2
BBQ Joint,1,1,1,1,1,1
Bagel Shop,1,1,1,1,1,1
Bakery,5,5,5,5,5,5


In [42]:
#check how many venues were returned for each area
print('There are {} uniques categories of Liverpool.'.format(len(l_venues['Venue Category'].unique())))
l_venues.groupby('Area').count()

There are 143 uniques categories of Liverpool.


Unnamed: 0_level_0,Area Latitude,Area Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Aigburth, St Michael's Hamlet, Sefton Park",4,4,4,4,4,4
"Aintree Village, Fazakerley",1,1,1,1,1,1
"Aintree, Fazakerley, Orrell Park, Walton",5,5,5,5,5,5
"Allerton, Mossley Hill",7,7,7,7,7,7
American Express[4],15,15,15,15,15,15
"Anfield, City Centre, Everton, Fairfield, Kensington, Tuebrook",3,3,3,3,3,3
"Anfield, Everton, Kirkdale, Vauxhall",5,5,5,5,5,5
"Anfield, Kirkdale, Walton",6,6,6,6,6,6
"BT Group, large Selectapost users[4]",65,65,65,65,65,65
"Belle Vale, Gateacre, Hunts Cross, Woolton, Halewood",5,5,5,5,5,5


# Analyze Manchester

In [43]:

# one hot encoding
m_onehot = pd.get_dummies(m_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
m_onehot['Area'] = m_venues['Area'] 

# move neighborhood column to the first column
fixed_columns = [m_onehot.columns[-1]] + list(m_onehot.columns[:-1])
m_onehot = m_onehot[fixed_columns]

#examine the new dataframe size after one hot encoding
print('{} rows were returned after one hot encoding.'.format(m_onehot.shape[0]))

#group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
m_grouped = m_onehot.groupby('Area').mean().reset_index()

#examine the new dataframe size after one hot encoding
print('{} rows were returned after grouping.'.format(m_grouped.shape[0]))

630 rows were returned after one hot encoding.
42 rows were returned after grouping.


In [44]:

#print each neighborhood along with the top 5 most common venues
num_top_venues = 5

for hood in m_grouped['Area']:
    print("----"+hood+"----")
    temp = m_grouped[m_grouped['Area'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Abbey Hey, Gorton----
                  venue  freq
0  Gym / Fitness Center  0.29
1                 Hotel  0.14
2                   Gym  0.14
3           Supermarket  0.14
4        Sandwich Place  0.14


----Ancoats, Northern Quarter, Strangeways----
          venue  freq
0           Bar  0.14
1   Coffee Shop  0.12
2           Pub  0.06
3  Cocktail Bar  0.05
4      Tea Room  0.05


----Ardwick, Longsight, Chorlton-on-Medlock----
               venue  freq
0  College Cafeteria  0.14
1  Convenience Store  0.14
2           Pharmacy  0.14
3       Liquor Store  0.14
4   Asian Restaurant  0.14


----Atherton----
              venue  freq
0       Supermarket  0.25
1               Pub  0.12
2       Roller Rink  0.12
3    Sandwich Place  0.12
4  Department Store  0.12


----Baguley, Brooklands (Manchester and Trafford), Roundthorn Industrial Estate----
                  venue  freq
0                  Park  0.50
1  Fast Food Restaurant  0.25
2          Tram Station  0.25
3              Pie S

In [45]:
#put into a pandas dataframe

#write a function to sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#create the new dataframe and display the top 10 venues for each neighborhood
num_top_venues = 8

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Area']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
areas_venues_sorted = pd.DataFrame(columns=columns)
areas_venues_sorted['Area'] = m_grouped['Area']

for ind in np.arange(m_grouped.shape[0]):
    areas_venues_sorted.iloc[ind, 1:] = return_most_common_venues(m_grouped.iloc[ind, :], num_top_venues)

areas_venues_sorted.head()

Unnamed: 0,Area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
0,"Abbey Hey, Gorton",Gym / Fitness Center,Hotel,Gym,Supermarket,Sandwich Place,Market,Deli / Bodega,Donut Shop
1,"Ancoats, Northern Quarter, Strangeways",Bar,Coffee Shop,Pub,Cocktail Bar,Tea Room,Bookstore,Record Shop,Beer Bar
2,"Ardwick, Longsight, Chorlton-on-Medlock",College Cafeteria,Asian Restaurant,Park,Liquor Store,Bus Stop,Pharmacy,Convenience Store,Cosmetics Shop
3,Atherton,Supermarket,Sandwich Place,Pub,Department Store,Bar,Roller Rink,Soccer Field,Dance Studio
4,"Baguley, Brooklands (Manchester and Trafford),...",Park,Fast Food Restaurant,Tram Station,Yoga Studio,Deli / Bodega,Duty-free Shop,Donut Shop,Dive Bar


In [52]:
m_merged = df_m
m_merged.rename(columns={'Coverage':'Area'}, inplace=True)

m_merged=m_merged.join(areas_venues_sorted.set_index('Area'),on='Area')
m_merged.head()



Unnamed: 0,postcode,latitude,longitude,Post town,Area,Local authority area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
0,M1,53.47734,-2.23508,MANCHESTER,"Piccadilly, City Centre, Market Street",Manchester,Hotel,Gay Bar,Bar,Pub,Coffee Shop,Cocktail Bar,Sushi Restaurant,Indian Restaurant
1,M11,53.47834,-2.17933,MANCHESTER,"Clayton, Openshaw, Beswick",Manchester,Indian Restaurant,Yoga Studio,Deli / Bodega,Electronics Store,Duty-free Shop,Donut Shop,Dive Bar,Discount Store
2,M12,53.46482,-2.20187,MANCHESTER,"Ardwick, Longsight, Chorlton-on-Medlock",Manchester,College Cafeteria,Asian Restaurant,Park,Liquor Store,Bus Stop,Pharmacy,Convenience Store,Cosmetics Shop
3,M13,53.4603,-2.21389,MANCHESTER,"Ardwick, Longsight, Chorlton-on-Medlock",Manchester,College Cafeteria,Asian Restaurant,Park,Liquor Store,Bus Stop,Pharmacy,Convenience Store,Cosmetics Shop
4,M14,53.4477,-2.22437,MANCHESTER,"Fallowfield, Moss Side, Ladybarn, Rusholme, Lo...",Manchester,Gym / Fitness Center,Bed & Breakfast,Fried Chicken Joint,Park,Racetrack,Bus Station,Coffee Shop,Art Gallery


In [47]:
m_grouped_clustering = m_grouped.drop('Area', 1)
m_grouped_clustering


Unnamed: 0,Adult Boutique,Airport,Airport Lounge,Airport Service,American Restaurant,Antique Shop,Art Gallery,Arts & Crafts Store,Asian Restaurant,Australian Restaurant,...,Thai Restaurant,Theater,Train Station,Tram Station,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Yoga Studio
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.023256,0.0,0.0,...,0.0,0.0,0.0,0.0,0.011628,0.0,0.0,0.011628,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625
7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0
9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,...,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0


In [58]:
m_merged


Unnamed: 0,postcode,latitude,longitude,Post town,Area,Local authority area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
0,M1,53.47734,-2.23508,MANCHESTER,"Piccadilly, City Centre, Market Street",Manchester,Hotel,Gay Bar,Bar,Pub,Coffee Shop,Cocktail Bar,Sushi Restaurant,Indian Restaurant
1,M11,53.47834,-2.17933,MANCHESTER,"Clayton, Openshaw, Beswick",Manchester,Indian Restaurant,Yoga Studio,Deli / Bodega,Electronics Store,Duty-free Shop,Donut Shop,Dive Bar,Discount Store
2,M12,53.46482,-2.20187,MANCHESTER,"Ardwick, Longsight, Chorlton-on-Medlock",Manchester,College Cafeteria,Asian Restaurant,Park,Liquor Store,Bus Stop,Pharmacy,Convenience Store,Cosmetics Shop
3,M13,53.4603,-2.21389,MANCHESTER,"Ardwick, Longsight, Chorlton-on-Medlock",Manchester,College Cafeteria,Asian Restaurant,Park,Liquor Store,Bus Stop,Pharmacy,Convenience Store,Cosmetics Shop
4,M14,53.4477,-2.22437,MANCHESTER,"Fallowfield, Moss Side, Ladybarn, Rusholme, Lo...",Manchester,Gym / Fitness Center,Bed & Breakfast,Fried Chicken Joint,Park,Racetrack,Bus Station,Coffee Shop,Art Gallery
5,M15,53.46563,-2.25008,MANCHESTER,"Hulme, Manchester Science Park, Old Trafford","Manchester, Trafford",Café,Grocery Store,Deli / Bodega,Discount Store,Vegetarian / Vegan Restaurant,Park,Convenience Store,Fast Food Restaurant
6,M16,53.45481,-2.26357,MANCHESTER,"Firswood, Old Trafford, Whalley Range, Moss Side","Manchester, Trafford",Grocery Store,Bus Stop,Sandwich Place,Yoga Studio,Duty-free Shop,Donut Shop,Dive Bar,Discount Store
7,M17,53.46906,-2.31789,MANCHESTER,"Trafford Park, The Trafford Centre",Trafford,Sandwich Place,Yoga Studio,Deli / Bodega,Electronics Store,Duty-free Shop,Donut Shop,Dive Bar,Discount Store
8,M18,53.46127,-2.16871,MANCHESTER,"Abbey Hey, Gorton",Manchester,Gym / Fitness Center,Hotel,Gym,Supermarket,Sandwich Place,Market,Deli / Bodega,Donut Shop
9,M19,53.43696,-2.19421,MANCHESTER,"Levenshulme, Burnage","Manchester, Stockport",Gym / Fitness Center,Restaurant,Indoor Play Area,Pakistani Restaurant,Park,Deli / Bodega,Donut Shop,Dive Bar


## K-mean Cluster Manchester

In [61]:

from sklearn.cluster import KMeans

# set number of clusters
mclusters = 3

m_grouped_clustering = m_grouped.drop('Area', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=mclusters, random_state=0).fit(m_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 
kmeans.labels_
len(kmeans.labels_)
#create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
# m_merged = bbintang

# add clustering labels
m_merged['Cluster Labels'] = pd.Series(kmeans.labels_)

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
# bintang_merged = bintang_merged.join(areas_venues_sorted.set_index('Area'), on='Area')

# bintang_merged.head()

In [65]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

#Finally, let's visualize the resulting clusters
# create map 
m_clusters = folium.Map(location=[53.47734, -2.23508], zoom_start=13)

mclusters = 3

# set color scheme for the clusters
x = np.arange(mclusters)
ys = [i+x+(i*x)**2 for i in range(mclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

m_merged= m_merged.fillna(0)

m_merged['Cluster Labels']=m_merged['Cluster Labels'].astype(int)

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(m_merged['latitude'], m_merged['longitude'], m_merged['Area'], m_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(m_clusters)
       
m_clusters

# Analyze Liverpool

In [66]:

# one hot encoding
l_onehot = pd.get_dummies(l_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
l_onehot['Area'] = l_venues['Area'] 

# move neighborhood column to the first column
fixed_columns = [l_onehot.columns[-1]] + list(l_onehot.columns[:-1])
l_onehot = l_onehot[fixed_columns]

#examine the new dataframe size after one hot encoding
print('{} rows were returned after one hot encoding.'.format(l_onehot.shape[0]))

#group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
l_grouped = l_onehot.groupby('Area').mean().reset_index()

#examine the new dataframe size after one hot encoding
print('{} rows were returned after grouping.'.format(l_grouped.shape[0]))

698 rows were returned after one hot encoding.
46 rows were returned after grouping.


In [67]:
#print each neighborhood along with the top 5 most common venues
num_top_venues = 5

for hood in l_grouped['Area']:
    print("----"+hood+"----")
    temp = l_grouped[l_grouped['Area'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Aigburth, St Michael's Hamlet, Sefton Park----
               venue  freq
0                Pub  0.25
1   Botanical Garden  0.25
2  Outdoor Sculpture  0.25
3               Park  0.25
4             Museum  0.00


----Aintree Village, Fazakerley----
                     venue  freq
0             Home Service   1.0
1      American Restaurant   0.0
2    Performing Arts Venue   0.0
3                Nightclub   0.0
4  North Indian Restaurant   0.0


----Aintree, Fazakerley, Orrell Park, Walton----
              venue  freq
0              Café   0.2
1     Grocery Store   0.2
2              Park   0.2
3          Pharmacy   0.2
4  Business Service   0.2


----Allerton, Mossley Hill----
                venue  freq
0  Chinese Restaurant  0.14
1     Thai Restaurant  0.14
2          Restaurant  0.14
3         Coffee Shop  0.14
4         Supermarket  0.14


----American Express[4]----
                 venue  freq
0                  Pub  0.20
1       Discount Store  0.13
2  Sporting Goods Shop  0.

In [68]:
#put into a pandas dataframe

#write a function to sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#create the new dataframe and display the top 10 venues for each neighborhood
num_top_venues = 8

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Area']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
areas_venues_sorted = pd.DataFrame(columns=columns)
areas_venues_sorted['Area'] = l_grouped['Area']

for ind in np.arange(l_grouped.shape[0]):
    areas_venues_sorted.iloc[ind, 1:] = return_most_common_venues(l_grouped.iloc[ind, :], num_top_venues)

areas_venues_sorted.head()

Unnamed: 0,Area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
0,"Aigburth, St Michael's Hamlet, Sefton Park",Park,Outdoor Sculpture,Botanical Garden,Pub,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,English Restaurant
1,"Aintree Village, Fazakerley",Home Service,Zoo Exhibit,Donut Shop,Food & Drink Shop,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,English Restaurant
2,"Aintree, Fazakerley, Orrell Park, Walton",Café,Grocery Store,Pharmacy,Park,Business Service,Falafel Restaurant,Ethiopian Restaurant,English Restaurant
3,"Allerton, Mossley Hill",Coffee Shop,Chinese Restaurant,Restaurant,Supermarket,Grocery Store,Thai Restaurant,Sandwich Place,Discount Store
4,American Express[4],Pub,Sporting Goods Shop,Discount Store,Outdoor Sculpture,Soccer Stadium,Supermarket,Sandwich Place,Bus Stop


In [70]:
l_merged = df_l
l_merged.rename(columns={'Coverage':'Area'}, inplace=True)

l_grouped_clustering = l_grouped.drop('Area', 1)
l_grouped_clustering

l_merged=l_merged.join(areas_venues_sorted.set_index('Area'),on='Area')
l_merged.head()




Unnamed: 0,postcode,latitude,longitude,Post town,Area,Local authority area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
0,L1,53.40254,-2.97928,LIVERPOOL,City Centre,Liverpool,Hotel,Pub,Coffee Shop,Bar,Café,Italian Restaurant,Clothing Store,Burger Joint
1,L10,53.47398,-2.92668,LIVERPOOL,"Aintree Village, Fazakerley","Sefton, Liverpool, Knowsley",Home Service,Zoo Exhibit,Donut Shop,Food & Drink Shop,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,English Restaurant
2,L11,53.44801,-2.91407,LIVERPOOL,"Clubmoor, Croxteth, Gillmoss, Norris Green",Liverpool,Pool,Hotel,Food & Drink Shop,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,English Restaurant,Electronics Store
3,L12,53.43467,-2.89421,LIVERPOOL,"Croxteth Park, West Derby",Liverpool,Soccer Field,Fast Food Restaurant,Supermarket,Zoo Exhibit,Donut Shop,Falafel Restaurant,Ethiopian Restaurant,English Restaurant
4,L13,53.4174,-2.91943,LIVERPOOL,"Clubmoor, Old Swan, Stoneycroft, Tuebrook",Liverpool,Furniture / Home Store,Indian Restaurant,Supermarket,Tea Room,Electronics Store,Food & Drink Shop,Fast Food Restaurant,Falafel Restaurant


## K-mean Cluster Liverpool

In [72]:


from sklearn.cluster import KMeans

# set number of clusters
lclusters = 3

l_grouped_clustering = l_grouped.drop('Area', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=lclusters, random_state=0).fit(l_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 
kmeans.labels_
len(kmeans.labels_)
#create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.


# add clustering labels
l_merged['Cluster Labels'] = pd.Series(kmeans.labels_)



# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

#Finally, let's visualize the resulting clusters
# create map 
l_clusters = folium.Map(location=[53.40254, -2.97928], zoom_start=13)

lclusters = 3

# set color scheme for the clusters
x = np.arange(lclusters)
ys = [i+x+(i*x)**2 for i in range(lclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

l_merged= l_merged.fillna(0)

l_merged['Cluster Labels']=l_merged['Cluster Labels'].astype(int)

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(l_merged['latitude'], l_merged['longitude'], l_merged['Area'], l_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(l_clusters)
       
l_clusters

# Results

### Let's see the top three clusters of Manchester

In [73]:
m_merged.loc[m_merged['Cluster Labels'] == 0, m_merged.columns[[2] + list(range(5, m_merged.shape[1]))]]

Unnamed: 0,longitude,Local authority area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,Cluster Labels
0,-2.23508,Manchester,Hotel,Gay Bar,Bar,Pub,Coffee Shop,Cocktail Bar,Sushi Restaurant,Indian Restaurant,0
3,-2.21389,Manchester,College Cafeteria,Asian Restaurant,Park,Liquor Store,Bus Stop,Pharmacy,Convenience Store,Cosmetics Shop,0
15,-2.19624,"Rochdale, Oldham",Stadium,Pub,Grocery Store,Supermarket,Deli / Bodega,Duty-free Shop,Donut Shop,Dive Bar,0
19,-2.39666,"Salford, Wigan",Sports Club,Park,Yoga Studio,Coffee Shop,Duty-free Shop,Donut Shop,Dive Bar,Discount Store,0
21,-2.35418,Salford,Gym,Grocery Store,Train Station,Pizza Place,Fast Food Restaurant,Sandwich Place,Auto Garage,Yoga Studio,0
28,-2.22909,Manchester,Bar,Coffee Shop,Pub,Cocktail Bar,Tea Room,Bookstore,Record Shop,Beer Bar,0
35,-2.28482,Salford,Brewery,Food Truck,Indian Restaurant,Building,Pizza Place,Duty-free Shop,Donut Shop,Dive Bar,0
36,-2.29696,Salford,Supermarket,Yoga Studio,Ethiopian Restaurant,Electronics Store,Duty-free Shop,Donut Shop,Dive Bar,Discount Store,0
42,-2.234866,Manchester,Bar,Pub,Coffee Shop,Café,Tea Room,Record Shop,Cocktail Bar,Bookstore,0


In [74]:
m_merged.loc[m_merged['Cluster Labels'] == 1, m_merged.columns[[2] + list(range(5, m_merged.shape[1]))]]

Unnamed: 0,longitude,Local authority area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,Cluster Labels
1,-2.17933,Manchester,Indian Restaurant,Yoga Studio,Deli / Bodega,Electronics Store,Duty-free Shop,Donut Shop,Dive Bar,Discount Store,1
2,-2.20187,Manchester,College Cafeteria,Asian Restaurant,Park,Liquor Store,Bus Stop,Pharmacy,Convenience Store,Cosmetics Shop,1
4,-2.22437,Manchester,Gym / Fitness Center,Bed & Breakfast,Fried Chicken Joint,Park,Racetrack,Bus Station,Coffee Shop,Art Gallery,1
5,-2.25008,"Manchester, Trafford",Café,Grocery Store,Deli / Bodega,Discount Store,Vegetarian / Vegan Restaurant,Park,Convenience Store,Fast Food Restaurant,1
6,-2.26357,"Manchester, Trafford",Grocery Store,Bus Stop,Sandwich Place,Yoga Studio,Duty-free Shop,Donut Shop,Dive Bar,Discount Store,1
8,-2.16871,Manchester,Gym / Fitness Center,Hotel,Gym,Supermarket,Sandwich Place,Market,Deli / Bodega,Donut Shop,1
9,-2.19421,"Manchester, Stockport",Gym / Fitness Center,Restaurant,Indoor Play Area,Pakistani Restaurant,Park,Deli / Bodega,Donut Shop,Dive Bar,1
10,-2.24263,Manchester,Coffee Shop,Italian Restaurant,Steakhouse,Hotel,Asian Restaurant,Pub,Plaza,Thai Restaurant,1
11,-2.23027,Manchester,Pizza Place,Bus Station,Wine Bar,Gym / Fitness Center,Tram Station,Thai Restaurant,Tennis Court,Deli / Bodega,1
12,-2.27099,Manchester,Grocery Store,Bar,Indian Restaurant,Gas Station,Japanese Restaurant,Falafel Restaurant,Fabric Shop,Music Venue,1


In [75]:
m_merged.loc[m_merged['Cluster Labels'] == 2, m_merged.columns[[2] + list(range(5, m_merged.shape[1]))]]

Unnamed: 0,longitude,Local authority area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,Cluster Labels
7,-2.31789,Trafford,Sandwich Place,Yoga Studio,Deli / Bodega,Electronics Store,Duty-free Shop,Donut Shop,Dive Bar,Discount Store,2
39,-2.21269,Manchester,Hotel,Bar,Park,Fish & Chips Shop,Pub,Gift Shop,Gay Bar,Comedy Club,2


### The top three clusters of Liverpool

In [76]:
l_merged.loc[l_merged['Cluster Labels'] == 0, l_merged.columns[[2] + list(range(5, l_merged.shape[1]))]]

Unnamed: 0,longitude,Local authority area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,Cluster Labels
20,-2.83758,Liverpool,Construction & Landscaping,Pizza Place,Indie Theater,Ethiopian Restaurant,Performing Arts Venue,Donut Shop,Food & Drink Shop,Fast Food Restaurant,0
46,-2.988368,non-geographic,American Restaurant,Park,Hotel,Food & Drink Shop,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,English Restaurant,0
47,-2.975554,non-geographic,Bar,Hotel,Pub,Coffee Shop,Fast Food Restaurant,Theater,Comic Shop,Middle Eastern Restaurant,0
48,-2.988061,non-geographic,Hotel,Harbor / Marina,Hostel,Pub,Music Venue,Seafood Restaurant,Bed & Breakfast,Rental Car Location,0
49,-2.985222,non-geographic,Construction & Landscaping,Train Station,Brewery,IT Services,Convenience Store,Cosmetics Shop,Department Store,Dessert Shop,0


In [77]:
l_merged.loc[l_merged['Cluster Labels'] == 1, l_merged.columns[[2] + list(range(5, l_merged.shape[1]))]]

Unnamed: 0,longitude,Local authority area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,Cluster Labels
0,-2.97928,Liverpool,Hotel,Pub,Coffee Shop,Bar,Café,Italian Restaurant,Clothing Store,Burger Joint,1
1,-2.92668,"Sefton, Liverpool, Knowsley",Home Service,Zoo Exhibit,Donut Shop,Food & Drink Shop,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,English Restaurant,1
4,-2.91943,Liverpool,Furniture / Home Store,Indian Restaurant,Supermarket,Tea Room,Electronics Store,Food & Drink Shop,Fast Food Restaurant,Falafel Restaurant,1
5,-2.87883,"Liverpool, Knowsley",Fast Food Restaurant,American Restaurant,Sports Club,Cosmetics Shop,Department Store,Dessert Shop,Diner,Discount Store,1
6,-2.91901,Liverpool,Grocery Store,Pool,Track,Park,Bar,Gym,Indian Restaurant,Department Store,1
7,-2.88744,"Liverpool, Knowsley",Grocery Store,Park,Zoo Exhibit,French Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,English Restaurant,1
8,-2.93962,Liverpool,Park,Outdoor Sculpture,Botanical Garden,Pub,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,English Restaurant,1
9,-2.90661,Liverpool,Coffee Shop,Chinese Restaurant,Restaurant,Supermarket,Grocery Store,Thai Restaurant,Sandwich Place,Discount Store,1
11,-2.98841,Liverpool,Hotel,Pub,Coffee Shop,Bar,Café,Italian Restaurant,Clothing Store,Burger Joint,1
12,-2.98797,"Sefton, Liverpool",Pharmacy,Discount Store,Stationery Store,Supermarket,Pub,Grocery Store,Thrift / Vintage Store,Coffee Shop,1


In [78]:
l_merged.loc[l_merged['Cluster Labels'] == 2, l_merged.columns[[2] + list(range(5, l_merged.shape[1]))]]

Unnamed: 0,longitude,Local authority area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,Cluster Labels
2,-2.91407,Liverpool,Pool,Hotel,Food & Drink Shop,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,English Restaurant,Electronics Store,2
3,-2.89421,Liverpool,Soccer Field,Fast Food Restaurant,Supermarket,Zoo Exhibit,Donut Shop,Falafel Restaurant,Ethiopian Restaurant,English Restaurant,2
10,-2.90165,Liverpool,Pub,Pool,Supermarket,Café,Playground,Convenience Store,Cosmetics Shop,Department Store,2
13,-2.98797,Liverpool,Pharmacy,Discount Store,Stationery Store,Supermarket,Pub,Grocery Store,Thrift / Vintage Store,Coffee Shop,2
18,-2.86296,"Liverpool, Knowsley",Movie Theater,Garden Center,Fast Food Restaurant,Café,Pub,Cosmetics Shop,Department Store,Dessert Shop,2
21,-2.86411,"Liverpool, Knowsley",Movie Theater,Convenience Store,Construction & Landscaping,Home Service,French Restaurant,Hostel,Cosmetics Shop,Department Store,2
24,-2.97062,Sefton,Discount Store,Hotel,Furniture / Home Store,Supermarket,Soccer Field,Bus Stop,Clothing Store,Multiplex,2
28,-2.81445,Knowsley,Zoo Exhibit,Bridal Shop,Playground,Zoo,Art Museum,Electronics Store,French Restaurant,Food & Drink Shop,2
29,-2.78244,"Knowsley, St Helens",0,0,0,0,0,0,0,0,2
44,-2.96069,non-geographic,Construction & Landscaping,Soccer Stadium,Soccer Field,Train Station,English Restaurant,Gym / Fitness Center,Zoo Exhibit,Donut Shop,2


# Discussion

As mentioned in the Introduction section, both of the two cities are important and popular in the UK. With what we've calculated and mapped in the last section, we can briefly classify the clusters of the two cities using the data of the most common venue.

For Manchester, cluster 1 is more likely for leisure/tourism (bar&pub), cluster 2 is for residential and cluster 3 is for shopping. For Liverpool, cluster 1 is likely for tourism as the cluster 1 of Manchester, cluster 2 is for residential and cluster 3 is mixed of leisure and residential. Based on the clusters calculated, we can draw a conclusion that the two cities are similar in terms of the common venues. 

There are several possible explanations for this result. Both of the two cities have highly-reputed football clubs and and museums, therefore, both of them attract people to visit. There are also high ranked universities located in the two cities, so this preliminary finding also suggests that the two cities are suitable for residential. 

In addition, if you are a university student live around the University and also like Chinese food then Liverpool provides you more choices than Manchester :)
 

# Conclusion 

This project compared two big cities in the North of England, classified the area in the city whether it's for residential, social or others. This was derived using Foursquare data. So the limitation is it was not a all through comparison. We can not jump into a conclusion that which city is better than another. 

In the future, we need to consider use other quantitative data such as living cost, house price and job market etc to have a further exploration. 
