### The Battle of Neighbourhoods: London vs. Birmingham

##### - JImmy Chuong

In [2]:
import pandas as pd
import numpy as np

### Section: London

#### Web Scraping with Pandas

In [46]:
url = 'https://en.wikipedia.org/wiki/List_of_areas_of_London'
londonDF = pd.read_html(url, header=0, flavor='bs4')
print("There are " + str(len(londonDF)) + " tables.")

There are 5 tables.


In [47]:
londonDF

[  Map all coordinates in "Category:Areas of London" using: OpenStreetMap
 0                 Download coordinates as: KML · GPX                    ,
             Location                     London borough       Post town  \
 0         Abbey Wood              Bexley, Greenwich [7]          LONDON   
 1              Acton  Ealing, Hammersmith and Fulham[8]          LONDON   
 2          Addington                         Croydon[8]         CROYDON   
 3         Addiscombe                         Croydon[8]         CROYDON   
 4        Albany Park                             Bexley  BEXLEY, SIDCUP   
 ..               ...                                ...             ...   
 526         Woolwich                          Greenwich          LONDON   
 527   Worcester Park       Sutton, Kingston upon Thames  WORCESTER PARK   
 528  Wormwood Scrubs             Hammersmith and Fulham          LONDON   
 529          Yeading                         Hillingdon           HAYES   
 530         Yi

There are a total of 5 tables gathered from the html. We are interested in the second table containing the name of each Location, Borough, Post Town, and Postcode.

In [48]:
londonDF = londonDF[1]
londonDF

Unnamed: 0,Location,London borough,Post town,Postcode district,Dial code,OS grid ref
0,Abbey Wood,"Bexley, Greenwich [7]",LONDON,SE2,020,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",020,TQ205805
2,Addington,Croydon[8],CROYDON,CR0,020,TQ375645
3,Addiscombe,Croydon[8],CROYDON,CR0,020,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",020,TQ478728
...,...,...,...,...,...,...
526,Woolwich,Greenwich,LONDON,SE18,020,TQ435795
527,Worcester Park,"Sutton, Kingston upon Thames",WORCESTER PARK,KT4,020,TQ225655
528,Wormwood Scrubs,Hammersmith and Fulham,LONDON,W12,020,TQ225815
529,Yeading,Hillingdon,HAYES,UB4,020,TQ115825


To avoid the issue of hidden character when scraping, the column names are renamed. This will allow the following codes to work.

In [49]:
londonDF.columns = ['Location', 'London borough', 'Post town', 'Postcode district', 'Dial code', 'OS grid ref']
londonDF

Unnamed: 0,Location,London borough,Post town,Postcode district,Dial code,OS grid ref
0,Abbey Wood,"Bexley, Greenwich [7]",LONDON,SE2,020,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",020,TQ205805
2,Addington,Croydon[8],CROYDON,CR0,020,TQ375645
3,Addiscombe,Croydon[8],CROYDON,CR0,020,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",020,TQ478728
...,...,...,...,...,...,...
526,Woolwich,Greenwich,LONDON,SE18,020,TQ435795
527,Worcester Park,"Sutton, Kingston upon Thames",WORCESTER PARK,KT4,020,TQ225655
528,Wormwood Scrubs,Hammersmith and Fulham,LONDON,W12,020,TQ225815
529,Yeading,Hillingdon,HAYES,UB4,020,TQ115825


We are only interested in neighbourhoods in the London Post Town. And also only the Location, London Borough, and Postcode district columns.

In [50]:
londonDF = londonDF[londonDF['Post town'] == 'LONDON']
londonDF = londonDF[['Location', 'London borough', 'Postcode district']]
londonDF

Unnamed: 0,Location,London borough,Postcode district
0,Abbey Wood,"Bexley, Greenwich [7]",SE2
1,Acton,"Ealing, Hammersmith and Fulham[8]","W3, W4"
6,Aldgate,City[10],EC3
7,Aldwych,Westminster[10],WC2
9,Anerley,Bromley[11],SE20
...,...,...,...
520,Wood Green,Haringey,N22
521,Woodford,Redbridge,"IG8, E18"
525,Woodside Park,Barnet,N12
526,Woolwich,Greenwich,SE18


In [65]:
# Removing citation tags in London borough column
for number in range(6, 44):
    londonDF['London borough'] = londonDF['London borough'].str.strip('[]')
    londonDF['London borough'] = londonDF['London borough'].str.strip(str(number))

# Resetting index values
londonDF = londonDF.reset_index(drop=True)
londonDF

Unnamed: 0,Location,London borough,Postcode district,Address
0,Abbey Wood,"Bexley, Greenwich",SE2,"Abbey Wood, Bexley, Greenwich , SE2"
1,Acton,"Ealing, Hammersmith and Fulham","W3, W4","Acton, Ealing, Hammersmith and Fulham, W3, W4"
2,Aldgate,City,EC3,"Aldgate, City, EC3"
3,Aldwych,Westminster,WC2,"Aldwych, Westminster, WC2"
4,Anerley,Bromley,SE20,"Anerley, Bromley, SE20"
...,...,...,...,...
292,Wood Green,Haringey,N22,"Wood Green, Haringey, N22"
293,Woodford,Redbridge,"IG8, E18","Woodford, Redbridge, IG8, E18"
294,Woodside Park,Barnet,N12,"Woodside Park, Barnet, N12"
295,Woolwich,Greenwich,SE18,"Woolwich, Greenwich, SE18"


#### Obtaining Latitude and Longitude coordinates with Geocoder

In [52]:
#pip install geocoder
# !conda install -c conda-forge geopy --yes
# !conda install folium -c conda-forge
from geopy.geocoders import Nominatim
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium
import requests

In [77]:
geolocator = Nominatim(user_agent='london_explorer')
londonDF['Coordinates'] = (londonDF['Postcode district'] + ', London').apply(geolocator.geocode)
londonDF['Latitude'] = londonDF['Coordinates'].apply(lambda x: x.latitude)
londonDF['Longitude'] = londonDF['Coordinates'].apply(lambda x: x.longitude)

GeocoderUnavailable: HTTPSConnectionPool(host='nominatim.openstreetmap.org', port=443): Max retries exceeded with url: /search?q=E18%2C+London&format=json&limit=1 (Caused by ReadTimeoutError("HTTPSConnectionPool(host='nominatim.openstreetmap.org', port=443): Read timed out. (read timeout=1)"))

In [54]:
londonDF

Unnamed: 0,Location,London borough,Postcode district,Address
0,Abbey Wood,"Bexley, Greenwich",SE2,"Abbey Wood, Bexley, Greenwich , SE2"
1,Acton,"Ealing, Hammersmith and Fulham","W3, W4","Acton, Ealing, Hammersmith and Fulham, W3, W4"
2,Aldgate,City,EC3,"Aldgate, City, EC3"
3,Aldwych,Westminster,WC2,"Aldwych, Westminster, WC2"
4,Anerley,Bromley,SE20,"Anerley, Bromley, SE20"
...,...,...,...,...
292,Wood Green,Haringey,N22,"Wood Green, Haringey, N22"
293,Woodford,Redbridge,"IG8, E18","Woodford, Redbridge, IG8, E18"
294,Woodside Park,Barnet,N12,"Woodside Park, Barnet, N12"
295,Woolwich,Greenwich,SE18,"Woolwich, Greenwich, SE18"


In [76]:
# Finding coordinates of London
address = 'London, UK'

geolocator = Nominatim(user_agent="london_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The coordinates of London are {}, {}.'.format(latitude, longitude))

The coordinates of London are 51.5073219, -0.1276474.


Now we visualise the map of London and the locations of its boroughs

In [None]:
# Creating the map of London
map_London = folium.Map(location=[latitude, longitude], zoom_start=11)

# adding markers to map
for latitude, longitude, borough, location in zip(londonDF['Latitude'], londonDF['Longitude'], londonDF['London Borough'], londonDF['Location']):
    label = '{}, {}'.format(location, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='red',
        fill=True
        ).add_to(map_London)  
    
map_London

Now that we have analysed London, we repeat the anaylsis for Birmingham

### Section: Birmingham

#### Importing CSV file of Birmingham Neighbourhoods

In [75]:
bhamDF = pd.read_csv('Birmingham Neighbourhoods.csv', encoding='latin1')
bhamDF

Unnamed: 0,Neighbourhood
0,Acocks Green
1,Alum Rock
2,Ashted
3,Aston
4,Aston Cross
...,...
180,Woodcock Hill
181,Woodgate
182,Wylde Green
183,Yardley


#### Obtaining Latitude and Longitude coordinates with Geocoder