This version of the notebook seeks to access the boundaries of the districts from the HERE API

<h1><center>Discovering Manchester</center></h1>

<h4><center> IBM Capstone Project - Exploring the best areas for a young professional to move to in Greater Manchester </center></h4>

## [Table of Contents:](#Table-of-Contents:)

* [Project Goals](#Project-Goals)
* [Libraries](#Libraries)
* [Data](#Data)
    * [Areas](#Areas)
        * [Metropolitan Districts](#Metropolitan-Districts:)
        * [Wards](#Wards:)
        * [Postcodes](#Postcodes:)
        * [Nearby 'districts'](#Nearby-'districts')
    * [Desirability](#Desirability)
* [References](#References)

## Project Goals 

This project seeks to analyse areas of Greater Manchester and seeks to answer the question; where are be the best locations for a young professional to move to?

This notebook shows my thought process and approach to this project and the avenues I have explored in trying to reach my ultimate goal of area comparison in Manchester, as such some of these sections could be removed to add readability and make the notebook much more concise and clean.

## Libraries

The following libraries have been used in this Notebook:

In [223]:
import re
from urllib.request import urlopen

import pandas as pd
import numpy as np
import requests
import folium
from geopy.geocoders import Here
from bs4 import BeautifulSoup
from shapely.wkt import loads
from shapely.geometry import mapping
import geojson
import ast

## Functions

The following functions are used in this Notebook:

In [2]:
def cleanhtml(soup) -> list:
    ## This function converts tables from url (beautiful soup object) to a 
    ## list of strings with HTML formatting removed ##
    
    tables_html = soup.find_all('table')
    tables_html = [str(table_html).split('<tr>') for table_html in tables_html]
    cleaner = re.compile('<.*?>')
    tables = [[list(filter(None,re.sub(cleaner, '', line).splitlines())) 
               for line in table] for table in tables_html]
    
    return tables

## Data

### Areas

The first task in this project is to determine how the areas of Greater Manchester will be divided.

Manchester is somewhat difficult to divide into clearly defined areas for historical reasons. Areas {EXPAND ON THIS}!!!!!!!!!!!!!!!!!!!!!!!  <sup>[1](#1.)</sup>

The following options were considered:

#### <b>Metropolitan Districts:</b><br>

Greater Manchester comprises of ten metropolitan areas:
1. City of Manchester
2. Stockport
3. Tameside
4. Oldham
5. Rochdale
6. Bury
7. Bolton
8. Wigan
9. City of Salford
10. Trafford

<figure>
<img src="images/Greater_Manchester_numbered_districts.svg.png" style="width:500px;height:300px;">
<figcaption>Image 2: Greater Manchester metropolitan districts. Source: Wikipedia.</figcaption>
</figure>
<br>

PLACE SOME OBSERVATIONS HERE<br>
<br>

#### <b>Wards:</b><br>

The City of Manchester countains contain 32 Wards<sup>[2](#2.)</sup>:

<figure>
<img src="Images/Ward-Map-Manchester_District_(B).jpg" style="width:500px;height:500px;">
<figcaption>Image 2: Greater Manchester metropolitan districts. Source: Wikipedia.</figcaption>
</figure>
<br>

PLACE SOME OBSERVATIONS HERE<br>
<br>


#### <b>Postcodes</b>:


Postcodes in the United Kingdom include a outward and an inward code. <br>
The outward code comprised of a postcode area, a one or two letter code, and a postcode district, one or two digits or a digit followed by a letter. For example, Machester city centre has the outward postal code M1.<br>
The inward code comprises of a single digit postcode sector followed by a two character postcode unit. A postal code may be a single street, or even a single building or organisation.<sup>[4](#4.)</sup>


<figure>
<img src="images/map-postcode-area-M-Manchester.jpg" style="width:500px;height:300px;">
<figcaption>Image 2: M-Postcode districts. Source: Geopunk</figcaption>
</figure>

While inner codes seem to be far too precise for this project's aims, outer codes with the M prefix seem to give good coverage of Greater Manchester while remaining precise enough to give locally defined areas. This is explored below, details on how this is acheived is noted in further detail the comments.

Unfortunately, Geopy was unable to find locations using only outer code of Postcodes. A CSV  for postcode outcode coordinates in the format of id, postcode, latitude, longitude was found from the Office of National Statistics (ONS) <sup>[5](#5.)</sup> on [free map tools](https://www.freemaptools.com/download-uk-postcode-lat-lng.htm) and used for geographical co-ordinates instead. Initially, Geopunk was scraped for postcode information, however once it was compared to this CSV it was found that many postcodes were missing from the Geopunk source. The csv file alone will be used for all postcode district geographical information as it is considerably easier to work with and any extra work to scrape websites or search for geographical co-ordinates with Geopy is superfluous.

The orignal code for scraping geopunks website and using Geopy for is left below for reference of the first method acheived for geographical location. It has been changed to a Markdown cell to save computational time and can be viewed in the dropdown below.

<details>
<summary>Old code</summary>

```python

# Scrape geopunk for M-Postcode list

# open site to bs4 object
with urlopen("https://geopunk.co.uk/postcode-areas/M") as fp:
    soup = BeautifulSoup(fp)

# Extract list as result set.
# Note the list of postcodes are in links on the website so first all links are extracted.
res_set = soup.find('a').find_all('a')

# Next the text form the links is converted to a list; a pattern match function has been added to the list comprehension so only 'Mx' 
# Postcodes have been extracted.

#Search for postcodes beginning with M then a digit
pattern1 = re.compile("M\d")

#Search for postcodes with a space to make distinction between full and partial post codes
pattern2 = re.compile(" ")

# Create DataFrame (list name is old syntax)
list_PC = pd.DataFrame([link.text for link in res_set if (pattern1.match(link.text) is not None) and (pattern2.search(link.text) is None)],
                      columns = ['Postcode_District'])

# Obtain geographical information using geopy (method1)
MCR_geo1 = [geolocator.geocode({'postalcode':PC, 'city':'Manchester','country':'United Kingdom'}) for PC in list_PC.Postcode_District]

# Obtain geographical information using geopy (method2)
MCR_geo2 = [geolocator.geocode('{}, Manchester, United Kingdom'.format(PC)) for PC in list_PC.Postcode_District]

```

Neither of these methods provided sufficient results for the geographical locations of the postcodes. Further inspection of Nominatim.openstreetmap showed very little mention of postcodes in the Manchester area so this method of obtaining geographical coordinates was abandoned.
    
</details>

Geographical coordinates for the M-Postcode Districts extracted from the postcode outcodes ONS csv file and loaded into a DataFrame.

In [3]:
# Get DataFrame of all UK Postcodes. Note there is some kwargs passed for 
# formatting purposes and ease of use.
URL = 'https://www.freemaptools.com/download/outcode-postcodes/postcode-outcodes.csv'
UK_PC = pd.read_csv(URL, usecols=[1,2,3],header = 0,
                    names = ['Postcode_District', 'Latitude','Longitude'])

# Extract M- only postcodes
MP_ONS = UK_PC.loc[UK_PC.Postcode_District.str.match('M\d')]\
            .reset_index(drop=True)

# Drop Non geographic postcode M61 - See link 5
MP_ONS = MP_ONS[MP_ONS.Postcode_District != 'M61']

This list of postcodes is compared to Wikipedia's [M postcode area](https://en.wikipedia.org/wiki/M_postcode_area)<sup>[6](#6.)</sup> page to ensure no postcodes are missing.

Extract postcodes from Wikipedia:

In [4]:
with urlopen("https://en.wikipedia.org/wiki/M_postcode_area") as fp:
    soup = BeautifulSoup(fp)

In [5]:
# convert to list of strings and copy relevant table to Pandas df
table = cleanhtml(soup)[1]

# table = np.array(table).reshape(len(table)//4,4)
MP_w_f = pd.DataFrame(table, columns = table[0])
MP_w_f = MP_w_f.drop(MP_w_f.index[0])

# Drop Non geographic postcode M61 - See link 5
MP_w_f = MP_w_f[MP_w_f['Postcode district'] != 'M61']

# We can also see in the dataframe that M60 and M99 are non-residential so are
# also dropped
MP_w_f = MP_w_f[(MP_w_f['Postcode district'].str.match('M60') == False)]
MP_w_f = MP_w_f[MP_w_f['Postcode district'] != 'M99']

# Merge M3 Postcode
# Remove post code sector information
cleaner2 = re.compile('\((.*?)\)')
MP_w_f['Postcode district'] = [re.sub(cleaner2, '', i) 
                              for i in MP_w_f['Postcode district']]

# Merge columns
MP_W = MP_w_f.groupby(by = MP_w_f['Postcode district'])\
       .aggregate(lambda x: ', '.join(x)).reset_index()

As can be seen in the comments above some cleaning of the wikipedia table was required to produce a final dataframe of M-Postcodes (MP_W), including the removal of non-residential postcodes. Checking the ONS csv dataframe we see that the M60 and M99 postcodes also exist here. As such, they are also removed from this dataframe.

In [6]:
MP_ONS = MP_ONS[MP_ONS.Postcode_District != 'M60']
MP_ONS = MP_ONS[MP_ONS.Postcode_District != 'M99']

Finally, we check Wikipedia and ONS csv match, first by ensuring they have the same length and then that the sort postcode values match for all indices: 

In [7]:
len(MP_W) == len(MP_ONS)

True

In [8]:
MP_match = [(i, MP_W["Postcode district"].sort_values().iloc[i] 
            == MP_ONS["Postcode_District"].sort_values().iloc[i]) 
            for i in range(len(MP_W))]

Exploring the values for MPC match we can see the 43 Postcodes match.

Mapping Manchster using M1 as centre

In [9]:
MPC_map = folium.Map(location=[MP_ONS.iloc[0,1], MP_ONS.iloc[0,2]], zoom_start=11)

for lat, lng, label in zip(MP_ONS['Latitude'], MP_ONS['Longitude'], MP_ONS['Postcode_District']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(MPC_map)  
    

MPC_map

#### <b>Nearby 'districts'</b>

The final option for determing the areas used in analysis is searching for areas using the HERE api. All areas with the type 'district' whithin 10km of the centre of Manchester were found (88 districts). Note, the definition of districts here may be loosely defined. 

In [10]:
# Find geographical coordinates of Manchester using HERE API on geopy geocode
# function (loaded at beginning of Notebook)

# Note: the API key generated expires after one hour
api_key = 'uwLO-Zkn-5dBE8Wj70tNZ4fr4RGospV1dlli_enI3-o' 
geolocator = Here(apikey=api_key)

m_loc = geolocator.geocode('Manchester')
m_lat = m_loc.latitude
m_lng = m_loc.longitude

In [313]:
# There is no available function for HERE browse entrypoint so a set of search 
# parameters are passed to the entrypoint to find nearby districts

URL = 'https://browse.search.hereapi.com/v1/browse'
PARAMS = {'at':str(m_lat)+','+str(m_lng),'apikey':api_key,
          'in':'circle:53.4794892,-2.2451148;r=10000','types':'district',
          'limit':'100','additionaldata':'IncludeShapeLevel,district'} 

r = requests.get(url = URL, params = PARAMS) 
data = r.json()

In [314]:
data

{'items': [{'title': 'Spinningfields, Salford, England, United Kingdom',
   'id': 'here:cm:namedplace:23796492',
   'resultType': 'locality',
   'localityType': 'district',
   'address': {'label': 'Spinningfields, Salford, England, United Kingdom',
    'countryCode': 'GBR',
    'countryName': 'United Kingdom',
    'state': 'England',
    'countyCode': 'LAN',
    'county': 'Lancashire',
    'city': 'Salford',
    'district': 'Spinningfields',
    'postalCode': 'M3 3'},
   'position': {'lat': 53.48068, 'lng': -2.25242},
   'distance': 272,
   'mapView': {'west': -2.25785,
    'south': 53.47833,
    'east': -2.24796,
    'north': 53.48352}},
  {'title': 'Shopping District, Salford, England, United Kingdom',
   'id': 'here:cm:namedplace:23796491',
   'resultType': 'locality',
   'localityType': 'district',
   'address': {'label': 'Shopping District, Salford, England, United Kingdom',
    'countryCode': 'GBR',
    'countryName': 'United Kingdom',
    'state': 'England',
    'countyCode': 'L

The data object (dict) contains a lot of useful information, for now we are only interested in the name of the district, the geogrpaphical location and the distance from the centre of Manchester so we extract them.

In [None]:
# There is no available function for HERE browse entrypoint so a set of search 
# parameters are passed to the entrypoint to find nearby districts

URL = 'https://browse.search.hereapi.com/v1/browse'
PARAMS = {'at':str(m_lat)+','+str(m_lng),'apikey':api_key,
          'in':'circle:53.4794892,-2.2451148;r=10000','types':'district',
          'limit':'100','additionaldata':'IncludeShapeLevel,district'} 

r = requests.get(url = URL, params = PARAMS) 
data = r.json()

In [12]:
M_dist = pd.DataFrame([[item['title'].split(',')[0],item['position']['lat'],
                        item['position']['lng'],item['distance']] 
                        for item in data['items']],
                        columns = ['District','Latitude','Longitude','Distance'])

Finally, we will save this DataFrame to csv to avoid unnecessarily re-use of the API

In [13]:
# Remove filepath to push

M_dist.to_csv('~\Documents\Data_Science\Python\IBM\BotN\Project Files\Manchester\M_dist.csv')

M_dist can be then directly loaded from the csv file with the following code

```python
   M_dist = pd.read_csv('filepath\M_dist.csv', index_col = 0) 
```

In [510]:
# Remove cell to push

M_dist = pd.read_csv(
    '~\Documents\Data_Science\Python\IBM\BotN\Project Files\Manchester\M_dist.csv', index_col = 0)

In [508]:
M_dist.iloc[11]

District        Hulme
Latitude     53.46724
Longitude    -2.24973
Distance         1376
Name: 11, dtype: object

Getting boundary information

In [511]:
len(M_dist)

85

In [512]:
URL = 'https://reverse.geocoder.ls.hereapi.com/6.2/reversegeocode.json'

wkt_df = pd.DataFrame([], columns=['lonlat'])

for i in range(len(M_dist)):
    PARAMS = {'apikey':api_key,'prox':str(M_dist.iloc[i,1])+','+str(M_dist.iloc[i,2])+','+str(2000),
              'mode':'retrieveAreas','additionaldata':'IncludeShapeLevel,district'}

    r = requests.get(url = URL, params = PARAMS) 
    data = r.json()

    try:
        thisone = data['Response']['View'][0]['Result'][0]['Location']['Shape']['Value']
        wkt_df.loc[i]=thisone
    except:
        pass

HERE's API returns Polygon information in WKT, here we convert it to geojson format

In [503]:
M_dist.loc[1][0]

'Shopping District'

In [513]:
geojson_string = [geojson.dumps(mapping(loads(i))) for i in wkt_df.lonlat]

geojson_dict = [ast.literal_eval(i) for i in geojson_string]

In [515]:
map_test = folium.Map(location=(53.48096,-2.24881), zoom_start=16)
for i in range(len(geojson_dict)):
    folium.GeoJson(geojson_dict[i]).add_to(map_test)
map_test

In [23]:
test_latlon = (M_dist.iloc[0,1],M_dist.iloc[0,2])

geolocator.reverse(test_latlon)

Location(Gartside Street, Manchester, M60 9, United Kingdom, Manchester, England M60 9, GBR, (53.4806815, -2.2524236, 0.0))

Mapping HERE API districts

### Desirability

The next task will be to determine what features will be used to determine the desirabilty of the given areas. Some examples of this may be:
* Public Transport
* Pubs and Restaraunts
* Green Spaces
* Average age of community (i.e. do similar people live in the area)
* Landmarks and points of interest (e.g Old Trafford, City of Manchester Stadium)
* Museums and Libraries
* Gyms and Leisure
* Access to Healthcare

## References


ACTUAL LIFESAVER
https://gist.github.com/drmalex07/5a54fc4f1db06a66679e


##### 1. 
https://ilovemanchester.com/manchester-and-salford-so-whats-the-difference
##### 2. 
https://geopunk.co.uk/council/Manchester-District-(B)
##### 3. 
https://www.geopunk.co.uk/postcode-areas/M  #### No longer in use
##### 4. 
https://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom#Outward_code
##### 5. 
postcode-outcodes.csv, Office for National Statistics licensed under the Open Government Licence v.3.0, https://www.freemaptools.com/download-uk-postcode-lat-lng.htm
##### 6. 
https://en.wikipedia.org/wiki/M_postcode_area

developer.here.com




https://en.wikipedia.org/wiki/Greater_Manchester <br>

https://en.wikipedia.org/wiki/Transport_in_Manchester <br>

https://www.cityoftrees.org.uk/explore <br>

[Top](#Discovering-Manchester)