# Battle of Neighborhoods
## Capstone project assignment - Samuel Mensah

In [3]:
# import necessary packages
from bs4 import BeautifulSoup
import requests
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
from IPython.display import HTML
import scipy.spatial as spatial
import json
from geopy.geocoders import Nominatim     # convert an address into latitude and longitude values
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium # map rendering library
print('Libraries imported.')

Libraries imported.


### Business Problem

KPMG Ghana is an independent member firm of the KPMG Global Network. Yearly, KPMG Ghana allows their colleagues to be transferred to partner firms in either Leeds or Manchester in the UK. However, in doing so, they need information about the neighborhoods of Leeds or Manchester, to enable their colleagues to make informed decisions since they will be staying in either of these places for a long time.

When moving to a distant country, it is important to get relevant information about the city you are to reside in. Colleagues at KPMG Ghana who have not been to these cities and are used to their environments, will need to know before hand if the city have neighborhoods which can provide them with the services or activities they need, else a hesitant colleague may easily decide not to go because of fear of the unknown. For example, they might be interested in a specific type of cuisine like an African cuisine. Does any of the neighborhood in Leeds or Manchester have African restaurants?

Can we cluster the neighborhoods of Manchester and Leeds according to venues to determine the best city which will be suitable for a KPMG Ghana colleague who has specific preferences?

### Data Description

This problem requires a location data for Manchester and Leeds, along with venues within the neighborhood of Leeds so as to inform colleagues of KPMG Ghana about the services, recreational centres, malls, shops, gyms, parks and other venues which might be of interest.

1. First of all we need location data for Leeds and Manchester. We can find this data online. The link below
    - Leeds Data: https://en.wikipedia.org/wiki/List_of_places_in_Leeds
    - Manchester: https://en.wikipedia.org/wiki/M_postcode_area

#### Lets view the data for Leeds from the link

In [6]:
leeds_html_url = 'https://en.wikipedia.org/wiki/List_of_places_in_Leeds'
leeds_source   = requests.get(leeds_html_url).text
leeds_soup     = BeautifulSoup(leeds_source, 'html.parser')

# find table using soup object
leeds_table = leeds_soup.find("table", class_="wikitable sortable") 
# get all rows from table
leeds_all_rows    = leeds_table.find_all('tr')
# loop over rows and get placenames for Leeds
for count, row in enumerate(leeds_all_rows):
    if count == 0: 
        column_names  = ['','Neighborhood','Leeds City Council Ward','Parliamentary Constituency',\
                         'Management Area','PostCode','PostTown','Pre-1974 authority','Note','']
        leeds_hoods = pd.DataFrame(columns=column_names) # instantiate dataframe
    else:
        leeds_hoods.loc[count] = row.text.split('\n')
leeds_hoods.head()

Unnamed: 0,Unnamed: 1,Neighborhood,Leeds City Council Ward,Parliamentary Constituency,Management Area,PostCode,PostTown,Pre-1974 authority,Note,Unnamed: 10
1,,Aberford,Harewood,Elmet & Roth.,NE Outer,LS25,LEEDS,Tadcaster RD,,
2,,Adel,Adel & Wharfedale,Leeds NW,NW Outer,LS16,LEEDS,Leeds CB,,
3,,Adwalton,Morley North,Morley & Out.,S Outer,BD11,BRADFORD,Morley MB,,
4,,Ainsty,Wetherby,Elmet & Roth.,NE Outer,LS22,WETHERBY,Wetherby RD,,
5,,Aireborough,,,,,LEEDS,Aireborough UD,,


#### Lets view the data for Manchester from the link

In [8]:
man_html_url = 'https://en.wikipedia.org/wiki/M_postcode_area'
man_source   = requests.get(man_html_url).text
man_soup     = BeautifulSoup(man_source, 'html.parser')

# find table using soup object
man_table     = man_soup.find("table", class_="wikitable sortable") 
# get all rows from table
man_all_rows  = man_table.find_all('tr')
# loop over rows and get placenames for Leeds
for count, row in enumerate(man_all_rows):
    if count == 0: 
        column_names  = ['','PostCode', '', 'PostTown', '', 'Neighborhood', '', 'Area', '']
        man_hoods = pd.DataFrame(columns=column_names) # instantiate dataframe
    else:
        man_hoods.loc[count] = row.text.split('\n')
man_hoods.head()

Unnamed: 0,Unnamed: 1,PostCode,Unnamed: 3,PostTown,Unnamed: 5,Neighborhood,Unnamed: 7,Area,Unnamed: 9
1,,M1,,MANCHESTER,,"Piccadilly, City Centre, Market Street",,Manchester,
2,,M2,,MANCHESTER,,"Deansgate, City Centre",,Manchester,
3,,"M3(Sectors 1, 2, 3, 4 and 9)",,MANCHESTER,,"City Centre, Deansgate, Castlefield",,Manchester,
4,,"M3(Sectors 5, 6 and 7)",,SALFORD,,"City Centre, Blackfriars, Greengate, Trinity",,Salford,
5,,M4,,MANCHESTER,,"Ancoats, Northern Quarter, Strangeways",,Manchester,


#### First View of the data

As a data scientist it is crucial to know if the datas can solve the problem.

As we are interested in the venues of neighborhoods, we find that the Leeds and Manchester data contain the names of neighborhood, post town, postcode, and other attributes. From the first hand, what we need are 
    - neighborhood name, 
    - venues
    - postcode, 
    - post town, 
    - and geographical coordinates for Manchester and Leeds. 
    
However, we find that venues and geographical coordinates are missing in the data.  
Luckily, python provides a library called **geopy** which can transform an address to latitudes and longitudes  
    - We exploit this library for our work.  
Moreover **FourSquare** provides venues for places or neighborhoods in their developer portal.  
    - We exploit FourSquare to get venues of neighborhoods

Moreover, the Manchester and Leeds Data will need some preprocessing as we find a lot of duplicates even in the first five rows of Neigborhood.

#### Lets view venues in the centre of Leeds using FourSquare

In [10]:
CLIENT_ID = ' '#'your-client-ID' # your Foursquare ID
CLIENT_SECRET = ' ' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: CLIENT_SECRET: 

##### Get the coordinates of Leeds

In [12]:
address = 'Leeds, UK'

geolocator = Nominatim(user_agent="foursquare_agent")
location   = geolocator.geocode(address)
latitude   = location.latitude
longitude  = location.longitude
print(latitude, longitude)

53.7974185 -1.5437941


Create url for foursquare

In [13]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=&client_secret=&v=20180604&ll=53.7974185,-1.5437941&radius=500&limit=100'

In [14]:
results = requests.get(url).json()

In [15]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Preprocess and list up to 100 venues around Leeds

In [17]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head(10)

Unnamed: 0,name,categories,lat,lng
0,Victoria Quarter,Shopping Mall,53.79817,-1.540943
1,Trinity Leeds,Shopping Mall,53.796525,-1.543937
2,Headrow House,Bar,53.798837,-1.541118
3,Whitelocks Ale House,Pub,53.79727,-1.542922
4,Patisserie Valerie,Café,53.797691,-1.545136
5,The LEGO Store,Toy / Game Store,53.796535,-1.544166
6,Apple Trinity Leeds,Electronics Store,53.796711,-1.543848
7,Mrs Athas,Coffee Shop,53.796542,-1.541268
8,Laynes Espresso,Coffee Shop,53.795323,-1.544939
9,Waterstones,Bookstore,53.798872,-1.545217


#### Lets view venues in the centre of Manchester using FourSquare

In [19]:
address = 'Manchester, UK'

location   = geolocator.geocode(address)
manlatitude   = location.latitude
manlongitude  = location.longitude
print(latitude, longitude)

53.7974185 -1.5437941


In [20]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    manlatitude, 
    manlongitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=&client_secret=&v=20180604&ll=53.4794892,-2.2451148&radius=500&limit=100'

In [21]:
man_results = requests.get(url).json()

In [22]:
venues = man_results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head(10)

Unnamed: 0,name,categories,lat,lng
0,Albert Square,Plaza,53.479465,-2.245202
1,Piccolino,Italian Restaurant,53.479971,-2.244511
2,El Gato Negro Tapas,Tapas Restaurant,53.481092,-2.245695
3,My Thai,Thai Restaurant,53.480385,-2.245574
4,BrewDog Manchester,Beer Bar,53.478083,-2.247319
5,King Street Townhouse,Hotel,53.480226,-2.243271
6,Manchester Art Gallery,Art Gallery,53.478882,-2.241817
7,Rapha Cycle Club,Bike Shop,53.481246,-2.245734
8,Albert's Schloss,Bar,53.478178,-2.247747
9,The Octagon at The Midland,Lounge,53.477624,-2.244868


We find interesting venues for both Manchester City Centre and Leeds City Centre,  
however we are interested in the venues of neigborhoods as well. In the coming days   
we will proceed to analyse the data with visualization using maps, and perform clustering  
and other techniques to extract knowledge from the data