# Capstone Project - The Battle of Neighborhoods

## Introduction

Nowadays, migration becomes much more easier and popular for each person worldwide. Some people choose to  move to another country to seek new career opportunities, however, some people might be forced to migrate due to war. The reasons can be categorised into
- Security
- Environment
- Stability
- Economics
- Services

According to UK parliament Migration Statistics, there are around 715,000 people migrated into the UK and the UK’s migrant population is concentrated in London. So is the best place to live in London? In which neighborhood is the best place to stay?

## Background

In this project we will discuss Sercurity(crime rate), Economics(employment and salary) and Services(infrastrucetures) comparing in the naighborhoods in London. Provide a suggestion for people who want to move to London after lockdown!
<br>This project is going to answer these questions:
- Is there safe neighborhoods?
- Is employment rate and annual average salary stable?
- Is there enough various infrastructures to live with?

## Datasets

The datasets will include the following data:
1. <b> London boroughs </b>
    <br> Link: https://en.wikipedia.org/wiki/List_of_London_boroughs
    - the data will be scraped from web url.
    - Selected columns: borough, population, coordinates
<br><br>
2. <b> London Crime Summary </b>
    <br> Link: https://data.london.gov.uk/dataset/recorded_crime_summary
    - the Year we will use is 2020.
    - Selected columns: MajorText, LookUp_BoroughName, 202001~202012
<br><br>
3. <b> London boroughs Income </b>
    <br> Link: https://www.mylondon.news/news/zone-1-news/londons-richest-boroughs-average-income-18114728
    - the data will be scraped from web url.
    - Selected columns: Borough, Employees, Self-employed
<br><br>
4. <b> Foursquare location data </b>

In [57]:
!pip install bs4
import pandas as pd
import requests
from bs4 import BeautifulSoup

!pip install geopy
from geopy.geocoders import Nominatim 



## Get London Boroughs Data

In [3]:
url_Boroughs = 'https://en.wikipedia.org/wiki/List_of_London_boroughs'
html_Boroughs = requests.get(url_Boroughs).text
soup_Boroughs = BeautifulSoup(html_Boroughs, 'html.parser')
table_contents_Boroughs=[]
table_Boroughs=soup_Boroughs.find('table')

In [4]:
Borough_data = pd.DataFrame(columns=["Borough", "Population", "Coordinates"])

#Get all rows from the table
for row in table_Boroughs.find_all('tr')[2:]: # in html table row is represented by the tag <tr>
    # Get all columns in each row.
    cols = row.find_all('td') # in html a column is represented by the tag <td>
    Borough = cols[0].text.replace("[note 2]", "").replace("[note 4]", "").strip()
    Population = cols[7].text .strip()
    Coordinates = cols[8].text.strip()
    Borough_data = Borough_data.append({"Borough":Borough, "Population":Population, "Coordinates":Coordinates}, ignore_index=True)
    
    
Borough_data

Unnamed: 0,Borough,Population,Coordinates
0,Barnet,395896,51°37′31″N 0°09′06″W﻿ / ﻿51.6252°N 0.1517°W﻿ /...
1,Bexley,248287,51°27′18″N 0°09′02″E﻿ / ﻿51.4549°N 0.1505°E﻿ /...
2,Brent,329771,51°33′32″N 0°16′54″W﻿ / ﻿51.5588°N 0.2817°W﻿ /...
3,Bromley,332336,51°24′14″N 0°01′11″E﻿ / ﻿51.4039°N 0.0198°E﻿ /...
4,Camden,270029,51°31′44″N 0°07′32″W﻿ / ﻿51.5290°N 0.1255°W﻿ /...
5,Croydon,386710,51°22′17″N 0°05′52″W﻿ / ﻿51.3714°N 0.0977°W﻿ /...
6,Ealing,341806,51°30′47″N 0°18′32″W﻿ / ﻿51.5130°N 0.3089°W﻿ /...
7,Enfield,333794,51°39′14″N 0°04′48″W﻿ / ﻿51.6538°N 0.0799°W﻿ /...
8,Greenwich,287942,51°29′21″N 0°03′53″E﻿ / ﻿51.4892°N 0.0648°E﻿ /...
9,Hackney,281120,51°32′42″N 0°03′19″W﻿ / ﻿51.5450°N 0.0553°W﻿ /...


We need to transform the coordinates into Latitude and Longitude.

In [5]:
geolocator = Nominatim(user_agent="London_explorer")
Borough_data['Coordinate']= Borough_data['Borough'].apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))
Borough_data[['Latitude', 'Longitude']] = Borough_data['Coordinate'].apply(pd.Series)
Borough_data = Borough_data.drop(['Coordinate','Coordinates'], axis=1)
Borough_data.head()

Unnamed: 0,Borough,Population,Latitude,Longitude
0,Barnet,395896,51.65309,-0.200226
1,Bexley,248287,39.969238,-82.936864
2,Brent,329771,32.937346,-87.164718
3,Bromley,332336,51.402805,0.014814
4,Camden,270029,39.94484,-75.119891


In [6]:
Borough_data.shape

(31, 4)

## Get London Crime Summary Data

In [8]:
crime_data = pd.read_csv('MPS Borough Level Crime (most recent 24 months).csv')
crime_data.rename(columns = {'LookUp_BoroughName':'Borough', 'MajorText':'Crime'}, inplace = True)

We will only need the data of the total crimes in 2020, so drop the columns we don't need and get a summary.

In [9]:
crime_data.drop(crime_data.loc[:, '201906':'201912'], inplace = True, axis = 1)
crime_data.drop(crime_data.loc[:, '202101':'202105'], inplace = True, axis = 1)
crime_data.drop(crime_data.columns[[1]], axis = 1, inplace = True)
crime_data['2020_Total_Crime'] = crime_data.sum(axis=1)
crime_data.drop(crime_data.loc[:, '202001':'202012'], inplace = True, axis = 1)
crime_data = pd.pivot_table(crime_data, values = '2020_Total_Crime', index='Borough', columns = 'Crime').reset_index()
crime_data['2020_Total_Crime'] = crime_data.sum(axis=1)
crime_data.head()

Crime,Borough,Arson and Criminal Damage,Burglary,Drug Offences,Miscellaneous Crimes Against Society,Possession of Weapons,Public Order Offences,Robbery,Sexual Offences,Theft,Vehicle Offences,Violence Against the Person,2020_Total_Crime
0,Barking and Dagenham,661.0,609.5,831.5,19.388889,40.0,300.5,368.5,295.0,789.5,585.5,2147.666667,6648.055556
1,Barnet,960.0,1407.5,599.5,23.133333,34.2,447.75,460.5,273.0,1270.25,1287.5,2484.0,9247.333333
2,Bexley,715.0,514.0,422.0,15.0625,24.2,321.75,154.5,186.0,563.25,598.25,1768.666667,5282.679167
3,Brent,1031.5,1064.0,1124.0,22.722222,51.2,525.25,491.5,298.0,1177.75,1092.0,2967.0,9844.922222
4,Bromley,864.0,873.5,635.5,25.214286,36.6,411.75,248.0,250.5,999.0,919.5,2138.666667,7402.230952


## Get London Boroughs Income Data

In [10]:
url_Income = 'https://www.mylondon.news/news/zone-1-news/londons-richest-boroughs-average-income-18114728'
html_Income = requests.get(url_Income).text
soup_Income = BeautifulSoup(html_Income, 'html.parser')
table_Income = soup_Income.find('table')

In [11]:
Income_data = pd.DataFrame(columns=["Borough", "Employee_Income", "Self_employed_Income"])

#Get all rows from the table
for row in table_Income.find_all('tr')[2:]: # in html table row is represented by the tag <tr>
    # Get all columns in each row.
    cols = row.find_all('td') # in html a column is represented by the tag <td>
    Borough = cols[1].text .strip()
    Employees = cols[2].text.replace("£", "").replace(",", "").strip()
    Self = cols[3].text.replace("£", "").replace(",", "").strip()
    Income_data = Income_data.append({"Borough":Borough, "Employee_Income":Employees, "Self_employed_Income":Self}, ignore_index=True)
    #print("{},{},{}".format(Borough,Population,Coordinates))
    
Income_data.head()

Unnamed: 0,Borough,Employee_Income,Self_employed_Income
0,Barking and Dagenham,23500,18100
1,Newham,24000,16500
2,Harrow,24500,17300
3,Hounslow,24800,16500
4,Enfield,25100,15700


## Merge Data

In [12]:
date_merged = Borough_data.merge(Income_data.merge(crime_data, on='Borough'), on='Borough')
date_merged.head()

Unnamed: 0,Borough,Population,Latitude,Longitude,Employee_Income,Self_employed_Income,Arson and Criminal Damage,Burglary,Drug Offences,Miscellaneous Crimes Against Society,Possession of Weapons,Public Order Offences,Robbery,Sexual Offences,Theft,Vehicle Offences,Violence Against the Person,2020_Total_Crime
0,Barnet,395896,51.65309,-0.200226,25600,15700,960.0,1407.5,599.5,23.133333,34.2,447.75,460.5,273.0,1270.25,1287.5,2484.0,9247.333333
1,Bexley,248287,39.969238,-82.936864,26800,16200,715.0,514.0,422.0,15.0625,24.2,321.75,154.5,186.0,563.25,598.25,1768.666667,5282.679167
2,Bromley,332336,51.402805,0.014814,30300,16800,864.0,873.5,635.5,25.214286,36.6,411.75,248.0,250.5,999.0,919.5,2138.666667,7402.230952
3,Camden,270029,39.94484,-75.119891,32600,17300,742.5,1137.5,1005.0,15.588235,33.4,489.5,604.0,284.5,2564.75,758.0,2131.0,9765.738235
4,Croydon,386710,51.371305,-0.101957,25900,16600,1220.5,1171.0,1389.0,28.222222,61.5,579.75,482.5,502.5,1166.75,1264.0,3620.666667,11486.388889


In [13]:
date_merged.shape

(28, 18)

## Let's see the map !

In [14]:
! pip install folium==0.5.0
import folium # plotting library

Collecting folium==0.5.0
  Downloading folium-0.5.0.tar.gz (79 kB)
[K     |████████████████████████████████| 79 kB 2.8 MB/s eta 0:00:011
Building wheels for collected packages: folium
  Building wheel for folium (setup.py) ... [?25ldone
[?25h  Created wheel for folium: filename=folium-0.5.0-py3-none-any.whl size=76240 sha256=ad7091221a0dbbabd9b2b26934401fcc2658054e9315131d0aff56dad7233277
  Stored in directory: /home/jovyan/.cache/pip/wheels/b2/2f/2c/109e446b990d663ea5ce9b078b5e7c1a9c45cca91f377080f8
Successfully built folium
Installing collected packages: folium
Successfully installed folium-0.5.0


In [15]:
address = 'London'
geolocator = Nominatim(user_agent="London_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of London are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of London are 51.5073219, -0.1276474.


In [16]:
# create map of Toronto using latitude and longitude values
map_London = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, population in zip(Borough_data['Latitude'], Borough_data['Longitude'], Borough_data['Borough'],Borough_data['Population']):
    label = 'Borough: {}, Population: {}'.format(borough, population)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius = 10,
        popup=label,
        color='#2F97C1',
        fill=True,
        fill_color='#2F97C1',
        fill_opacity=0.5,
        parse_html=False).add_to(map_London)  
    
map_London

## Get the Foursquare location data

Retreive Foursquare information.

In [17]:
CLIENT_ID = 'IEWFAFLUUSE0LTN5LUQOYARSU13XUZXNGLHL3LPZJTUUOJXG' # your Foursquare ID
CLIENT_SECRET = 'AWFX1LKFB2P030S134BWSAXOUHQSYCZAEBRJ2C1RZV15NPT0' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

In [23]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Borough', 
                  'Borough Latitude', 
                  'Borough Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Get London venues

In [24]:
# type your answer here
London_venues = getNearbyVenues(
                                    names=Borough_data['Borough'],
                                    latitudes=Borough_data['Latitude'],
                                    longitudes=Borough_data['Longitude']
                                  )


Barnet
Bexley
Brent
Bromley
Camden
Croydon
Ealing
Enfield
Greenwich
Hackney
Hammersmith and Fulham
Haringey
Harrow
Havering
Hillingdon
Hounslow
Islington
Kensington and Chelsea
Kingston upon Thames
Lambeth
Lewisham
Merton
Newham
Redbridge
Richmond upon Thames
Southwark
Sutton
Tower Hamlets
Waltham Forest
Wandsworth
Westminster


In [25]:
print(London_venues.shape)
London_venues.head()

(1092, 7)


Unnamed: 0,Borough,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Barnet,51.65309,-0.200226,Ye Old Mitre Inne,51.65294,-0.199507,Pub
1,Barnet,51.65309,-0.200226,Caffè Nero,51.654861,-0.201743,Coffee Shop
2,Barnet,51.65309,-0.200226,The Black Horse,51.653075,-0.206719,Pub
3,Barnet,51.65309,-0.200226,Waterstones,51.655368,-0.202607,Bookstore
4,Barnet,51.65309,-0.200226,Domino's Pizza,51.652675,-0.198837,Pizza Place


We can explore in which borough having the most of the venues.
<br> Let's see the top 5 borough.

In [56]:
London_venues_count = London_venues[['Borough','Venue']].groupby('Borough', as_index=False).count()
#London_venues_count.sort_values('Venue', ascending=False).head()
London_venues_count = London_venues_count.sort_values('Venue', ascending=False).head()
#",".join(London_venues_count['Borough'].unique())
print('Top 5 boroughs with the most venues are {}.'.format(", ".join(London_venues_count['Borough'].unique())))

Top 5 boroughs with the most venues are Southwark, Kingston upon Thames, Ealing, Hammersmith and Fulham, Islington.
