# Relocating to London
## Finding a good school, easy commute and affordable rent
### IBM Data Science Capstone Project
### Table of contents

Introduction

Data

Methodology

Results

Discussion

Conclusion

## Introduction 

In this project we will look at the problem of relocating to London with a family: finding a good school for the kids, and a place to live that is within budget and has a good commute into Central London. We will consider proximity to top state schools, commuting distance to Central London by tube and average rent near tube stations. For our final shortlist, we will perform a clustering analysis for London boroughs based on venues in order to group similar ones together and get a better feel for what they are like.

# Data 
Raw data has been obtained from the following websites:

1)Top (free) state schools: https://www.homesandproperty.co.uk/property-news/where-to-buy-a-new-home-near-a-good-london-state-school-a126836.html


2)Tube stations: commute time into Central London and average weekly rent https://www.totallymoney.com/rent-vs-tube-journey-time/


3)Tube stations: locations https://wiki.openstreetmap.org/wiki/List_of_London_Underground_stations


4)List of London boroughs with locations https://en.wikipedia.org/wiki/List_of_London_boroughs


For our clustering analysis of London boroughs, we will use the Foursquare API to obtain data on venues.

Data has been scraped and stored in excel files for convenience.

## Methodology 
Let's start by loading all the libraries we are going to need.

In [1]:
import numpy as np
import pandas as pd

import requests

#!conda install -c conda-forge geopy --yes

# Convert an address into latitude and longitude values
from geopy.geocoders import Nominatim

# Calculate the geodesic distance between two pairs of latitude and longitude coordinates
from geopy.distance import geodesic

# Import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes
# Map rendering library
import folium

# Let's create a map of London

In [2]:
# Let's first get the geographical coordinates of London.
address = 'London, United Kingdom'

geolocator = Nominatim(user_agent="ldn_explorer")
location = geolocator.geocode(address)
ldn_latitude = location.latitude
ldn_longitude = location.longitude
#print('The geograpical coordinate of London are {}, {}.'.format(latitude, longitude))

In [3]:
# create map of London using latitude and longitude values
map_london = folium.Map(location=[ldn_latitude, ldn_longitude], zoom_start=10)

map_london


Top London state schools
We'll load the data on top state schools. Schools are ranked based on the percentage of pupils that achieves grade 5 or above in English and maths. We'll use the geopy library to get latitude and longitude coordinates and display the schools on our map.

In [7]:
df = pd.read_excel("df_Schools.xlsx")
df

Unnamed: 0,School,Borough,Post code,% of pupils with grade 5 in English and maths,latitude,longitude
0,The henrieta barnett,Barnet,NW11 7BN,100,51.58,-0.19
1,Queen elizabths,Barnet,EN5 4DQ,100,51.66,-0.21
2,Wilson's,Sutton,SM6 9JW,100,51.36,-0.13
3,St michael's catholic,Barnet,N12 7NJ,99,51.61,-0.18
4,Newstead wood,Bromely,BR6 9SA,99,51.37,0.08
5,The Latymer,Enfield,N9 9TN,99,51.63,-0.08
6,The Tiffins girls,Kingston upon Thame,KT2 5PL,99,51.43,-0.3
7,Tiffin,Kingston upon Thames,KT2 6RL,99,51.41,-0.3
8,Wood ford count,Redbridge,IG8 9LA,99,51.61,-0.02
9,Nonsuch high school,Sutton,SM3 8AB,99,51.35,-0.22


In [8]:
df

Unnamed: 0,School,Borough,Post code,% of pupils with grade 5 in English and maths,latitude,longitude
0,The henrieta barnett,Barnet,NW11 7BN,100,51.58,-0.19
1,Queen elizabths,Barnet,EN5 4DQ,100,51.66,-0.21
2,Wilson's,Sutton,SM6 9JW,100,51.36,-0.13
3,St michael's catholic,Barnet,N12 7NJ,99,51.61,-0.18
4,Newstead wood,Bromely,BR6 9SA,99,51.37,0.08
5,The Latymer,Enfield,N9 9TN,99,51.63,-0.08
6,The Tiffins girls,Kingston upon Thame,KT2 5PL,99,51.43,-0.3
7,Tiffin,Kingston upon Thames,KT2 6RL,99,51.41,-0.3
8,Wood ford count,Redbridge,IG8 9LA,99,51.61,-0.02
9,Nonsuch high school,Sutton,SM3 8AB,99,51.35,-0.22


In [10]:
latitude = []
longitude = []
# Get location data
geolocator = Nominatim(user_agent="school_explorer")
for pc in  df['Post code']:
    location = geolocator.geocode(pc)
    latitude.append(location.latitude)
    longitude.append(location.longitude)

df['latitude'] = latitude
df['longitude'] = longitude
df

Unnamed: 0,School,Borough,Post code,% of pupils with grade 5 in English and maths,latitude,longitude
0,The henrieta barnett,Barnet,NW11 7BN,100,51.581261,-0.188661
1,Queen elizabths,Barnet,EN5 4DQ,100,51.655857,-0.213358
2,Wilson's,Sutton,SM6 9JW,100,51.358845,-0.128086
3,St michael's catholic,Barnet,N12 7NJ,99,51.614407,-0.180811
4,Newstead wood,Bromely,BR6 9SA,99,51.366428,0.076602
5,The Latymer,Enfield,N9 9TN,99,51.625648,-0.075625
6,The Tiffins girls,Kingston upon Thame,KT2 5PL,99,51.425145,-0.302895
7,Tiffin,Kingston upon Thames,KT2 6RL,99,51.41177,-0.295623
8,Wood ford count,Redbridge,IG8 9LA,99,51.607303,0.018493
9,Nonsuch high school,Sutton,SM3 8AB,99,51.35462,-0.223683


In [12]:
# Add markers to map
for lat, lng, label in zip(df['latitude'], df['longitude'], df['School']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_london)
    
map_london

# London tube stations
We'll load the data on London tube stations with commute time to central London and average weekly rent and narrow down the options based on commute time and rent budget. We'll then add location data and find the nearest borough for each station.

In [13]:
# Create Pandas dataframe
file = 'tube_stations_travel_time_rent.xlsx'
df_stations = pd.read_excel(file)
df_stations.head()

FileNotFoundError: [Errno 2] No such file or directory: 'tube_stations_travel_time_rent.xlsx'