# Relocating to London
### *Finding a good school, easy commute and affordable rent*
### IBM Data Science Capstone Project

## Table of contents
* [Introduction](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Results](#results)
* [Discussion](#discussion)
* [Conclusion](#conclusion)

## Introduction <a name="introduction"></a>

In this project we will look at the problem of relocating to London with a family: finding a good school for the kids, and a place to live that is within budget and has a good commute into Central London. We will consider proximity to top state schools, commuting distance to Central London by tube and average rent near tube stations. For our final shortlist, we will perform a clustering analysis for London boroughs based on venues in order to group similar ones together and get a better feel for what they are like.

## Data <a name="data"></a>

Raw data has been obtained from the following websites:

1. Top (free) state schools: https://www.homesandproperty.co.uk/property-news/where-to-buy-a-new-home-near-a-good-london-state-school-a126836.html

2. Tube stations: commute time into Central London and average weekly rent https://www.totallymoney.com/rent-vs-tube-journey-time/

3. Tube stations: locations https://wiki.openstreetmap.org/wiki/List_of_London_Underground_stations

4. List of London boroughs with locations https://en.wikipedia.org/wiki/List_of_London_boroughs 

For our clustering analysis of London boroughs, we will use the Foursquare API to obtain data on venues.

Data has been scraped and stored in excel files for convenience.

## Methodology <a name="methodology"></a>

Let's start by loading all the libraries we are going to need.

In [20]:
import numpy as np
import pandas as pd

import requests

#!conda install -c conda-forge geopy --yes

# Convert an address into latitude and longitude values
from geopy.geocoders import Nominatim

# Calculate the geodesic distance between two pairs of latitude and longitude coordinates
from geopy.distance import geodesic

# Import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes
# Map rendering library
import folium 

#### Let's create a map of London

In [21]:
# Let's first get the geographical coordinates of London.
address = 'London, United Kingdom'

geolocator = Nominatim(user_agent="ldn_explorer")
location = geolocator.geocode(address)
ldn_latitude = location.latitude
ldn_longitude = location.longitude
#print('The geograpical coordinate of London are {}, {}.'.format(latitude, longitude))

In [22]:
# create map of London using latitude and longitude values
map_london = folium.Map(location=[ldn_latitude, ldn_longitude], zoom_start=10)

map_london

#### Top London state schools
We'll load the data on top state schools. Schools are ranked based on the percentage of pupils that achieves grade 5 or above in English and maths. We'll use the geopy library to get latitude and longitude coordinates and display the schools on our map.

In [23]:
# Create Pandas dataframe
file = 'top_London_state_schools.xlsx'
df_schools = pd.read_excel(file)

# Get location data
geolocator = Nominatim(user_agent="school_explorer")

latitude = []
longitude = []

for pc in  df_schools['Post code']:
    location = geolocator.geocode(pc)
    latitude.append(location.latitude)
    longitude.append(location.longitude)

df_schools['latitude'] = latitude
df_schools['longitude'] = longitude
df_schools

Unnamed: 0,School,Borough,Post code,% of pupils with grade 5 in English and maths,latitude,longitude
0,The Henrietta Barnett School,Barnet,NW11 7BN,100,51.58,-0.19
1,Queen Elizabeth's School,Barnet,EN5 4DQ,100,51.66,-0.21
2,Wilson's School,Sutton,SM6 9JW,100,51.36,-0.13
3,St Michael's Catholic Grammar School,Barnet,N12 7NJ,99,51.61,-0.18
4,Newstead Wood School,Bromley,BR6 9SA,99,51.37,0.08
5,The Latymer School,Enfield,N9 9TN,99,51.63,-0.08
6,The Tiffin Girls' School,Kingston upon Thames,KT2 5PL,99,51.43,-0.3
7,Tiffin School,Kingston upon Thames,KT2 6RL,99,51.41,-0.3
8,Woodford County High School,Redbridge,IG8 9LA,99,51.61,0.02
9,Nonsuch High School for Girls,Sutton,SM3 8AB,99,51.35,-0.22


Let's add the schools to our London map

In [24]:
# Add markers to map
for lat, lng, label in zip(df_schools['latitude'], df_schools['longitude'], df_schools['School']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_london)
    
map_london

#### London tube stations
We'll load the data on London tube stations with commute time to central London and average weekly rent and narrow down the options based on commute time and rent budget. We'll then add location data and find the nearest borough for each station.

In [25]:
# Create Pandas dataframe
file = 'tube_stations_travel_time_rent.xlsx'
df_stations = pd.read_excel(file)
df_stations.head()

Unnamed: 0,Line,Tube station,Time (mins),Weekly Rent (£)
0,Piccadilly,Acton Town,18,308
1,District,Acton Town,5,308
2,Metropolitan,Aldgate,15,442
3,District,Aldgate East,61,428
4,Hammersmith&City,Aldgate East,22,428


We are going to ignore all the slower connections to Central London from a tube station. We will consider tube stations with a commute time into Central London of 45 minutes or less and an average weekly rent of £500 or less.

In [26]:
# Ignore slower connections from a given tube station
df_stations2=df_stations.groupby(['Tube station'], as_index=False).min()

# Consider only tube stations with an average weekly rent of £500 or less and a max commute time of 45 minutes
df_stations2=df_stations2[(df_stations2['Time (mins)'] <= 45) & (df_stations2['Weekly Rent (£)'] <= 500)] 
df_stations2.reset_index(drop=True, inplace=True)

print("The number of tube stations to consider is " + str(df_stations2.shape[0]))

The number of tube stations to consider is 218


Let's add location data to our tube stations.

In [27]:
# Load file with location data for tube stations
file = 'tube_stations.xlsx'
df_stations_locations = pd.read_excel(file)
df_stations_locations = df_stations_locations.drop(['Zone', 'Postcode', 'Easting', 'Northing'], axis=1)
df_stations3=df_stations2.join(df_stations_locations.set_index('Tube station'), on='Tube station')
column = ['Tube_station', 'Line', 'Time','Weekly_rent', 'Latitude', 'Longitude']
df_stations3.columns=column

#missing=df_stations3[df_stations3.isnull().any(axis=1)]
#missing

df_stations3.head(5)

Unnamed: 0,Tube_station,Line,Time,Weekly_rent,Latitude,Longitude
0,Acton Town,District,5,308,51.5,-0.28
1,Aldgate,Metropolitan,15,442,51.51,-0.08
2,Aldgate East,District,22,428,51.52,-0.07
3,Alperton,Piccadilly,28,271,51.54,-0.3
4,Angel,Northern,13,424,51.53,-0.11


Now, let's add the nearest borough for each tube station.

In [28]:
file = 'borough_coordinates.xlsx'
df_boroughs = pd.read_excel(file)

boroughs=[]

for station in df_stations3.itertuples():
    station_loc = (station.Latitude, station.Longitude)
    min_dist = 100
    borough_name = ""
    for borough in df_boroughs.itertuples():
        borough_loc = (borough.Latitude, borough.Longitude)
        distance = geodesic(borough_loc, station_loc).miles
        if (distance < min_dist):
            min_dist = distance
            borough_name = borough.Borough
    boroughs.append(borough_name)

df_stations3['Borough'] = boroughs
df_stations3.head(5)

Unnamed: 0,Tube_station,Line,Time,Weekly_rent,Latitude,Longitude,Borough
0,Acton Town,District,5,308,51.5,-0.28,Ealing
1,Aldgate,Metropolitan,15,442,51.51,-0.08,Southwark
2,Aldgate East,District,22,428,51.52,-0.07,Southwark
3,Alperton,Piccadilly,28,271,51.54,-0.3,Brent
4,Angel,Northern,13,424,51.53,-0.11,Islington


#### Finding schools with qualifying tube connections that are within range
Next, we'll find the nearest tube station for each school before deciding on the maximum walking/driving distance we'll allow to a qualifying tube connection. 

In [29]:
# Let's find the minimum distance for each school to a suitable tube station
schools=[]
stations=[]
distances=[]

for school in df_schools.itertuples():
    school_loc = (school.latitude, school.longitude)
    min_dist = 100
    station_name = ""
    for station in df_stations3.itertuples():
        tube_loc = (station.Latitude, station.Longitude)
        distance = geodesic(school_loc, tube_loc).miles
        if (distance < min_dist):
            min_dist = distance
            station_name = station.Tube_station
    schools.append(school.School)
    stations.append(station_name)
    distances.append(min_dist)

list_of_tuples = list(zip(schools, stations, distances))
column=['School', 'Nearest tube station', 'Distance (mi)']
results = pd.DataFrame(list_of_tuples, columns = column)
pd.options.display.float_format = '{:,.2f}'.format
results

Unnamed: 0,School,Nearest tube station,Distance (mi)
0,The Henrietta Barnett School,Golders Green,0.68
1,Queen Elizabeth's School,High Barnet,0.87
2,Wilson's School,Morden,4.16
3,St Michael's Catholic Grammar School,Woodside Park,0.31
4,Newstead Wood School,North Greenwich,9.77
5,The Latymer School,Southgate,2.29
6,The Tiffin Girls' School,Richmond,2.63
7,Tiffin School,Richmond,3.57
8,Woodford County High School,Woodford,0.67
9,Nonsuch High School for Girls,Morden,3.52


We can now see that the distance to the nearest tube station ranges from 0.31 to 9.77 miles. We decide that we'll allow for a maximum distance of three miles, which narrows down the number of schools to consider to four: The Henrietta Barnett School, Queen Elizabeth's School, St Michael's Catholic Grammar School, and Woodford Country High School.

Next, we'll find all the tube stations for those schools that are within 2 miles of a tube station.

In [30]:
# Maximum distance from school to tube station
max_dist=2

schools=[]
boroughs=[]
lats=[]
longs=[]
stations=[]
borough_stations=[]
distances=[]
latitudes=[]
longitudes=[]
time = []
rent = []

for school in df_schools.itertuples():
    school_loc = (school.latitude, school.longitude)
    for station in df_stations3.itertuples():
        tube_loc = (station.Latitude, station.Longitude)
        distance = geodesic(school_loc, tube_loc).miles
        if (distance <= max_dist):
            schools.append(school.School)
            boroughs.append(school.Borough)
            lats.append(school.latitude)
            longs.append(school.longitude)
            stations.append(station.Tube_station)
            borough_stations.append(station.Borough)
            distances.append(distance)
            latitudes.append(station.Latitude)
            longitudes.append(station.Longitude)
            time.append(station.Time)
            rent.append(station.Weekly_rent)

list_of_tuples = list(zip(schools, boroughs, lats, longs, stations, borough_stations, distances, latitudes, longitudes, time, rent))
column=['School', 'Borough', 'Lat', 'Lon', 'Tube station', 'Borough station', \
        'Distance (mi)', 'Latitude', 'Longitude', 'Time to central London (mins)', 'Weekly rent (£)']
df_results = pd.DataFrame(list_of_tuples, columns = column)
df_results

Unnamed: 0,School,Borough,Lat,Lon,Tube station,Borough station,Distance (mi),Latitude,Longitude,Time to central London (mins),Weekly rent (£)
0,The Henrietta Barnett School,Barnet,51.58,-0.19,Brent Cross,Brent,1.11,51.58,-0.21,21,257
1,The Henrietta Barnett School,Barnet,51.58,-0.19,East Finchley,Haringey,1.11,51.59,-0.16,19,288
2,The Henrietta Barnett School,Barnet,51.58,-0.19,Finchley Central,Barnet,1.37,51.6,-0.19,23,273
3,The Henrietta Barnett School,Barnet,51.58,-0.19,Golders Green,Brent,0.68,51.57,-0.19,18,307
4,The Henrietta Barnett School,Barnet,51.58,-0.19,Hampstead,Camden,1.77,51.56,-0.18,14,365
5,The Henrietta Barnett School,Barnet,51.58,-0.19,Hendon Central,Brent,1.64,51.58,-0.23,23,255
6,The Henrietta Barnett School,Barnet,51.58,-0.19,Highgate,Haringey,1.81,51.58,-0.15,17,307
7,The Henrietta Barnett School,Barnet,51.58,-0.19,West Finchley,Barnet,1.94,51.61,-0.19,25,280
8,Queen Elizabeth's School,Barnet,51.66,-0.21,High Barnet,Barnet,0.87,51.65,-0.19,32,252
9,St Michael's Catholic Grammar School,Barnet,51.61,-0.18,East Finchley,Haringey,2.0,51.59,-0.16,19,288


Let's put our selected schools with suitable tube connections on a new map.

In [47]:
map_london_final = folium.Map(location=[ldn_latitude, ldn_longitude], zoom_start=10)

df_results_stations = df_results[['Tube station', 'Latitude', 'Longitude']].groupby(['Tube station'], as_index=False).min()

# Add station markers to map
for lat, lng, label in zip(df_results_stations['Latitude'], df_results_stations['Longitude'], df_results_stations['Tube station']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_london_final)

df_results_schools = df_results[['School', 'Lat', 'Lon']].groupby(['School'], as_index=False).min()

# Add school markers to map
for lat, lng, label in zip(df_results_schools['Lat'], df_results_schools['Lon'], df_results_schools['School']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_london_final)

map_london_final

We now have a pretty good idea of our options. The four schools that we selected are all located in the North London borougsh of Barnet and Redbridge. However, the tube stations that are within range add Brent, Camden, Haringey and Waltham Forest to our list of boroughs to consider.

In order to make a final decision, we would like to know what London boroughs are like. We will perform a clustering analysis based on the top venues in each borough and use the Foursquare API.

#### Exploring London boroughs (clustering analysis)
We'll use the location data on London boroughs to connect to the Foursquare API and find 100 venues within a 500 meter range. We'll then consider the top ten most common venue categories per borough and group the boroughs into five clusters to get an idea of what they are like.  

Let's set up our Foursquare connection and create a function to retrieve the borough venues.

In [41]:
# Foursquare credentials
CLIENT_ID = '4KO34XDEDEUIF2DSKPTQ4YKDHQPB1HU53SP0B0JOTBYB4PPK'
CLIENT_SECRET = 'RDTYQXTN3KO2MMO10NDVJZX4UKKVERPHM2ZGIGXSX3XHPGZY'
VERSION = '20190902'

# limit of number of venues returned by Foursquare API
LIMIT = 100
# Define radius
radius = 500 

In [42]:
# Function to retrieve venues
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
           
        # Create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # Make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # Return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Borough', 
                  'Borough Latitude', 
                  'Borough Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [43]:
# Create dataframe with nearby venues in London boroughs 
london_venues = getNearbyVenues(names=df_boroughs['Borough'],
                                   latitudes=df_boroughs['Latitude'],
                                   longitudes=df_boroughs['Longitude']
                                  )
# Analyse each borough

# One hot encoding
london_onehot = pd.get_dummies(london_venues[['Venue Category']], prefix="", prefix_sep="")

# Add neighborhood column back to dataframe
london_onehot['Borough'] = london_venues['Borough'] 

# move neighborhood column to the first column
fixed_columns = [london_onehot.columns[-1]] + list(london_onehot.columns[:-1])
london_onehot = london_onehot[fixed_columns]

london_onehot.head()

# let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
london_grouped = london_onehot.groupby('Borough').mean().reset_index()

In [44]:
# Function to sort the venues in descending order.
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

# Create a new dataframe and display the top 10 venues for each neighborhood
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Borough']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
boroughs_venues_sorted = pd.DataFrame(columns=columns)
boroughs_venues_sorted['Borough'] = london_grouped['Borough']

for ind in np.arange(london_grouped.shape[0]):
    boroughs_venues_sorted.iloc[ind, 1:] = return_most_common_venues(london_grouped.iloc[ind, :], num_top_venues)

boroughs_venues_sorted

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barking and Dagenham,Gym / Fitness Center,Park,Pool,Supermarket,Golf Course,Martial Arts Dojo,Bus Station,Dog Run,Film Studio,Dessert Shop
1,Barnet,Café,Bus Stop,Yoga Studio,Fast Food Restaurant,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Court,Flea Market
2,Bexley,Pub,Coffee Shop,Italian Restaurant,Clothing Store,Fast Food Restaurant,Supermarket,Plaza,Video Game Store,Chinese Restaurant,Sandwich Place
3,Brent,Coffee Shop,Hotel,Clothing Store,Bar,Sporting Goods Shop,American Restaurant,Grocery Store,Indian Restaurant,Italian Restaurant,Sandwich Place
4,Bromley,Coffee Shop,Clothing Store,Pizza Place,Burger Joint,Gym / Fitness Center,Bar,Ice Cream Shop,Sandwich Place,Stationery Store,Mediterranean Restaurant
5,Camden,Coffee Shop,Hotel,Pub,Café,Burger Joint,Italian Restaurant,Pizza Place,Sushi Restaurant,Modern European Restaurant,Restaurant
6,Croydon,Pub,Coffee Shop,Asian Restaurant,Gym / Fitness Center,Gaming Cafe,Breakfast Spot,Brewery,Malay Restaurant,Spanish Restaurant,Burger Joint
7,Ealing,Coffee Shop,Clothing Store,Italian Restaurant,Park,Bakery,Pub,Hotel,Burger Joint,Bus Stop,Café
8,Enfield,Coffee Shop,Clothing Store,Pharmacy,Mobile Phone Shop,Shopping Mall,Pub,Bookstore,Supermarket,Gift Shop,Department Store
9,Greenwich,Pub,Supermarket,Coffee Shop,Fast Food Restaurant,Hotel,Sandwich Place,Clothing Store,Plaza,Grocery Store,Pizza Place


In [45]:
#Cluster neighborhoods

# set number of clusters
kclusters = 5

london_grouped_clustering = london_grouped.drop('Borough', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(london_grouped_clustering)

In [46]:
# add clustering labels
boroughs_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

london_merged = df_boroughs

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
london_merged = london_merged.join(boroughs_venues_sorted.set_index('Borough'), on='Borough')

london_merged

Unnamed: 0,Borough,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barking and Dagenham,51.56,0.16,1,Gym / Fitness Center,Park,Pool,Supermarket,Golf Course,Martial Arts Dojo,Bus Station,Dog Run,Film Studio,Dessert Shop
1,Barnet,51.63,-0.15,2,Café,Bus Stop,Yoga Studio,Fast Food Restaurant,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Court,Flea Market
2,Bexley,51.45,0.15,1,Pub,Coffee Shop,Italian Restaurant,Clothing Store,Fast Food Restaurant,Supermarket,Plaza,Video Game Store,Chinese Restaurant,Sandwich Place
3,Brent,51.56,-0.28,1,Coffee Shop,Hotel,Clothing Store,Bar,Sporting Goods Shop,American Restaurant,Grocery Store,Indian Restaurant,Italian Restaurant,Sandwich Place
4,Bromley,51.4,0.02,1,Coffee Shop,Clothing Store,Pizza Place,Burger Joint,Gym / Fitness Center,Bar,Ice Cream Shop,Sandwich Place,Stationery Store,Mediterranean Restaurant
5,Camden,51.53,-0.13,1,Coffee Shop,Hotel,Pub,Café,Burger Joint,Italian Restaurant,Pizza Place,Sushi Restaurant,Modern European Restaurant,Restaurant
6,Croydon,51.37,-0.1,1,Pub,Coffee Shop,Asian Restaurant,Gym / Fitness Center,Gaming Cafe,Breakfast Spot,Brewery,Malay Restaurant,Spanish Restaurant,Burger Joint
7,Ealing,51.51,-0.31,1,Coffee Shop,Clothing Store,Italian Restaurant,Park,Bakery,Pub,Hotel,Burger Joint,Bus Stop,Café
8,Enfield,51.65,-0.08,1,Coffee Shop,Clothing Store,Pharmacy,Mobile Phone Shop,Shopping Mall,Pub,Bookstore,Supermarket,Gift Shop,Department Store
9,Greenwich,51.49,0.06,1,Pub,Supermarket,Coffee Shop,Fast Food Restaurant,Hotel,Sandwich Place,Clothing Store,Plaza,Grocery Store,Pizza Place


The clustering labels show which boroughs are grouped together. Perhaps unsurprisingly, most boroughs in Greater London have very similar amenities and most boroughs are grouped together in the same cluster (1). Barnet, Harrow, Hounslow, and Newham are a bit more distinct.  

## Results <a name="results"></a>

On the basis of data collected from several websites, including geo location data, we have been able to create a shortlist of schools that have a good tube connection to Central London and are in areas that fall within our rent budget.

The four schools that we selected are all in North London: The Henrietta Barnett School, Queen Elizabeth's School, St Michael's Catholic Grammar School, and Woodford Country High School. All schools are located in Barnet or Redbridge. However, the tube stations that are within range add Brent, Camden, Haringey and Waltham Forest to our list of boroughs to consider.

In order to make a final decision on where we would like to live, we retrieved venues for each London boroughs using the Foursquare API and performed a clustering analysis based on venue categories. We found that most boroughs in London are hard to distinguish as they offer very similar amenities.

## Discussion <a name="discussion"></a>

Looking at only the top ten state schools, only four schools in London meet our criteria and the remaining six schools in South London are excluded because of poor connectivity to Central London. We could consider a longer list of schools to increase the probability of admission and have not considered school specific admission criteria that are likely to differ somewhat from school to school.

Amenities in London are generally very good and quite similar for most boroughs so we could consider other data sources and explore new features such as house prices, average income levels, average school ratings and private schools. 

## Conclusion <a name="conclusion"></a>

Based on website and geo location data, we have been able to find the 'sweet spot' of free state schools in areas that fall within our rent budget and provide a good connection to Central London.

This provides a useful starting point for possible extensions that could include:
- a wider set of schools,
- selection of multiple schools,
- admission criteria,
- new features that may provide a more distinct grouping of boroughs.