# Capstone Project - The Battle of Neighbourhoods (Week 1 & 2)
<h2><center>Property Prices & Venue Data Analysis of London</center></h2>


## 1. Introduction

### 1.1. Background
It goes without saying that the coronavirus (COVID-19) has had, is currently and will continue to have a significant impact on businesses and the economy worldwide. This is evident with stock market and oil prices crash, record breaking number of people filing for unemployment and major airlines on the brink of adminstration.

The Real Estate & Property market is no exception to the coronavirus impact, with the London property market coming to a halt back in March when the full lock down was announced to prevent the spread of the virus. Physical viewings were postponed, constructions were suspended and estate agents & morgage lenders no longer able to value properties in person.

As a result Zoopla has predicted that completed sales in the UK will be 50% lower in 2020 than in 2019 and Knight Frank has also predicted that the number of sales in Greater London will fall by 35%. However despite the bleak outlook for property and housing prices this year, a large number of firms & their analysts believe that the housing market could make a very strong recovery by 2021, with an estimated range of 3% - 6%.

### 1.2. Business Problem
The best decisions are often backed up by insight and data,  by utilising Machine Learning we can effectively and efficiently generate those insights in order to provide potential home-buyers and investors the best decision making support as possible. This brings us to our business problem: How can we generate insight so home-buyers and investors can make well informed choices when purchasing properties in London, especially in this uncertain economic situation?

In order to solve this business problem, we will cluster the London areas based on the average sales price, local venues and amenities, i.e. schools, supermarkets, coffee shops. We will then compare these clusters with the average property prices and rental prices for each borough, and also calculate the rental yield for each cluster for investors who are buying to let. This will provide valuable information on whether a property is a viable choice for home-buyers & investors.

## 2. Data Acquisition

### 2.1. Data Sources

The Price Paid Data (property sales data) in London will be sourced from HM Land REgistry, where the data is based on the raw data released each month. The dataset will include the following columns: Transaction unique identifier, Price, Date of Transfer, Postcode, Property Type, Old/New, Duration, PAON (Primary Addressable Object Name), SAON (Secondary Addressable Object Name), Street, Locality, Town/City, District, County and PPD Category Type.

The FourSquare API will be used to access and explore venues and amenities based on the Latitude and Longitude collected using the GeoCoder library, which will then be read into a dataframe for data wrangling and cleaning. This dataframe will be merged with the Price Paid Data from HM Land Registry and processed to be suitable for fitting the machine learning model.

The list of boroughs in London will be scrapped from the Wikipedia page and the average property and rental prices per borough will be scraped from Foxtons (A UK estate agency). The data will be visualised using Folium in order to gauge the recommendations generated by our model against average prices for each cluster.

Please see the References section at the end of the notebook for links and descriptions for data sources.

### 2.2. Data Collecting & Cleaning

In [4]:
import pandas as pd
import numpy as np
import json
import csv
from bs4 import BeautifulSoup
from geopy.geocoders import Nominatim

import requests
from pandas.io.json import json_normalize

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

import folium
print('Libraries imported.')

Libraries imported.


In [8]:
ppd_2019 = pd.read_csv('../data/external/pp-2019.csv')
ppd_2019.head()

Unnamed: 0,{8F1B26BD-60CA-53DB-E053-6C04A8C03649},221950,2019-04-26 00:00,TS17 5FF,D,Y,F,3,Unnamed: 8,CARRAWBURGH ROAD,INGLEBY BARWICK,STOCKTON-ON-TEES,STOCKTON-ON-TEES.1,STOCKTON-ON-TEES.2,A,A.1
0,{8F1B26BD-60CB-53DB-E053-6C04A8C03649},246995,2019-03-29 00:00,TS15 9ZH,D,Y,F,16,,GRESLEY CLOSE,,YARM,STOCKTON-ON-TEES,STOCKTON-ON-TEES,A,A
1,{8F1B26BD-60CC-53DB-E053-6C04A8C03649},244950,2019-05-17 00:00,TS18 2FN,T,Y,F,13,,INFINITY VIEW,,STOCKTON-ON-TEES,STOCKTON-ON-TEES,STOCKTON-ON-TEES,A,A
2,{8F1B26BD-60CD-53DB-E053-6C04A8C03649},139950,2019-05-31 00:00,TS18 2FN,S,Y,F,40,,INFINITY VIEW,,STOCKTON-ON-TEES,STOCKTON-ON-TEES,STOCKTON-ON-TEES,A,A
3,{8F1B26BD-60CE-53DB-E053-6C04A8C03649},271995,2019-05-31 00:00,TS15 9FD,D,Y,F,27,,MALLARD DRIVE,,YARM,STOCKTON-ON-TEES,STOCKTON-ON-TEES,A,A
4,{8F1B26BD-60CF-53DB-E053-6C04A8C03649},84450,2019-04-26 00:00,TS18 2FD,T,Y,F,117,,DEEPDALE AVENUE,,STOCKTON-ON-TEES,STOCKTON-ON-TEES,STOCKTON-ON-TEES,A,A


As mentioned on the 'How to access HM Land Registry Price Paid Data' website, the column headers are not supplied in the file therefore they will need to be manually added.

### 2.3. Feature Selection 

## 3. Exploratory Data Analysis (EDA)

### 3.1. Scraping and collecting the data

## References

How to access HM Land Registry Price Paid Data: https://www.gov.uk/guidance/about-the-price-paid-data

Price Paid Data - HM Land Registry: https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads

Average private rental prices per borough:https://data.london.gov.uk/dataset/average-private-rents-borough

Borough property and rental prices - Foxtons: https://www.foxtons.co.uk/living-in/bermondsey

List of London boroughs : https://en.wikipedia.org/wiki/List_of_London_boroughs