#Extracting average house price at the end of each year

In [1]:
import requests, pandas as pd, numpy as np

In [2]:
## Recall raw data from Land Registry that's hosted on my Github
lhouseprices = pd.read_csv('https://raw.githubusercontent.com/sachahenson/sachahenson.github.io/main/project/Raw%20data/land-registry-house-prices-borough.csv')

In [3]:
## Isolating the dates
lhouseprices[['Year ending','Year']] = lhouseprices['Year'].str.split('ending ', expand=True)

In [4]:
lhouseprices['Month Year']=lhouseprices['Year']
lhouseprices[['Month','Year']] = lhouseprices['Year'].str.split(' ', expand=True)

In [5]:
lhouseprices['Month Year format'] = pd.to_datetime(lhouseprices['Month Year'])

In [6]:
## Keeping only London boroughs
for i in ['North East','North West','Yorkshire And The Humber','East Midlands','West Midlands','East of England','London','South East','South West','England','Wales','England And Wales']:
  lhouseprices = lhouseprices[lhouseprices.Area != i]

## Year ending December as a proxy for that year's average house prices. Using median as the normal measure of 'average house prices'.
lhouseprices = lhouseprices[lhouseprices.Month == 'Dec']
lhouseprices = lhouseprices[lhouseprices.Measure=='Median']

In [7]:
## Removing irrelevant column
lhouseprices.drop(columns=['Year ending'], inplace=True)
lhouseprices

Unnamed: 0,Code,Area,Year,Measure,Value,Month Year,Month,Month Year format
0,E09000001,City of London,1995,Median,105000,Dec 1995,Dec,1995-12-01
1,E09000002,Barking and Dagenham,1995,Median,49000,Dec 1995,Dec,1995-12-01
2,E09000003,Barnet,1995,Median,85125,Dec 1995,Dec,1995-12-01
3,E09000004,Bexley,1995,Median,62000,Dec 1995,Dec,1995-12-01
4,E09000005,Brent,1995,Median,68000,Dec 1995,Dec,1995-12-01
...,...,...,...,...,...,...,...,...
3988,E09000029,Sutton,2017,Median,367000,Dec 2017,Dec,2017-12-01
3989,E09000030,Tower Hamlets,2017,Median,490000,Dec 2017,Dec,2017-12-01
3990,E09000031,Waltham Forest,2017,Median,445000,Dec 2017,Dec,2017-12-01
3991,E09000032,Wandsworth,2017,Median,654000,Dec 2017,Dec,2017-12-01


In [8]:
lhouseprices.to_csv('lhouseprices.csv')

In [9]:
## Creating log of house value. Log is used to caution against large values.
lhouseprices_final=lhouseprices[['Area','Year','Value']]

In [10]:
lhouseprices_final.dtypes

Area     object
Year     object
Value    object
dtype: object

In [11]:
## Convert years to integer
lhouseprices_final['Year'] = pd.to_numeric(lhouseprices_final['Year'])
lhouseprices_final.dtypes

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Area     object
Year      int64
Value    object
dtype: object

In [12]:
## Keep only years 2008 and up to match crime data
lhouseprices_final = lhouseprices_final[lhouseprices_final.Year >= 2007]
lhouseprices_final

Unnamed: 0,Area,Year,Value
2160,City of London,2007,416250
2161,Barking and Dagenham,2007,185000
2162,Barnet,2007,295000
2163,Bexley,2007,212000
2164,Brent,2007,280000
...,...,...,...
3988,Sutton,2017,367000
3989,Tower Hamlets,2017,490000
3990,Waltham Forest,2017,445000
3991,Wandsworth,2017,654000


In [13]:
lhouseprices_final.sort_values(['Area','Year'], inplace=True,ascending=[True, True])

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [14]:
## Converting house price average into value to prepare for logging
lhouseprices_final['Value']=lhouseprices_final['Value'].str.replace(',','')
lhouseprices_final['Value'] = pd.to_numeric(lhouseprices_final['Value'], downcast='integer')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


In [15]:
## Absolute percentage change
lhouseprices_final['pct_ch'] = (lhouseprices_final.groupby('Area')['Value'].apply(pd.Series.pct_change)*100)

In [16]:
## Log average house value
lhouseprices_final['log_value'] = np.log(lhouseprices_final['Value'])

In [17]:
## Log percentage change
lhouseprices_final['logpct_ch'] = (lhouseprices_final.groupby('Area')['log_value'].apply(pd.Series.pct_change)*100)

In [18]:
## Save
lhouseprices_final.to_csv('loghouseprices08-17.csv')