# London rent prices on Rightmove


In this project, I have created my own dataset by scraping the site Rightmove.
https://www.rightmove.co.uk/

It is a future plan of mine, soon or not so soon to live in the city, here I aim to find some insights into what money can buy in London. 

The dataset I've created consists of only 4 columns, and they are:

Price, Location, Type, Link, Bedrooms.

- Price - The price of the property per month.
- Location - The address of the property.
- Type - What type of property it is, i.e Flat, House, Shared apartment.
- Link - A link that takes you to the property page.
- Bedrooms - Number of bedrooms at the property.

#### Importing the data

In [1]:
import pandas as pd
import numpy as np
import chardet
import re

In [2]:
file_path = r"C:\Users\georg\Desktop\Data Centre\DataPython\Projects\London Rent Prices\rightmove_data_uncleaned.csv"

with open(file_path, 'rb') as f:
    content = f.read()
result = chardet.detect(content)
encoding = result['encoding']

print("Detected Encoding:", encoding)
print("Confidence:", result['confidence'])

data = pd.read_csv(file_path, encoding = encoding, header = 0)

print(data.head())


Detected Encoding: Windows-1252
Confidence: 0.73
   £5,250 pcm                Newark House, Loughborough Junction         Flat  \
0  £1,600 pcm                          Bulwer Court Road, London  Ground Flat   
1  £2,900 pcm           Vandome Close, Custom House, London, E16        House   
2  £1,900 pcm  Royal Tower Lodge, Cartwright Street, St Kathe...         Flat   
3  £2,390 pcm                       Arlingford Road, London, SW2     Terraced   
4  £3,500 pcm                    Beechcroft Avenue, London, NW11         Flat   

    /properties/86194638#/?channel=RES_LET    4  
0  /properties/136242707#/?channel=RES_LET  2.0  
1  /properties/135790637#/?channel=RES_LET  4.0  
2  /properties/123122660#/?channel=RES_LET  1.0  
3  /properties/136242695#/?channel=RES_LET  2.0  
4  /properties/136242698#/?channel=RES_LET  4.0  


In [3]:
properties = pd.DataFrame(data)
properties.head()

Unnamed: 0,"£5,250 pcm","Newark House, Loughborough Junction",Flat,/properties/86194638#/?channel=RES_LET,4
0,"£1,600 pcm","Bulwer Court Road, London",Ground Flat,/properties/136242707#/?channel=RES_LET,2.0
1,"£2,900 pcm","Vandome Close, Custom House, London, E16",House,/properties/135790637#/?channel=RES_LET,4.0
2,"£1,900 pcm","Royal Tower Lodge, Cartwright Street, St Kathe...",Flat,/properties/123122660#/?channel=RES_LET,1.0
3,"£2,390 pcm","Arlingford Road, London, SW2",Terraced,/properties/136242695#/?channel=RES_LET,2.0
4,"£3,500 pcm","Beechcroft Avenue, London, NW11",Flat,/properties/136242698#/?channel=RES_LET,4.0


In [4]:
columns = ["Price", "Location", "Type", "Link", "Bedrooms"]
properties.columns = columns
properties.head()

Unnamed: 0,Price,Location,Type,Link,Bedrooms
0,"£1,600 pcm","Bulwer Court Road, London",Ground Flat,/properties/136242707#/?channel=RES_LET,2.0
1,"£2,900 pcm","Vandome Close, Custom House, London, E16",House,/properties/135790637#/?channel=RES_LET,4.0
2,"£1,900 pcm","Royal Tower Lodge, Cartwright Street, St Kathe...",Flat,/properties/123122660#/?channel=RES_LET,1.0
3,"£2,390 pcm","Arlingford Road, London, SW2",Terraced,/properties/136242695#/?channel=RES_LET,2.0
4,"£3,500 pcm","Beechcroft Avenue, London, NW11",Flat,/properties/136242698#/?channel=RES_LET,4.0


#### Data cleaning

To do:

1. Check for null values and decide what to do with them if found.
2. Price - remove string values and create integer values.
3. Location - Extract postcode into a different column called "Postcode".
4. Link - Add "Rightmove.co.uk" to the beginning of each value.
5. Bedrooms - convert to integer value.

1.

In [5]:
def null_values(properties):
    null_count = properties.isnull().sum()
    null_data = pd.DataFrame({'Column': null_count.index, 'Null values': null_count.values})
    return null_data

In [6]:
null = null_values(properties)
print(null)

     Column  Null values
0     Price            0
1  Location            0
2      Type            0
3      Link            0
4  Bedrooms           20


In [7]:
properties.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1024 entries, 0 to 1023
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Price     1024 non-null   object 
 1   Location  1024 non-null   object 
 2   Type      1024 non-null   object 
 3   Link      1024 non-null   object 
 4   Bedrooms  1004 non-null   float64
dtypes: float64(1), object(4)
memory usage: 40.1+ KB


In [8]:
properties.dropna(inplace = True, axis = 0)

In [9]:
null_values(properties)

Unnamed: 0,Column,Null values
0,Price,0
1,Location,0
2,Type,0
3,Link,0
4,Bedrooms,0


2.

In [10]:
properties["Price"] = properties["Price"].str.replace("£", "").str.replace("pcm", "").str.replace(",","")
properties["Price"] = properties["Price"].astype(int)

In [11]:
properties["Price"]

0       1600
1       2900
2       1900
3       2390
4       3500
        ... 
1019    2708
1020    1550
1021    4914
1022    1250
1023    3450
Name: Price, Length: 1004, dtype: int32

3.

In [12]:
def extract_postcode(text):
    pattern = r"London,\s*([A-Za-z0-9]+)"
    match = re.search(pattern, text)
    if match:
        return match.group(1)
    else:
        return pd.NA

properties["Postcode"] = properties["Location"].apply(extract_postcode)


In [13]:
properties.head()

Unnamed: 0,Price,Location,Type,Link,Bedrooms,Postcode
0,1600,"Bulwer Court Road, London",Ground Flat,/properties/136242707#/?channel=RES_LET,2.0,
1,2900,"Vandome Close, Custom House, London, E16",House,/properties/135790637#/?channel=RES_LET,4.0,E16
2,1900,"Royal Tower Lodge, Cartwright Street, St Kathe...",Flat,/properties/123122660#/?channel=RES_LET,1.0,
3,2390,"Arlingford Road, London, SW2",Terraced,/properties/136242695#/?channel=RES_LET,2.0,SW2
4,3500,"Beechcroft Avenue, London, NW11",Flat,/properties/136242698#/?channel=RES_LET,4.0,NW11


In [14]:
null_values(properties)

Unnamed: 0,Column,Null values
0,Price,0
1,Location,0
2,Type,0
3,Link,0
4,Bedrooms,0
5,Postcode,446


In [15]:
properties.drop(columns = "Postcode", inplace = True) # Too many null values to be deemed viable

In [16]:
properties.head()

Unnamed: 0,Price,Location,Type,Link,Bedrooms
0,1600,"Bulwer Court Road, London",Ground Flat,/properties/136242707#/?channel=RES_LET,2.0
1,2900,"Vandome Close, Custom House, London, E16",House,/properties/135790637#/?channel=RES_LET,4.0
2,1900,"Royal Tower Lodge, Cartwright Street, St Kathe...",Flat,/properties/123122660#/?channel=RES_LET,1.0
3,2390,"Arlingford Road, London, SW2",Terraced,/properties/136242695#/?channel=RES_LET,2.0
4,3500,"Beechcroft Avenue, London, NW11",Flat,/properties/136242698#/?channel=RES_LET,4.0


4.

In [17]:
domain = "rightmove.co.uk"
properties["Link"] = domain + properties["Link"]

In [18]:
properties.head()

Unnamed: 0,Price,Location,Type,Link,Bedrooms
0,1600,"Bulwer Court Road, London",Ground Flat,rightmove.co.uk/properties/136242707#/?channel...,2.0
1,2900,"Vandome Close, Custom House, London, E16",House,rightmove.co.uk/properties/135790637#/?channel...,4.0
2,1900,"Royal Tower Lodge, Cartwright Street, St Kathe...",Flat,rightmove.co.uk/properties/123122660#/?channel...,1.0
3,2390,"Arlingford Road, London, SW2",Terraced,rightmove.co.uk/properties/136242695#/?channel...,2.0
4,3500,"Beechcroft Avenue, London, NW11",Flat,rightmove.co.uk/properties/136242698#/?channel...,4.0


5.

In [19]:
properties["Bedrooms"] = properties["Bedrooms"].astype(int)

In [20]:
properties["Bedrooms"]

0       2
1       4
2       1
3       2
4       4
       ..
1019    2
1020    1
1021    1
1022    1
1023    2
Name: Bedrooms, Length: 1004, dtype: int32

In [21]:
properties.head()

Unnamed: 0,Price,Location,Type,Link,Bedrooms
0,1600,"Bulwer Court Road, London",Ground Flat,rightmove.co.uk/properties/136242707#/?channel...,2
1,2900,"Vandome Close, Custom House, London, E16",House,rightmove.co.uk/properties/135790637#/?channel...,4
2,1900,"Royal Tower Lodge, Cartwright Street, St Kathe...",Flat,rightmove.co.uk/properties/123122660#/?channel...,1
3,2390,"Arlingford Road, London, SW2",Terraced,rightmove.co.uk/properties/136242695#/?channel...,2
4,3500,"Beechcroft Avenue, London, NW11",Flat,rightmove.co.uk/properties/136242698#/?channel...,4


In [22]:
properties.to_csv(r"C:\Users\georg\Desktop\Data Centre\DataPython\Projects\London Rent Prices\rightmove_data.csv", 
                  encoding = "UTF-8")