# Conducting a Research on Lagos State Houing Property Market 

![Lagos housing estate!](../image/Lagos_housing_estate.jpg "Lagos housing estate")

## Problem Statement
The real estate sector in Lagos, Nigeria, is a rapidly growing and dynamic market, characterized by a complex array of factors including population growth, urbanization, economic development, and policy cahnges. However, this growth presents significant challenges, including housing affordability, access to reliable data on property prices, and market transparency. 

This project aims to bridge this gap by systematically scraping housing data from [nigeria property centre website, performing rigorous data cleaning and analysis. This project will focus on identifying trends, patterns, and anomalies in the housing market, particularly in pricing, availability and distribution of property types across different regions of Lagos.

By providing a detailed and data-driven overview of Lagos housing market, this project will empower stakeholders to make informed decisions and contribute to the development of more equitable and effiecient housing policies. Ultimately, this research seeks to enhance transparency and accessibility in the Lagos real estate market, fascilitating sustainable urban development and improving housing affordability for the residents of Lagos state. 

## Tools and Libraries
- Python programming language
- Pandas
- BeautifulSoup
- Requests
- Time

## Import Libraries

In [1]:
# import libraries
import pandas as pd
from bs4 import BeautifulSoup
import requests
import time
import re

In [6]:

def make_requests(base_url: str, page_num: int) -> str:
    page_url = base_url + f"?page={page_num}"
    res = requests.get(page_url)
    if res.status_code == 200:
        response = res.content
    else:
        print("Resource could not be found!")
    return response

## Scraping Activities

In [20]:
if "__main__" == __name__:
    # invoke make_requests
    BASE_URL = "https://nigeriapropertycentre.com/for-rent/lagos"
    data = [] # empty list

    # 1158 pages
    for page_num in range(1,1158):
        response = make_requests(BASE_URL, page_num)
        print(f"Collecting data from page {page_num} ...")
        # time.sleep(3)
        
        # create soup
        page_soup = BeautifulSoup(response, "html.parser")
        property_container = page_soup.find_all("div", class_="row property-list")
        
        for property in property_container:
            # get property details
            details = {
                "title" : property.find("h3").text if property.find("h3").text != None else "",
                "type" : property.find("h4", class_="content-title").text if property.find("h4", class_="content-title").text != None else "",
                "address" : property.find("address", class_="voffset-bottom-10").text if property.find("address", class_="voffset-bottom-10").text != None else "",
                "description" : property.find("div", class_="description hidden-xs").text if property.find("div", class_="description hidden-xs").text != None else "",
                "price" : property.find("span", class_="pull-sm-left").text if property.find("span", class_="pull-sm-left").text != None else "",
                "extra" : property.find("ul", class_="aux-info").text if property.find("ul", class_="aux-info").text != None else ""
            }
            # append each property details
            data.append(details)

print(f"Done collecting {len(data)} property data.")

# convert data to DataFrame
property_df = pd.DataFrame(data)

Collecting data from page 1 ...
Collecting data from page 2 ...
Collecting data from page 3 ...
Collecting data from page 4 ...
Collecting data from page 5 ...
Collecting data from page 6 ...
Collecting data from page 7 ...
Collecting data from page 8 ...
Collecting data from page 9 ...
Collecting data from page 10 ...
Collecting data from page 11 ...
Collecting data from page 12 ...
Collecting data from page 13 ...
Collecting data from page 14 ...
Collecting data from page 15 ...
Collecting data from page 16 ...
Collecting data from page 17 ...
Collecting data from page 18 ...
Collecting data from page 19 ...
Collecting data from page 20 ...
Collecting data from page 21 ...
Collecting data from page 22 ...
Collecting data from page 23 ...
Collecting data from page 24 ...
Collecting data from page 25 ...
Collecting data from page 26 ...
Collecting data from page 27 ...
Collecting data from page 28 ...
Collecting data from page 29 ...
Collecting data from page 30 ...
Collecting data fro

In [22]:
# save to csv
property_df.to_csv("../raw_data/house_unclean_data.csv", index=False)

## Data Cleaning

## Load Data

In [2]:
property_df = pd.read_csv("house_unclean_data.csv")
property_df

Unnamed: 0,title,type,address,description,price,extra
0,3 Bedroom Apartment In Ologolo,3 bedroom flat / apartment for rent,"Ologolo, Lekki, Lagos",\n\nSelf serviced\n3bedroom fr rent in ologolo...,"\n₦3,500,000 per annum",3 Bedrooms3 Bathrooms3 Toilets Save
1,Luxury 2 Bedroom Apartment,2 bedroom flat / apartment for rent,"Newtown Estate, Ogombo, Ajah, Lagos",\n\nBrand new\ngood water\nluxury home\nfitted...,"\n₦1,300,000 per annum",2 Bedrooms2 Bathrooms3 Toilets2 Parking Spaces...
2,Well Maintained 5 Bedroom Detached Duplex With Bq,5 bedroom detached duplex for rent,"Lekki Phase 1, Lekki, Lagos",\n\nWell maintained 5 bedroom detached duplex ...,"\n₦20,000,000 per annum",5 Bedrooms5 Bathrooms6 Toilets3 Parking Spaces...
3,Refurbished 3 Bedroom Luxury Flat With Bq,3 bedroom flat / apartment for rent,"Lekki Phase 1, Lekki, Lagos",\n\nRefurbished 3 bedroom luxury flats with bq...,"\n₦8,500,000 per annum",3 Bedrooms3 Bathrooms3 Toilets3 Parking Spaces...
4,Slick 2 Bedroom Apartment In Ologolo,2 bedroom flat / apartment for rent,"Ologolo, Lekki, Lagos",\n\n2 bedroom apartmemt for rent\nrent 2.8m\ns...,"\n₦2,800,000 per annum",2 Bedrooms2 Bathrooms3 Toilets2 Parking Spaces...
...,...,...,...,...,...,...
22419,A Fully Furnished 4 Bedroom Semi-detached Duplex,4 bedroom semi-detached duplex for rent,"Spg Road, Igbo-efon, Lekki, Lagos",\n\nA fully furnished 4 bedroom semi-detached ...,"\n₦4,000,000 per annum",4 Bedrooms4 Bathrooms5 Toilets3 Parking Spaces...
22420,Furnished 3 Bedroom Flat,3 bedroom flat / apartment for rent,"Coral Beach Estate, Lekki Free Trade Zone, L...",\n\nFurnished 3 bedroom apartment at coral bea...,"\n₦6,000,000 per annum",3 Bedrooms4 Bathrooms4 Toilets Save
22421,Executive All Rooms En-suite 3 Bedroom,3 bedroom terraced duplex for rent,"Lekki Gardens Phase 2, Ajah, Lagos",\n\nExecutive all rooms 3 bedroom corner piece...,"\n₦2,200,000 per annum",3 Bedrooms3 Bathrooms4 Toilets4 Parking Spaces...
22422,"Shop Space, Upstairs",Shop for rent,"Adebayo Doherty Road, Lekki Phase 1, Lekki, ...",\n\nShop space upstairs on adebayo doherty roa...,"\n₦2,500,000 per annum",1 Toilet Save


## Data Inspection

In [3]:
property_df.head()

Unnamed: 0,title,type,address,description,price,extra
0,3 Bedroom Apartment In Ologolo,3 bedroom flat / apartment for rent,"Ologolo, Lekki, Lagos",\n\nSelf serviced\n3bedroom fr rent in ologolo...,"\n₦3,500,000 per annum",3 Bedrooms3 Bathrooms3 Toilets Save
1,Luxury 2 Bedroom Apartment,2 bedroom flat / apartment for rent,"Newtown Estate, Ogombo, Ajah, Lagos",\n\nBrand new\ngood water\nluxury home\nfitted...,"\n₦1,300,000 per annum",2 Bedrooms2 Bathrooms3 Toilets2 Parking Spaces...
2,Well Maintained 5 Bedroom Detached Duplex With Bq,5 bedroom detached duplex for rent,"Lekki Phase 1, Lekki, Lagos",\n\nWell maintained 5 bedroom detached duplex ...,"\n₦20,000,000 per annum",5 Bedrooms5 Bathrooms6 Toilets3 Parking Spaces...
3,Refurbished 3 Bedroom Luxury Flat With Bq,3 bedroom flat / apartment for rent,"Lekki Phase 1, Lekki, Lagos",\n\nRefurbished 3 bedroom luxury flats with bq...,"\n₦8,500,000 per annum",3 Bedrooms3 Bathrooms3 Toilets3 Parking Spaces...
4,Slick 2 Bedroom Apartment In Ologolo,2 bedroom flat / apartment for rent,"Ologolo, Lekki, Lagos",\n\n2 bedroom apartmemt for rent\nrent 2.8m\ns...,"\n₦2,800,000 per annum",2 Bedrooms2 Bathrooms3 Toilets2 Parking Spaces...


In [4]:
property_df.tail()

Unnamed: 0,title,type,address,description,price,extra
22419,A Fully Furnished 4 Bedroom Semi-detached Duplex,4 bedroom semi-detached duplex for rent,"Spg Road, Igbo-efon, Lekki, Lagos",\n\nA fully furnished 4 bedroom semi-detached ...,"\n₦4,000,000 per annum",4 Bedrooms4 Bathrooms5 Toilets3 Parking Spaces...
22420,Furnished 3 Bedroom Flat,3 bedroom flat / apartment for rent,"Coral Beach Estate, Lekki Free Trade Zone, L...",\n\nFurnished 3 bedroom apartment at coral bea...,"\n₦6,000,000 per annum",3 Bedrooms4 Bathrooms4 Toilets Save
22421,Executive All Rooms En-suite 3 Bedroom,3 bedroom terraced duplex for rent,"Lekki Gardens Phase 2, Ajah, Lagos",\n\nExecutive all rooms 3 bedroom corner piece...,"\n₦2,200,000 per annum",3 Bedrooms3 Bathrooms4 Toilets4 Parking Spaces...
22422,"Shop Space, Upstairs",Shop for rent,"Adebayo Doherty Road, Lekki Phase 1, Lekki, ...",\n\nShop space upstairs on adebayo doherty roa...,"\n₦2,500,000 per annum",1 Toilet Save
22423,A Very Spacious Commercial Space,Plaza / complex / mall for rent,"Admiralty Way, Lekki Phase 1, Lekki, Lagos",\n\nVery spacious commercial shop on admiralty...,"\n₦3,000,000 per annum",1 Bathroom1 Toilet10 Parking Spaces45 sqm Tota...


In [5]:
property_df.sample(3)

Unnamed: 0,title,type,address,description,price,extra
4896,3 Bedroom Luxury Apartments,3 bedroom flat / apartment for rent,"Off Kingsway Road, Ikoyi, Lagos",\n\nLuxury 3-bedroom apartments for rent in ik...,"\n$44,000 per annum \napprox. ₦69,381,394\n",3 Bedrooms3 Bathrooms4 Toilets Save
11096,3 Bedroom Flat,3 bedroom flat / apartment for rent,"Ikota Villa Estate, Ikota, Lekki, Lagos","\n\n3 bedroom flat for rent\nprice: ₦2,500,000...","\n₦2,500,000 per annum",3 Bedrooms3 Bathrooms4 Toilets Save
16717,"Open Plan Office Space (460sqm Per Floor, Tota...",Office space for rent,"Victoria Island (VI), Lagos",\n\nFor lease:\ndescription: open plan office ...,"\n₦150,000 per square meter / per annum",Save


In [6]:
property_df.columns

Index(['title', 'type', 'address', 'description', 'price', 'extra'], dtype='object')

In [7]:
property_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 22424 entries, 0 to 22423
Data columns (total 6 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   title        22424 non-null  object
 1   type         22424 non-null  object
 2   address      22424 non-null  object
 3   description  22424 non-null  object
 4   price        22424 non-null  object
 5   extra        22424 non-null  object
dtypes: object(6)
memory usage: 1.0+ MB


In [8]:
property_df.shape

(22424, 6)

In [9]:
# check duplicates
property_df.duplicated().sum()

177

In [10]:
# drop duplicates
property_df.drop_duplicates(inplace=True)

In [11]:
# confirm duplicates has been removed
property_df.duplicated().sum()

0

## Data Cleaning

In [12]:
# remove white spaces
property_df["title"] = property_df["title"].str.strip()

In [15]:
property_df["title"].nunique()

14858

In [17]:
# remove inconsistences 
property_df["type"] = property_df["type"].str.replace("for rent", "").str.replace("bedroom", "")
property_df["type"] = property_df["type"].str.replace(r"^\d+\s*", "", regex=True)
property_df["type"] = property_df["type"].str.strip()

In [18]:
# remove white spaces
property_df["address"] = property_df["address"].str.strip()

In [19]:
# remove white spaces
property_df["description"] = property_df["description"].str.strip()
property_df["description"] = property_df["description"].str.replace("\n\n", "")

In [20]:
# remove inconsistences 
property_df["price"] = property_df["price"].str.replace(",", "")
property_df["price"] = property_df["price"].str.replace("/", "")
property_df["price"] = property_df["price"].str.replace("₦", "")
property_df["price"] = property_df["price"].str.replace("per annum", "")
property_df["price"] = property_df["price"].str.replace("per square meter", "")
property_df["price"] = property_df["price"].str.strip()

In [21]:
property_df.head()

Unnamed: 0,title,type,address,description,price,extra
0,3 Bedroom Apartment In Ologolo,flat / apartment,"Ologolo, Lekki, Lagos",Self serviced\n3bedroom fr rent in ologolo\nre...,3500000,3 Bedrooms3 Bathrooms3 Toilets Save
1,Luxury 2 Bedroom Apartment,flat / apartment,"Newtown Estate, Ogombo, Ajah, Lagos",Brand new\ngood water\nluxury home\nfitted kit...,1300000,2 Bedrooms2 Bathrooms3 Toilets2 Parking Spaces...
2,Well Maintained 5 Bedroom Detached Duplex With Bq,detached duplex,"Lekki Phase 1, Lekki, Lagos",Well maintained 5 bedroom detached duplex with...,20000000,5 Bedrooms5 Bathrooms6 Toilets3 Parking Spaces...
3,Refurbished 3 Bedroom Luxury Flat With Bq,flat / apartment,"Lekki Phase 1, Lekki, Lagos",Refurbished 3 bedroom luxury flats with bq ava...,8500000,3 Bedrooms3 Bathrooms3 Toilets3 Parking Spaces...
4,Slick 2 Bedroom Apartment In Ologolo,flat / apartment,"Ologolo, Lekki, Lagos",2 bedroom apartmemt for rent\nrent 2.8m\nservi...,2800000,2 Bedrooms2 Bathrooms3 Toilets2 Parking Spaces...


## Feature Engineering

In [22]:
# Feature engineering
# split column extra: bedrooms, bathrooms, toilets
def extract_extra_features(text):
    features = {
        "bedrooms" : 0,
        "bathrooms" : 0,
        "toilets" : 0,
        "parking_spaces" : 0
    }

    patterns = {
        "bedrooms" : r"(\d+)\s*Bedrooms?",
        "bathrooms" : r"(\d+)\s*Bathrooms?",
        "toilets" : r"(\d+)\s*Toilets?",
        "parking_spaces" : r"(\d+)\s*Parking\s*Spaces?"
    }

    for feature, pattern in patterns.items():
        match = re.search(pattern, text, re.IGNORECASE)
        if match:
            features[feature] = int(match.group(1))
    return features

In [23]:
property_df

Unnamed: 0,title,type,address,description,price,extra
0,3 Bedroom Apartment In Ologolo,flat / apartment,"Ologolo, Lekki, Lagos",Self serviced\n3bedroom fr rent in ologolo\nre...,3500000,3 Bedrooms3 Bathrooms3 Toilets Save
1,Luxury 2 Bedroom Apartment,flat / apartment,"Newtown Estate, Ogombo, Ajah, Lagos",Brand new\ngood water\nluxury home\nfitted kit...,1300000,2 Bedrooms2 Bathrooms3 Toilets2 Parking Spaces...
2,Well Maintained 5 Bedroom Detached Duplex With Bq,detached duplex,"Lekki Phase 1, Lekki, Lagos",Well maintained 5 bedroom detached duplex with...,20000000,5 Bedrooms5 Bathrooms6 Toilets3 Parking Spaces...
3,Refurbished 3 Bedroom Luxury Flat With Bq,flat / apartment,"Lekki Phase 1, Lekki, Lagos",Refurbished 3 bedroom luxury flats with bq ava...,8500000,3 Bedrooms3 Bathrooms3 Toilets3 Parking Spaces...
4,Slick 2 Bedroom Apartment In Ologolo,flat / apartment,"Ologolo, Lekki, Lagos",2 bedroom apartmemt for rent\nrent 2.8m\nservi...,2800000,2 Bedrooms2 Bathrooms3 Toilets2 Parking Spaces...
...,...,...,...,...,...,...
22419,A Fully Furnished 4 Bedroom Semi-detached Duplex,semi-detached duplex,"Spg Road, Igbo-efon, Lekki, Lagos",A fully furnished 4 bedroom semi-detached dupl...,4000000,4 Bedrooms4 Bathrooms5 Toilets3 Parking Spaces...
22420,Furnished 3 Bedroom Flat,flat / apartment,"Coral Beach Estate, Lekki Free Trade Zone, Lek...",Furnished 3 bedroom apartment at coral beach e...,6000000,3 Bedrooms4 Bathrooms4 Toilets Save
22421,Executive All Rooms En-suite 3 Bedroom,terraced duplex,"Lekki Gardens Phase 2, Ajah, Lagos",Executive all rooms 3 bedroom corner piece ter...,2200000,3 Bedrooms3 Bathrooms4 Toilets4 Parking Spaces...
22422,"Shop Space, Upstairs",Shop,"Adebayo Doherty Road, Lekki Phase 1, Lekki, Lagos",Shop space upstairs on adebayo doherty road le...,2500000,1 Toilet Save


In [24]:
facilities = property_df["extra"].apply(extract_extra_features).apply(pd.Series)
facilities

Unnamed: 0,bedrooms,bathrooms,toilets,parking_spaces
0,3,3,3,0
1,2,2,3,2
2,5,5,6,3
3,3,3,3,3
4,2,2,3,2
...,...,...,...,...
22419,4,4,5,3
22420,3,4,4,0
22421,3,3,4,4
22422,0,0,1,0


In [25]:
# concatenate the dataframe
property_df = pd.concat([property_df, facilities], axis=1)
property_df.head()

Unnamed: 0,title,type,address,description,price,extra,bedrooms,bathrooms,toilets,parking_spaces
0,3 Bedroom Apartment In Ologolo,flat / apartment,"Ologolo, Lekki, Lagos",Self serviced\n3bedroom fr rent in ologolo\nre...,3500000,3 Bedrooms3 Bathrooms3 Toilets Save,3,3,3,0
1,Luxury 2 Bedroom Apartment,flat / apartment,"Newtown Estate, Ogombo, Ajah, Lagos",Brand new\ngood water\nluxury home\nfitted kit...,1300000,2 Bedrooms2 Bathrooms3 Toilets2 Parking Spaces...,2,2,3,2
2,Well Maintained 5 Bedroom Detached Duplex With Bq,detached duplex,"Lekki Phase 1, Lekki, Lagos",Well maintained 5 bedroom detached duplex with...,20000000,5 Bedrooms5 Bathrooms6 Toilets3 Parking Spaces...,5,5,6,3
3,Refurbished 3 Bedroom Luxury Flat With Bq,flat / apartment,"Lekki Phase 1, Lekki, Lagos",Refurbished 3 bedroom luxury flats with bq ava...,8500000,3 Bedrooms3 Bathrooms3 Toilets3 Parking Spaces...,3,3,3,3
4,Slick 2 Bedroom Apartment In Ologolo,flat / apartment,"Ologolo, Lekki, Lagos",2 bedroom apartmemt for rent\nrent 2.8m\nservi...,2800000,2 Bedrooms2 Bathrooms3 Toilets2 Parking Spaces...,2,2,3,2


In [26]:
# delete extra column
property_df.drop(columns=["extra"], inplace=True)

In [27]:
property_df.head()

Unnamed: 0,title,type,address,description,price,bedrooms,bathrooms,toilets,parking_spaces
0,3 Bedroom Apartment In Ologolo,flat / apartment,"Ologolo, Lekki, Lagos",Self serviced\n3bedroom fr rent in ologolo\nre...,3500000,3,3,3,0
1,Luxury 2 Bedroom Apartment,flat / apartment,"Newtown Estate, Ogombo, Ajah, Lagos",Brand new\ngood water\nluxury home\nfitted kit...,1300000,2,2,3,2
2,Well Maintained 5 Bedroom Detached Duplex With Bq,detached duplex,"Lekki Phase 1, Lekki, Lagos",Well maintained 5 bedroom detached duplex with...,20000000,5,5,6,3
3,Refurbished 3 Bedroom Luxury Flat With Bq,flat / apartment,"Lekki Phase 1, Lekki, Lagos",Refurbished 3 bedroom luxury flats with bq ava...,8500000,3,3,3,3
4,Slick 2 Bedroom Apartment In Ologolo,flat / apartment,"Ologolo, Lekki, Lagos",2 bedroom apartmemt for rent\nrent 2.8m\nservi...,2800000,2,2,3,2


In [28]:
# extract location from address
def extract_location(text):
    location = text.split(",")[-2]
    return location

In [29]:
property_df["location"] = property_df["address"].apply(extract_location)

In [31]:
property_df["location"] = property_df["location"].str.strip()

In [33]:
property_df["location"].unique()

array(['Lekki', 'Ajah', 'Victoria Island (VI)', 'Ojodu', 'Ikeja', 'Ikoyi',
       'Maryland', 'Magodo', 'Surulere', 'Gbagada', 'Ibeju Lekki',
       'Isheri North', 'Yaba', 'Ketu', 'Ilupeju', 'Agbara-Igbesa',
       'Amuwo Odofin', 'Ikorodu', 'Ikotun', 'Agege', 'Ogudu', 'Shomolu',
       'Epe', 'Eko Atlantic City', 'Isolo', 'Ipaja', 'Ojo',
       'Lagos Island', 'Kosofe', 'Ayobo', 'Oshodi', 'Ibeju', 'Mushin',
       'Alimosho', 'Ifako-Ijaiye', 'Apapa', 'Ojota', 'Orile', 'Isheri',
       'Egbe', 'Ilashe', 'Iganmu', 'Ijesha', 'Idimu', 'Ejigbo', 'Ijede'],
      dtype=object)

In [34]:
property_df["location"].nunique()

46

In [49]:
property_df["type"].unique()

array(['flat / apartment', 'detached duplex', 'semi-detached duplex',
       'terraced duplex', 'mini flat (room and parlour)',
       'Self contain (single rooms)', 'commercial property', 'house',
       'Commercial land', 'Office space', 'terraced bungalow',
       'Mixed-use land', 'block of flats', 'office space', 'Shop', 'Land',
       'detached bungalow', 'Plaza / complex / mall', 'Terraced duplex',
       'semi-detached bungalow', 'Warehouse', 'Restaurant / bar',
       'Commercial property', 'restaurant / bar', 'hotel / guest house',
       'Filling station', 'School', 'House', 'Church', 'Flat / apartment',
       'Residential land', 'Hall', 'Hotel / guest house',
       'Industrial land', 'Tank farm', 'shop',
       'Conference / meeting / training room', 'Block of flats',
       'Factory', 'Event centre / venue'], dtype=object)

In [50]:
property_df["location"].nunique()

46

In [35]:
# check data types
property_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 22247 entries, 0 to 22423
Data columns (total 10 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   title           22247 non-null  object
 1   type            22247 non-null  object
 2   address         22247 non-null  object
 3   description     22247 non-null  object
 4   price           22247 non-null  object
 5   bedrooms        22247 non-null  int64 
 6   bathrooms       22247 non-null  int64 
 7   toilets         22247 non-null  int64 
 8   parking_spaces  22247 non-null  int64 
 9   location        22247 non-null  object
dtypes: int64(4), object(6)
memory usage: 1.9+ MB


In [36]:
# remove whitespaces
property_df["price"] = property_df["price"].str.strip()

In [37]:
property_df["price"].unique()

array(['3500000', '1300000', '20000000', '8500000', '2800000', '6500000',
       '25000000', '5000000', '1000000', '2500000', '12000000', '1200000',
       '2000000', '1700000', '950000', '4000000', '8000000', '7000000',
       '6000000', '3000000', '5500000', '3600000', '16000000', '15000000',
       '750000', '10000000', '2300000', '900000', '18000000', '650000',
       '1500000', '3200000', '100000000', '3800000', '11000000',
       '$15000  \napprox. 23652748', '2700000', '4500000', '3250000',
       '50000000', '13000000', '60000000', '9000000', '250000000',
       '7500000', '700000', '40000000', '1400000', '30000000', '800000',
       '450000', '190000 per month', '1800000', '2200000', '35000000',
       '900000000', '500000', '1100000', '12500000', '210000 per month',
       '180000 per month', '200000 per month', '220000 per month',
       '600000', '1600000', '400000', '280000000',
       '$22000  \napprox. 34690697', '80000000', '550000', '300000',
       '560000', '59999999

In [38]:
property_df["price"].unique()

array(['3500000', '1300000', '20000000', '8500000', '2800000', '6500000',
       '25000000', '5000000', '1000000', '2500000', '12000000', '1200000',
       '2000000', '1700000', '950000', '4000000', '8000000', '7000000',
       '6000000', '3000000', '5500000', '3600000', '16000000', '15000000',
       '750000', '10000000', '2300000', '900000', '18000000', '650000',
       '1500000', '3200000', '100000000', '3800000', '11000000',
       '$15000  \napprox. 23652748', '2700000', '4500000', '3250000',
       '50000000', '13000000', '60000000', '9000000', '250000000',
       '7500000', '700000', '40000000', '1400000', '30000000', '800000',
       '450000', '190000 per month', '1800000', '2200000', '35000000',
       '900000000', '500000', '1100000', '12500000', '210000 per month',
       '180000 per month', '200000 per month', '220000 per month',
       '600000', '1600000', '400000', '280000000',
       '$22000  \napprox. 34690697', '80000000', '550000', '300000',
       '560000', '59999999

In [39]:
def is_not_numeric(value):
    try:
        float(value)
        return False
    except ValueError:
        return True

In [40]:
filtered_df = property_df[property_df["price"].apply(is_not_numeric)]

In [41]:
filtered_df.shape

(653, 10)

In [42]:
filtered_df["price"].unique()

array(['$15000  \napprox. 23652748', '190000 per month',
       '210000 per month', '180000 per month', '200000 per month',
       '220000 per month', '$22000  \napprox. 34690697',
       '1000000 per month', '950000 per month', '420000 per month',
       '400000 per month', '650000 per month',
       '$700    \napprox. 1103795', '$400    \napprox. 630740',
       '$39999  \napprox. 63072418', '300000 per month',
       '900000 per month', '$40000  \napprox. 63073995',
       '$80000  \napprox. 126147989', '$24000  \napprox. 37844397',
       '350000 per month', '750000 per month', '$650  \napprox. 1024952',
       '425000 per day', '430000 per day', '500000 per day',
       '$95000  \napprox. 149800737', '1500000 per month',
       '450000 per month', '1200000 per month',
       '$75000  \napprox. 118263740', '$85000  \napprox. 134032239',
       '260000 per month', '150000 per month',
       '$60000  \napprox. 94610992', '3000 per square foot',
       '66000 per hour', '$38000  \napp

In [43]:
# Lets remove property with unclear prices

# convert price to numeric, and set error to coerce to convert non numeric values to NAN
property_df["price"] = pd.to_numeric(property_df["price"], errors="coerce")

property_df = property_df.dropna(subset=["price"])                               

In [44]:
property_df["price"].unique()

array([3.50000000e+06, 1.30000000e+06, 2.00000000e+07, 8.50000000e+06,
       2.80000000e+06, 6.50000000e+06, 2.50000000e+07, 5.00000000e+06,
       1.00000000e+06, 2.50000000e+06, 1.20000000e+07, 1.20000000e+06,
       2.00000000e+06, 1.70000000e+06, 9.50000000e+05, 4.00000000e+06,
       8.00000000e+06, 7.00000000e+06, 6.00000000e+06, 3.00000000e+06,
       5.50000000e+06, 3.60000000e+06, 1.60000000e+07, 1.50000000e+07,
       7.50000000e+05, 1.00000000e+07, 2.30000000e+06, 9.00000000e+05,
       1.80000000e+07, 6.50000000e+05, 1.50000000e+06, 3.20000000e+06,
       1.00000000e+08, 3.80000000e+06, 1.10000000e+07, 2.70000000e+06,
       4.50000000e+06, 3.25000000e+06, 5.00000000e+07, 1.30000000e+07,
       6.00000000e+07, 9.00000000e+06, 2.50000000e+08, 7.50000000e+06,
       7.00000000e+05, 4.00000000e+07, 1.40000000e+06, 3.00000000e+07,
       8.00000000e+05, 4.50000000e+05, 1.80000000e+06, 2.20000000e+06,
       3.50000000e+07, 9.00000000e+08, 5.00000000e+05, 1.10000000e+06,
      

In [45]:
property_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 21594 entries, 0 to 22423
Data columns (total 10 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   title           21594 non-null  object 
 1   type            21594 non-null  object 
 2   address         21594 non-null  object 
 3   description     21594 non-null  object 
 4   price           21594 non-null  float64
 5   bedrooms        21594 non-null  int64  
 6   bathrooms       21594 non-null  int64  
 7   toilets         21594 non-null  int64  
 8   parking_spaces  21594 non-null  int64  
 9   location        21594 non-null  object 
dtypes: float64(1), int64(4), object(5)
memory usage: 1.8+ MB


In [46]:
property_df.shape

(21594, 10)

In [47]:
property_df.head()

Unnamed: 0,title,type,address,description,price,bedrooms,bathrooms,toilets,parking_spaces,location
0,3 Bedroom Apartment In Ologolo,flat / apartment,"Ologolo, Lekki, Lagos",Self serviced\n3bedroom fr rent in ologolo\nre...,3500000.0,3,3,3,0,Lekki
1,Luxury 2 Bedroom Apartment,flat / apartment,"Newtown Estate, Ogombo, Ajah, Lagos",Brand new\ngood water\nluxury home\nfitted kit...,1300000.0,2,2,3,2,Ajah
2,Well Maintained 5 Bedroom Detached Duplex With Bq,detached duplex,"Lekki Phase 1, Lekki, Lagos",Well maintained 5 bedroom detached duplex with...,20000000.0,5,5,6,3,Lekki
3,Refurbished 3 Bedroom Luxury Flat With Bq,flat / apartment,"Lekki Phase 1, Lekki, Lagos",Refurbished 3 bedroom luxury flats with bq ava...,8500000.0,3,3,3,3,Lekki
4,Slick 2 Bedroom Apartment In Ologolo,flat / apartment,"Ologolo, Lekki, Lagos",2 bedroom apartmemt for rent\nrent 2.8m\nservi...,2800000.0,2,2,3,2,Lekki


In [48]:
# save to csv
property_df.to_csv("../raw_data/house_clean_data.csv", index=False)

## Thank you