## Real estate data cleaning with Pandas for efficient analysis
This is a real dataset that was downloaded using web scraping techniques. The data contains records from Fotocasa, one of the most popular real estate websites in Spain. Please do not perform web scraping unless it is for academic purposes.

The dataset was downloaded a few years ago by Henry Navarro, and no economic benefit was obtained from it.

It contains thousands of real house listings published on the website www.fotocasa.com. Your goal is to extract as much information as possible with the data science knowledge you have acquired so far.

Let's get started!

- First, let's read and explore the dataset.

In [4]:
# Step 0. Load libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [5]:
# Step 1. Load data
df_raw = pd.read_csv("../real_estate.csv",sep=";")
df_raw.sample(10)

Unnamed: 0.1,Unnamed: 0,id_realEstates,isNew,realEstate_name,phone_realEstate,url_inmueble,rooms,bathrooms,surface,price,...,level4Id,level5Id,level6Id,level7Id,level8Id,accuracy,latitude,longitude,zipCode,customZone
4346,4347,148349442,False,aedas homes,910921386.0,https://www.fotocasa.es/vivienda/rivas-vaciama...,3.0,2.0,83.0,210000,...,0,0,0,0,0,1,4050321622,-338500343,,
10487,10488,151566673,False,inmosierra,914874297.0,https://www.fotocasa.es/es/comprar/vivienda/sa...,4.0,2.0,177.0,400000,...,0,0,0,0,0,0,4070975,-392322,,
4130,4131,153886692,False,vivienda2,912188604.0,https://www.fotocasa.es/es/comprar/vivienda/ma...,2.0,1.0,55.0,214000,...,0,0,0,0,0,0,404749736,-37145991,,
315,316,153131260,False,pacego,914878034.0,https://www.fotocasa.es/es/comprar/vivienda/ma...,3.0,2.0,,575000,...,0,0,0,0,0,0,4048677,-360793,,
6166,6167,153915024,False,housell,914873928.0,https://www.fotocasa.es/es/comprar/vivienda/mi...,2.0,1.0,64.0,368000,...,0,0,0,0,0,0,404531146135624,-370137598643763,,
13086,13087,153176279,False,nomada,912665562.0,https://www.fotocasa.es/es/comprar/vivienda/vi...,2.0,1.0,63.0,115000,...,0,0,0,0,0,0,407303823,-358116083,,
7395,7396,153859175,False,vivantial,911368467.0,https://www.fotocasa.es/vivienda/alcala-de-hen...,2.0,1.0,75.0,135000,...,0,0,0,0,0,0,403529433,-36830563,,
7715,7716,146812116,False,aproperties,914890879.0,https://www.fotocasa.es/es/comprar/vivienda/ma...,4.0,4.0,211.0,2500000,...,0,0,0,0,0,0,4042175,-369618,,
14336,14337,153920399,False,mejocasa,912787481.0,https://www.fotocasa.es/es/comprar/vivienda/fu...,2.0,1.0,53.0,136000,...,0,0,0,0,0,0,404296157864376,-361895421356237,,
2708,2709,152751098,False,redpiso centro sol las letras,910758879.0,https://www.fotocasa.es/es/comprar/vivienda/pa...,3.0,3.0,185.0,745000,...,0,0,0,0,0,0,40414016,-3700503,,


In [None]:
# Step 2. See info
df_raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15335 entries, 0 to 15334
Data columns (total 37 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Unnamed: 0        15335 non-null  int64  
 1   id_realEstates    15335 non-null  int64  
 2   isNew             15335 non-null  bool   
 3   realEstate_name   15325 non-null  object 
 4   phone_realEstate  14541 non-null  float64
 5   url_inmueble      15335 non-null  object 
 6   rooms             14982 non-null  float64
 7   bathrooms         14990 non-null  float64
 8   surface           14085 non-null  float64
 9   price             15335 non-null  int64  
 10  date              15335 non-null  object 
 11  description       15193 non-null  object 
 12  address           15335 non-null  object 
 13  country           15335 non-null  object 
 14  level1            15335 non-null  object 
 15  level2            15335 non-null  object 
 16  level3            15335 non-null  object

In [9]:
#Step 3. Clean and transform data
df_baking = df_raw.copy()
df_baking = df_baking.iloc[:,1:-2]
df_baking["date"] = pd.to_datetime(df_baking["date"],errors="coerce")
df_baking.columns = df_baking.columns.str.lower()
df_baking

Unnamed: 0,id_realestates,isnew,realestate_name,phone_realestate,url_inmueble,rooms,bathrooms,surface,price,date,...,level2id,level3id,level4id,level5id,level6id,level7id,level8id,accuracy,latitude,longitude
0,153771986,False,ferrari 57 inmobiliaria,912177526.0,https://www.fotocasa.es/es/comprar/vivienda/ma...,3.0,2.0,103.0,195000,2019-12-28 18:27:15.997502700+00:00,...,0,0,0,0,0,0,0,0,402948276786438,-344402412135624
1,153867863,False,tecnocasa fuenlabrada ferrocarril,916358736.0,https://www.fotocasa.es/es/comprar/vivienda/ma...,3.0,1.0,,89000,2019-12-28 18:27:15.997502700+00:00,...,0,0,0,0,0,0,0,1,4028674,-379351
2,153430440,False,look find boadilla,916350408.0,https://www.fotocasa.es/es/comprar/vivienda/ma...,2.0,2.0,99.0,390000,2019-12-28 18:27:15.997502700+00:00,...,0,0,0,0,0,0,0,0,404115646786438,-390662252135624
3,152776331,False,tecnocasa fuenlabrada ferrocarril,916358736.0,https://www.fotocasa.es/es/comprar/vivienda/ma...,3.0,1.0,86.0,89000,2019-12-28 18:27:15.997502700+00:00,...,0,0,0,0,0,0,0,0,402853785786438,-379508142135624
4,153180188,False,ferrari 57 inmobiliaria,912177526.0,https://www.fotocasa.es/es/comprar/vivienda/ma...,2.0,2.0,106.0,172000,2019-12-28 18:27:15.997502700+00:00,...,0,0,0,0,0,0,0,0,402998774864376,-345226301356237
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15330,153901377,False,infocasa consulting,911360461.0,https://www.fotocasa.es/es/comprar/vivienda/ma...,2.0,1.0,96.0,259470,NaT,...,0,0,0,0,0,0,0,0,4045416,-370286
15331,150394373,False,inmobiliaria pulpon,912788039.0,https://www.fotocasa.es/es/comprar/vivienda/ma...,3.0,1.0,150.0,165000,NaT,...,0,0,0,0,0,0,0,0,4036652,-348951
15332,153901397,False,tecnocasa torrelodones,912780348.0,https://www.fotocasa.es/es/comprar/vivienda/ma...,4.0,2.0,175.0,495000,NaT,...,0,0,0,0,0,0,0,0,4057444,-392124
15333,152607440,False,inmobiliaria pulpon,912788039.0,https://www.fotocasa.es/es/comprar/vivienda/ma...,3.0,2.0,101.0,195000,NaT,...,0,0,0,0,0,0,0,0,4036967,-348105
