# Real Estate Data - Database Cleaning

This project demonstrates my approach to database cleaning by working with real estate listing data from Bangladesh.

Upon initial inspection, the dataset contains various inconsistencies and non-ideal formatting choices. My goal is to enhance its readability, consistency, and usability, making it more suitable for analysis.

Some of the goals include:
* inspect and manage missing data,
* ensure correct data types of each column,
* convert the data to enable easy filtering and sorting.

## Importing libraries

In [4]:
import pandas as pd
import numpy as np

## Inspection of the data

In [6]:
reData = pd.read_csv('raw_data_rental_listings.csv')
reData.head()

Unnamed: 0,url,title,property type,property size,parking,lift,floor,price,service_charge,year built,building registration type,preferred tennant,interior,garage size,front road size,common area,bedrooms,bathrooms,location,country
0,https://rents.com.bd/property/a-modern-well-pl...,A Modern Well-planned Flat Of 2000 Sq Ft Is Up...,Residential Apartment,2000 Sq Ft,1,1,3rd Floor Available,"BDT 80,000/-","BDT 8,500/-",2010.0,Residential,Foreigner,Un-Furnished,120 Sq. Ft.,16 Ft.,180 Sq Ft,03 Bedrooms,03 Bathrooms,"Banani, Dhaka 1213",Bangladesh
1,https://rents.com.bd/property/a-modern-well-pl...,A Modern Well-planned Flat Of 4500 Sq Ft Is Up...,Residential Apartment,4500 Sq Ft,2,1,6th-7th floor (Duplex),"BDT 220,000/-","BDT 17,000/-",2012.0,Residential,Foreigner,Semi-Furnished,240 Sq. Ft.,12 Ft.,250 Sq Ft,04 Bedrooms,04 Bathrooms,"Gulshan, Dhaka 1212",Bangladesh
2,https://rents.com.bd/property/brand-new-and-ta...,Brand New And Tastefully Designed This 20000 S...,Commercial space rent in Dhaka | 225+ Spaces f...,20000 Sq. Ft. (Per floor 4000 Sq. Ft.),05 car parking,3,11th-15th floor available,"BDT 29,00,000/- (BDT 145/- per Sq Ft)","BDT 300,000/- (BDT 15/- per Sq Ft)",2023.0,Commercial,Corporate Office or MNC Office,Un-Furnished,600 Sq Ft.,50 Ft.,1250 Sq Ft,,,"Gulshan, Dhaka 1212",Bangladesh
3,https://rents.com.bd/property/a-modern-well-pl...,A Modern Well-planned Flat Of 2250 Sq Ft Is Up...,Residential Apartment,2250 Sq Ft,1,2,7th Floor Available,"BDT 100,000/-","BDT 15,000/-",2017.0,Residential,Foreigner,Un-Furnished,120 Sq. Ft.,24 Ft.,180 Sq Ft,03 Bedrooms,03 Bathrooms,"Gulshan, Dhaka 1212",Bangladesh
4,https://rents.com.bd/property/a-modern-well-pl...,A Modern Well-planned Flat Of 4300 Sq Ft Is Up...,Residential Apartment,4300 Sq Ft,2,2,5th floor,"BDT 220,000/-","BDT 25,000/-",2010.0,Residential,Foreigner,Un-Furnished,240 Sq. Ft.,16 Ft.,250 Sq Ft,04 Bedrooms,04 Bathrooms,"Gulshan, Dhaka 1212",Bangladesh


In [7]:
for i in range(len(reData.columns)):
    reData.rename(columns = {reData.columns[i]: reData.columns[i].replace(' ', '_')}, inplace=True)
print(reData.columns)
# Converting the column names to save some time using autocompletion.

Index(['url', 'title', 'property_type', 'property_size', 'parking', 'lift',
       'floor', 'price', 'service_charge', 'year_built',
       'building_registration_type', 'preferred_tennant', 'interior',
       'garage_size', 'front_road_size', 'common_area', 'bedrooms',
       'bathrooms', 'location', 'country'],
      dtype='object')


In [8]:
print('Missing values and data types of the columns: \n')
print(reData.info())

Missing values and data types of the columns: 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 818 entries, 0 to 817
Data columns (total 20 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   url                         818 non-null    object 
 1   title                       818 non-null    object 
 2   property_type               818 non-null    object 
 3   property_size               743 non-null    object 
 4   parking                     728 non-null    object 
 5   lift                        727 non-null    object 
 6   floor                       646 non-null    object 
 7   price                       745 non-null    object 
 8   service_charge              667 non-null    object 
 9   year_built                  743 non-null    float64
 10  building_registration_type  735 non-null    object 
 11  preferred_tennant           679 non-null    object 
 12  interior                    743 non-null    

## Checking for duplicates

In [10]:
duplicates = reData.duplicated()
print(duplicates.value_counts())
# Number of duplicate rows in the database
len(reData)

False    811
True       7
Name: count, dtype: int64


818

In [11]:
reData = reData.drop_duplicates()
print(len(reData))
reData.reset_index(drop=True)

811


Unnamed: 0,url,title,property_type,property_size,parking,lift,floor,price,service_charge,year_built,building_registration_type,preferred_tennant,interior,garage_size,front_road_size,common_area,bedrooms,bathrooms,location,country
0,https://rents.com.bd/property/a-modern-well-pl...,A Modern Well-planned Flat Of 2000 Sq Ft Is Up...,Residential Apartment,2000 Sq Ft,1,1,3rd Floor Available,"BDT 80,000/-","BDT 8,500/-",2010.0,Residential,Foreigner,Un-Furnished,120 Sq. Ft.,16 Ft.,180 Sq Ft,03 Bedrooms,03 Bathrooms,"Banani, Dhaka 1213",Bangladesh
1,https://rents.com.bd/property/a-modern-well-pl...,A Modern Well-planned Flat Of 4500 Sq Ft Is Up...,Residential Apartment,4500 Sq Ft,2,1,6th-7th floor (Duplex),"BDT 220,000/-","BDT 17,000/-",2012.0,Residential,Foreigner,Semi-Furnished,240 Sq. Ft.,12 Ft.,250 Sq Ft,04 Bedrooms,04 Bathrooms,"Gulshan, Dhaka 1212",Bangladesh
2,https://rents.com.bd/property/brand-new-and-ta...,Brand New And Tastefully Designed This 20000 S...,Commercial space rent in Dhaka | 225+ Spaces f...,20000 Sq. Ft. (Per floor 4000 Sq. Ft.),05 car parking,3,11th-15th floor available,"BDT 29,00,000/- (BDT 145/- per Sq Ft)","BDT 300,000/- (BDT 15/- per Sq Ft)",2023.0,Commercial,Corporate Office or MNC Office,Un-Furnished,600 Sq Ft.,50 Ft.,1250 Sq Ft,,,"Gulshan, Dhaka 1212",Bangladesh
3,https://rents.com.bd/property/a-modern-well-pl...,A Modern Well-planned Flat Of 2250 Sq Ft Is Up...,Residential Apartment,2250 Sq Ft,1,2,7th Floor Available,"BDT 100,000/-","BDT 15,000/-",2017.0,Residential,Foreigner,Un-Furnished,120 Sq. Ft.,24 Ft.,180 Sq Ft,03 Bedrooms,03 Bathrooms,"Gulshan, Dhaka 1212",Bangladesh
4,https://rents.com.bd/property/a-modern-well-pl...,A Modern Well-planned Flat Of 4300 Sq Ft Is Up...,Residential Apartment,4300 Sq Ft,2,2,5th floor,"BDT 220,000/-","BDT 25,000/-",2010.0,Residential,Foreigner,Un-Furnished,240 Sq. Ft.,16 Ft.,250 Sq Ft,04 Bedrooms,04 Bathrooms,"Gulshan, Dhaka 1212",Bangladesh
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
806,https://rents.com.bd/property/3500-sq-ft-furni...,3500 Sq Ft Furnished apartment rent in Park ro...,"Furnished Apartment For Rent In Dhaka, Residen...",,,,,,,,,,,,,,,,"Gulshan, Dhaka, Chanpara Bazar, Dhaka District...",Bangladesh
807,https://rents.com.bd/property/2500-sq-ft-retai...,2500 Sq Ft Retail space for rent in Banani,Commercial space rent in Dhaka | 225+ Spaces f...,,,,,,,,,,,,,,,,"Banani, Gulshan, Dhaka, Chanpara Bazar, Dhaka ...",Bangladesh
808,https://rents.com.bd/property/3500-sft-furnish...,3500 sft Furnished apartment rent in Gulshan,"Furnished Apartment For Rent In Dhaka, Residen...",,,,,,,,,,,,,,,,"Gulshan, Dhaka, Chanpara Bazar, Dhaka District...",Bangladesh
809,https://rents.com.bd/property/3300-sft-furnish...,3300 sft Furnished apartment rent in Baridhara...,"Furnished Apartment For Rent In Dhaka, Residen...",,,,,,,,,,,,,,,,"Gulshan, Dhaka, Chanpara Bazar, Dhaka District...",Bangladesh


## Fixing column data types

Almost all of the columns of the database are of the `'object'` type, which is obviously wrong for many numerical variables i.e. property size, parking, lift, price. I'm going to modify them to enable conversion to correct data types.

### property_size

In [15]:
reData.property_size = reData.property_size.replace('\([^)]*\)', '', regex=True)
# Removes all the additional data in '(...)' brackets (i.e. (Per floor 4000 Sq. Ft.))
reData.property_size = reData.property_size.replace('[\sSsqFft\.]', '', regex=True)
# Removes all the whitespace characters and various forms of 'Sq. Ft.'

# reData.property_size = pd.to_numeric(reData.property_size)
# Conversion to numeric type doesn't execute yet, as some values most likely still have additional characters. 
print(reData.loc[reData.property_size.str.contains(r'\D', regex=True, na=False)].property_size)

86         3285/306Meer
238              20,000
250              14,000
277              13,500
278              20,000
             ...       
745    3000/240uareMeer
746    2600/208uareMeer
747    3700/296uareMeer
748    3500/280uareMeer
750        2600/208Meer
Name: property_size, Length: 198, dtype: object


  reData.property_size = reData.property_size.replace('\([^)]*\)', '', regex=True)
  reData.property_size = reData.property_size.replace('[\sSsqFft\.]', '', regex=True)


In [16]:
reData.property_size = reData.property_size.replace('[,]', '', regex=True)
# Fixes columns with ',' as seperator.
reData.property_size = reData.property_size.str.extract(r'(^\d+)')
# Extracts only the first numerical value of each cell, removing the value in sq. meters
reData.property_size = pd.to_numeric(reData.property_size)
reData.property_size = pd.to_numeric(reData.property_size, downcast="signed")
print(reData.property_size.dtype)

float64


### parking

In [18]:
reData.parking.unique()
# Checking all the values as I expect there to be a managable number of them.

array(['1', '2', '05 car parking', '3', '01 car parking',
       '06 car parking', '02 car parking', '02 Car Parking',
       '38 Car Parking', '03 Car Parking', '06 Car Parking', nan,
       '01 Car Parking', '4 Car Parking', '01 car parking (Per floor)',
       '04 Car Parking', '10 Car Parking', '08 Car Parking',
       '45 Car Parking', '09 Car Parking', '05 Car Parking', '10', '9',
       '6', '11', '07 Car Parking', '2 Car Parking', '021 Car Parking',
       '\xa002 Car Parking', '\xa001 Car Parking'], dtype=object)

In [19]:
reData.parking= reData.parking.str.replace('[\D]', '', regex=True)

# There are no rows with value '0' in parking spaces count, so I'm going to assume that the NaN values are 0s.
reData.loc[reData.parking.isnull(), 'parking'] = 0

reData.parking = pd.to_numeric(reData.parking)
reData.parking = pd.to_numeric(reData.parking, downcast="signed")
reData.parking.unique()

  reData.parking= reData.parking.str.replace('[\D]', '', regex=True)


array([ 1,  2,  5,  3,  6, 38,  0,  4, 10,  8, 45,  9, 11,  7, 21],
      dtype=int8)

### lift

In [21]:
reData.lift.value_counts()

lift
2                   205
02 available        154
1                   143
01 available        107
3                    29
02 Available         20
4                    13
01 Available         12
03 available         10
03 Available          8
6                     5
5                     3
04 Available          3
No                    2
04 available          2
 N/A                  1
Individual house      1
0                     1
Not Available         1
Name: count, dtype: int64

In [22]:
# Looking at the unique values I would assume that every value without a given number (i.e.'No', ' N/A ' etc.) means that there is 0 lifts.
reData.loc[reData.lift.str.match(r'^[/A-Za-z\s]+$', na=False), 'lift'] = '0'
reData.lift = reData.lift.fillna('0')

In [23]:
reData.lift = reData.lift.str.extract(r'(\d*)')
reData.lift = pd.to_numeric(reData.lift)
reData.lift = pd.to_numeric(reData.lift, downcast="signed")


In [24]:
reData.dtypes

url                            object
title                          object
property_type                  object
property_size                 float64
parking                          int8
lift                             int8
floor                          object
price                          object
service_charge                 object
year_built                    float64
building_registration_type     object
preferred_tennant              object
interior                       object
garage_size                    object
front_road_size                object
common_area                    object
bedrooms                       object
bathrooms                      object
location                       object
country                        object
dtype: object

### floor

In [26]:
reData.floor.unique()

array(['3rd Floor Available', '6th-7th floor (Duplex)',
       '11th-15th floor available', '7th Floor Available', '5th floor',
       'G+4th floor', '11th floor', '4th Floor Available',
       '9th Floor Available', '5th Floor', '8th Floor', 'Full Building',
       '3rd & 5th Floor Available', 'Ground floor', '5th Floor Available',
       '11th Floor', '3rd\xa0 floor',
       '9th, 10th, 11th, 12th, 13th, 14th & 15th floor (per floor 4000 Sq Ft).',
       'Individual House', '4th & 6th floor', nan, '2nd floor',
       '1st floor', '9th floor', '8th top + 4th Floor', '8th floor',
       '13th Floor', '3rd Floor', 'Ground\xa0 floor', '6th floor',
       '9th Floor', '3rd floor', '4th Floor', '3rd & 4th Floor (duplex) ',
       '1st Floor', '6th Floor', '1st\xa0 floor', '12th Floor',
       '13th floor', '20th floor', '11th – 13th Floor',
       '3rd to 6th Floor Available', '4th Floor ',
       'Ground floor (Duplex)', '2nd Floor Available',
       '5th, 8th & 9th Floor Available', 'G+1

In [27]:
reData.floor.replace('\([^)]*\)', '', regex=True, inplace=True)
reData.floor.replace('([fF]loor|[Aa]vailable)', '', regex=True, inplace=True)
# Removes all the additional data in (...) brackets and 'Floor available' text.
reData.floor.replace('&|and|\+', ',', regex=True, inplace=True)
# Standarization of the format for listing floor numbers.
reData.floor.replace(' to ', '-', regex=True, inplace=True)
# Standarization of the format for listing floor ranges.
reData.floor.replace('[Gg]round|G', '0', regex=True, inplace=True)
# Converting 'ground' floor listings to 0s.
reData.loc[reData.floor.str.match(r'^[A-Za-z\s]*$', na=False), 'floor'] = np.nan
# Converts listings without numerical data to empty lists.
reData.floor.replace('st|nd|rd|th| ', '', regex=True, inplace=True)
reData.floor = reData.floor.str.strip()
# Removes ordinal suffixes and spaces.

# I want to have only numbers, commas ',' and hyphens '-'.
# Checking for values that don't meet that condition
reData.loc[~reData.floor.str.match(r'^[\d\-,]+$', na=True), 'floor']

  reData.floor.replace('\([^)]*\)', '', regex=True, inplace=True)
  reData.floor.replace('&|and|\+', ',', regex=True, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  reData.floor.replace('\([^)]*\)', '', regex=True, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  reData.floor.replace('([fF]loor|[Aa]

18     9,10,11,12,13,14,15.
27                   8top,4
62                    11–13
251                   12–13
314                     1–5
Name: floor, dtype: object

In [28]:
reData.floor.replace('–', '-', regex=True, inplace=True)
# Replacing dashes with hyphens.
reData.floor.replace('top|\.', '', regex=True, inplace=True)
# Deleting remaining unwanted characters.

reData.floor.unique()

  reData.floor.replace('top|\.', '', regex=True, inplace=True)


array(['3', '6-7', '11-15', '7', '5', '0,4', '11', '4', '9', '8', nan,
       '3,5', '0', '9,10,11,12,13,14,15', '4,6', '2', '1', '8,4', '13',
       '6', '3,4', '12', '20', '11-13', '3-6', '5,8,9', '0,1', '4-7',
       '5,6', '0-5', '10', '0-6', '1-7', '2-9', '16', '0-3', '6,8,10',
       '23', '8,9', '15', '2,3', '12-13', '1-5', '9,10', '17', '1,2',
       '21', '19', '14', '17,18', '18', '4-23', '2,3,4,5,6', '2,6', '5,8',
       '08', '6,7,10', '0,1,2', '6,7,8', '4,5', '04', '09', '06', '1,5,6',
       '10,11', '12,13', '4,5,6,7,8', '3,4,5', '5,6,7,8', '5,11', '07',
       '05', '1,5,7', '4,5,6,7', '1-6', '4,5,6', '5-8', '2,3,5'],
      dtype=object)

In [29]:
# Converting strings describing ranges to lists of ints.
def expand_range(value):
    if isinstance(value, str) and '-' in value:  # Checks if it's a string and a range.
        start, end = map(int, value.split('-'))  # Splits and converts to integers.
        return list(range(start, end + 1))  # Generates range as list.
    return value  # Does nothing if conditions are not met.

# Converting remaining strings listing floor numbers to lists of ints.
def expand_list(value):
    if isinstance(value, str):  # Checks if it's a string
        return list(map(int, value.split(',')))  # Splits and convert to integers
    return value  # Does nothing if conditions are not met.

reData.floor = reData.floor.apply(expand_range)
reData.floor = reData.floor.apply(expand_list)

# Sorting lists.
def sort_list(value):
    if isinstance(value, list):  # Checks if it's a list
        return sorted(value)  # Sorts lists
    return value  # Does nothing if conditions are not met.
    
reData.floor = reData.floor.apply(sort_list)
reData.floor.iloc[1:25]

1                          [6, 7]
2            [11, 12, 13, 14, 15]
3                             [7]
4                             [5]
5                          [0, 4]
6                            [11]
7                             [4]
8                             [9]
9                             [5]
10                            [8]
11                            NaN
12                         [3, 5]
13                            [0]
14                            [5]
15                            [7]
16                           [11]
17                            [3]
18    [9, 10, 11, 12, 13, 14, 15]
19                            NaN
20                         [4, 6]
21                            NaN
22                            [2]
23                            [1]
24                            [2]
Name: floor, dtype: object

In [30]:
# Right now filtering by floor number is hard because of NaNs which can't be searched.
# Converting NaNs to empty lists.
reData.loc[reData.floor.isnull(),'floor'] = reData.loc[reData.floor.isnull(), 'floor'].apply(lambda x: [])
reData[reData.floor.apply(lambda x: 11 in x)]

Unnamed: 0,url,title,property_type,property_size,parking,lift,floor,price,service_charge,year_built,building_registration_type,preferred_tennant,interior,garage_size,front_road_size,common_area,bedrooms,bathrooms,location,country
2,https://rents.com.bd/property/brand-new-and-ta...,Brand New And Tastefully Designed This 20000 S...,Commercial space rent in Dhaka | 225+ Spaces f...,20000.0,5,3,"[11, 12, 13, 14, 15]","BDT 29,00,000/- (BDT 145/- per Sq Ft)","BDT 300,000/- (BDT 15/- per Sq Ft)",2023.0,Commercial,Corporate Office or MNC Office,Un-Furnished,600 Sq Ft.,50 Ft.,1250 Sq Ft,,,"Gulshan, Dhaka 1212",Bangladesh
6,https://rents.com.bd/property/tastefully-desig...,Tastefully Designed This 3200 SQ FT Commercial...,Commercial space rent in Dhaka | 225+ Spaces f...,3200.0,1,2,[11],"BDT 448,000/- (BDT 140/- per Sq Ft)","BDT 32,000/- (BDT 10/- per Sq Ft)",2010.0,Commercial,Corporate Office or MNC Office,Un-Furnished,120 Sq Ft.,50 Ft.,250 Sq Ft,,,"Banani, Dhaka 1213",Bangladesh
16,https://rents.com.bd/property/2900-sq-ft-brand...,2900 SQ FT Brand New Apartment Is Vacant For R...,Residential Apartment,2900.0,2,2,[11],"BDT 180,000/-","BDT 15,000/-",2023.0,Residential,Foreigner,Un-Furnished,240 Sq. Ft.,24 Ft.,250 Sq Ft,03 Bedrooms,04 Bathrooms,"Gulshan, Dhaka 1212",Bangladesh
18,https://rents.com.bd/property/excellent-brand-...,Excellent Brand New Commercial Space Of 28000 ...,Commercial space rent in Dhaka | 225+ Spaces f...,28000.0,38,3,"[9, 10, 11, 12, 13, 14, 15]","BDT 44,80,000/- (BDT 160/- per Sq Ft.)","BDT 280,000/- (BDT 10/- per Sq Ft.)",2022.0,Commercial,Corporate Office or MNC Offices,Un-Furnished,4560 Sq Ft.,50 Ft,2700 Sq Ft,,,"Gulshan Avenue, Dhaka 1212",Bangladesh
45,https://rents.com.bd/property/spaciously-desig...,Spaciously Designed And Strongly Structured Th...,Commercial space rent in Dhaka | 225+ Spaces f...,2367.0,1,2,[11],"BDT 230,000/-/-",Including,2008.0,Commercial,Corporate Office or MNC Office,Un-Furnished,120 Sq Ft.,60 Ft.,180 Sq Ft,,,"Banani, Dhaka 1213",Bangladesh
62,https://rents.com.bd/property/brand-new-luxury...,Brand New Luxury Home Of 5400 Sq Ft Is Up For ...,"Luxury Collection, Residential Apartment",5400.0,2,2,"[11, 12, 13]","BDT 400,000/-",Not fixed yet.,2022.0,Residential,Foreigner,Un-Furnished,240 Sq. Ft.,36 Ft.,250 Sq Ft.,05 Bedrooms,05 Bathrooms,"Gulshan, Dhaka 1212",Bangladesh
156,https://rents.com.bd/property/well-built-and-p...,Well Built And Properly Designed Commercial Sp...,Commercial space rent in Dhaka | 225+ Spaces f...,6120.0,1,2,[11],"BDT 612,000/- (BDT 100/- Per Sq Ft)","BDT 61,200/- (BDT 10/- Per Sq Ft)",2000.0,Commercial,Corporate Office or MNC Office,Un-Furnished,120 Sq Ft,40 Ft.,450 Sq Ft,,,"Gulshan, Dhaka 1212",Bangladesh
306,https://rents.com.bd/property/a-modern-well-pl...,A Modern Well-planned Flat Of 2100 Sq Ft Is Up...,"Furnished Apartment For Rent In Dhaka, Residen...",2100.0,1,2,[11],"BDT 100,000/-",BDT 7000/-,2010.0,Residential,Foreigner,Furnished,120 Sq Ft,16 Ft,180 Sq Ft,03 Bedrooms,03 Bathrooms,"Banani, Dhaka 1213",Bangladesh
333,https://rents.com.bd/property/stunning-luxury-...,Stunning Luxury Home Of 2200 Sq Ft In The Pres...,"Furnished Apartment For Rent In Dhaka, Luxury ...",2200.0,1,2,[11],"BDT 190,000/-","BDT 15,000/-",2018.0,Residential,Foreigner,Furnished,120 Sq Ft,16 Ft,180 Sq Ft,03 Bedrooms,03 Bathrooms,"Gulshan, Dhaka, 1212",Bangladesh
361,https://rents.com.bd/property/97760-sq-ft-bran...,"79,900 Sq Ft Brand New Commercial Space (Full ...",Commercial space rent in Dhaka | 225+ Spaces f...,79900.0,38,3,"[4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,...",BDT 1.40 cr. (BDT 175/- per Sq Ft.),"BDT 799,000/- (BDT 10/- per Sq Ft.)",2022.0,Commercial,Corporate Office or MNC Offices,Un-Furnished,4560 Sq Ft.,50 Ft,14664 Sq Ft,,,"Gulshan Avenue, Dhaka 1212",Bangladesh


### price

In [32]:
# Removing not necessary parts
reData.price = reData.price.str.replace(r'\([^\)]*\)', '', regex=True)
reData.price = reData.price.str.replace(r'BDT', '', regex=True)
reData.price = reData.price.str.replace(r'/-', '', regex=True)
reData.price = reData.price.str.replace(r',', '', regex=True)
reData.price = reData.price.str.replace(r'\.\s*$', '', regex=True)

# 'Lac' is an Indian unit of 100,000 and Cr - 10,000,000. I need to convert decimal numbers in strings to floats and multiply them accordingly.
mask = reData.price.str.contains(r'[Ll]ac', na=False)
reData.loc[mask, 'price'] = reData.loc[mask, 'price'].str.replace(r'[Ll]ac', '', regex=True).str.strip().astype(float).apply(lambda x: x*100000)

mask = reData.price.str.contains(r'[Cc]r', na=False)
reData.loc[mask, 'price'] = reData.loc[mask, 'price'].str.replace(r'[Cc]r', '', regex=True).str.strip().astype(float).apply(lambda x: x*10000000)

# Some records have 'USD' prices added after BDT. I'll take only first value
mask = reData.price.str.contains(r'(USD)', na=False)
reData.loc[mask, 'price'] = reData.loc[mask, 'price'].str.extract(r'(^\s*\d+)')

reData.price = reData.price.str.strip()
reData.price = pd.to_numeric(reData.price)
reData.price = reData.price.round()

  mask = reData.price.str.contains(r'(USD)', na=False)


### service_charge