# **Dubai Estates Price Prediction** 
<p>in this project we will predict the price of <strong>Dubai</strong> estates based on studing the <strong>2022</strong> transactions.</p>

## Feature Creation
### Data Pre preparation

In this stage we will manipulate, retransform and do some feature engineering in the dataset so we can prepare it to the next stage.

At first we will import the dataset from the last stage which is clean and ready to prepare it for the next stage.

Libraries:

In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.read_csv('data/WranglingDone.csv', index_col=[0])
df.head()

Unnamed: 0,Transaction Number,Transaction Date,Property ID,Transaction Type,Transaction sub type,Registration type,Is Free Hold?,Usage,Area,Property Type,...,Room(s),Parking,Nearest Metro,Nearest Mall,Nearest Landmark,No. of Buyer,No. of Seller,Master Project,Project,MeterPrice
0,102-100-2022,2022-01-03 12:30:23,1147007457,Sales,Sell - Pre registration,Off-Plan,Free Hold,Residential,ARJAN,Unit,...,1 B/R,1,Sharaf Dg Metro Station,Mall of the Emirates,Motor City,1,1,,SKYZ By Danube,12732.615083
1,102-1000-2022,2022-01-10 13:21:58,1133339823,Sales,Sell - Pre registration,Off-Plan,Free Hold,Residential,BUSINESS BAY,Unit,...,Studio,1,Business Bay Metro Station,Dubai Mall,Downtown Dubai,2,1,,Peninsula One,20719.536424
2,102-10004-2022,2022-04-13 11:37:06,879755622,Sales,Sell - Pre registration,Off-Plan,Free Hold,Residential,DOWN TOWN JABAL ALI,Unit,...,Studio,1,UAE Exchange Metro Station,Ibn-e-Battuta Mall,Expo 2020 Site,1,1,,Alexis Tower,12825.112108
3,102-10005-2022,2022-04-13 11:37:23,715889874,Sales,Sell - Pre registration,Off-Plan,Free Hold,Residential,DUBAI CREEK HARBOUR,Unit,...,2 B/R,1,Creek Metro Station,City Centre Mirdif,Dubai International Airport,1,1,,Palace Residences - Dubai Creek Harbour,16312.785263
4,102-10007-2022,2022-04-13 11:37:52,1186821507,Sales,Sell - Pre registration,Off-Plan,Free Hold,Residential,JUMEIRAH VILLAGE CIRCLE,Unit,...,1 B/R,1,Dubai Internet City,Mall of the Emirates,Sports City Swimming Academy,2,1,,Catch Residences By IGO,9317.785349


Based on the previous stage we specified which features are effect the **MeterPrice** and which features we need to remove or manipulate as the following:
<dl>
    <dt>Features we will work with:</dt>
    <dd>1- Transaction Type</dd>
    <dd>2- Area</dd>
    <dd>3- Room(s)</dd>
    <dd>4- Parking</dd>
    <dd>5- Nearest Metro</dd>
    <dd>6- Nearest Mall</dd>
    <dd>7- Nearest Landmark</dd>
    <dd>8- MeterPrice</dd>
</dl> 

So at first we will select thos features and maipulate them.

In [3]:
df = df[['Transaction Type', 'Area', 'Room(s)', 'Parking', 'Nearest Metro', 'Nearest Mall', 'Nearest Landmark', 'MeterPrice']]
df.head()

Unnamed: 0,Transaction Type,Area,Room(s),Parking,Nearest Metro,Nearest Mall,Nearest Landmark,MeterPrice
0,Sales,ARJAN,1 B/R,1,Sharaf Dg Metro Station,Mall of the Emirates,Motor City,12732.615083
1,Sales,BUSINESS BAY,Studio,1,Business Bay Metro Station,Dubai Mall,Downtown Dubai,20719.536424
2,Sales,DOWN TOWN JABAL ALI,Studio,1,UAE Exchange Metro Station,Ibn-e-Battuta Mall,Expo 2020 Site,12825.112108
3,Sales,DUBAI CREEK HARBOUR,2 B/R,1,Creek Metro Station,City Centre Mirdif,Dubai International Airport,16312.785263
4,Sales,JUMEIRAH VILLAGE CIRCLE,1 B/R,1,Dubai Internet City,Mall of the Emirates,Sports City Swimming Academy,9317.785349


At first we will transforme **MeterPrice** to its log1p

In [4]:
df.MeterPrice = np.log1p(df.MeterPrice)

Now based on the previous stage we will do some pre processing on thos features.
start with the **Transaction Type**:

In [5]:
TType = pd.get_dummies(df['Transaction Type'])
df = df.join(TType)
df['OtherTransactionType'] = 0
df.drop(columns=['Transaction Type'], inplace=True)
df.head()

Unnamed: 0,Area,Room(s),Parking,Nearest Metro,Nearest Mall,Nearest Landmark,MeterPrice,Gifts,Mortgage,Sales,OtherTransactionType
0,ARJAN,1 B/R,1,Sharaf Dg Metro Station,Mall of the Emirates,Motor City,9.452001,0,0,1,0
1,BUSINESS BAY,Studio,1,Business Bay Metro Station,Dubai Mall,Downtown Dubai,9.938881,0,0,1,0
2,DOWN TOWN JABAL ALI,Studio,1,UAE Exchange Metro Station,Ibn-e-Battuta Mall,Expo 2020 Site,9.459238,0,0,1,0
3,DUBAI CREEK HARBOUR,2 B/R,1,Creek Metro Station,City Centre Mirdif,Dubai International Airport,9.699766,0,0,1,0
4,JUMEIRAH VILLAGE CIRCLE,1 B/R,1,Dubai Internet City,Mall of the Emirates,Sports City Swimming Academy,9.139788,0,0,1,0


The next feature to manipulate it it's **Area**, we will group this feature in 10 groups.

In [6]:
group = df.groupby(['Area'])['MeterPrice'].mean().sort_values(ascending=False)
qcu = pd.qcut(group, q = 10)

for areaIndex in qcu.index.values:
    areaValue = group[areaIndex] 
    df.loc[df['Area'] == areaIndex, 'Area'] = areaValue

qcuList = qcu.unique().tolist()
qcuAreaPriceDic = {}
for index, value in enumerate(qcuList):
    qcuAreaPriceDic[value] = 'area'+ str(index+1)

df['Area'] = df['Area'].map(qcuAreaPriceDic)
AreaOH = pd.get_dummies(df['Area'])
df = df.join(AreaOH)
df.drop(columns='Area', inplace=True)
df.head()

Unnamed: 0,Room(s),Parking,Nearest Metro,Nearest Mall,Nearest Landmark,MeterPrice,Gifts,Mortgage,Sales,OtherTransactionType,area1,area10,area2,area3,area4,area5,area6,area7,area8,area9
0,1 B/R,1,Sharaf Dg Metro Station,Mall of the Emirates,Motor City,9.452001,0,0,1,0,0,0,0,0,0,1,0,0,0,0
1,Studio,1,Business Bay Metro Station,Dubai Mall,Downtown Dubai,9.938881,0,0,1,0,0,0,0,1,0,0,0,0,0,0
2,Studio,1,UAE Exchange Metro Station,Ibn-e-Battuta Mall,Expo 2020 Site,9.459238,0,0,1,0,0,0,0,0,1,0,0,0,0,0
3,2 B/R,1,Creek Metro Station,City Centre Mirdif,Dubai International Airport,9.699766,0,0,1,0,0,0,1,0,0,0,0,0,0,0
4,1 B/R,1,Dubai Internet City,Mall of the Emirates,Sports City Swimming Academy,9.139788,0,0,1,0,0,0,0,0,0,1,0,0,0,0


After that we have **Room(s)**, but here we will not forget that in Room(s) box plot there were an operlapping.

In [7]:
otherRoomList = ['Hotel', 'GYM', 'PENTHOUSE', 'Shop', 'Single Room','Office', '4 B/R', '5 B/R', '6 B/R', '7 B/R', '9 B/R']
df['Room(s)'] = df['Room(s)'].apply(lambda x : 'OtherRooms' if x in otherRoomList else x)
roomsOH = pd.get_dummies(df['Room(s)'])
df = df.join(roomsOH)
df.drop(columns=['Room(s)'], inplace=True)
df.head()

Unnamed: 0,Parking,Nearest Metro,Nearest Mall,Nearest Landmark,MeterPrice,Gifts,Mortgage,Sales,OtherTransactionType,area1,...,area5,area6,area7,area8,area9,1 B/R,2 B/R,3 B/R,OtherRooms,Studio
0,1,Sharaf Dg Metro Station,Mall of the Emirates,Motor City,9.452001,0,0,1,0,0,...,1,0,0,0,0,1,0,0,0,0
1,1,Business Bay Metro Station,Dubai Mall,Downtown Dubai,9.938881,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,1
2,1,UAE Exchange Metro Station,Ibn-e-Battuta Mall,Expo 2020 Site,9.459238,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,1
3,1,Creek Metro Station,City Centre Mirdif,Dubai International Airport,9.699766,0,0,1,0,0,...,0,0,0,0,0,0,1,0,0,0
4,1,Dubai Internet City,Mall of the Emirates,Sports City Swimming Academy,9.139788,0,0,1,0,0,...,1,0,0,0,0,1,0,0,0,0


Next is **Parking**, also in parking there were an overlapping.

In [8]:
parkingList = ['B', 'P', '1']
df['ParkingFiltered'] = df['Parking'].apply(lambda x : str(x)[0])
df['ParkingFiltered'] = df['ParkingFiltered'].apply(lambda x : 'OtherParking' if str(x) not in parkingList else x)
parkingOH = pd.get_dummies(df['ParkingFiltered'])
df = df.join(parkingOH)
df.drop(columns=['ParkingFiltered', 'Parking'], inplace=True)
df.head()

Unnamed: 0,Nearest Metro,Nearest Mall,Nearest Landmark,MeterPrice,Gifts,Mortgage,Sales,OtherTransactionType,area1,area10,...,area9,1 B/R,2 B/R,3 B/R,OtherRooms,Studio,1,B,OtherParking,P
0,Sharaf Dg Metro Station,Mall of the Emirates,Motor City,9.452001,0,0,1,0,0,0,...,0,1,0,0,0,0,1,0,0,0
1,Business Bay Metro Station,Dubai Mall,Downtown Dubai,9.938881,0,0,1,0,0,0,...,0,0,0,0,0,1,1,0,0,0
2,UAE Exchange Metro Station,Ibn-e-Battuta Mall,Expo 2020 Site,9.459238,0,0,1,0,0,0,...,0,0,0,0,0,1,1,0,0,0
3,Creek Metro Station,City Centre Mirdif,Dubai International Airport,9.699766,0,0,1,0,0,0,...,0,0,1,0,0,0,1,0,0,0
4,Dubai Internet City,Mall of the Emirates,Sports City Swimming Academy,9.139788,0,0,1,0,0,0,...,0,1,0,0,0,0,1,0,0,0


Moveing to **Nearest Metro**

In [9]:
MetroStationsList = df['Nearest Metro'].value_counts().nlargest(10).index.values
df['Nearest Metro'] = df['Nearest Metro'].apply(lambda x : 'OtherMetro' if x not in MetroStationsList else x)
NearestMOH = pd.get_dummies(df['Nearest Metro'])
df = df.join(NearestMOH)
df.drop(columns=['Nearest Metro'], inplace=True)
df.head()

Unnamed: 0,Nearest Mall,Nearest Landmark,MeterPrice,Gifts,Mortgage,Sales,OtherTransactionType,area1,area10,area2,...,Business Bay Metro Station,Creek Metro Station,Damac Properties,Dubai Internet City,First Abu Dhabi Bank Metro Station,Jumeirah Lakes Towers,Nakheel Metro Station,OtherMetro,Rashidiya Metro Station,Sharaf Dg Metro Station
0,Mall of the Emirates,Motor City,9.452001,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
1,Dubai Mall,Downtown Dubai,9.938881,0,0,1,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
2,Ibn-e-Battuta Mall,Expo 2020 Site,9.459238,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
3,City Centre Mirdif,Dubai International Airport,9.699766,0,0,1,0,0,0,1,...,0,1,0,0,0,0,0,0,0,0
4,Mall of the Emirates,Sports City Swimming Academy,9.139788,0,0,1,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0


**Nearest Mall**

In [10]:
NearestMallOH = pd.get_dummies(df['Nearest Mall'])
df = df.join(NearestMallOH)
df.drop(columns=['Nearest Mall'], inplace=True)
df.head()

Unnamed: 0,Nearest Landmark,MeterPrice,Gifts,Mortgage,Sales,OtherTransactionType,area1,area10,area2,area3,...,Jumeirah Lakes Towers,Nakheel Metro Station,OtherMetro,Rashidiya Metro Station,Sharaf Dg Metro Station,City Centre Mirdif,Dubai Mall,Ibn-e-Battuta Mall,Mall of the Emirates,Marina Mall
0,Motor City,9.452001,0,0,1,0,0,0,0,0,...,0,0,0,0,1,0,0,0,1,0
1,Downtown Dubai,9.938881,0,0,1,0,0,0,0,1,...,0,0,0,0,0,0,1,0,0,0
2,Expo 2020 Site,9.459238,0,0,1,0,0,0,0,0,...,0,0,1,0,0,0,0,1,0,0
3,Dubai International Airport,9.699766,0,0,1,0,0,0,1,0,...,0,0,0,0,0,1,0,0,0,0
4,Sports City Swimming Academy,9.139788,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0


Now for **Nearest Landmark**

In [11]:
nlvc = df['Nearest Landmark'].value_counts()
df['Nearest Landmark'] = df['Nearest Landmark'].apply(lambda x : 'OtherLandmark' if nlvc[x] < 6000 else x)
nlOH = pd.get_dummies(df['Nearest Landmark'])
df = df.join(nlOH)
df.drop(columns=['Nearest Landmark'], inplace=True)
df.head()

Unnamed: 0,MeterPrice,Gifts,Mortgage,Sales,OtherTransactionType,area1,area10,area2,area3,area4,...,Ibn-e-Battuta Mall,Mall of the Emirates,Marina Mall,Burj Al Arab,Burj Khalifa,Downtown Dubai,Dubai International Airport,Motor City,OtherLandmark,Sports City Swimming Academy
0,9.452001,0,0,1,0,0,0,0,0,0,...,0,1,0,0,0,0,0,1,0,0
1,9.938881,0,0,1,0,0,0,0,1,0,...,0,0,0,0,0,1,0,0,0,0
2,9.459238,0,0,1,0,0,0,0,0,1,...,1,0,0,0,0,0,0,0,1,0
3,9.699766,0,0,1,0,0,0,1,0,0,...,0,0,0,0,0,0,1,0,0,0
4,9.139788,0,0,1,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,1


In [12]:
df['OtherMall'] = 0

In [13]:
df.to_csv('data/FeatureEngineeringDone.csv')

0    48
dtype: int64