# Analysis of apartments rent in Dubai, UAE
 For **DIFC** and **Downtown** area

All data was scraped from Bayut website on August 9 and saved as .CSV file.  
First downloading .CSV file from Google Drive with command `! gdown`

In [19]:
! gdown 1EMlXkq1MI4HqYwbcvTl93bKZ2BvRqDCy

Downloading...
From: https://drive.google.com/uc?id=1EMlXkq1MI4HqYwbcvTl93bKZ2BvRqDCy
To: /content/bayut-dubai-difc-downtown.csv
  0% 0.00/8.85M [00:00<?, ?B/s]100% 8.85M/8.85M [00:00<00:00, 158MB/s]


Importing all necessary libraries

In [20]:
import pandas as pd

Loading .CSV file with pandas using python engine and checking first 5 rows of data

In [21]:
data = pd.read_csv('bayut-dubai-difc-downtown.csv', engine='python')

In [22]:
data.head(5)

Unnamed: 0,web-scraper-order,web-scraper-start-url,pagination,apartmet-link,apartmet-link-href,building,price,rent-frequency,address,beds,...,reference-no,date-added,balcony-size-sqft,parking,building-info-name,building-info-floors,building-info-year,building-info-area-sqft,furnishing,features-amenities
0,1660037380-2176,https://www.bayut.com/to-rent/apartments/dubai...,https://www.bayut.com/to-rent/apartments/dubai...,,https://www.bayut.com/property/details-5975062...,,190000,Yearly,"The Address The Blvd, Downtown Dubai, Dubai",1 Bed,...,Bayut - BHM-R-556121,23 May 2022,,,,,,,Furnished,Centrally Air-Conditioned
1,1660040273-2805,https://www.bayut.com/to-rent/apartments/dubai...,https://www.bayut.com/to-rent/apartments/dubai...,,https://www.bayut.com/property/details-5937929...,,165000,Yearly,"Downtown Views, Downtown Dubai, Dubai",2 Beds,...,Bayut - MK-R-5701-07-P,10 May 2022,,,,,,,Furnished,Parking Spaces: 1
2,1660036840-2061,https://www.bayut.com/to-rent/apartments/dubai...,https://www.bayut.com/to-rent/apartments/dubai...,,https://www.bayut.com/property/details-5577993...,,290000,Yearly,"BLVD Heights Tower 2, BLVD Heights, Downtown D...",2 Beds,...,Bayut - 879-Ap-R-1951,4 July 2022,,,,,,,Unfurnished,Swimming Pool
3,1660042132-3219,https://www.bayut.com/to-rent/apartments/dubai...,https://www.bayut.com/to-rent/apartments/dubai...,,https://www.bayut.com/property/details-5942993...,,185000,Yearly,"The Address The Blvd, Downtown Dubai, Dubai",1 Bed,...,Bayut - BHM-R-555721,12 May 2022,,,,,,,Furnished,Centrally Air-Conditioned
4,1660045614-3922,https://www.bayut.com/to-rent/apartments/dubai...,https://www.bayut.com/to-rent/apartments/dubai...,,https://www.bayut.com/property/details-6196723...,,130000,Yearly,"Downtown Views, Downtown Dubai, Dubai",1 Bed,...,Bayut - MCC-R-5685,3 August 2022,,,,,,,Furnished,


There's a few columns that is not valuable for future analysis such as links, IDs, rent frequency as its only contains annual rent options.  
Let's drop those columns.

In [23]:
data.drop(['apartmet-link-href','rent-frequency','web-scraper-order','web-scraper-start-url','pagination','apartmet-link', 'building'], axis=1, inplace=True)

In [29]:
# for col in data.columns:
#     print(col)
data.columns

Index(['price', 'address', 'beds', 'baths', 'area-sqft', 'description-title',
       'description', 'reference-no', 'date-added', 'balcony-size-sqft',
       'parking', 'building-info-name', 'building-info-floors',
       'building-info-year', 'building-info-area-sqft', 'furnishing',
       'features-amenities'],
      dtype='object')

First view on data info if there are anything odd or interesting insights

In [26]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4824 entries, 0 to 4823
Data columns (total 17 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   price                    4824 non-null   object 
 1   address                  4824 non-null   object 
 2   beds                     4824 non-null   object 
 3   baths                    4802 non-null   object 
 4   area-sqft                4824 non-null   object 
 5   description-title        4824 non-null   object 
 6   description              4824 non-null   object 
 7   reference-no             4824 non-null   object 
 8   date-added               4824 non-null   object 
 9   balcony-size-sqft        1069 non-null   object 
 10  parking                  1188 non-null   object 
 11  building-info-name       1184 non-null   object 
 12  building-info-floors     1184 non-null   float64
 13  building-info-year       1184 non-null   float64
 14  building-info-area-sqft 

Price is an object, that should be changed as number of beds, baths and some othere data.  
Some data are missing, we will deal with that later

In [27]:
data.sample()

Unnamed: 0,price,address,beds,baths,area-sqft,description-title,description,reference-no,date-added,balcony-size-sqft,parking,building-info-name,building-info-floors,building-info-year,building-info-area-sqft,furnishing,features-amenities
349,285000,"BLVD Heights Tower 2, BLVD Heights, Downtown D...",3 Beds,4 Baths,"2,354 sqft",High Floor | Move In Ready | Keys On Hand,D&B properties is extremely proud to present t...,Bayut - BST225496_L,23 June 2022,183 sqft,Yes,Blvd Heights T2,41.0,2020.0,"2,354 sqft",,


For column with price converting it to data type int, but first removing comma.

In [28]:
data['price'] = data['price'].str.replace(',','').astype(float).astype(int)
data['price'].sample(5)

4267    160000
4288    380000
3889    290000
273     150000
3272    160000
Name: price, dtype: int64

For column with number of bedrooms, let's see how many unique values are there and then replace it with a number, if that is a studio then replace it with 0.

In [11]:
data['beds'].unique()

array(['1 Bed', '2 Beds', 'Studio', '3 Beds', '4 Beds', '5 Beds'],
      dtype=object)

In [12]:
data['beds'] = data['beds'].replace(
    {'Studio': 0,
     '1 Bed': 1,
     '2 Beds': 2,
     '3 Beds': 3,
     '4 Beds': 4,
     '5 Beds': 5,
    }
  )


For column with address, it has name of the city, area, bulding name or even a complecs of buildings name.  
We need it to be separated to diferent columns for future analysis. We can see that a bulding name is always goes first, city is always the last and before city is area name.  
So, let's split the address and save each element to a new column.

In [32]:
data['address_city'] = data['address'].str.split(', ').str[-1]
data['address_area'] = data['address'].str.split(', ').str[-2]
data['address_building'] = data['address'].str.split(', ').str[0]

In [33]:
data[['address', 'address_city', 'address_area', 'address_building']].sample(5)

Unnamed: 0,address,address_city,address_area,address_building
4158,"Burj Al Nujoom, Downtown Dubai, Dubai",Dubai,Downtown Dubai,Burj Al Nujoom
2330,"29 Boulevard 1, 29 Boulevard, Downtown Dubai, ...",Dubai,Downtown Dubai,29 Boulevard 1
931,"BLVD Heights Tower 2, BLVD Heights, Downtown D...",Dubai,Downtown Dubai,BLVD Heights Tower 2
228,"The Address Dubai Mall, Downtown Dubai, Dubai",Dubai,Downtown Dubai,The Address Dubai Mall
2788,"The Gate, DIFC, Dubai",Dubai,DIFC,The Gate


We no longer needed column with address, let's drop it.

In [36]:
data.drop(['address'], axis=1, inplace=True)