# House Rent Prediction

The rent of a house depends on a lot of factors. With appropriate data and Machine Learning techniques, many real estate platforms find the housing options according to the customer’s budget. So, if you want to learn how to use Machine Learning to predict the rent of a house, this article is for you. In this article, I will take you through the task of House Rent Prediction with Machine Learning using Python.

The rent of a housing property depends on a lot of factors like:

1) number of bedrooms, hall, and kitchen

2) size of the property

3) the floor of the house

4)area type

5)area locality

6)City

7) furnishing status of the house

To build a house rent prediction system, we need data based on the factors affecting the rent of a housing property. I found a dataset from Kaggle which includes all the features we need.

Dataset :- https://www.kaggle.com/datasets/iamsouravbanerjee/house-rent-prediction-dataset

In [41]:
#importing libraries
import numpy as np #2 perform mathematical operations on arrays
import pandas as pd #for data analysis

#statistical graphics
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go

from sklearn.model_selection import train_test_split #measure the accuracy of the model
from keras.models import Sequential #allows u 2 create models layers by layers, hvg a stack of layers(as it is stack it'll acc every layer as 1 element)
from keras.layers import Dense #all the layers frm 1 layer r connected to 2nd layer
from keras.layers import LSTM # Long Short Term Memory RNN model, used in Time Series

In [42]:
#load the dataset 2 pandas data frame for manupulating the data
raw_data = pd.read_csv('House_Rent_Dataset.csv', encoding = 'latin-1')

#now v hv 2 replace null values with null string otherwise it will show errors
#v will store this in variable claaed "mail_data"
data = raw_data.where((pd.notnull(raw_data)), '')

#lets check the shape of the dataset
data.shape

(4746, 12)

In [43]:
# printing the first 10 rows of the dataset
data.head(10)

Unnamed: 0,Posted On,BHK,Rent,Size,Floor,Area Type,Area Locality,City,Furnishing Status,Tenant Preferred,Bathroom,Point of Contact
0,2022-05-18,2,10000,1100,Ground out of 2,Super Area,Bandel,Kolkata,Unfurnished,Bachelors/Family,2,Contact Owner
1,2022-05-13,2,20000,800,1 out of 3,Super Area,"Phool Bagan, Kankurgachi",Kolkata,Semi-Furnished,Bachelors/Family,1,Contact Owner
2,2022-05-16,2,17000,1000,1 out of 3,Super Area,Salt Lake City Sector 2,Kolkata,Semi-Furnished,Bachelors/Family,1,Contact Owner
3,2022-07-04,2,10000,800,1 out of 2,Super Area,Dumdum Park,Kolkata,Unfurnished,Bachelors/Family,1,Contact Owner
4,2022-05-09,2,7500,850,1 out of 2,Carpet Area,South Dum Dum,Kolkata,Unfurnished,Bachelors,1,Contact Owner
5,2022-04-29,2,7000,600,Ground out of 1,Super Area,Thakurpukur,Kolkata,Unfurnished,Bachelors/Family,2,Contact Owner
6,2022-06-21,2,10000,700,Ground out of 4,Super Area,Malancha,Kolkata,Unfurnished,Bachelors,2,Contact Agent
7,2022-06-21,1,5000,250,1 out of 2,Super Area,Malancha,Kolkata,Unfurnished,Bachelors,1,Contact Agent
8,2022-06-07,2,26000,800,1 out of 2,Carpet Area,"Palm Avenue Kolkata, Ballygunge",Kolkata,Unfurnished,Bachelors,2,Contact Agent
9,2022-06-20,2,10000,1000,1 out of 3,Carpet Area,Natunhat,Kolkata,Semi-Furnished,Bachelors/Family,2,Contact Owner


In [44]:
# printing the last 10 rows of the dataset
data.tail(10)

Unnamed: 0,Posted On,BHK,Rent,Size,Floor,Area Type,Area Locality,City,Furnishing Status,Tenant Preferred,Bathroom,Point of Contact
4736,2022-06-28,3,15000,1500,Lower Basement out of 2,Super Area,Almasguda,Hyderabad,Semi-Furnished,Family,3,Contact Owner
4737,2022-07-07,3,15000,1500,Lower Basement out of 2,Super Area,Almasguda,Hyderabad,Semi-Furnished,Bachelors/Family,3,Contact Owner
4738,2022-07-06,2,17000,855,4 out of 5,Carpet Area,"Godavari Homes, Quthbullapur",Hyderabad,Unfurnished,Bachelors,2,Contact Agent
4739,2022-07-06,2,25000,1040,2 out of 4,Carpet Area,Gachibowli,Hyderabad,Unfurnished,Bachelors,2,Contact Owner
4740,2022-06-02,2,12000,1350,2 out of 2,Super Area,Old Alwal,Hyderabad,Unfurnished,Bachelors/Family,2,Contact Owner
4741,2022-05-18,2,15000,1000,3 out of 5,Carpet Area,Bandam Kommu,Hyderabad,Semi-Furnished,Bachelors/Family,2,Contact Owner
4742,2022-05-15,3,29000,2000,1 out of 4,Super Area,"Manikonda, Hyderabad",Hyderabad,Semi-Furnished,Bachelors/Family,3,Contact Owner
4743,2022-07-10,3,35000,1750,3 out of 5,Carpet Area,"Himayath Nagar, NH 7",Hyderabad,Semi-Furnished,Bachelors/Family,3,Contact Agent
4744,2022-07-06,3,45000,1500,23 out of 34,Carpet Area,Gachibowli,Hyderabad,Semi-Furnished,Family,2,Contact Agent
4745,2022-05-04,2,15000,1000,4 out of 5,Carpet Area,Suchitra Circle,Hyderabad,Unfurnished,Bachelors,2,Contact Owner


In [45]:
#dataset informations
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4746 entries, 0 to 4745
Data columns (total 12 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   Posted On          4746 non-null   object
 1   BHK                4746 non-null   int64 
 2   Rent               4746 non-null   int64 
 3   Size               4746 non-null   int64 
 4   Floor              4746 non-null   object
 5   Area Type          4746 non-null   object
 6   Area Locality      4746 non-null   object
 7   City               4746 non-null   object
 8   Furnishing Status  4746 non-null   object
 9   Tenant Preferred   4746 non-null   object
 10  Bathroom           4746 non-null   int64 
 11  Point of Contact   4746 non-null   object
dtypes: int64(4), object(8)
memory usage: 445.1+ KB


In [46]:
#data preprocessing 2 check whether if there r any empty values
#checking the number of missing values in each column
data.isnull().sum()

Posted On            0
BHK                  0
Rent                 0
Size                 0
Floor                0
Area Type            0
Area Locality        0
City                 0
Furnishing Status    0
Tenant Preferred     0
Bathroom             0
Point of Contact     0
dtype: int64

In [47]:
#statistical measures about the rent
print(f"Mean Rent: {data.Rent.mean()}")
print(f"Median Rent: {data.Rent.median()}")
print(f"Highest Rent: {data.Rent.max()}")
print(f"Lowest Rent: {data.Rent.min()}")

Mean Rent: 34993.45132743363
Median Rent: 16000.0
Highest Rent: 3500000
Lowest Rent: 1200


## Visualization

In [48]:
#printing the rent in different cities acc to BHK
figure = px.bar(data, x=data["City"], 
                y = data["Rent"], 
                color = data["BHK"],
            title="Rent in Different Cities According to BHK")
figure.show()

In [49]:
#printing the rent in different cities acc to Area type
figure = px.bar(data, x=data["City"], 
                y = data["Rent"], 
                color = data["Area Type"],
            title="Rent in Different Cities According to Area Type")
figure.show()

In [50]:
#printing the rent in different cities acc to Furniture status
figure = px.bar(data, x=data["City"], 
                y = data["Rent"], 
                color = data["Furnishing Status"],
            title="Rent in Different Cities According to Furnishing Status")
figure.show()

In [51]:
#printing the rent in different cities acc to Size
figure = px.bar(data, x=data["City"], 
                y = data["Rent"], 
                color = data["Size"],
            title="Rent in Different Cities According to Size")
figure.show()

In [52]:
#printing the no of houses available for rent
cities = data["City"].value_counts()
label = cities.index
counts = cities.values
colors = ['gold','lightgreen']

fig = go.Figure(data=[go.Pie(labels=label, values=counts, hole=0.5)])
fig.update_layout(title_text='Number of Houses Available for Rent')
fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=30,
                  marker=dict(colors=colors, line=dict(color='black', width=3)))
fig.show()

In [53]:
#printing the Preference of Tenant
tenant = data["Tenant Preferred"].value_counts()
label = tenant.index
counts = tenant.values
colors = ['gold','lightgreen']

fig = go.Figure(data=[go.Pie(labels=label, values=counts, hole=0.5)])
fig.update_layout(title_text='Preference of Tenant in India')
fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=30,
                  marker=dict(colors=colors, line=dict(color='black', width=3)))
fig.show()

## Preprocessing

In [54]:
#Now let’s prepare the data for the task of training a deep learning model. 
#Here I will convert all the categorical features into numerical values:
data["Area Type"] = data["Area Type"].map({"Super Area": 1, 
                                           "Built Area": 2,
                                           "Carpet Area": 3})

data["City"] = data["City"].map({"Mumbai": 400000, "Chennai": 600000, 
                                 "Bangalore": 560000, "Hyderabad": 500000, 
                                 "Delhi": 110000, "Kolkata": 700000})

data["Furnishing Status"] = data["Furnishing Status"].map({"Unfurnished": 0, 
                                                           "Semi-Furnished": 1, 
                                                           "Furnished": 2})

data["Tenant Preferred"] = data["Tenant Preferred"].map({"Bachelors": 1,
                                                         "Bachelors/Family": 2,
                                                         "Family": 3})

In [55]:
# printing the first 10 rows of the dataset
data.head(10)

Unnamed: 0,Posted On,BHK,Rent,Size,Floor,Area Type,Area Locality,City,Furnishing Status,Tenant Preferred,Bathroom,Point of Contact
0,2022-05-18,2,10000,1100,Ground out of 2,1,Bandel,700000,0,2,2,Contact Owner
1,2022-05-13,2,20000,800,1 out of 3,1,"Phool Bagan, Kankurgachi",700000,1,2,1,Contact Owner
2,2022-05-16,2,17000,1000,1 out of 3,1,Salt Lake City Sector 2,700000,1,2,1,Contact Owner
3,2022-07-04,2,10000,800,1 out of 2,1,Dumdum Park,700000,0,2,1,Contact Owner
4,2022-05-09,2,7500,850,1 out of 2,3,South Dum Dum,700000,0,1,1,Contact Owner
5,2022-04-29,2,7000,600,Ground out of 1,1,Thakurpukur,700000,0,2,2,Contact Owner
6,2022-06-21,2,10000,700,Ground out of 4,1,Malancha,700000,0,1,2,Contact Agent
7,2022-06-21,1,5000,250,1 out of 2,1,Malancha,700000,0,1,1,Contact Agent
8,2022-06-07,2,26000,800,1 out of 2,3,"Palm Avenue Kolkata, Ballygunge",700000,0,1,2,Contact Agent
9,2022-06-20,2,10000,1000,1 out of 3,3,Natunhat,700000,1,2,2,Contact Owner


In [56]:
# printing the last 10 rows of the dataset
data.tail(10)

Unnamed: 0,Posted On,BHK,Rent,Size,Floor,Area Type,Area Locality,City,Furnishing Status,Tenant Preferred,Bathroom,Point of Contact
4736,2022-06-28,3,15000,1500,Lower Basement out of 2,1,Almasguda,500000,1,3,3,Contact Owner
4737,2022-07-07,3,15000,1500,Lower Basement out of 2,1,Almasguda,500000,1,2,3,Contact Owner
4738,2022-07-06,2,17000,855,4 out of 5,3,"Godavari Homes, Quthbullapur",500000,0,1,2,Contact Agent
4739,2022-07-06,2,25000,1040,2 out of 4,3,Gachibowli,500000,0,1,2,Contact Owner
4740,2022-06-02,2,12000,1350,2 out of 2,1,Old Alwal,500000,0,2,2,Contact Owner
4741,2022-05-18,2,15000,1000,3 out of 5,3,Bandam Kommu,500000,1,2,2,Contact Owner
4742,2022-05-15,3,29000,2000,1 out of 4,1,"Manikonda, Hyderabad",500000,1,2,3,Contact Owner
4743,2022-07-10,3,35000,1750,3 out of 5,3,"Himayath Nagar, NH 7",500000,1,2,3,Contact Agent
4744,2022-07-06,3,45000,1500,23 out of 34,3,Gachibowli,500000,1,3,2,Contact Agent
4745,2022-05-04,2,15000,1000,4 out of 5,3,Suchitra Circle,500000,0,1,2,Contact Owner


In [57]:
#statistical measures about the data
data.describe(include = 'all')

Unnamed: 0,Posted On,BHK,Rent,Size,Floor,Area Type,Area Locality,City,Furnishing Status,Tenant Preferred,Bathroom,Point of Contact
count,4746,4746.0,4746.0,4746.0,4746,4746.0,4746,4746.0,4746.0,4746.0,4746.0,4746
unique,81,,,,480,,2235,,,,,3
top,2022-07-06,,,,1 out of 2,,Bandra West,,,,,Contact Owner
freq,311,,,,379,,37,,,,,3216
mean,,2.08386,34993.45,967.490729,,1.968816,,481860.514117,0.760851,1.924568,1.965866,
std,,0.832256,78106.41,634.202328,,0.999408,,167570.170987,0.684553,0.518366,0.884532,
min,,1.0,1200.0,10.0,,1.0,,110000.0,0.0,1.0,1.0,
25%,,2.0,10000.0,550.0,,1.0,,400000.0,0.0,2.0,1.0,
50%,,2.0,16000.0,850.0,,1.0,,500000.0,1.0,2.0,2.0,
75%,,3.0,33000.0,1200.0,,3.0,,600000.0,1.0,2.0,2.0,


## Splitting the data

### Splitting the data into Features & Targets



In [58]:
#assigning features as X
x = np.array(data[["BHK", "Size", "Area Type", "City", "Furnishing Status", "Tenant Preferred", "Bathroom"]])

#assigning targets as Y
y = np.array(data[["Rent"]])

In [59]:
print(x) #printing the features
print("---------------------------------------------------------------------------------------------------------------------------")
print(y) #printing the targets

[[   2 1100    1 ...    0    2    2]
 [   2  800    1 ...    1    2    1]
 [   2 1000    1 ...    1    2    1]
 ...
 [   3 1750    3 ...    1    2    3]
 [   3 1500    3 ...    1    3    2]
 [   2 1000    3 ...    0    1    2]]
---------------------------------------------------------------------------------------------------------------------------
[[10000]
 [20000]
 [17000]
 ...
 [35000]
 [45000]
 [15000]]


### Splitting the data into Training and Testing

In [60]:
#spliting the dataset in2 Training & Testing

#test size --> 2 specify the percentage of test data needed ==> 0.2 ==> 20%

#random state --> specific split of data each value of random_state splits the data differently, v can put any state v want
#v need 2 specify the same random_state everytym if v want 2 split the data the same way everytym
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.20, random_state = 2)

In [61]:
#checking dimensions of Features
print(x.shape, x_train.shape, x_test.shape)

(4746, 7) (3796, 7) (950, 7)


In [62]:
#checking dimensions of Targets
print(y.shape, y_train.shape, y_test.shape)

(4746, 1) (3796, 1) (950, 1)


## LSTM Deep Learning Model

In [68]:
#now let’s train a house rent prediction model using an LSTM neural network model
#These are the building blocks of neural networks that will learn how to predict x_train based on its input shape (x_train's shape)
#train a model with an LSTM hidden layer and Dense output layer
model = Sequential()

#v add 2 LSTMs with 128 and 64 units respectively 
#The first dimension of the input is x_train, which has a shape of (n_samples, n_features) where n_samples is the number of samples in the training set and n_features is the number of features in each sample
#The second dimension of the input is 1, which means that there are no dimensions
model.add(LSTM(128, return_sequences=True, input_shape = (x_train.shape[1], 1)))
model.add(LSTM(64, return_sequences=False))

#we add two Dense layers with 25 and 1 units respectively which will help us compute our predictions using backpropagation through time (BPTT) later on when we train our network
#Dense Layer is simple layer of neurons in which each neorons receives input frm all the neurons of previous layer
model.add(Dense(25))
model.add(Dense(1))

#we'll c the summary of our model
model.summary()

Model: "sequential_7"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 lstm_14 (LSTM)              (None, 7, 128)            66560     
                                                                 
 lstm_15 (LSTM)              (None, 64)                49408     
                                                                 
 dense_14 (Dense)            (None, 25)                1625      
                                                                 
 dense_15 (Dense)            (None, 1)                 26        
                                                                 
Total params: 117,619
Trainable params: 117,619
Non-trainable params: 0
_________________________________________________________________


In [69]:
#for better accuracy v need 2 scale the dataset
#v choose adam optimizer as it is the best 
model.compile(optimizer='adam', loss='mse')

#v r fitting our train data 2 the model
#fit is where the training actually hppns
#epochs is no of iterations for which ur neural network is gng 2 run the train
#v took epochs as 36 coz it is suggested to take 3 tyms our data column count
#as our data column count is 12 => 12 * 3 = 36 :)
model.fit(x_train, y_train, batch_size=52, epochs=36)

Epoch 1/36
Epoch 2/36
Epoch 3/36
Epoch 4/36
Epoch 5/36
Epoch 6/36
Epoch 7/36
Epoch 8/36
Epoch 9/36
Epoch 10/36
Epoch 11/36
Epoch 12/36
Epoch 13/36
Epoch 14/36
Epoch 15/36
Epoch 16/36
Epoch 17/36
Epoch 18/36
Epoch 19/36
Epoch 20/36
Epoch 21/36
Epoch 22/36
Epoch 23/36
Epoch 24/36
Epoch 25/36
Epoch 26/36
Epoch 27/36
Epoch 28/36
Epoch 29/36
Epoch 30/36
Epoch 31/36
Epoch 32/36
Epoch 33/36
Epoch 34/36
Epoch 35/36
Epoch 36/36


<keras.callbacks.History at 0x7f99edf02550>

In [70]:
#lets c the loss on our test dataset
model.evaluate(x_test, y_test)



15240175616.0

## Predictive Model

In [74]:
#Prediction Model
print("Enter House Details to Predict Rent")
print('----------------------------------------')

a = int(input("Number of BHK: "))
b = int(input("Size of the House in Sqrt: "))
c = int(input("Area Type (Super Area = 1, Built Area = 2, Carpet Area = 3): "))
d = int(input("Pin Code of the City: "))
e = int(input("Furnishing Status of the House (Unfurnished = 0, Semi-Furnished = 1, Furnished = 2): "))
f = int(input("Tenant Type (Bachelors = 1, Bachelors/Family = 2, Only Family = 3): "))
g = int(input("Number of bathrooms: "))

features = np.array([[a, b, c, d, e, f, g]])

print("Predicted House Price = ", model.predict(features))

Enter House Details to Predict Rent
----------------------------------------
Number of BHK: 2
Size of the House in Sqrt: 1000
Area Type (Super Area = 1, Built Area = 2, Carpet Area = 3): 2
Pin Code of the City: 520002
Furnishing Status of the House (Unfurnished = 0, Semi-Furnished = 1, Furnished = 2): 1
Tenant Type (Bachelors = 1, Bachelors/Family = 2, Only Family = 3): 3
Number of bathrooms: 2
Predicted House Price =  [[18974.893]]


## Summary
So this is how to use Machine Learning to predict the rent of a housing property. With appropriate data and Machine Learning techniques, many real estate platforms find the housing options according to the customer’s budget.