# Prediction Renting Price Home %99 Accuracy

In this data contains to price renting home and some feature of homes in Brazil. In this kernel I'm going to predict renting price of home in Brazil. For this I'm going to use regression algorithms. Regression algorithms are pretty successful in price predictions. 

### About Dataset

* There are 13 feature and 10692 data points in this data. 
* 4 features are object.
* 9 features are int

The features in the data:

1. city - city where home is found
1. area - area of house
1. rooms - number of rooms
1. bathroom - number of bathrooms
1. parking spaces - number of parking spaces
1. floor - floor 
1. animal - acept animals or not
1. furniture - furnish or not
1. hoa - homeowners association tax
1. property tax - property tax
1. rent amount - rental price
1. fire insurance - fire insurance
1. total - total value

## CONTENT

[1. Libraries](#1) <br/>
[2. Exploratory Data Analysis](#2) <br/>
[3. Cleaning Data](#3) <br/>
[4. Train Test Split Data](#4) <br/>
[5. Create Model](#5) <br/>
[6. Conclusions](#6) <br/>

<a id="1"></a>
## Libraries

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Data visualization
import seaborn as sns 
import matplotlib.pyplot as plt

# model selection and metrics
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

# Model libraries
from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

<a id="2"></a>
## Exploratory Data Analysis

In [None]:
data = pd.read_csv("/kaggle/input/brasilian-houses-to-rent/houses_to_rent_v2.csv")
data.head()

In [None]:
data.info()

In [None]:
sns.countplot(data["city"])
plt.show()

In [None]:
sns.countplot(data["animal"])
plt.show()

In [None]:
sns.countplot(data["furniture"])
plt.show()

In [None]:
sns.countplot(data["rooms"])
plt.show()

In [None]:
sns.countplot(data["bathroom"])
plt.show()

In [None]:
sns.countplot(data["parking spaces"])
plt.show()

In [None]:
f,ax = plt.subplots(figsize=(8, 8))
sns.countplot(data["floor"])
plt.show()

<a id="3"></a>
## Cleaning Data

Floor feature is missing value. Also we need to convert object datas to numerical datas in order to use our model. 

* I filled missing value in floor feature with mean. 

* I replaced object datas with numberical.

**animal**: accept-1, not accept-0 <br/>
**furniture**: furnished-1,not furnished-0 <br/>
**city**: São Paulo-1,Porto Alegre-2,Rio de Janeiro-3,Campinas-4,Belo Horizonte-5 <br/>

In [None]:
data["floor"] = data["floor"].replace("-",np.nan)
data["floor"] = pd.to_numeric(data["floor"])

print(np.mean(data["floor"])) # mean

data["floor"] = data["floor"].fillna(int(np.mean(data["floor"])))

data["animal"] = data["animal"].replace("acept",1)
data["animal"] = data["animal"].replace("not acept",0)
data["furniture"] = data["furniture"].replace("furnished",1)
data["furniture"] = data["furniture"].replace("not furnished",0)

cities = {"São Paulo" : 1, "Porto Alegre" : 2, "Rio de Janeiro": 3, "Campinas":4, "Belo Horizonte":5} 
data["city"] = data["city"].replace(cities)

In [None]:
f,ax = plt.subplots(figsize=(12, 12))
sns.heatmap(data.corr(),annot=True, linewidths=.5, fmt= '.1f',ax=ax)
plt.show()

<a id="4"></a>
## Train Test Split Data

I splited 20 percentage of dataset as test data. 

In [None]:
X = data.drop("total (R$)",axis=1)
Y = data.loc[:,"total (R$)"]
x_train, x_test, y_train, y_test = train_test_split(X,Y,test_size=0.2)
print(x_train.shape[0])
print(y_train.shape[0])
print(x_test.shape[0])
print(y_test.shape[0])

<a id="5"></a>
## Create Model

### Multiple Linear Regression

In [None]:
linear_model = LinearRegression()
linear_model.fit(x_train,y_train)
linear_model_predict = linear_model.predict(x_test)
print("Score: ",r2_score(linear_model_predict,y_test))

### Ridge Regression

In [None]:
ridge_model = Ridge()
ridge_model.fit(x_train,y_train)
ridge_model_predict = ridge_model.predict(x_test)
print("Score: ",r2_score(ridge_model_predict,y_test))

### Lasso Regression

In [None]:
lasso_model = Lasso()
lasso_model.fit(x_train,y_train)
lasso_model_predict = lasso_model.predict(x_test)
print("Score: ",r2_score(lasso_model_predict,y_test))

### Elastic Net Regression

In [None]:
elasticnet_model = ElasticNet()
elasticnet_model.fit(x_train,y_train)
elasticnet_model_predict = elasticnet_model.predict(x_test)
print("Score: ",r2_score(elasticnet_model_predict,y_test))

<a id="6"></a>
# Conclusions

Multiple Linear, Ridge, Lasso and Elastic Net Regression algorithms achive successful results in this dataset. So I used to Multiple Linear model in the code section below. You can use other algorthms(Ridge, Lasso, Elastic Net etc.) for this data. This algorithms achive the best results in multiple features. 

Thanks for reading my kernel. I'm new to Machine Learning. Your comment is so important to improving myself. If you write comment section your question and comment. I will be pleased. :) If you click upvote, it will be my motivation. :)

In [None]:
predict_price = pd.DataFrame({'Actual Price':y_test,'Prediction Price':linear_model_predict})
predict_price.to_csv('submission.csv', index=False)