<font size = 40 color=darkgreen>Dutch Rental Prices</font><br>
*Both my friends and myself have often asked the question, 'How much rent could I charge for my apartment?' The question becomes very complex as we start assuming that location, size, amenities, furnished, and many more attributes need to be considered when deciding on a rental price. My goal for this project is to build a model that will weigh all the attributes available using data from a popular rental website in the Netherlands, Kamernet. <BR>  
*Creating a regression model to predict the rental prices in the Netherlands using data from kamernet.nl*<br>
**Steps:**<br>
1. Importing libraries and reading data
2. EDA and re-shaping data for ML pre-processing 
3. PyCaret setup
4. Adjust base data, repeat compare and create steps

# <font color=teal>Import data</font>

In [2]:
# Data manipulation
import pandas as pd
import numpy as np

# ML libraries
import pycaret.regression as py
from pycaret.regression import *

# Options for pandas
pd.options.display.max_columns
pd.options.display.max_rows = 30

# Visualizations
from matplotlib import pyplot as plt
import seaborn as sns
import missingno as msno
import chart_studio.plotly as pl
import plotly.graph_objs as go
from plotly.offline import iplot, init_notebook_mode

In [3]:
# Reading json data from Kamernet

#Git
source = 'https://github.com/michael-william/Netherlands-Rental-Prices/raw/master/properties-trim.json'
df=pd.read_json(source, lines=True)


# <font color=teal>EDA and reshaping data</font>

> **<font color=brown>Overview</font>**
> 1. Dropping non-needed columns
> 2. Missing values
> 3. Cardinality of categorical values

In [59]:
# Creating copy of main df
data = df[['areaSqm', 'latitude', 'longitude', 'propertyType', 'roommates','rent']]

In [60]:
# converting 'roommates' feature to 'int' and creating a new binary feature called 'shared'
data['roommates'] = data.roommates.replace(to_replace='None', value=0)
data['roommates'] = data.roommates.replace(to_replace='More than 8', value=9)
data['roommates'] = data.roommates.replace(to_replace='Unknown', value=0)
data['roommates'] = data.roommates.replace(to_replace='nan', value=0)
data['roommates'] = data.roommates.fillna(0)
data['roommates'] = data.roommates.astype('int')
data['shared'] = [1 if x>0 else 0 for x in data.roommates]
data.drop('roommates', axis=1, inplace=True)

In [61]:
data.to_csv("ml_data.csv")

In [31]:
py.setup(data, target='rent', silent=True)

 
Setup Succesfully Completed!


Unnamed: 0,Description,Value
0,session_id,3658
1,Transform Target,False
2,Transform Target Method,
3,Original Data,"(12830, 8)"
4,Missing Values,False
5,Numeric Features,3
6,Categorical Features,4
7,Ordinal Features,False
8,High Cardinality Features,False
9,High Cardinality Method,


(       areaSqm   latitude  longitude  furnish_Furnished  \
 0         14.0  51.896601   4.514993                0.0   
 1         30.0  52.370200   4.920721                1.0   
 2         11.0  52.350880   4.854786                1.0   
 3         16.0  53.013494   6.561012                0.0   
 4         22.0  51.932871   4.479732                0.0   
 ...        ...        ...        ...                ...   
 12825     35.0  52.368030   5.205308                0.0   
 12826     10.0  51.690365   5.311952                1.0   
 12827     12.0  52.010965   4.336536                0.0   
 12828     16.0  52.008520   4.390628                0.0   
 12829     21.0  51.906095   4.445226                0.0   
 
        propertyType_Anti-squat  propertyType_Apartment  propertyType_Room  \
 0                          0.0                     0.0                1.0   
 1                          0.0                     0.0                0.0   
 2                          0.0             

In [51]:
rf_new = py.create_model('rf', fold=5, verbose=False)

In [50]:
rf = finalize_model(rf_new)

In [39]:
label_df = predict_model('dutch_pycaret_rf', data=data)

In [53]:
test = df.iloc[1]

In [54]:
test

areaSqm                30
furnish         Furnished
latitude          52.3702
longitude         4.92072
propertyType       Studio
roommates               0
rent                  950
shared                  0
Name: 1, dtype: object

In [55]:
py.predict_model(test)

AttributeError: 'Series' object has no attribute 'predict'

In [41]:
prediction = predict_model('dutch_pycaret_rf', data=data)['Label']

In [43]:
prediction

0         474.5200
1        1052.8667
2         905.5000
3         305.4433
4        1300.7800
           ...    
12825     705.1400
12826     498.0000
12827     416.8600
12828     447.3083
12829     768.0700
Name: Label, Length: 12830, dtype: float64

In [48]:
df = pd.read_csv(r'https://github.com/michael-william/Netherlands-Rental-Prices/raw/master/ml_data.csv', index_col=0)

In [65]:
data.head()

Unnamed: 0,areaSqm,latitude,longitude,propertyType,rent,shared
0,14,51.896601,4.514993,Room,500,1
1,30,52.3702,4.920721,Studio,950,0
2,11,52.35088,4.854786,Room,1000,1
3,16,53.013494,6.561012,Room,290,1
4,22,51.932871,4.479732,Room,475,1
