<font size = 40 color=darkgreen>Dutch Rental Prices</font><br>
*Both my friends and myself have often asked the question, 'How much rent could I charge for my apartment?' The question becomes very complex as we start assuming that location, size, amenities, furnished, and many more attributes need to be considered when deciding on a rental price. My goal for this project is to build a model that will weigh all the attributes available using data from a popular rental website in the Netherlands, Kamernet. <BR>  
*Creating a regression model to predict the rental prices in the Netherlands using data from kamernet.nl*<br>
**Steps:**<br>
1. Importing libraries and reading data
2. EDA and re-shaping data for ML pre-processing 
3. PyCaret setup
4. Adjust base data, repeat compare and create steps

# <font color=teal>Import data</font>

In [1]:
# Data manipulation
import pandas as pd
import numpy as np

# ML libraries
import pycaret.regression as py
from pycaret.regression import *

# Options for pandas
pd.options.display.max_columns
pd.options.display.max_rows = 30

# Visualizations
from matplotlib import pyplot as plt
import seaborn as sns
import missingno as msno
import chart_studio.plotly as pl
import plotly.graph_objs as go
from plotly.offline import iplot, init_notebook_mode

In [2]:
# Reading json data from Kamernet

#Git
source = 'https://github.com/michael-william/Netherlands-Rental-Prices/raw/master/properties-trim.json'
df=pd.read_json(source, lines=True)


# <font color=teal>EDA and reshaping data</font>

> **<font color=brown>Overview</font>**
> 1. Dropping non-needed columns
> 2. Missing values
> 3. Cardinality of categorical values

In [5]:
# Creating copy of main df
data = df[['areaSqm', 'furnish', 'latitude', 'longitude', 'propertyType', 'roommates','rent']]

In [6]:
# converting 'roommates' feature to 'int' and creating a new binary feature called 'shared'
data['roommates'] = data.roommates.replace(to_replace='None', value=0)
data['roommates'] = data.roommates.replace(to_replace='More than 8', value=9)
data['roommates'] = data.roommates.replace(to_replace='Unknown', value=0)
data['roommates'] = data.roommates.replace(to_replace='nan', value=0)
data['roommates'] = data.roommates.fillna(0)
data['roommates'] = data.roommates.astype('int')
data['shared'] = [1 if x>0 else 0 for x in data.roommates]



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/

In [7]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12830 entries, 0 to 12829
Data columns (total 8 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   areaSqm       12830 non-null  int64  
 1   furnish       12830 non-null  object 
 2   latitude      12830 non-null  float64
 3   longitude     12830 non-null  float64
 4   propertyType  12830 non-null  object 
 5   roommates     12830 non-null  int64  
 6   rent          12830 non-null  int64  
 7   shared        12830 non-null  int64  
dtypes: float64(2), int64(4), object(2)
memory usage: 802.0+ KB


In [16]:
py.load_model('/Users/michaelcondon/Documents/GitHub/Amsterdam rentals/Netherlands-Rental-Prices/dutch_pycaret_rf')

Transformation Pipeline and Model Sucessfully Loaded


[Pipeline(memory=None,
          steps=[('dtypes',
                  DataTypes_Auto_infer(categorical_features=[],
                                       display_types=True,
                                       features_todrop=['toilet', 'shower',
                                                        'living', 'kitchen'],
                                       ml_usecase='regression',
                                       numerical_features=[], target='rent',
                                       time_features=[])),
                 ('imputer',
                  Simple_Imputer(categorical_strategy='not_available',
                                 numeric_strategy='mean',
                                 target_variable=None)),
                 ('ne...
                 ('group', Empty()), ('nonliner', Empty()), ('scaling', Empty()),
                 ('P_transform', Empty()), ('pt_target', Empty()),
                 ('binn', Empty()), ('rem_outliers', Empty()),
                 ('c

In [20]:
label_df = predict_model('dutch_pycaret_rf', data=data)

In [21]:
label_df

Unnamed: 0,areaSqm,furnish,latitude,longitude,propertyType,roommates,rent,shared,Label
0,14,Unfurnished,51.896601,4.514993,Room,5,500,1,494.2500
1,30,Furnished,52.370200,4.920721,Studio,0,950,0,1021.1833
2,11,Furnished,52.350880,4.854786,Room,1,1000,1,940.1000
3,16,Unfurnished,53.013494,6.561012,Room,4,290,1,307.3700
4,22,Unfurnished,51.932871,4.479732,Room,1,475,1,459.7200
...,...,...,...,...,...,...,...,...,...
12825,35,Unfurnished,52.368030,5.205308,Room,1,750,1,726.0500
12826,10,Furnished,51.690365,5.311952,Room,1,600,1,500.3500
12827,12,Unfurnished,52.010965,4.336536,Student residence,4,395,1,414.0500
12828,16,Uncarpeted,52.008520,4.390628,Room,2,425,1,458.5500


In [24]:
columns = data.columns