<a href="https://colab.research.google.com/github/gordrick/rent-prediction-model/blob/master/StreetEasy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Imports
Import all the modules needed for the analysis.

In [None]:
!pip install -q seaborn

In [None]:
!pip install git+https://github.com/tensorflow/docs

In [None]:
%tensorflow_version 2.x

In [None]:
from __future__ import absolute_import, division, print_function, unicode_literals

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import clear_output
from six.moves import urllib

import tensorflow.compat.v2.feature_column as fc

import tensorflow as tf

from tensorflow import keras
from tensorflow.keras import layers

In [None]:
import tensorflow_docs as tfdocs
import tensorflow_docs.plots
import tensorflow_docs.modeling

# Load The Data
Load the StreetEasy data from the csv file using pandas read_csv method.

In [None]:
streeteasy = pd.read_csv('https://raw.githubusercontent.com/Codecademy/datasets/master/streeteasy/streeteasy.csv')
df = streeteasy[['bedrooms', 'bathrooms', 'rent', 'size_sqft', 'min_to_subway', 'floor', 'building_age_yrs', 'no_fee', 'has_roofdeck', 'has_washer_dryer', 'has_doorman', 'has_elevator', 'has_dishwasher', 'has_patio', 'has_gym']]

# Split the data to training and testing sets


In [None]:
train = df.sample(frac=0.8, random_state=0)
test = df.drop(train.index)

# Inspect the data

In [None]:
train_stats = train.describe()
train_stats.pop('rent')
train_stats = train_stats.transpose()
train_stats

# Split features from labels
Separate the label (rent) from the features. We are training the model to predict the rent.

In [None]:
train_labels = train.pop('rent')
test_labels = test.pop('rent')

# Normalize the data
Normalize features that use different scales and ranges.

In [None]:
def norm(x):
  return (x - train_stats['mean']) / train_stats['std']
normed_train_data = norm(train)
normed_test_data = norm(test)

# Building the Model

In [None]:
def build_model():
  model = keras.Sequential([
    layers.Dense(64, activation='relu', input_shape=[len(train.keys())]),
    layers.Dense(64, activation='relu'),
    layers.Dense(1)
  ])

  optimizer = tf.keras.optimizers.RMSprop(0.001)

  model.compile(loss='mse',
                optimizer=optimizer,
                metrics=['mae', 'mse'])
  return model

In [None]:
model = build_model()

# Training the Model
Training the model and set call to automatically stop training when the validation score doesn't improve.

In [None]:
EPOCHS =1000
early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=20)

early_history = model.fit(normed_train_data, train_labels, 
                    epochs=EPOCHS, validation_split = 0.2, verbose=0, 
                    callbacks=[early_stop, tfdocs.modeling.EpochDots()])

Early stopping is used to prevent overfitting.

In [None]:
plotter = tfdocs.plots.HistoryPlotter(smoothing_std=2)

In [None]:
plotter.plot({'Early Stopping': early_history}, metric = "mae")
plt.ylim(500, 2000)
plt.ylabel('MAE [RENT]')

In [None]:
plotter.plot({'Early Stopping': early_history}, metric = "mse")
plt.ylim([1000000, 10000000])
plt.ylabel('MSE [RENT^2]')

# Make predictions
Use the trained model to make predictions on some data.

In [None]:
test_predictions = model.predict(normed_test_data).flatten()

a = plt.axes(aspect='equal')
plt.scatter(test_labels, test_predictions)
plt.xlabel('True Values [RENT]')
plt.ylabel('Predictions [RENT]')
lims = [500, 20000]
plt.xlim(lims)
plt.ylim(lims)
_ = plt.plot(lims, lims)


Check the distrubution of error.

In [None]:
error = test_predictions - test_labels
plt.hist(error, bins = 25)
plt.xlabel("Prediction Error [RENT]")
_ = plt.ylabel("Count")

Making new predictions with a set of data points using the model.
The data set needs to be provided in the right format, normalized then passed to the model that will return a prediction.
Example

In [None]:
def predict(x):
  df_object = pd.DataFrame(x, columns=['bedrooms', 'bathrooms',  'size_sqft', 'min_to_subway', 'floor', 'building_age_yrs', 'no_fee', 'has_roofdeck', 'has_washer_dryer', 'has_doorman', 'has_elevator', 'has_dishwasher', 'has_patio', 'has_gym'])
  train_stats = df_object.describe()
  train_stats = train_stats.transpose()
  normed_data = norm(df_object)
  return model.predict(normed_data).flatten()

The predict function takes in a list of values representing a listing(s) data set and returns a an array with the predicted rent.