# Predicting house prices with neural networks

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

## Data

### loading the data

In [None]:
raw_data = pd.read_csv("data/train.csv")
raw_data.head()

In [None]:
raw_data.shape

### Extracting the numeric columns

In [None]:
raw_data.dtypes

In [None]:
numeric_columns = list(raw_data.columns[(raw_data.dtypes==np.int64) |
                 (raw_data.dtypes==np.float64)])
print(numeric_columns, "\n", len(numeric_columns))

Set `SalesPrice` as the last index, since it is the value we want to predict.

In [None]:
numeric_columns.remove('SalePrice')
numeric_columns.append('SalePrice')

We do not need the `Id` column.

In [None]:
numeric_columns.remove('Id')

Now we extract the numeric data.

In [None]:
numeric_data = raw_data[numeric_columns]
numeric_data.head()

Now let's deal with the missing values in the data.

In [None]:
nan_columns = np.any(pd.isna(numeric_data), axis = 0)
nan_columns = list(nan_columns[nan_columns == True].index)
nan_columns

We simply replace them with zero.

In [None]:
numeric_data['LotFrontage'] = numeric_data['LotFrontage'].fillna(0)
numeric_data['MasVnrArea'] = numeric_data['MasVnrArea'].fillna(0)
numeric_data['GarageYrBlt'] = numeric_data['GarageYrBlt'].fillna(0)

let's split the data for training and test!

In [None]:
from sklearn.model_selection import train_test_split
numeric_data_train, numeric_data_test = train_test_split(numeric_data, test_size=0.1)

### Normalizing the data
Before training our linear regression model, we have to normalize the data. We do this by subtracting each column from its minimum value and then dividing it by the difference between maximum and minimum.

In [None]:
# saving max, min for each column
maxs, mins = dict(), dict()

In [None]:
for col in numeric_data:
    maxs[col] = numeric_data_train[col].max()
    mins[col] = numeric_data_train[col].min()

In [None]:
numeric_data_train = (numeric_data_train - numeric_data_train.min()) / (numeric_data_train.max() - numeric_data_train.min())

## Building a Linear Regression model

In [None]:
import torch
import torch.nn as nn

In [None]:
numeric_x_columns = list(numeric_data_train.columns)
numeric_x_columns.remove("SalePrice")
X_train_df = numeric_data_train[numeric_x_columns]
y_train_df = pd.DataFrame(numeric_data_train["SalePrice"])

Now we have to convert the data into torch tensors. A `torch.Tensor` is a multi-dimensional matrix containing elements of a single data type. It's very similar to arrays in `NumPy`.

In [None]:
X_train = torch.tensor(X_train_df.values, dtype=torch.float)
y_train = torch.tensor(y_train_df.values, dtype=torch.float)

In [None]:
print(X_train.size(), y_train.size())

### Defining a model with pytorch
A model is always defined as a class in pytorch. It should have a `__init__` function in which you define the layers of your network. It also should have a `forward` function (method) that basically defines the forward pass on the network.

For the beggining, let's start with a single layer network.

In [None]:
class Net(nn.Module):
    def __init__(self, D_in, H1, D_out):
        super(Net, self).__init__()
        
        self.linear1 = nn.Linear(D_in, H1)
        self.linear2 = nn.Linear(H1, D_out)
        self.activation = nn.ReLU()
        
    def forward(self, x):
        y_pred = self.activation(self.linear1(x))
        y_pred = self.linear2(y_pred)
        return y_pred

In [None]:
D_in, D_out = X_train.shape[1], y_train.shape[1]

In [None]:
# defining the first model: an instance of the class "Net"
model1 = Net(D_in, 500, D_out)

The next steps is to define the __loss criterion__ and the __optimizer__ for the network. That is, we have to define the loss function we want to optimize during training and also the optimization method we are going to use, e.g, SGD, etc.

In [None]:
# MSE loss
criterion = nn.MSELoss(reduction='sum')
# SGD optimizer for finding the weights of the network
optimizer = torch.optim.SGD(model1.parameters(), lr=1e-4)

Now, we are ready to do the training. We can simply do this by a for loop over the number of iterations. The training has 3 main steps:
- A forward pass to compute the prediction for the current data point (batch).
- computing the loss for the current prediction.
- A backward pass to compute the gradient of the loss with respect to the weight of the network.
- Finaly, updating the weights of the network (`optimizer.step()`).

Note that in each backward pass pytorch saves the gradient for all of the parameters. Therefore it is important to replace the old gradient values with zero in the beggining of each iteration, otherwise the gradients will be accumulated during the iterations!

In [None]:
losses1 = []

for t in range(500):
    y_pred = model1(X_train)
    
    loss = criterion(y_pred, y_train)
    print(t, loss.item())
    losses1.append(loss.item())
    
    if torch.isnan(loss):
        break
    
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Now let's try a new model with more neurons in the hidden layer.

In [None]:
model2 = Net(D_in, 1000, D_out)

In [None]:
# MSE loss
criterion = nn.MSELoss(reduction='sum')
# SGD optimizer for finding the weights of the network
optimizer = torch.optim.SGD(model2.parameters(), lr=1e-4)

In [None]:
losses2 = []

for t in range(500):
    y_pred = model2(X_train)
    
    loss = criterion(y_pred, y_train)
    print(t, loss.item())
    losses2.append(loss.item())
    
    if torch.isnan(loss):
        break
    
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

In [None]:
plt.plot(losses1, label="model1")
plt.plot(losses2, label="model2")
plt.ylim([0, 70])
plt.legend()

Let's compare the MSE loss on the test data

In [None]:
# we need to normalize the test data with the min and max value
# from the training data
for col in numeric_data_test.columns:
    numeric_data_test[col] = (numeric_data_test[col] - mins[col]) / (maxs[col] - mins[col])

In [None]:
y_test_df = pd.DataFrame(numeric_data_test["SalePrice"])
y_test = torch.tensor(y_test_df.values, dtype=torch.float)
x_test_df = numeric_data_test[numeric_x_columns]
x_test = torch.tensor(x_test_df.values, dtype=torch.float)
# prediction for model 1
model1_pred = model1(x_test)
print("MSE loss for model1: ", criterion(model1_pred, y_test))
# prediction for model 2
model2_pred = model2(x_test)
print("MSE loss for model2: ", criterion(model2_pred, y_test))


## Now it is your turn!
### Exercises

1- Let's get back to model1. This time try to train it with a new optimizer. Try the Adam optimizer (which has shown to be faster than SGD for non-convex functions) and compare the trainig loss curve with SGD. Plot the training loss for the model trained with SGD and Adam optimizer.

Note1: Use `torch.optim.Adam(model1.parameters(), lr=...)`

Note2: If you are interested, check [this nice post](https://ruder.io/optimizing-gradient-descent/index.html) on differen gradient descent optimization algorithms.

2- This time we want to build a new model with a new architecture. Specifically, we want to train a network with 3 hidden layers on the data. You can use the following code to build the architecture. Use the values 500, 1000, 200 for H1, H2, and H3 respectively. Train this new network on the same training data and compare it with the model1 we built above.

```
class Net_new(nn.Module):
    def __init__(self, D_in, H1, H2, H3, D_out):
        super(Net_new, self).__init__()

        self.linear1 = nn.Linear(D_in, H1)
        self.linear2 = nn.Linear(H1, H2)
        self.linear3 = nn.Linear(H2, H3)
        self.linear4 = nn.Linear(H3, D_out)

    def forward(self, x):
        y_pred = self.linear1(x).clamp(min=0)
        y_pred = self.linear2(y_pred).clamp(min=0)
        y_pred = self.linear3(y_pred).clamp(min=0)
        y_pred = self.linear4(y_pred)
        return y_pred
```