# Homework 7: Designing and Training Neural Networks

In [1]:
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_boston
from sklearn.dummy import DummyRegressor
from sklearn.metrics import mean_squared_error
import torch
import torch.nn as nn
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
from tqdm.notebook import tqdm
import wandb

For this homework, we will use the Boston Housing Dataset, that we split in three: training, validation, and test set.

In [2]:
X, y = load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X,y)
X_train, X_val, y_train, y_val = train_test_split(X_train,y_train)

As a reference, let's see what MSE we get using a dummy regressor. To get your point for this homework, you should do better than this reference.

In [3]:
dummy = DummyRegressor().fit(X_train,y_train)
mean_squared_error(dummy.predict(X_test),y_test)

102.06726556313036

It will make our life much simpler to wrap the dataset as a torch one because it was allow us to create a dataloader which will take care of batching. On top of this, it is super easy to do. Make sure to read this carefully.

In [4]:
class Dataset(Dataset):
    def __init__(self,X,y):
        self.X = torch.tensor(X,dtype=torch.float)
        self.y = torch.tensor(y,dtype=torch.float)
        
    def __len__(self):
        return len(self.y)

    def __getitem__(self, idx):
        return self.X[idx], self.y[idx]

In [5]:
train_dataset = Dataset(X_train,y_train)

In [6]:
batch_size=32

Here we go, now that we have a proper torch dataset, we can create a data loader for the training set. If the dataset was truly large, we should also create dataloaders for the validation and the test set.

In [9]:
train_dataloader = DataLoader(train_dataset, batch_size=batch_size)

**Problem 1**: design a neural network model. Your model should have at two hidden layers. You are free to choose the non-linearities. You might consider using dropout layers, and the Sequential container to simplify your model specification. If you want to try other types of layers, go for it! 

In [13]:
class Model(nn.Module):
    def __init__(self,size):
        super(Model, self).__init__()
        self.layers = torch.nn.Sequential(
            nn.Linear(13,size),
            nn.Dropout(p=0.5),
            nn.ELU(),
            nn.Linear(size,size),
            nn.Dropout(p=0.5),
            nn.ELU(),
            nn.Linear(size,1),
          )
        
    def forward(self,x):
        return self.layers(x)

In [14]:
model = Model(100)

**Problem 2**: Implement an algorithm to train your model. A few hints:<br>
<ul>
<li> This function will likely be over 30 lines of code, so plan ahead.
<li> First, you should create an optimizer. You can pick whichever you want. One of the most popular one is the Adam optimizer, but there are many others: https://pytorch.org/docs/stable/optim.html
<li> If you decide to use wandb, you should remember to initialize it before the iterations starts.
<li> Think ahead about how you are going to keep track of the train and validation loss!
<li> The rest of the code is one big loop that will iterate epochs times 
<li> The body of the loop has two main tasks:
<ul>
    <li> First, the actual training. This is where you should use your dataloader to do the training by batches. That means you will have a loop that does the usual training procedure. 
    <li> Second, you should compute the validation loss. For simplicity, you can do it using sklearn.
    <li> Finally, make sure to keep track of the current model if it turns out to work well on the validation set!  
</ul>
<li> It is very likely that training the model will take a while. Not only because you will need to figure out what are the right parameters (e.g. learning rate of your optimizer, rate of dropout, size of the hidden layers), but also because errors in your code may go unnoticed. 
</ul>

In [37]:
def train(epochs):
    val_dataset = Dataset(X_val,y_val)
    val_dataloader = DataLoader(val_dataset, batch_size=32)
    
    optimizer = torch.optim.Adam(model.parameters(),lr=0.0001)
    
    best_score = None
    
    for epoch in tqdm(range(epochs)):
        # TRAIN
        model.train()
        running_loss = 0.
        
        for batch_x_train, batch_y_train in train_dataloader:
            optimizer.zero_grad()
            y_pred = model(batch_x_train)
            #print(y_pred.reshape(-1).shape)
            #print(batch_y_train.reshape(-1,1).shape)
            mse = ((y_pred.reshape(-1) - batch_y_train)**2).sum()
            running_loss += mse.item()
            mse.backward()
            optimizer.step()
            
        running_loss /= len(train_dataset)
        
        # EVAL
        model.eval()
        val_score = 0.
        
        for batch_x_val, batch_y_val in val_dataloader:
            optimizer.zero_grad()
            y_pred = model(batch_x_val)
            mse = ((y_pred.reshape(-1) - batch_y_val)**2).sum()
            val_score += mse.item()
            mse.backward()
            optimizer.step()
        
        val_score /= len(val_dataset)
        
        if not best_score:
            best_score = val_score
            torch.save(model, 'best-model.pt') 
        if val_score < best_score:
            best_score = val_score
            torch.save(model, 'best-model.pt')
        
    print("Train loss: ", running_loss, "Validation loss: ", val_score, "Best Validation loss: ", best_score)

In [38]:
train(100)

  0%|          | 0/100 [00:00<?, ?it/s]

Train loss:  74.46068863129952 Validation loss:  83.43913060238486 Best Validation loss:  82.78660310444079


**Problem 3**: Write a function that evaluates your model on the test set.

In [39]:
def test():
    # TEST
    test_dataset = Dataset(X_test,y_test)
    test_dataloader = DataLoader(test_dataset, batch_size=32)
    
    optimizer = torch.optim.Adam(model.parameters(),lr=0.0001)
    
    model.eval()
    test_score = 0.

    for batch_x_test, batch_y_test in test_dataloader:
        optimizer.zero_grad()
        y_pred = model(batch_x_test)
        mse = ((y_pred.reshape(-1) - batch_y_test)**2).sum()
        test_score += mse.item()
        mse.backward()
        optimizer.step()

    test_score /= len(test_dataset)
    return test_score        

In [40]:
test()

61.61863888718012