<img width="800px" src="../fidle/img/header.svg"></img>


# <!-- TITLE --> [PBHPD1] - Regression with a Dense Network (DNN)
<!-- DESC --> A Simple regression with a Dense Neural Network (DNN) using Pytorch - BHPD dataset
<!-- AUTHOR : Jean-Luc Parouty (CNRS/SIMaP), Laurent Risser (CNRS/IMT) -->

## Objectives :
 - Predicts **housing prices** from a set of house features. 
 - Understanding the **principle** and the **architecture** of a regression with a **dense neural network**  


The **[Boston Housing Dataset](https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html)** consists of price of houses in various places in Boston.  
Alongside with price, the dataset also provide theses informations : 

 - CRIM: This is the per capita crime rate by town
 - ZN: This is the proportion of residential land zoned for lots larger than 25,000 sq.ft
 - INDUS: This is the proportion of non-retail business acres per town
 - CHAS: This is the Charles River dummy variable (this is equal to 1 if tract bounds river; 0 otherwise)
 - NOX: This is the nitric oxides concentration (parts per 10 million)
 - RM: This is the average number of rooms per dwelling
 - AGE: This is the proportion of owner-occupied units built prior to 1940
 - DIS: This is the weighted distances to five Boston employment centers
 - RAD: This is the index of accessibility to radial highways
 - TAX: This is the full-value property-tax rate per 10,000 dollars
 - PTRATIO: This is the pupil-teacher ratio by town
 - B: This is calculated as 1000(Bk — 0.63)^2, where Bk is the proportion of people of African American descent by town
 - LSTAT: This is the percentage lower status of the population
 - MEDV: This is the median value of owner-occupied homes in 1000 dollars
## What we're going to do :

 - Retrieve data
 - Preparing the data
 - Build a model
 - Train the model
 - Evaluate the result


## Step 1 - Import and init

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable


import numpy as np
import matplotlib.pyplot as plt
import sys,os

import pandas as pd

from modules.fidle_pwk_additional import convergence_history_MSELoss

import fidle

# Init Fidle environment
run_id, run_dir, datasets_dir = fidle.init('PBHPD1')


## Step 2 - Retrieve data


Boston housing is a famous historic dataset, which can be get here: [Boston housing datasets](https://www.kaggle.com/puxama/bostoncsv)  

In [None]:
data = pd.read_csv('./BostonHousing.csv', header=0)

display(data.head(5).style.format("{0:.2f}").set_caption("Few lines of the dataset :"))
print('Missing Data : ',data.isna().sum().sum(), '  Shape is : ', data.shape)

## Step 3 - Preparing the data
### 3.1 - Split data
We will use 70% of the data for training and 30% for validation.  
The dataset is **shuffled** and shared between **learning** and **testing**.  
x will be input data and y the expected output

In [None]:
# ---- Shuffle and Split => train, test
#
data_train = data.sample(frac=0.7, axis=0)
data_test  = data.drop(data_train.index)

# ---- Split => x,y (medv is price)
#
x_train = data_train.drop('medv',  axis=1)
y_train = data_train['medv']
x_test  = data_test.drop('medv',   axis=1)
y_test  = data_test['medv']

print('Original data shape was : ',data.shape)
print('x_train : ',x_train.shape, 'y_train : ',y_train.shape)
print('x_test  : ',x_test.shape,  'y_test  : ',y_test.shape)

### 3.2 - Data normalization
**Note :** 
 - All input data must be normalized, train and test.  
 - To do this we will **subtract the mean** and **divide by the standard deviation**.  
 - But test data should not be used in any way, even for normalization.  
 - The mean and the standard deviation will therefore only be calculated with the train data.

In [None]:
display(x_train.describe().style.format("{0:.2f}").set_caption("Before normalization :"))

mean = x_train.mean()
std  = x_train.std()
x_train = (x_train - mean) / std
x_test  = (x_test  - mean) / std

display(x_train.describe().style.format("{0:.2f}").set_caption("After normalization :"))
display(x_train.head(5).style.format("{0:.2f}").set_caption("Few lines of the dataset :"))

x_train, y_train = np.array(x_train), np.array(y_train)
x_test,  y_test  = np.array(x_test),  np.array(y_test)


## Step 4 - Build a model
About informations about : 
 - [Optimizer](https://pytorch.org/docs/stable/optim.html)
 - [Basic neural-network blocks](https://pytorch.org/docs/stable/nn.html)
 - [Loss](https://pytorch.org/docs/stable/nn.html#loss-functions)

In [None]:
class model_v1(nn.Module):
    """
    Basic fully connected neural-network for tabular data
    """
    def __init__(self,num_vars):
        super(model_v1, self).__init__()
        self.num_vars=num_vars
        self.hidden1 = nn.Linear(self.num_vars, 64)
        self.hidden2 = nn.Linear(64, 64)
        self.hidden3 = nn.Linear(64, 1)

    def forward(self, x):
        x = x.view(-1,self.num_vars)   #flatten the observation before using fully-connected layers
        x = self.hidden1(x)
        x = F.relu(x)
        x = self.hidden2(x)
        x = F.relu(x)
        x = self.hidden3(x)
        return x



## Step 5 - Train the model

### 5.1 - Stochastic gradient descent strategy to fit the model

In [None]:
def fit(model,X_train,Y_train,X_test,Y_test, EPOCHS = 5, BATCH_SIZE = 32):
    
    loss = nn.MSELoss()
    optimizer = torch.optim.Adam(model.parameters(),lr=1e-3) #lr is the learning rate
    model.train()
    
    history=convergence_history_MSELoss()
    
    history.update(model,X_train,Y_train,X_test,Y_test)
    
    n=X_train.shape[0] #number of observations in the training data
    
    #stochastic gradient descent
    for epoch in range(EPOCHS):
        
        batch_start=0
        epoch_shuffler=np.arange(n) 
        np.random.shuffle(epoch_shuffler) #remark that 'utilsData.DataLoader' could be used instead
        
        while batch_start+BATCH_SIZE < n:
            #get mini-batch observation
            mini_batch_observations = epoch_shuffler[batch_start:batch_start+BATCH_SIZE]
            var_X_batch = Variable(X_train[mini_batch_observations,:]).float()
            var_Y_batch = Variable(Y_train[mini_batch_observations]).float()
            
            #gradient descent step
            optimizer.zero_grad()               #set the parameters gradients to 0
            Y_pred_batch = model(var_X_batch)   #predict y with the current NN parameters
            
            curr_loss = loss(Y_pred_batch.view(-1), var_Y_batch.view(-1))  #compute the current loss
            curr_loss.backward()                         #compute the loss gradient w.r.t. all NN parameters
            optimizer.step()                             #update the NN parameters
            
            #prepare the next mini-batch of the epoch
            batch_start+=BATCH_SIZE
            
        history.update(model,X_train,Y_train,X_test,Y_test)
    
    return history


#### 5.2 - Get the model

In [None]:

   
model=model_v1( x_train[0,:].shape[0] )

print(model)



#### 5.3 - Train the model

In [None]:


torch_x_train=torch.from_numpy(x_train)
torch_y_train=torch.from_numpy(y_train)
torch_x_test=torch.from_numpy(x_test)
torch_y_test=torch.from_numpy(y_test)

batch_size  = 10
epochs      = 100


history=fit(model,torch_x_train,torch_y_train,torch_x_test,torch_y_test,EPOCHS=epochs,BATCH_SIZE = batch_size)


## Step 6 - Evaluate
### 6.1 - Model evaluation
MAE =  Mean Absolute Error (between the labels and predictions)  
A mae equal to 3 represents an average error in prediction of $3k.

In [None]:
var_x_test = Variable(torch_x_test).float()
var_y_test = Variable(torch_y_test).float()
y_pred = model(var_x_test)

nn_loss = nn.MSELoss()
nn_MAE_loss = nn.L1Loss()

print('x_test / loss      : {:5.4f}'.format(nn_loss(y_pred.view(-1), var_y_test.view(-1)).item()))
print('x_test / mae       : {:5.4f}'.format(nn_MAE_loss(y_pred.view(-1), var_y_test.view(-1)).item()))


### 6.2 - Training history
What was the best result during our training ?

In [None]:

df=pd.DataFrame(data=history.history)
df.describe()

In [None]:
print("min( val_mae ) : {:.4f}".format( min(history.history["val_mae"]) ) )

In [None]:
fidle.scrawler.history(history, plot={'MAE' :['mae', 'val_mae'],
                                'LOSS':['loss','val_loss']})

## Step 7 - Make a prediction
The data must be normalized with the parameters (mean, std) previously used.

In [None]:
my_data = [ 1.26425925, -0.48522739,  1.0436489 , -0.23112788,  1.37120745,
       -2.14308942,  1.13489104, -1.06802005,  1.71189006,  1.57042287,
        0.77859951,  0.14769795,  2.7585581 ]
real_price = 10.4

my_data=np.array(my_data).reshape(1,13)

In [None]:
torch_my_data=torch.from_numpy(my_data)
var_my_data = Variable(torch_my_data).float()

predictions = model( var_my_data )
print("Prediction : {:.2f} K$".format(predictions[0][0]))
print("Reality    : {:.2f} K$".format(real_price))

---
<img width="80px" src="../fidle/img/logo-paysage.svg"></img>