#NEURAL NETWORKS AND DEEP LEARNING
> M.Sc. ICT FOR LIFE AND HEALTH
> 
> Department of Information Engineering

> M.Sc. COMPUTER ENGINEERING
>
> Department of Information Engineering

> M.Sc. AUTOMATION ENGINEERING
>
> Department of Information Engineering
 
> M.Sc. PHYSICS OF DATA
>
> Department of Physics and Astronomy
 
> M.Sc. COGNITIVE NEUROSCIENCE AND CLINICAL NEUROPSYCHOLOGY
>
> Department of General Psychology

---
A.A. 2020/21 (6 CFU) - Dr. Alberto Testolin, Dr. Matteo Gadaleta
---


# Homework 1 - Supervised Deep Learning

## General overview
In this homework you will learn how to implement and test simple neural network models for solving supervised problems. It is divided in two tasks.

* **Regression task**: 
the regression model will consist in a simple function approximation problem, similar to the one discussed during the Lab practices. 

* **Classification task**: 
the classification model will consist in a simple image recognition problem, where the goal is to correctly classify images of handwritten digits (MNIST). 

In both cases, but especially for the classification problem, you should explore the use of advanced optimizers and regularization methods (e.g., initialization scheme, momentum, ADAM, early stopping, L2, L1 / sparsity, dropout…) to improve convergence of stochastic gradient descent and promote generalization. Learning hyperparameters should be tuned using appropriate search procedures, and final accuracy should be evaluated using a cross-validation setup. For the image classification task, you can also implement more advanced convolutional architectures and explore feature visualization techniques to better understand how the deep network is encoding information at different processing layers.



## Technical notes
The homework should be implemented in Python using the PyTorch framework. The student can explore additional libraries and tools to implement the models; however, please make sure you understand the code you are writing because during the exam you might receive specific questions related to your implementation. The entire source code required to run the homework must be uploaded as a compressed archive in a Moodle section dedicated to the homework.If your code will be entirely included in a single Python notebook, just upload the notebook file.




## Final report
Along with the source code, you must separately upload a PDF file containing a brief report of your homework. The report should include a brief Introduction on which you explain the homework goals and the main implementation strategies you choose, a brief Method section where you describe your model architectures and hyperparameters, and a Result section where you present the simulation results. Total length should not exceed 6 pages, though you can include additional tables and figures in a final Appendix (optional).




## Grade
The maximum grade for this homework will be **8 points**. Points will be assigned based on the correct implementation of the following items:
*	2 pt: implement basic regression and classification tasks
*	2 pt: explore advanced optimizers and regularization methods 
*	1 pt: optimize hyperparameters using grid/random search and cross-validation
*	2 pt: implement CNN for classification task
*	1 pt: visualize weight histograms, activation profiles and receptive fields



## Deadline
The complete homework (notebook + report) must be submitted through Moodle at least 10 days before the chosen exam date.


# Regression task

## Guidelines

* The goal is to train a neural network to approximate an unknown function:
$$ 
f:\mathbb{R}→\mathbb{R} \\
x↦y=f(x) \\
\text{network}(x) \approx f(x)
$$
* As training point, you only have noisy measures from the target function.
$$
\hat{y} = f(x) + noise
$$
* Consider to create a validation set from you training data, or use a k-fold cross-validation strategy. You may find useful these functions from the `scikit-learn` library:
    - [train_test_split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html)
    - [KFold](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html#sklearn.model_selection.KFold) 

## 1) Dataset

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

The following cell of code will download the dataset and make it available in the local folder `regression_dataset`. There are two files:

* `regression_dataset/train_data.csv`
* `regression_dataset/test_data.csv`

Use them to train and test your model. Each row contains two values, respactively the input and the target (label).

In [None]:
#!wget -P regression_dataset https://gitlab.dei.unipd.it/gadaleta/nnld-2020-21-lab-resources/-/raw/master/homework_1_regression_dataset/train_data.csv
#!wget -P regression_dataset https://gitlab.dei.unipd.it/gadaleta/nnld-2020-21-lab-resources/-/raw/master/homework_1_regression_dataset/test_data.csv 

How to load the data:

In [None]:
train_df = pd.read_csv('regression_dataset/train_data.csv')

How to get a specific sample:

In [None]:
sample_index = 0
input = train_df.iloc[sample_index]['input']
label = train_df.iloc[sample_index]['label']

print(f"SAMPLE AT INDEX {sample_index}")
print(f"INPUT: {input}")
print(f"LABEL: {label}")

All training points:

In [None]:
fig = plt.figure(figsize=(12,8))
plt.scatter(train_df.input, train_df.label, label='Training points')
plt.xlabel('input')
plt.ylabel('label')
plt.legend()
plt.show()

---

**Dataset**
- Download the data in local directory
- Load the data
- Create test set
- Explore the data
- Prepare the dataset
  - Define the dataset
  - Transformations
    - Data Cleaning (not neccessary)
    - Feautre selection (not neccessary)
    - Feature Scaling ?????
  - Initialize the dataset
  - Define the dataloaders
 
**Models**
 - Model 1: basic
 - Model 2: L2, dropout

**Training**
 - Train method
 - Test method
 
**Hyperparameters tuning:**
 - Basic with K-fold
 - grid-search with K-fold
 - Plot loss
 
**Test the model**
 
**Metrics summary:**
  - Loss, Bias & variance
  
**Save the model:**
 - ...

**Network Analysis:**
  - Weights histogram
  - Analyze activations

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms

## 1) Dataset

### 1.1) Dataset object

In [2]:
class Dataset(Dataset):

    def __init__(self, data, transform=None):
        """
        Args:
            data (numpy array): numpy array
            transform (callable, optional): Optional transform to be applied
                on a sample.
        """
        self.transform = transform

        self.data = data
            

        # Now self.data contains all our dataset.
        # Each element of the list self.data is a tuple: (input, output)

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
    
        sample = self.data[idx]
        if self.transform:
            sample = self.transform(sample)
        return sample

### 1.2) Transformations

In [3]:
class ToTensor(object):
    """Convert sample to Tensors."""
    def __call__(self, sample):
        x, y = sample
        return (torch.Tensor([x]).float(),
            torch.Tensor([y]).float())

### 1.3) Load and initialize Dataset

In [4]:
#Load from CSV
train_df = pd.read_csv('regression_dataset/train_data.csv').values
test_df = pd.read_csv('regression_dataset/test_data.csv').values

#Split Full dataset in Train and Validation
train_dataset = train_df[:int(len(train_df)*0.8)]
val_dataset = train_df[int(len(train_df)*0.8):]


composed_transform = transforms.Compose([ToTensor()])

train_dataset = Dataset(train_dataset, transform=composed_transform)
val_dataset   = Dataset(val_dataset, transform=composed_transform)
test_dataset  = Dataset(test_df, transform=composed_transform)


"""
from sklearn.model_selection import train_test_split

train, val = train_test_split("full.csv", test_size=0.2)

"""

'\nfrom sklearn.model_selection import train_test_split\n\ntrain, val = train_test_split("full.csv", test_size=0.2)\n\n'

In [None]:
#Dataloaers
train_dataloader = DataLoader(train_dataset, batch_size=4, shuffle=True, num_workers=0)
val_dataloader   = DataLoader(val_dataset, batch_size=len(val_dataset), shuffle=True, num_workers=0)
test_dataloader  = DataLoader(test_dataset, batch_size=len(test_dataset), shuffle=True, num_workers=0)

## 2) Models

In [5]:
class BasicRegNet(nn.Module):
    
    def __init__(self, Ni, Nh1, Nh2, No):
        """
        Ni - Input size
        Nh1 - Neurons in the 1st hidden layer
        Nh2 - Neurons in the 2nd hidden layer
        No - Output size
        """
        super().__init__()
        
        self.fc1 = nn.Linear(in_features=Ni, out_features=Nh1)
        self.fc2 = nn.Linear(in_features=Nh1, out_features=Nh2)
        self.out = nn.Linear(in_features=Nh2, out_features=No)
        self.sig = nn.Sigmoid()
        self.relu= nn.ReLU()
        
        self.name="BasicRegNet"

        print('Network initialized')
        
    def forward(self, x, additional_out=False):
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.out(x)
        return x

In [None]:
class BasicRegNet(nn.Module):
    
    def __init__(self, Ni, Nh1, Nh2, No):
        """
        Ni - Input size
        Nh1 - Neurons in the 1st hidden layer
        Nh2 - Neurons in the 2nd hidden layer
        No - Output size
        """
        super().__init__()
        
        self.fc1 = nn.Linear(in_features=Ni, out_features=Nh1)
        self.fc2 = nn.Linear(in_features=Nh1, out_features=Nh2)
        self.out = nn.Linear(in_features=Nh2, out_features=No)
        self.sig = nn.Sigmoid()
        self.relu= nn.ReLU()
        
        self.name="BasicRegNet"

        print('Network initialized')
        
    def forward(self, x, additional_out=False):
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.out(x)
        return x

## 3) Training

In [6]:
# Check if the GPU is available
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
print(f"Training device: {device}")

Training device: cuda


In [7]:
# Initialize the network and move the parameters to the proper device

torch.manual_seed(0)

Ni = 1
Nh1 = 32
Nh2 = 32
No = 1

net=BasicRegNet(Ni,Nh1,Nh2,No)

net.to(device)
print(net)

Network initialized
BasicRegNet(
  (fc1): Linear(in_features=1, out_features=32, bias=True)
  (fc2): Linear(in_features=32, out_features=32, bias=True)
  (out): Linear(in_features=32, out_features=1, bias=True)
  (sig): Sigmoid()
  (relu): ReLU()
)


In [8]:
# Define the loss function
loss_function = nn.MSELoss()

In [None]:
# Define the optimizer
optimizer = optim.Adam(net.parameters(), lr=0.001)

In [None]:
def train(net, num_epochs=100, verbose=False,early_stopping=None):
    #Print informations on the training
    if verbose: print(f"Training for {num_epochs}, Early stopping={early_stopping}")
    
    train_loss_log = []
    val_loss_log = []
    best_val,index_best=float("inf"),-1
    for epoch_num in range(num_epochs):

        ### TRAIN
        train_loss= []
        net.train() # Training mode (e.g. enable dropout)
        for sample_batched in train_dataloader:

            # Move data to device
            x_batch=sample_batched[0].to(device)
            label_batch=sample_batched[1].to(device)

            # Forward pass
            out=net(x_batch)

            # Compute loss
            loss=loss_function(out,label_batch)

            # Backpropagation
            net.zero_grad()
            loss.backward()

            # Update the weights
            optimizer.step()

            # Save train loss for this batch
            loss_batch = loss.detach().cpu().numpy()
            train_loss.append(loss_batch)

        # Save average train loss
        train_loss = np.mean(train_loss)
        train_loss_log.append(train_loss)


        ### VALIDATION
        val_loss= []
        net.eval() # Evaluation mode (e.g. disable dropout)
        with torch.no_grad(): # Disable gradient tracking
            for sample_batched in val_dataloader:
                
                # Move data to device
                x_batch=sample_batched[0].to(device)
                label_batch=sample_batched[1].to(device)

                # Forward pass
                out=net(x_batch)

                # Compute loss
                loss=loss_function(out,label_batch)

                # Save val loss for this batch
                loss_batch = loss.detach().cpu().numpy()
                val_loss.append(loss_batch)

            # Save average validation loss
            val_loss = np.mean(val_loss)
            val_loss_log.append(val_loss)

        #Print Epoch informations
        if verbose: print(f"Epoch: {epoch_num+1}/{num_epochs}\tTrain Loss: {str(round(train_loss,3))}\tVal Loss:{str(round(val_loss,3))}")
        
        if val_loss<=best_val:
            best_val=val_loss
            index_best=epoch_num
        
        #Early Stopping
        if early_stopping:
            if index_best<epoch_num-early_stopping:
                if verbose: print("Early Stopped")
                break
    return (train_loss_log,val_loss_log)

In [None]:
#Train
train_loss_log,val_loss_log=train(net,num_epochs=2000, verbose=True)

In [None]:
import datetime
save_name=datetime.datetime.now().strftime("%Y-%m-%d_%H-%M")+"_"+net.name

# Plot losses
plt.figure(figsize=(12,8))
plt.semilogy(train_loss_log, label='Train loss')
plt.semilogy(val_loss_log, label='Validation loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.grid()
plt.legend()
plt.savefig("models/"+save_name+"_Losses", dpi=400)
plt.show()

## 4) Test the model

In [None]:
all_inputs = []
all_outputs = []
all_labels = []
net.eval() # Evaluation mode (e.g. disable dropout)
with torch.no_grad(): # Disable gradient tracking
    for sample_batched in test_dataloader:
        # Move data to device
        x_batch = sample_batched[0].to(device)
        label_batch = sample_batched[1].to(device)
        # Forward pass
        out = net(x_batch)
        # Save outputs and labels
        all_inputs.append(x_batch)
        all_outputs.append(out)
        all_labels.append(label_batch)
# Concatenate all the outputs and labels in a single tensor
all_inputs  = torch.cat(all_inputs)
all_outputs = torch.cat(all_outputs)
all_labels  = torch.cat(all_labels)

test_loss = loss_function(all_outputs, all_labels)
print(f"AVERAGE TEST LOSS: {test_loss}")

## 5) Metrics Summary

In [None]:
#Train Loss
train_loss=train_loss_log[-1]

#Val loss
val_loss=val_loss_log[-1]

print("Train Loss:\t",round(train_loss,3))
print("Val Loss:\t",round(val_loss,3))
print("Test Loss:\t",round(float(test_loss),3))

## 6) Save model

In [None]:
### Save network parameters
### Save the network state
# The state dictionary includes all the parameters of the network
net_state_dict = net.state_dict()
# Save the state dict to a file
torch.save(net_state_dict, "models/"+save_name+".torch")

#Save Metrics to File
f = open("models/"+save_name+"_Metrics.txt", "a")
f.write('Train loss:\t'+ str(round(train_loss,3))+ "\n")
f.write('Train loss:\t'+ str(round(val_loss,3))+ "\n")
f.write('Train loss:\t'+ str(round(float(test_loss),3))+ "\n")
f.close()

In [35]:




train_dataset = train_df[:int(len(train_df)*0.8)]
val_dataset = train_df[int(len(train_df)*0.8):]

test_dataset = pd.read_csv('regression_dataset/test_data.csv').values

X_train=train_dataset[:,0]
y_train=train_dataset[:,1]
X_val=val_dataset[:,0]
y_val=val_dataset[:,1]
print(test_dataset.shape)
X_test=test_dataset[:,0]
y_test=test_dataset[:,1]

X_train=np.expand_dims(X_train, axis=1)
y_train=np.expand_dims(y_train, axis=1)
X_train=torch.from_numpy(X_train).float()
y_train=torch.from_numpy(y_train).float()

X_test=np.expand_dims(X_test, axis=1)
y_test=np.expand_dims(y_test, axis=1)
X_test=torch.from_numpy(X_test).float()
y_test=torch.from_numpy(y_test).float()


print(X_train.shape,y_train.shape)
print(type(y_train[0]))

from skorch import NeuralNetRegressor

net = NeuralNetRegressor(
    module=BasicRegNet,
    module__Ni= 1,
    module__Nh1 = 32,
    module__Nh2 = 32,
    module__No = 1,
    max_epochs=2000,
    
    device=device,  # uncomment this to train with CUDA
    optimizer = torch.optim.Adam,
    optimizer__lr=0.001,
    #criterion=nn.MSELoss() #used by default
)

net.fit(X_train, y_train)

(100, 2)
torch.Size([80, 1]) torch.Size([80, 1])
<class 'torch.Tensor'>
Network initialized
  epoch    train_loss    valid_loss     dur
-------  ------------  ------------  ------
      1       [36m10.7058[0m        [32m4.3981[0m  0.0050
      2       [36m10.5289[0m        [32m4.3044[0m  0.0050
      3       [36m10.3545[0m        [32m4.2147[0m  0.0050
      4       [36m10.1819[0m        [32m4.1287[0m  0.0040
      5       [36m10.0141[0m        [32m4.0458[0m  0.0050
      6        [36m9.8532[0m        [32m3.9661[0m  0.0050
      7        [36m9.6949[0m        [32m3.8882[0m  0.0050
      8        [36m9.5382[0m        [32m3.8121[0m  0.0069
      9        [36m9.3838[0m        [32m3.7297[0m  0.0060
     10        [36m9.2262[0m        [32m3.6510[0m  0.0070
     11        [36m9.0698[0m        [32m3.5758[0m  0.0050
     12        [36m8.9151[0m        [32m3.5066[0m  0.0050
     13        [36m8.7621[0m        [32m3.4429[0m  0.0060
     14       

    141        [36m2.3487[0m        [32m2.6827[0m  0.0050
    142        [36m2.3387[0m        [32m2.6691[0m  0.0050
    143        [36m2.3289[0m        [32m2.6550[0m  0.0050
    144        [36m2.3191[0m        [32m2.6407[0m  0.0050
    145        [36m2.3093[0m        [32m2.6263[0m  0.0050
    146        [36m2.2998[0m        [32m2.6136[0m  0.0050
    147        [36m2.2902[0m        [32m2.6023[0m  0.0060
    148        [36m2.2808[0m        [32m2.5906[0m  0.0050
    149        [36m2.2715[0m        [32m2.5785[0m  0.0050
    150        [36m2.2623[0m        [32m2.5663[0m  0.0050
    151        [36m2.2531[0m        [32m2.5540[0m  0.0060
    152        [36m2.2441[0m        [32m2.5416[0m  0.0050
    153        [36m2.2351[0m        [32m2.5291[0m  0.0060
    154        [36m2.2262[0m        [32m2.5166[0m  0.0060
    155        [36m2.2174[0m        [32m2.5042[0m  0.0050
    156        [36m2.2087[0m        [32m2.4920[0m  0.0070
    157 

    274        [36m1.6938[0m        [32m1.6477[0m  0.0040
    275        [36m1.6910[0m        [32m1.6432[0m  0.0050
    276        [36m1.6883[0m        [32m1.6386[0m  0.0050
    277        [36m1.6855[0m        [32m1.6338[0m  0.0050
    278        [36m1.6827[0m        [32m1.6291[0m  0.0050
    279        [36m1.6799[0m        [32m1.6242[0m  0.0050
    280        [36m1.6771[0m        [32m1.6193[0m  0.0050
    281        [36m1.6743[0m        [32m1.6145[0m  0.0060
    282        [36m1.6716[0m        [32m1.6097[0m  0.0050
    283        [36m1.6688[0m        [32m1.6049[0m  0.0060
    284        [36m1.6660[0m        [32m1.6001[0m  0.0050
    285        [36m1.6632[0m        [32m1.5959[0m  0.0040
    286        [36m1.6604[0m        [32m1.5917[0m  0.0050
    287        [36m1.6576[0m        [32m1.5874[0m  0.0060
    288        [36m1.6548[0m        [32m1.5830[0m  0.0050
    289        [36m1.6521[0m        [32m1.5785[0m  0.0060
    290 

    407        [36m1.2852[0m        [32m0.9830[0m  0.0050
    408        [36m1.2817[0m        [32m0.9775[0m  0.0050
    409        [36m1.2782[0m        [32m0.9722[0m  0.0050
    410        [36m1.2747[0m        [32m0.9658[0m  0.0050
    411        [36m1.2712[0m        [32m0.9599[0m  0.0050
    412        [36m1.2677[0m        [32m0.9540[0m  0.0050
    413        [36m1.2642[0m        [32m0.9479[0m  0.0050
    414        [36m1.2607[0m        [32m0.9427[0m  0.0040
    415        [36m1.2572[0m        [32m0.9383[0m  0.0050
    416        [36m1.2537[0m        [32m0.9346[0m  0.0050
    417        [36m1.2501[0m        [32m0.9316[0m  0.0050
    418        [36m1.2466[0m        [32m0.9277[0m  0.0060
    419        [36m1.2432[0m        [32m0.9218[0m  0.0070
    420        [36m1.2397[0m        [32m0.9150[0m  0.0060
    421        [36m1.2362[0m        [32m0.9086[0m  0.0060
    422        [36m1.2327[0m        [32m0.9026[0m  0.0060
    423 

    542        [36m0.8699[0m        0.5298  0.0060
    543        [36m0.8675[0m        0.5297  0.0050
    544        [36m0.8650[0m        0.5320  0.0070
    545        [36m0.8626[0m        0.5329  0.0060
    546        [36m0.8602[0m        0.5320  0.0050
    547        [36m0.8578[0m        [32m0.5221[0m  0.0060
    548        [36m0.8555[0m        [32m0.5173[0m  0.0060
    549        [36m0.8532[0m        [32m0.5150[0m  0.0050
    550        [36m0.8509[0m        [32m0.5124[0m  0.0040
    551        [36m0.8485[0m        0.5131  0.0055
    552        [36m0.8461[0m        0.5158  0.0040
    553        [36m0.8437[0m        0.5190  0.0040
    554        [36m0.8414[0m        0.5211  0.0050
    555        [36m0.8391[0m        0.5210  0.0050
    556        [36m0.8368[0m        [32m0.5089[0m  0.0050
    557        [36m0.8344[0m        [32m0.5034[0m  0.0050
    558        [36m0.8322[0m        0.5052  0.0040
    559        [36m0.8298[0m        0.5127 

    692        [36m0.5837[0m        0.4839  0.0050
    693        [36m0.5823[0m        0.4794  0.0050
    694        [36m0.5809[0m        0.4699  0.0070
    695        [36m0.5796[0m        0.4773  0.0060
    696        [36m0.5783[0m        0.4874  0.0060
    697        [36m0.5770[0m        0.4854  0.0050
    698        [36m0.5754[0m        0.4780  0.0059
    699        [36m0.5740[0m        0.4707  0.0050
    700        [36m0.5728[0m        0.4693  0.0060
    701        [36m0.5715[0m        0.4812  0.0050
    702        [36m0.5703[0m        0.4750  0.0080
    703        [36m0.5689[0m        0.4763  0.0050
    704        [36m0.5674[0m        0.4733  0.0050
    705        [36m0.5661[0m        0.4843  0.0050
    706        [36m0.5648[0m        0.4826  0.0050
    707        [36m0.5635[0m        0.4799  0.0060
    708        [36m0.5621[0m        0.4891  0.0050
    709        [36m0.5607[0m        0.4929  0.0060
    710        [36m0.5595[0m        0.4940  

    847        [36m0.3930[0m        0.5028  0.0050
    848        [36m0.3922[0m        0.5083  0.0050
    849        [36m0.3913[0m        0.5082  0.0060
    850        [36m0.3904[0m        0.4974  0.0050
    851        [36m0.3895[0m        0.4960  0.0060
    852        [36m0.3887[0m        0.5036  0.0060
    853        [36m0.3879[0m        0.5095  0.0050
    854        [36m0.3871[0m        0.4933  0.0060
    855        [36m0.3862[0m        0.4887  0.0050
    856        [36m0.3854[0m        0.4926  0.0060
    857        [36m0.3845[0m        0.5060  0.0050
    858        [36m0.3837[0m        0.5110  0.0050
    859        [36m0.3828[0m        0.5054  0.0080
    860        [36m0.3820[0m        0.5039  0.0050
    861        [36m0.3812[0m        0.5068  0.0050
    862        [36m0.3804[0m        0.5029  0.0060
    863        [36m0.3795[0m        0.5027  0.0080
    864        [36m0.3787[0m        0.4942  0.0050
    865        [36m0.3779[0m        0.4936  

   1002        [36m0.2903[0m        0.4423  0.0050
   1003        [36m0.2898[0m        0.4359  0.0050
   1004        [36m0.2894[0m        0.4292  0.0050
   1005        [36m0.2891[0m        0.4267  0.0050
   1006        [36m0.2887[0m        0.4229  0.0040
   1007        [36m0.2884[0m        0.4347  0.0060
   1008        [36m0.2879[0m        0.4399  0.0050
   1009        [36m0.2875[0m        0.4426  0.0050
   1010        [36m0.2872[0m        0.4430  0.0060
   1011        [36m0.2868[0m        0.4509  0.0050
   1012        [36m0.2865[0m        0.4551  0.0050
   1013        [36m0.2863[0m        0.4354  0.0060
   1014        [36m0.2858[0m        0.4239  0.0050
   1015        [36m0.2856[0m        0.4341  0.0050
   1016        [36m0.2851[0m        0.4506  0.0050
   1017        [36m0.2849[0m        0.4529  0.0070
   1018        [36m0.2846[0m        0.4440  0.0050
   1019        [36m0.2842[0m        0.4314  0.0050
   1020        [36m0.2838[0m        0.4267  

   1157        [36m0.2591[0m        0.4088  0.0050
   1158        [36m0.2591[0m        0.4174  0.0070
   1159        [36m0.2590[0m        0.4242  0.0050
   1160        [36m0.2588[0m        0.4312  0.0050
   1161        [36m0.2587[0m        0.4267  0.0050
   1162        [36m0.2586[0m        0.4288  0.0050
   1163        [36m0.2586[0m        0.4280  0.0060
   1164        [36m0.2585[0m        0.4252  0.0050
   1165        [36m0.2584[0m        0.4204  0.0050
   1166        [36m0.2583[0m        0.4170  0.0060
   1167        [36m0.2582[0m        0.4270  0.0060
   1168        [36m0.2581[0m        0.4446  0.0070
   1169        0.2582        0.4330  0.0070
   1170        [36m0.2579[0m        0.4175  0.0070
   1171        0.2579        0.4176  0.0061
   1172        [36m0.2579[0m        0.4318  0.0050
   1173        [36m0.2577[0m        0.4392  0.0060
   1174        [36m0.2575[0m        0.4378  0.0050
   1175        [36m0.2575[0m        0.4392  0.0050
   1176    

   1314        [36m0.2369[0m        0.4402  0.0055
   1315        [36m0.2368[0m        0.4422  0.0070
   1316        [36m0.2367[0m        0.4313  0.0040
   1317        [36m0.2365[0m        0.4115  0.0050
   1318        [36m0.2364[0m        [32m0.4058[0m  0.0060
   1319        [36m0.2363[0m        0.4154  0.0060
   1320        [36m0.2362[0m        0.4234  0.0050
   1321        [36m0.2361[0m        0.4238  0.0060
   1322        [36m0.2359[0m        0.4154  0.0060
   1323        [36m0.2359[0m        0.4145  0.0060
   1324        [36m0.2358[0m        0.4190  0.0050
   1325        [36m0.2357[0m        0.4270  0.0060
   1326        [36m0.2356[0m        0.4274  0.0060
   1327        [36m0.2354[0m        0.4248  0.0060
   1328        [36m0.2353[0m        0.4228  0.0060
   1329        [36m0.2351[0m        0.4226  0.0060
   1330        [36m0.2349[0m        0.4147  0.0060
   1331        [36m0.2348[0m        [32m0.4051[0m  0.0050
   1332        [36m0.2348[

   1468        [36m0.2189[0m        0.4134  0.0050
   1469        [36m0.2189[0m        0.4036  0.0100
   1470        [36m0.2188[0m        0.3928  0.0050
   1471        [36m0.2187[0m        0.3883  0.0050
   1472        0.2187        0.4012  0.0050
   1473        [36m0.2185[0m        0.4214  0.0000
   1474        0.2187        0.4069  0.0000
   1475        [36m0.2184[0m        0.3909  0.0000
   1476        0.2184        0.3926  0.0000
   1477        [36m0.2182[0m        0.4095  0.0000
   1478        [36m0.2182[0m        0.4206  0.0000
   1479        0.2182        0.4140  0.0000
   1480        [36m0.2181[0m        0.4007  0.0090
   1481        [36m0.2179[0m        0.3981  0.0050
   1482        [36m0.2179[0m        0.4077  0.0040
   1483        [36m0.2177[0m        0.4116  0.0050
   1484        [36m0.2177[0m        0.4015  0.0050
   1485        [36m0.2176[0m        0.4019  0.0050
   1486        [36m0.2176[0m        0.4049  0.0040
   1487        [36m0.2174[0

   1631        [36m0.2111[0m        0.4237  0.0000
   1632        0.2111        0.4185  0.0000
   1633        [36m0.2110[0m        0.4048  0.0000
   1634        [36m0.2109[0m        0.4026  0.0156
   1635        0.2110        0.4085  0.0000
   1636        [36m0.2108[0m        0.4225  0.0000
   1637        [36m0.2108[0m        0.4231  0.0000
   1638        [36m0.2108[0m        0.4114  0.0000
   1639        [36m0.2107[0m        0.4079  0.0156
   1640        [36m0.2107[0m        0.4141  0.0156
   1641        [36m0.2107[0m        0.4101  0.0156
   1642        [36m0.2106[0m        0.4042  0.0191
   1643        0.2107        0.4101  0.0172
   1644        [36m0.2106[0m        0.4228  0.0170
   1645        0.2106        0.4181  0.0156
   1646        [36m0.2105[0m        0.4069  0.0000
   1647        [36m0.2105[0m        0.4084  0.0000
   1648        [36m0.2105[0m        0.4189  0.0156
   1649        [36m0.2104[0m        0.4202  0.0156
   1650        [36m0.2104[0

   1796        [36m0.2067[0m        0.4031  0.0060
   1797        [36m0.2067[0m        0.4152  0.0000
   1798        [36m0.2067[0m        0.4163  0.0156
   1799        [36m0.2066[0m        0.4099  0.0000
   1800        [36m0.2066[0m        0.4112  0.0000
   1801        [36m0.2066[0m        0.4184  0.0000
   1802        0.2066        0.4165  0.0156
   1803        [36m0.2065[0m        0.4078  0.0060
   1804        [36m0.2065[0m        0.4088  0.0050
   1805        [36m0.2065[0m        0.4178  0.0050
   1806        [36m0.2065[0m        0.4173  0.0060
   1807        [36m0.2064[0m        0.4087  0.0059
   1808        0.2065        0.4094  0.0060
   1809        0.2065        0.4181  0.0050
   1810        [36m0.2064[0m        0.4253  0.0050
   1811        0.2064        0.4190  0.0156
   1812        0.2064        0.4068  0.0000
   1813        [36m0.2063[0m        0.4085  0.0000
   1814        [36m0.2063[0m        0.4099  0.0156
   1815        0.2063        0.4188  0

   1961        [36m0.2028[0m        0.4113  0.0000
   1962        0.2028        0.4141  0.0156
   1963        [36m0.2027[0m        0.4243  0.0000
   1964        [36m0.2027[0m        0.4181  0.0156
   1965        [36m0.2027[0m        0.4073  0.0000
   1966        0.2027        0.4108  0.0156
   1967        0.2027        0.4250  0.0000
   1968        [36m0.2026[0m        0.4225  0.0156
   1969        [36m0.2026[0m        0.4085  0.0050
   1970        0.2026        0.4075  0.0050
   1971        0.2026        0.4187  0.0000
   1972        [36m0.2024[0m        0.4328  0.0156
   1973        0.2026        0.4235  0.0000
   1974        0.2026        0.4071  0.0050
   1975        0.2025        0.3989  0.0060
   1976        0.2025        0.4116  0.0060
   1977        [36m0.2024[0m        0.4329  0.0060
   1978        0.2025        0.4297  0.0050
   1979        0.2024        0.4108  0.0050
   1980        [36m0.2024[0m        0.4032  0.0060
   1981        0.2024        0.4114  0.

<class 'skorch.regressor.NeuralNetRegressor'>[initialized](
  module_=BasicRegNet(
    (fc1): Linear(in_features=1, out_features=32, bias=True)
    (fc2): Linear(in_features=32, out_features=32, bias=True)
    (out): Linear(in_features=32, out_features=1, bias=True)
    (sig): Sigmoid()
    (relu): ReLU()
  ),
)

In [31]:
y_pred = net.predict(X_test)

test_loss = loss_function(y_pred, y_test)
print(f"AVERAGE TEST LOSS: {test_loss}")

TypeError: 'int' object is not callable

---
---
---
---
---
---
---
---

# Classification task

## Guidelines

* The goal is to train a neural network that maps an input image (hand-written digit) to one of ten classes (multi-class classification problem with mutually exclusive classes).
* Define a proper loss (e.g. [torch.nn.CrossEntropyLoss](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss))
* Also here, consider to create a validation set from you training data, or use a k-fold cross-validation strategy.
* Pay attention to the shape, data type and output values range. If needed, modify them accordingly to your implementation (read carefully the documentation of the layers that you use, e.g. [torch.nn.Conv2d](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html)).
* Explore different optimizers, acivation functions, network architectures. Analyze the effect of different regularization methods, such as dropout layers, random transformations (image rotation, scaling, add noise...) or L2 regularization (weight decay).

## Dataset

In [None]:
import torch
import torchvision
import matplotlib.pyplot as plt
import numpy as np

Download the dataset:

In [None]:
train_dataset = torchvision.datasets.MNIST('classifier_data', train=True, download=True)
test_dataset  = torchvision.datasets.MNIST('classifier_data', train=False, download=True)

How to get an image and the corresponding label:

In [None]:
sample_index = 0
image = train_dataset[sample_index][0]
label = train_dataset[sample_index][1]

fig = plt.figure(figsize=(8,8))
plt.imshow(image, cmap='Greys')
print(f"SAMPLE AT INDEX {sample_index}")
print(f"LABEL: {label}")

The output of the dataset is a PIL Image, a python object specifically developed to manage and process images. PyTorch supports this format, and there are useful transforms available natively in the framework: https://pytorch.org/docs/stable/torchvision/transforms.html

If you want, you can easily convert a PIL image to a numpy array and entirely ignore the PIL object:

In [None]:
image_numpy = np.array(image)

print(f'Numpy array shape: {image_numpy.shape}')
print(f'Numpy array type: {image_numpy.dtype}')

To transform a PIL Image directly to a PyTorch tensor, instead:

In [None]:
to_tensor = torchvision.transforms.ToTensor()
image_tensor = to_tensor(image)

print(f'PyTorch tensor shape: {image_tensor.shape}')
print(f'PyTorch tensor type: {image_tensor.dtype}')

---

**Dataset**
- Download the data in local directory
- Load the data
- Create test set
- Explore the data
- Prepare the dataset
  - Define the dataset
  - Transformations
    - Data Cleaning (not neccessary)
    - Feautre selection (not neccessary)
    - Feature Scaling ?????
  - Initialize the dataset
  - Define the dataloaders
 
 **Models**
 - Model 1: basic
 - Model 2: L2, dropout

**Training**
 - Train method
 - Test method
 
**Hyperparameters tuning:**
 - Basic with K-fold
 - grid-search with K-fold
 - Plot loss & Accuracy
 
**Test the model**
 
**Metrics summary:**
  - Loss
  - Accuracy
  - Bias & variance
  - Precision and Recall
  - F1
  - Confusion Matrix

**Network Analysis:**
  - Weights histogram
  - Analyze activations
  - Most mispredicted words