Knowledge Distillation 
===============================

**Author**: [Clara Martinez](https://github.com/moonblume/LIVIA.git)


Knowledge distillation is a technique that enables knowledge transfer
from large, computationally expensive models to smaller ones without
losing validity. This allows for deployment on less powerful hardware,
making evaluation faster and more efficient.

Librairies
================


In [24]:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "3"
import pandas as pd
import numpy as np
from tqdm import tqdm

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
from torch.utils.data import DataLoader, TensorDataset, Dataset
from torch.optim.lr_scheduler import ReduceLROnPlateau

from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold
from scipy.signal import savgol_filter
from sklearn.metrics import mean_absolute_error, mean_squared_error

from typing import List, Union, Tuple, Any
import statistics
from statsmodels.tsa.seasonal import seasonal_decompose

# Check if GPU is available, and if not, use the CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

Loading dataset as Pandas Dataframe
===================================

Fisrt, I focus on the physiological signals of the Biovid dataset. In one sample, we have access to 6 classes associated with 0 to 4 pain levels :  

Time: This could be the timestamp or time index when the signal was recorded.

GSR (Galvanic Skin Response): A measure of the electrical conductance of the skin, which varies with the moisture level of the skin. It's often associated with emotional arousal.

ECG (Electrocardiogram): A recording of the electrical activity of the heart over time. It typically consists of waves representing the depolarization and repolarization of the heart muscle during each heartbeat.

EMG (Electromyography) - Trapezius: Measures the electrical activity produced by skeletal muscles. The trapezius muscle is a large superficial muscle that extends longitudinally from the occipital bone to the lower thoracic vertebrae and laterally to the spine of the scapula.

EMG - Corrugator: Electromyography signal from the corrugator supercilii muscle, which is a small facial muscle involved in frowning and expressing negative emotions.

EMG - Zygomaticus: Electromyography signal from the zygomaticus major muscle, which is involved in smiling and expressing positive emotions.  
    
    
Our objective is to predict the pain level of input signals. One signal corresponds to one csv file.



In [11]:
# Define the directory containing the CSV files
biosignals_path = '/home/ens/AU59350/LIVIA/physio/physio_organised/'

# Initialize an empty list to store data for DataFrame
data = []

# Iterate over each pain level directory
for pain_level in os.listdir(biosignals_path):
    pain_level_dir = os.path.join(biosignals_path, pain_level)
    
    # Check if it's a directory
    if os.path.isdir(pain_level_dir):
        # Iterate over each CSV file in the pain level directory
        for csv_file in os.listdir(pain_level_dir):
            # Check if it's a CSV file
            if csv_file.endswith('.csv'):
                csv_path = os.path.join(pain_level_dir, csv_file)
                # Read the CSV file
                df = pd.read_csv(csv_path, sep='\t')
                # Extract GSR values
                gsr_signal = df['gsr'].values
                # Extract ECG values
                ecg_signal = df['ecg'].values
                # Extract EMG trapezius values
                emg_signal = df['emg_trapezius'].values
                # Extract time values
                time = df['time'].values
                # Append the CSV name, GSR signals, and Pain level to the data list
                data.append({'CSV name': csv_file,'Time': time, 'GSR signals': gsr_signal, 'ECG signals': ecg_signal, 'EMG signals': emg_signal,'Pain level': int(pain_level)})

# Create a DataFrame from the collected data
df = pd.DataFrame(data)

# Display the DataFrame
df.head()


Unnamed: 0,CSV name,Time,GSR signals,ECG signals,EMG signals,Pain level
0,072414_m_23-PA2-034_bio.csv,"[1641, 3594, 5547, 7500, 9453, 11406, 13359, 1...","[6.966839, 6.966161, 6.966, 6.966839, 6.966161...","[-246.3745, -248.5128, -247.0629, -248.6413, -...","[-0.001924584, -0.02534641, -0.3388469, -0.513...",2
1,081609_w_40-PA2-028_bio.csv,"[0, 1953, 3906, 5859, 7813, 9766, 11719, 13672...","[0.872, 0.872, 0.872, 0.872, 0.872, 0.872, 0.8...","[99.9569, 111.0614, 114.0062, 123.7483, 117.03...","[2.315527, -4.576343, -7.510249, -2.14524, 1.3...",2
2,081714_m_36-PA2-065_bio.csv,"[859, 2813, 4766, 6719, 8672, 10625, 12578, 14...","[6.089862, 6.091, 6.091432, 6.092, 6.092432, 6...","[186.0979, 187.6918, 190.7044, 193.3572, 195.2...","[3.878096e-29, -1.448683e-28, 5.406606e-28, -2...",2
3,102514_w_40-PA2-046_bio.csv,"[0, 1953, 3906, 5859, 7813, 9766, 11719, 13672...","[1.462, 1.462, 1.462, 1.462, 1.462, 1.462, 1.4...","[-107.6247, -95.28533, -108.069, -102.2526, -1...","[-1.5371, -0.8260319, 0.09676914, 0.09593942, ...",2
4,120514_w_56-PA2-019_bio.csv,"[234, 2188, 4141, 6094, 8047, 10000, 11953, 13...","[2.226, 2.226, 2.226, 2.226, 2.226, 2.226, 2.2...","[-262.4505, -253.6655, -228.2035, -202.7421, -...","[-0.005932733, 0.0208005, -0.07729835, 0.27021...",2


Variable analysis on GSR
================================

### Selection of number of pain level included in the classification task

In [12]:
# Filter the DataFrame to keep rows where pain level is not equal to 1, 2, or 3
df = df[~df['Pain level'].isin([1, 2, 3])]

# Print the filtered DataFrame
print(df)

                         CSV name  \
5220  080314_w_25-PA4-067_bio.csv   
5221  092009_m_54-PA4-042_bio.csv   
5222  071709_w_23-PA4-071_bio.csv   
5223  082809_m_26-PA4-005_bio.csv   
5224  112909_w_20-PA4-080_bio.csv   
...                           ...   
8695  082909_m_47-BL1-085_bio.csv   
8696  081609_w_40-BL1-090_bio.csv   
8697  091809_w_43-BL1-097_bio.csv   
8698  112016_m_25-BL1-091_bio.csv   
8699  083013_w_47-BL1-086_bio.csv   

                                                   Time  \
5220  [234, 2188, 4141, 6094, 8047, 10000, 11953, 13...   
5221  [156, 2109, 4063, 6016, 7969, 9922, 11875, 138...   
5222  [1406, 3359, 5313, 7266, 9219, 11172, 13125, 1...   
5223  [391, 2344, 4297, 6250, 8203, 10156, 12109, 14...   
5224  [1406, 3359, 5313, 7266, 9219, 11172, 13125, 1...   
...                                                 ...   
8695  [781, 2734, 4688, 6641, 8594, 10547, 12500, 14...   
8696  [547, 2500, 4453, 6406, 8359, 10313, 12266, 14...   
8697  [313, 2266, 4219, 

In [13]:
# Remove the 'Time' column from the DataFrame
df.drop(columns=['Time'], inplace=True)

# Remove the 'ECG' column from the DataFrame
df.drop(columns=['ECG signals'], inplace=True)

# Remove the 'EMG' column from the DataFrame
df.drop(columns=['EMG signals'], inplace=True)

Preprocessing of dataframe
================

Preprocessing steps for GSR DataFrame include tasks such as handling missing values, smoothing the signal to reduce noise in the GSR signal 9(Savitzky-Golay filtering), removing outliers (z-score), and normalizing the data between a specified range, such as [0, 1] or [-1, 1] helping comparison across different subjects.

In [17]:
five_fold_annotations_path = '/projets2/AS84330/Datasets/Biovid/PartA/5folds_annotations2/'

In [15]:
# Function to preprocess GSR signals
def preprocess_gsr_signal(gsr_signal):
    # Handle missing values (if any)
    gsr_signal = np.array(gsr_signal)  # Convert to NumPy array
    gsr_signal = gsr_signal[~np.isnan(gsr_signal)]  # Remove NaN values
    
    # Check if the length of the signal is sufficient for smoothing
    if len(gsr_signal) < 5:
        # If the signal is too short, return the original signal
        return gsr_signal
    
    try:
        # Smoothing using Savitzky-Golay filter
        gsr_signal_smooth = savgol_filter(gsr_signal, window_length=5, polyorder=2)
    except ValueError:
        # If an error occurs during smoothing, return the original signal
        return gsr_signal
    
    # Removing outliers based on Z-scores
    z_scores = (gsr_signal_smooth - gsr_signal_smooth.mean()) / gsr_signal_smooth.std()
    gsr_signal_smooth_no_outliers = gsr_signal_smooth[(z_scores < 3)]
    
    # Normalization
    if len(gsr_signal_smooth_no_outliers) > 0:
        gsr_signal_normalized = (gsr_signal_smooth_no_outliers - gsr_signal_smooth_no_outliers.min()) / \
                                 (gsr_signal_smooth_no_outliers.max() - gsr_signal_smooth_no_outliers.min())
    else:
        # If there are no valid values after removing outliers, return the original signal
        return gsr_signal
    
    return gsr_signal_normalized

# Apply preprocessing to each row in the DataFrame
df['GSR signals'] = df['GSR signals'].apply(preprocess_gsr_signal)

# Display the updated DataFrame
df.head()

Unnamed: 0,CSV name,GSR signals,Pain level
5220,080314_w_25-PA4-067_bio.csv,"[0.0, 0.000784955924601847, 0.0012131137016623...",4
5221,092009_m_54-PA4-042_bio.csv,"[0.21654547886192313, 0.21654547886192313, 0.2...",4
5222,071709_w_23-PA4-071_bio.csv,"[1.0, 0.9983642739356123, 0.9979709555643436, ...",4
5223,082809_m_26-PA4-005_bio.csv,"[0.3629402970779174, 0.36178416384520784, 0.36...",4
5224,112909_w_20-PA4-080_bio.csv,"[0.16994520796566095, 0.1741950993228739, 0.17...",4


In [25]:
#  Prepare the data
max_length = max(len(signal) for signal in df['GSR signals'])  # Find the maximum length of GSR signals

# Pad or truncate the GSR signals to the maximum length
gsr_signals = np.array([np.pad(signal, (0, max_length - len(signal))) if len(signal) < max_length else signal[:max_length] for signal in df['GSR signals']])

pain_levels = df['Pain level'].values

# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(gsr_signals, pain_levels, test_size=0.2, random_state=42)

# Convert the data into PyTorch tensors
X_train_tensor = torch.tensor(X_train, dtype=torch.float32).unsqueeze(1)  
X_test_tensor = torch.tensor(X_test, dtype=torch.float32).unsqueeze(1)  
y_train_tensor = torch.tensor(y_train, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32)

# Check the shape of tensors
print("X_train_tensor shape:", X_train_tensor.shape)
print("X_test_tensor shape:", X_test_tensor.shape)
print("y_train_tensor shape:", y_train_tensor.shape)
print("y_test_tensor shape:", y_test_tensor.shape)


X_train_tensor shape: torch.Size([2784, 1, 2816])
X_test_tensor shape: torch.Size([696, 1, 2816])
y_train_tensor shape: torch.Size([2784])
y_test_tensor shape: torch.Size([696])


X_train_tensor: The training data tensor with a shape of [6960, 1, 2816], indicating that there are 6960 samples, each with 1 channel (for the GSR signal), and each signal has been padded or truncated to a length of 2816.  
X_test_tensor: The test data tensor with a shape of [1740, 1, 2816], indicating that there are 1740 samples in the test set, each with 1 channel, and the signals have the same length as the training data.  
y_train_tensor: The training labels tensor with a shape of [6960], containing the corresponding pain levels for the training samples.  
y_test_tensor: The test labels tensor with a shape of [1740], containing the corresponding pain levels for the test samples.

The first dimension (29) represents the batch size, indicating that there are 29 samples in the batch.  
The second dimension (1) represents the number of channels. In this case, there is only one channel.  
The third dimension (1) represents the length of the input data for each channel.

Input data has a shape of (batch_size, channels, sequence_length).

Defining model classes and utility functions
================

In [18]:
# Neural network class to be used as teacher:

class Conv1D_T(nn.Module):
    def __init__(self, num_classes=2):
        super(Conv1D_T, self).__init__()
        # First Convolutional Layer
        self.conv1 = nn.Conv1d(in_channels=1, out_channels=32, kernel_size=5, stride=2)
        self.relu1 = nn.ReLU()
        self.maxpool1 = nn.MaxPool1d(kernel_size=2)
        
        # Second Convolutional Layer
        self.conv2 = nn.Conv1d(in_channels=32, out_channels=64, kernel_size=5)
        self.relu2 = nn.ReLU()
        self.maxpool2 = nn.MaxPool1d(kernel_size=2)
        
        # Fully Connected Layers
        self.fc1 = nn.Linear(22336, 512)  
        self.relu3 = nn.ReLU()
        self.fc2 = nn.Linear(512, num_classes)
        
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.conv1(x)
        x = self.relu1(x)
        x = self.maxpool1(x)
        
        x = self.conv2(x)
        x = self.relu2(x)
        x = self.maxpool2(x)
        
        x = x.view(x.size(0), -1)  # Flatten the tensor to 1D
        x = self.fc1(x)
        
        x = self.fc2(x)
        #x = self.sigmoid(x)
        return x

# Neural network class to be used as student:

class Conv1D_S(nn.Module):
    def __init__(self, num_classes=2):
        super(Conv1D_S, self).__init__()
        # First Convolutional Layer
        self.conv1 = nn.Conv1d(in_channels=1, out_channels=32, kernel_size=5, stride=2)
        self.relu1 = nn.ReLU()
        self.maxpool1 = nn.MaxPool1d(kernel_size=2)
        
        # Second Convolutional Layer
        self.conv2 = nn.Conv1d(in_channels=32, out_channels=64, kernel_size=5)
        self.relu2 = nn.ReLU()
        self.maxpool2 = nn.MaxPool1d(kernel_size=2)
        
        # Fully Connected Layers
        self.fc1 = nn.Linear(22336, 512)  
        self.relu3 = nn.ReLU()
        self.fc2 = nn.Linear(512, num_classes)
        
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.conv1(x)
        x = self.relu1(x)
        x = self.maxpool1(x)
        
        x = self.conv2(x)
        x = self.relu2(x)
        x = self.maxpool2(x)
        
        x = x.view(x.size(0), -1)  # Flatten the tensor to 1D
        x = self.fc1(x)
        
        x = self.fc2(x)
        #x = self.sigmoid(x)
        return x


In [20]:
class GSRDataset(Dataset):
    def __init__(self, annotationfile_path, biosignals_path):
        
        self.root_path = '/projets2/AS84330/Datasets/Biovid/PartA'
        self.biosignals_path = biosignals_path
        self.annotationfile_path = annotationfile_path
        self._parse_annotationfile()

    def __len__(self):
        return len(self.data)

    def __getitem__(self, index):
        label = self.video_list[index].label
        physio_df = self._load_biosignals(self.video_list[index].path)
        gsr_data=physio_df['gsr']
        gsr_data = torch.tensor(gsr_data.values)
        return gsr_data, label
    
    def _load_biosignals(self,vid):
        biosignals_path = self.biosignals_path
        vid_name = vid.split('/')[:]
        csv_path = os.path.join(biosignals_path, vid_name[-2],vid_name[-1] + '_bio.csv')
        physio_df = pd.read_csv(csv_path, sep='\t')
        return physio_df
    
    def _parse_annotationfile(self):
        self.video_list = [VideoRecord(x.strip().split(), self.root_path) for x in open(self.annotationfile_path)]
    
    def __len__(self):
        return len(self.video_list)

In [21]:
def validate_physio_gsr_only(physio_model, val_dataloader, criterion, device):
    # Validation phase
    physio_model.eval() 
    val_correct = 0
    val_total = 0
    val_vis_loss = 0.0
    val_physio_loss = 0.0

    with torch.no_grad():
        for val_data in tqdm(val_dataloader, total=len(val_dataloader), desc=f'Validation'):
            val_inputs, val_labels = val_data

            val_inputs = val_inputs.reshape(val_inputs.shape[0],1,val_inputs.shape[1])
            
            val_inputs = val_inputs.to(device, dtype=torch.float)
            val_labels = val_labels.to(device)
        


            val_physio_outputs = physio_model(val_inputs)
            # val_vis_outputs = vis_model(val_inputs)

            val_physio_loss += criterion(val_physio_outputs, val_labels)
            # val_vis_loss += criterion(val_vis_outputs, val_labels).item()

            # val_both_outputs = val_physio_outputs + val_vis_outputs

            _,val_predicted = torch.max(val_physio_outputs.data, 1)

            # _, val_both_predicted = torch.max(val_both_outputs.data, 1)
            
            val_total += val_labels.size(0)
            val_correct += (val_predicted == val_labels).sum().item()

    val_accuracy = 100 * val_correct / val_total
    avg_val_loss = ((val_physio_loss)) / len(val_dataloader)
    print(f'Validation accuracy: {val_accuracy}%')
    print(f'Validation loss: {avg_val_loss}')
    return val_accuracy, avg_val_loss


In [30]:
def train(train_annotation,test_annotation,weight_path):
    batch_size = 1024
    num_epochs = 200
    lr = 0.0001
    num_classes = 2
    check_every = 1
    best_val_acc = 0

    biosignals_path = '/projets2/AS84330/Datasets/Biovid/PartA/physio/physio_organised'
    five_fold_annotations_path = '/projets2/AS84330/Datasets/Biovid/PartA/5folds_annotations2/'
    train_annotation_file = os.path.join(five_fold_annotations_path, train_annotation)
    val_annotation_file = os.path.join(five_fold_annotations_path, test_annotation)


    train_dataset = GSRDataset(train_annotation_file, biosignals_path)
    train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

    val_dataset = GSRDataset(val_annotation_file, biosignals_path)
    val_dataloader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    physio_model = Conv1D_T(num_classes).to(device)

    criterion = nn.CrossEntropyLoss()
    physio_optimizer = optim.SGD(physio_model.parameters(), lr=lr, momentum=0.9)
    #scheduler = ReduceLROnPlateau(physio_optimizer, mode='min', factor=0.1, patience=5, verbose=True)

    for epoch in tqdm(range(num_epochs), desc='Epochs'):
        physio_model.train()
        
        
        running_loss = 0
        correct = 0
        total = 0
        
        for i,(physio_batch, labels) in enumerate(train_dataloader):
            physio_optimizer.zero_grad()
            physio_batch = physio_batch.reshape(physio_batch.shape[0],1,physio_batch.shape[1])
            physio_batch = physio_batch.to(device, dtype=torch.float)
            labels = labels.to(device)
            
            physio_outputs = physio_model(physio_batch)
            
            physio_loss = criterion(physio_outputs, labels)
            
            physio_loss.backward()
            physio_optimizer.step()
            # print(physio_loss.data)
            
            running_loss += physio_loss.item()
            
            _, physio_predicted = torch.max(physio_outputs.data, 1)
            total += labels.size(0)
            # print('output: ', physio_outputs)
            # print('predicted: ', physio_predicted)
            # print('labels: ', labels)
            # print('**************************')
            correct += (physio_predicted == labels).sum().item()
            #print(physio_loss.item())

        print(f"Accuracy after epoch {epoch + 1}: {100 * correct / total}%")

        if epoch % check_every == 0:
                val_acc, val_loss = validate_physio_gsr_only(physio_model, val_dataloader, criterion, device)
                # scheduler.step(val_loss)
                # print( "Validation accuracy: ", val_acc)
                if val_acc > best_val_acc:
                    best_val_acc = val_acc
                    remove_previous_files(weight_path)
                    model_save_path = f'{weight_path}{round(best_val_acc,2)}.pth'
                    torch.save(physio_model.state_dict(), model_save_path)
                    print('Best model saved at epoch: ', epoch+1)
                    best_epoch = epoch+1

    print("Finished Training")

    train_accuracy = 100 * correct / total
    avg_train_loss = running_loss / len(train_dataloader)
    print(f'Training accuracy: {train_accuracy}%')
    print(f'Training loss: {avg_train_loss}')

    print("Best model saved at epoch: ", best_epoch)
    print("Best validation accuracy: ", best_val_acc)
    
    return best_val_acc

def test(test_annotation, test_weights):

    batch_size = 1024
    num_classes = 2
    biosignals_path = '/projets2/AS84330/Datasets/Biovid/PartA/physio/physio_organised'
   
    videos_root = '/projets2/AS84330/Datasets/Biovid/PartA/subject_images/subject_images_organised'
    val_annotation_file = os.path.join(videos_root,'../../5folds_annotations', test_annotation)


    val_dataset = GSRDataset(val_annotation_file, biosignals_path)
    val_dataloader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    physio_model = Conv1D_T(num_classes).to(device)
    
    physio_model.load_state_dict(torch.load('/projets2/AS84330/Projets/MM_transformer/biovid_codes/all_weights/weights_physio_viso2/' + test_weights))

    criterion = nn.CrossEntropyLoss()

    
    val_acc, _ = validate_physio_gsr_only(physio_model, val_dataloader, criterion, device)
    
    return val_acc

##### Train + Evaluate #####
if __name__ == '__main__':
    dir_name = '/home/ens/AU59350/LIVIA/resultsCNN'
    if not os.path.exists(dir_name):
        os.mkdir(dir_name)
    else:
        print(f"The directory '{dir_name}' already exists.")
    
    kfold_accuracy = []
    for i in range (1,6):
        train_annotation = f'train_fold{i}.txt'
        test_annotation = f'test_fold{i}.txt'
        weight_name = f'model_best_gsr_fold{i}_'
        weight_path = os.path.join(dir_name,weight_name)
        best_accuracy = train(train_annotation,test_annotation,weight_path)
        kfold_accuracy.append(round(best_accuracy,1))

    write_accuracy_to_file(dir_name, kfold_accuracy)
    

The directory '/home/ens/AU59350/LIVIA/resultsCNN' already exists.


NameError: name 'VideoRecord' is not defined

In [20]:
def train(model, train_loader, epochs, learning_rate, device):
    # Define learning parameters
    learning_rate = 0.0001
    epochs = 100
    batch_size = 1024
    num_classes = 2

    # Define loss function and optimizer
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9)

    # Initialize K-Fold cross-validation
    k_folds = 5
    kf = KFold(n_splits=k_folds, shuffle=True)
    
    # Perform cross-validation
    for fold, (train_index, val_index) in enumerate(kf.split(X_train_tensor)):
        print(f"Fold {fold + 1}/{k_folds}")
        
        # Split data into training and validation sets for this fold
        X_train_fold, X_val_fold = X_train_tensor[train_index], X_train_tensor[val_index]
        y_train_fold, y_val_fold = y_train_tensor[train_index], y_train_tensor[val_index]
        
        # Create DataLoader for training and validation sets
        train_dataset = TensorDataset(X_train_fold, y_train_fold)
        val_dataset = TensorDataset(X_val_fold, y_val_fold)
        train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
        val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)
        
        for epoch in range(epochs):
            
            model.train()
            running_loss = 0.0
            correct = 0
            total = 0

            for inputs, labels in train_loader:
                optimizer.zero_grad()
                outputs = model(inputs)
                loss = criterion(outputs, labels)
                loss.backward()
                optimizer.step()
                running_loss += loss.item() * inputs.size(0)

                # Calculate training accuracy
                _, predicted = torch.max(outputs, 1)
                total += labels.size(0)
                correct += (predicted == labels).sum().item()

            epoch_loss = running_loss / len(train_loader.dataset)
            epoch_accuracy = correct / total
            print(f"Epoch {epoch+1}/{epochs}, Loss: {epoch_loss:.4f}, Accuracy: {epoch_accuracy:.4f}")

def test(model, test_loader, device):
    model.to(device)
    model.eval()

    correct = 0
    total = 0

    with torch.no_grad():
        outputs = model(X_test_tensor)
        predictions = torch.argmax(outputs, dim=1)
        y_test_tensor_sampled = y_test_tensor[:len(predictions)]
        accuracy = torch.mean((predictions == y_test_tensor_sampled).float()).item()
        print(f"Test Accuracy: {accuracy:.4f}")
        
        # Calculate MAE and RMSE
        mae = mean_absolute_error(y_test_tensor_sampled, predictions)
        rmse = mean_squared_error(y_test_tensor_sampled, predictions, squared=False)
        print(f"MAE: {mae:.4f}")
        print(f"RMSE: {rmse:.4f}")

Cross-entropy runs
==================

For reproducibility, we need to set the torch manual seed. I train
networks using different methods, so to compare them fairly, it makes
sense to initialize the networks with the same weights. I start by
training the teacher network using cross-entropy:

In [18]:
# Define batch size
batch_size = 1024

# Define training and testing datasets
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
test_dataset = TensorDataset(X_test_tensor, y_test_tensor)

# Create DataLoader for training and testing sets
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

# Train the teacher network
torch.manual_seed(42)
nn_t = Conv1D_T(num_classes=2).to(device)
train(nn_t, train_loader, epochs=100, learning_rate=0.0001, device=device)

# Test the teacher network
test_accuracy_T = test(nn_t, test_loader, device)

# Instantiate the lightweight network
torch.manual_seed(42)
nn_s = Conv1D_S(num_classes=2).to(device)


Fold 1/5
Epoch 1/100, Loss: 0.6974, Accuracy: 0.4946
Epoch 2/100, Loss: 0.6942, Accuracy: 0.4932
Epoch 3/100, Loss: 0.6941, Accuracy: 0.5068
Epoch 4/100, Loss: 0.6943, Accuracy: 0.5041
Epoch 5/100, Loss: 0.6933, Accuracy: 0.5156
Epoch 6/100, Loss: 0.6931, Accuracy: 0.5009
Epoch 7/100, Loss: 0.6933, Accuracy: 0.5063
Epoch 8/100, Loss: 0.6931, Accuracy: 0.5022
Epoch 9/100, Loss: 0.6929, Accuracy: 0.5072
Epoch 10/100, Loss: 0.6929, Accuracy: 0.5153
Epoch 11/100, Loss: 0.6928, Accuracy: 0.5165
Epoch 12/100, Loss: 0.6926, Accuracy: 0.5138
Epoch 13/100, Loss: 0.6926, Accuracy: 0.5084
Epoch 14/100, Loss: 0.6925, Accuracy: 0.5074
Epoch 15/100, Loss: 0.6924, Accuracy: 0.5154
Epoch 16/100, Loss: 0.6923, Accuracy: 0.5174
Epoch 17/100, Loss: 0.6922, Accuracy: 0.5183
Epoch 18/100, Loss: 0.6921, Accuracy: 0.5198
Epoch 19/100, Loss: 0.6921, Accuracy: 0.5217
Epoch 20/100, Loss: 0.6920, Accuracy: 0.5207
Epoch 21/100, Loss: 0.6919, Accuracy: 0.5198
Epoch 22/100, Loss: 0.6918, Accuracy: 0.5208
Epoch 23/1



I instantiate one more lightweight network model to compare their
performances. Back propagation is sensitive to weight initialization, so
I need to make sure these two networks have the exact same
initialization.


In [None]:
torch.manual_seed(42)
new_Conv1D_S = Conv1D_S(num_classes=2).to(device)

To ensure I have created a copy of the first network, we inspect the
norm of its first layer. If it matches, then the networks are indeed the same.

In [None]:
# Print the norm of the first layer of the initial lightweight model
print("Norm of 1st layer of nn_light:", torch.norm(Conv1D_S.features[0].weight).item())
# Print the norm of the first layer of the new lightweight model
print("Norm of 1st layer of new_nn_light:", torch.norm(new_Conv1D_S.features[0].weight).item())

Print the total number of parameters in each model:

In [None]:
total_params_T = "{:,}".format(sum(p.numel() for p in Conv1D_T.parameters()))
print(f"DeepNN parameters: {total_params_T}")
total_params_S = "{:,}".format(sum(p.numel() for p in Conv1D_S.parameters()))
print(f"LightNN parameters: {total_params_S}")

Train and test the lightweight network with cross entropy loss:

In [None]:
train(Conv1D_S, train_loader, epochs=10, learning_rate=0.0001, device=device)
test_accuracy_S_ce = test(Conv1D_S, test_loader, device)

As we can see, based on test accuracy, I can now compare the deeper
network that is to be used as a teacher with the lightweight network
that is the supposed student. So far, the student has not intervened
with the teacher, therefore this performance is achieved by the student
itself. The metrics so far can be seen with the following lines:

In [None]:
print(f"Teacher accuracy: {test_accuracy_T:.2f}%")
print(f"Student accuracy: {test_accuracy_S_ce:.2f}%")

In [None]:
def train_knowledge_distillation(teacher, student, train_loader, epochs, learning_rate, T, soft_target_loss_weight, ce_loss_weight, device):
    # Define learning parameters
    learning_rate = 0.0001

    # Define loss function and optimizer
    ce_loss = nn.CrossEntropyLoss()
    optimizer = optim.SGD(student.parameters(), lr=learning_rate, momentum=0.9)

    teacher.eval()  # Teacher set to evaluation mode
    student.train() # Student to train mode

    for epoch in range(epochs):
        running_loss = 0.0
        for inputs, labels in train_loader:
            inputs, labels = inputs.to(device), labels.to(device)

            optimizer.zero_grad()

            # Forward pass with the teacher model - do not save gradients here as we do not change the teacher's weights
            with torch.no_grad():
                teacher_logits = teacher(inputs)

            # Forward pass with the student model
            student_logits = student(inputs)

            #Soften the student logits by applying softmax first and log() second
            soft_targets = nn.functional.softmax(teacher_logits / T, dim=-1)
            soft_prob = nn.functional.log_softmax(student_logits / T, dim=-1)

            # Calculate the soft targets loss. Scaled by T**2 as suggested by the authors of the paper "Distilling the knowledge in a neural network"
            soft_targets_loss = torch.sum(soft_targets * (soft_targets.log() - soft_prob)) / soft_prob.size()[0] * (T**2)

            # Calculate the true label loss
            label_loss = ce_loss(student_logits, labels)

            # Weighted sum of the two losses
            loss = soft_target_loss_weight * soft_targets_loss + ce_loss_weight * label_loss

            loss.backward()
            optimizer.step()

            running_loss += loss.item()

        print(f"Epoch {epoch+1}/{epochs}, Loss: {running_loss / len(train_loader)}")

# Apply ``train_knowledge_distillation`` with a temperature of 2. Arbitrarily set the weights to 0.75 for CE and 0.25 for distillation loss.
train_knowledge_distillation(teacher=Conv1D_T, student=new_Conv1D_S, train_loader=train_loader, epochs=10, learning_rate=0.001, T=2, soft_target_loss_weight=0.25, ce_loss_weight=0.75, device=device)
test_accuracy_light_ce_and_kd = test(new_nn_light, test_loader, device)

# Compare the student test accuracy with and without the teacher, after distillation
print(f"Teacher accuracy: {test_accuracy_T:.2f}%")
print(f"Student accuracy without teacher: {test_accuracy_S_ce:.2f}%")
print(f"Student accuracy with CE + KD: {test_accuracy_light_ce_and_kd:.2f}%")