# **Neuroevolution Deliverable 2** 🙂

Group I:

Student Name       | Student Email
-------------------|------------------
Filipe Dias        | r20181050@novaims.unl.pt
Inês Santos        | r20191184@novaims.unl.pt
Manuel Marreiros   | r20191223@novaims.unl.pt

In [1]:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
import random
import copy
import torch
import torch.nn as nn
import torch.nn.functional as nnf
import numpy as np
import matplotlib.pyplot as plt
from torch.utils.data import DataLoader, TensorDataset
from torchvision import datasets , transforms
from dataclasses import dataclass
import torchvision.transforms as T 
from mlxtend.plotting import plot_decision_regions
from tqdm import tqdm
%matplotlib inline

In [2]:
device = device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

batch_size = 16

train_dataset = datasets.MNIST('./data', 
                               train=True, 
                               download=True, 
                               transform=transforms.ToTensor())

validation_dataset = datasets.MNIST('./data', 
                                    train=False, 
                                    transform=transforms.ToTensor())

train_loader = DataLoader(dataset=train_dataset, 
                            batch_size=batch_size, 
                            shuffle=True)

validation_loader = DataLoader(dataset=validation_dataset, 
                                batch_size=batch_size, 
                                shuffle=False)

input_size = 28*28
output_size =  10

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 148272335.85it/s]

Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw






Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 26118088.36it/s]


Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 36082848.88it/s]


Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 5840137.57it/s]

Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw






### Grammar

This dictionary contains the different layers that can be used by the models and the respective parameters with values that will be drawn at random.

In [3]:
layer_params = {
    'Linear': {
        'in_features': [64, 128, 256, 512],
        'out_features': [512, 256, 128, 64, 32, 16],
        'bias': [True, False]
    },
    'BatchNorm1d': {
        'eps': [1e-5, 1e-4, 1e-3],
        'momentum': [0.1, 0.9, 0.99]
    },
    'LayerNorm': {
        'eps': [1e-5, 1e-4, 1e-3]
    },
    'Dropout': {
        'p': [0.1, 0.5, 0.7]
    },
    'AlphaDropout': {
        'p': [0.1, 0.5, 0.7]
    },
    'Activation': ['Sigmoid', 'ReLU', 'PReLU', 'ELU', 'SELU', 'GELU', 'CELU', 'SiLU']
}


This function will generate a random network with at least 2 layers and maximum 50 layers.

The only predefined layers are the first and last ones, which will be linear layers. All the other layers will be selected randomly from the dictionary above.

 The softmax layer at the end is only defined at the training stage to convert the raw output of the model into probabilities at that point.

In [4]:
def generate_random_network(input_size, output_size, min_layers, max_layers):
    # Randomly choose the number of layers (excluding the first and last layers)
    num_layers = random.randint(min_layers, max_layers - 2)

    # Randomly select layers and their parameters
    layer_list = []
    
    # Add the first linear layer
    first_layer_string = f'Linear|{input_size},512,True'
    previous_output_features = 512
    layer_list.append(first_layer_string)

    # Add the intermediate layers
    for _ in range(num_layers):
        layer_type = random.choice(list(layer_params.keys()))

        if layer_type == 'Activation':
            activation_fn = random.choice(layer_params[layer_type])
            layer_string = f'Act|{activation_fn}'
        else:
            layer_params_dict = {}
            for param, values in layer_params[layer_type].items():
                layer_params_dict[param] = random.choice(values)

            if layer_type == 'Linear':
                layer_params_dict['in_features'] = previous_output_features
                previous_output_features = layer_params_dict['out_features']            
                param_string = ','.join([str(value) for value in layer_params_dict.values()])
                layer_string = f'{layer_type}|{param_string}'

            elif layer_type == 'BatchNorm1d':
                param_string = ','.join([str(value) for value in layer_params_dict.values()])
                layer_string = f'{layer_type}|{previous_output_features},{param_string}'

            elif layer_type == 'LayerNorm':
                param_string = ','.join([str(value) for value in layer_params_dict.values()])
                layer_string = f'{layer_type}|{previous_output_features},{param_string}'
            
            else:
                param_string = ','.join([str(value) for value in layer_params_dict.values()])
                layer_string = f'{layer_type}|{param_string}'

        layer_list.append(layer_string)

    # Add the last linear layer
    last_layer_string = f'Linear|{previous_output_features},{output_size},True'
    layer_list.append(last_layer_string)

    return layer_list

# Defining the number of layers
min_layers = 2
max_layers = 50

In [5]:
# Define network class to parse instructions and build PyTorch network structure (phenotype)
class Net(nn.Module):
    def __init__(self, layer_list):
        super(Net, self).__init__()

        self.layers = nn.ModuleList()

        for layer_str in layer_list:
            layer = self.parse_layer_string(layer_str)
            self.layers.append(layer)

        # to flatten the images into 1d arays
        self.flat = nn.Flatten()

    def forward(self, x):
        x=self.flat(x)
        for layer in self.layers:
            x = layer(x)    
        return x

    def parse_layer_string(self, layer_str):
        if layer_str.startswith('Linear'):
            features = layer_str.split('|')[-1].split(',')
            in_features = int(features[0])
            out_features = int(features[1])
            bias = features[-1]
            return nn.Linear(in_features, out_features, bias)
        elif layer_str.startswith('BatchNorm1d'):
            features = layer_str.split('|')[-1].split(',')
            num_features = int(features[0])
            eps = float(features[1])
            momentum = float(features[2])
            return nn.BatchNorm1d(num_features, eps, momentum)
        elif layer_str.startswith('LayerNorm'):
            features = layer_str.split('|')[-1].split(',')
            shape = int(features[0])
            eps = float(features[1])
            return nn.LayerNorm(normalized_shape=[16, shape], eps=eps) #16 is the batch size, shape is the current input shape
        elif layer_str.startswith('Dropout'):
            p = float(layer_str.split('|')[-1])
            return nn.Dropout(p=p)
        elif layer_str.startswith('AlphaDropout'):
            p = float(layer_str.split('|')[-1])
            return nn.AlphaDropout(p=p)
        elif layer_str.startswith('Act'):
            act = layer_str.split('|')[-1]
            layer_class = getattr(nn,act)
            return layer_class()
        else:
            raise ValueError('Invalid layer string: {}'.format(layer_str))

This next dictionary contains the different optimizers and the respective parameters with values that will be drawn at random.

In [6]:
optimizer_params = {
    'Adam': {
        'lr': [0.001, 0.01, 0.1],
        'betas': [(0.9, 0.999), (0.85, 0.95), (0.8, 0.9)]
    },
    'AdamW': {
        'lr': [0.001, 0.01, 0.1],
        'betas': [(0.9, 0.999), (0.85, 0.95), (0.8, 0.9)],
        'weight_decay': [0.0, 0.001, 0.01]
    },
    'Adadelta': {
        'lr': [0.1, 1.0, 10.0],
        'rho': [0.9, 0.95, 0.99]
    },
    'NAdam': {
        'lr': [0.001, 0.01, 0.1],
        'betas': [(0.9, 0.999), (0.85, 0.95), (0.8, 0.9)],
        'momentum_decay': [0.9, 0.95, 0.99]
    },
    'SGD': {
        'lr': [0.01, 0.1, 1.0],
        'momentum': [0.9, 0.95, 0.99],
        'nesterov': [True, False]
    }
}


In [7]:
# Define function to parse optimizer instructions and build PyTorch optimizer
def build_optimizer(optimizer_str, params, model):
    # Parse the optimizer string and create a PyTorch optimizer
    if optimizer_str == 'SGD':
        lr = params['lr']
        momentum = params['momentum']
        nesterov = params['nesterov']
        return torch.optim.SGD(params=model.parameters(), lr=lr, momentum=momentum, nesterov=nesterov)
    elif optimizer_str == 'Adam':
        lr = params['lr']
        betas = params['betas']
        return torch.optim.Adam(params=model.parameters(), lr=lr, betas=betas)
    elif optimizer_str == 'AdamW':
        lr = params['lr']
        betas = params['betas']
        weight_decay = params['weight_decay']
        return torch.optim.AdamW(params=model.parameters(), lr=lr, betas=betas, weight_decay=weight_decay)
    elif optimizer_str == 'NAdam':
        lr = params['lr']
        betas = params['betas']
        momentum_decay = params['momentum_decay']
        return torch.optim.NAdam(params=model.parameters(), lr=lr, betas=betas, momentum_decay=momentum_decay)
    elif optimizer_str == 'Adadelta':
        lr = params['lr']
        rho = params['rho']
        return torch.optim.Adadelta(params=model.parameters(), lr=lr, rho=rho)

## Network 1

In [8]:
layer_list = generate_random_network(input_size, output_size, min_layers, max_layers)

# Print the randomly generated network
print("Layer List:", layer_list)

Layer List: ['Linear|784,512,True', 'AlphaDropout|0.7', 'Linear|512,64,False', 'LayerNorm|64,0.001', 'Dropout|0.1', 'Dropout|0.7', 'AlphaDropout|0.1', 'BatchNorm1d|64,0.001,0.1', 'BatchNorm1d|64,1e-05,0.1', 'Dropout|0.7', 'AlphaDropout|0.7', 'BatchNorm1d|64,0.001,0.99', 'Linear|64,512,False', 'AlphaDropout|0.7', 'Dropout|0.1', 'AlphaDropout|0.1', 'AlphaDropout|0.7', 'Act|GELU', 'BatchNorm1d|512,1e-05,0.1', 'AlphaDropout|0.1', 'Linear|512,128,True', 'Linear|128,256,False', 'Linear|256,32,False', 'Dropout|0.1', 'Linear|32,32,True', 'Linear|32,10,True']


In [9]:
model1 = Net(layer_list).to(device)

# Define the loss function, we will use the standard Binary Crossentropy
loss_fn = nn.CrossEntropyLoss()

In [10]:
model1

Net(
  (layers): ModuleList(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): AlphaDropout(p=0.7, inplace=False)
    (2): Linear(in_features=512, out_features=64, bias=True)
    (3): LayerNorm((16, 64), eps=0.001, elementwise_affine=True)
    (4): Dropout(p=0.1, inplace=False)
    (5): Dropout(p=0.7, inplace=False)
    (6): AlphaDropout(p=0.1, inplace=False)
    (7): BatchNorm1d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    (8): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (9): Dropout(p=0.7, inplace=False)
    (10): AlphaDropout(p=0.7, inplace=False)
    (11): BatchNorm1d(64, eps=0.001, momentum=0.99, affine=True, track_running_stats=True)
    (12): Linear(in_features=64, out_features=512, bias=True)
    (13): AlphaDropout(p=0.7, inplace=False)
    (14): Dropout(p=0.1, inplace=False)
    (15): AlphaDropout(p=0.1, inplace=False)
    (16): AlphaDropout(p=0.7, inplace=False)
    (17): GELU(approximate=

In [11]:
# Randomly select an optimizer
optimizer_str = random.choice(list(optimizer_params.keys()))

# Randomly select parameters for the chosen optimizer
params = {}
for param, values in optimizer_params[optimizer_str].items():
    params[param] = random.choice(values)

# Print the randomly generated optimizer and its parameters
print("Optimizer:", optimizer_str)
print("Parameters:", params)

Optimizer: NAdam
Parameters: {'lr': 0.001, 'betas': (0.9, 0.999), 'momentum_decay': 0.9}


In [12]:
optimizer = build_optimizer(optimizer_str, params, model1)

In [13]:
def train_model(model):
    num_epochs = 10
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = model.to(device)
    
    for epoch in range(num_epochs):
        # Training
        accuracy_hist_train = 0
        loss_hist_train = 0

        model.train()  # Set the model to training mode

        for x_batch, y_batch in train_loader:
            x_batch = x_batch.to(device)
            y_batch = y_batch.to(device)

            # Forward pass
            pred = model(x_batch)
            pred_probs = nnf.softmax(pred, dim=1)
            loss = loss_fn(pred, y_batch)

            # Backward pass and optimization
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            # Compute training accuracy
            is_correct = (torch.argmax(pred, dim=1) == y_batch).float()
            accuracy_hist_train += is_correct.sum().item()
            loss_hist_train += loss.item()

        accuracy_hist_train /= len(train_loader.dataset)
        loss_hist_train /= len(train_loader)

        # Validation
        accuracy_hist_val = 0
        loss_hist_val = 0

        model.eval()  # Set the model to evaluation mode

        with torch.no_grad():
            for x_val, y_val in validation_loader:
                x_val = x_val.to(device)
                y_val = y_val.to(device)

                # Forward pass
                val_pred = model(x_val)
                val_pred_probs = nnf.softmax(val_pred, dim=1)
                val_loss = loss_fn(val_pred, y_val)

                # Compute validation accuracy
                val_is_correct = (torch.argmax(val_pred, dim=1) == y_val).float()
                accuracy_hist_val += val_is_correct.sum().item()
                loss_hist_val += val_loss.item()

        accuracy_hist_val /= len(validation_loader.dataset)
        loss_hist_val /= len(validation_loader)

        # Print training and validation metrics
        print(f"Epoch {epoch + 1}/{num_epochs}")
        print(f"Training Loss: {loss_hist_train:.4f}  Training Accuracy: {accuracy_hist_train:.4f}")
        print(f"Validation Loss: {loss_hist_val:.4f}  Validation Accuracy: {accuracy_hist_val:.4f}")
        print("------------------------")


In [14]:
train_model(model1)

Epoch 1/10
Training Loss: 2.3072  Training Accuracy: 0.1039
Validation Loss: 2.3005  Validation Accuracy: 0.1135
------------------------
Epoch 2/10
Training Loss: 2.3047  Training Accuracy: 0.1087
Validation Loss: 2.3008  Validation Accuracy: 0.1019
------------------------
Epoch 3/10
Training Loss: 2.3035  Training Accuracy: 0.1081
Validation Loss: 2.3037  Validation Accuracy: 0.1135
------------------------
Epoch 4/10
Training Loss: 2.3027  Training Accuracy: 0.1105
Validation Loss: 2.2996  Validation Accuracy: 0.1135
------------------------
Epoch 5/10
Training Loss: 2.3026  Training Accuracy: 0.1106
Validation Loss: 2.3016  Validation Accuracy: 0.1028
------------------------
Epoch 6/10
Training Loss: 2.3028  Training Accuracy: 0.1102
Validation Loss: 2.3003  Validation Accuracy: 0.1135
------------------------
Epoch 7/10
Training Loss: 2.3028  Training Accuracy: 0.1105
Validation Loss: 2.3008  Validation Accuracy: 0.1541
------------------------
Epoch 8/10
Training Loss: 2.3029  

## Network 2

In [15]:
layer_list = generate_random_network(input_size, output_size, min_layers, max_layers)

# Print the randomly generated network
print("Layer List:", layer_list)

Layer List: ['Linear|784,512,True', 'Act|ELU', 'AlphaDropout|0.1', 'BatchNorm1d|512,1e-05,0.99', 'AlphaDropout|0.1', 'Act|ReLU', 'Act|CELU', 'BatchNorm1d|512,0.001,0.9', 'BatchNorm1d|512,0.0001,0.9', 'Dropout|0.5', 'AlphaDropout|0.7', 'Dropout|0.7', 'LayerNorm|512,0.0001', 'LayerNorm|512,0.0001', 'BatchNorm1d|512,0.0001,0.9', 'BatchNorm1d|512,0.001,0.1', 'Linear|512,512,True', 'LayerNorm|512,1e-05', 'Dropout|0.5', 'Dropout|0.1', 'Linear|512,128,True', 'Dropout|0.7', 'Act|ELU', 'BatchNorm1d|128,0.0001,0.1', 'AlphaDropout|0.5', 'Act|CELU', 'LayerNorm|128,0.0001', 'Act|ELU', 'BatchNorm1d|128,1e-05,0.9', 'LayerNorm|128,0.001', 'Dropout|0.5', 'Linear|128,256,False', 'Dropout|0.7', 'Dropout|0.1', 'Linear|256,10,True']


In [16]:
model2 = Net(layer_list).to(device)

# Define the loss function, we will use the standard Binary Crossentropy
loss_fn = nn.CrossEntropyLoss()

In [17]:
model2

Net(
  (layers): ModuleList(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ELU(alpha=1.0)
    (2): AlphaDropout(p=0.1, inplace=False)
    (3): BatchNorm1d(512, eps=1e-05, momentum=0.99, affine=True, track_running_stats=True)
    (4): AlphaDropout(p=0.1, inplace=False)
    (5): ReLU()
    (6): CELU(alpha=1.0)
    (7): BatchNorm1d(512, eps=0.001, momentum=0.9, affine=True, track_running_stats=True)
    (8): BatchNorm1d(512, eps=0.0001, momentum=0.9, affine=True, track_running_stats=True)
    (9): Dropout(p=0.5, inplace=False)
    (10): AlphaDropout(p=0.7, inplace=False)
    (11): Dropout(p=0.7, inplace=False)
    (12-13): 2 x LayerNorm((16, 512), eps=0.0001, elementwise_affine=True)
    (14): BatchNorm1d(512, eps=0.0001, momentum=0.9, affine=True, track_running_stats=True)
    (15): BatchNorm1d(512, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    (16): Linear(in_features=512, out_features=512, bias=True)
    (17): LayerNorm((16, 512), eps=1e-0

In [18]:
# Randomly select an optimizer
optimizer_str = random.choice(list(optimizer_params.keys()))

# Randomly select parameters for the chosen optimizer
params = {}
for param, values in optimizer_params[optimizer_str].items():
    params[param] = random.choice(values)

# Print the randomly generated optimizer and its parameters
print("Optimizer:", optimizer_str)
print("Parameters:", params)

Optimizer: Adadelta
Parameters: {'lr': 10.0, 'rho': 0.9}


In [19]:
optimizer = build_optimizer(optimizer_str, params, model2)

In [20]:
train_model(model2)

Epoch 1/10
Training Loss: 4.7165  Training Accuracy: 0.1010
Validation Loss: 2.7371  Validation Accuracy: 0.0997
------------------------
Epoch 2/10
Training Loss: 3.4654  Training Accuracy: 0.1006
Validation Loss: 2.6682  Validation Accuracy: 0.1022
------------------------
Epoch 3/10
Training Loss: 3.4435  Training Accuracy: 0.1023
Validation Loss: 2.8221  Validation Accuracy: 0.0965
------------------------
Epoch 4/10
Training Loss: 3.4484  Training Accuracy: 0.0997
Validation Loss: 2.6729  Validation Accuracy: 0.1012
------------------------
Epoch 5/10
Training Loss: 3.4465  Training Accuracy: 0.1004
Validation Loss: 2.6483  Validation Accuracy: 0.1021
------------------------
Epoch 6/10
Training Loss: 3.4383  Training Accuracy: 0.1003
Validation Loss: 2.7669  Validation Accuracy: 0.1038
------------------------
Epoch 7/10
Training Loss: 3.4360  Training Accuracy: 0.1016
Validation Loss: 2.7467  Validation Accuracy: 0.1040
------------------------
Epoch 8/10
Training Loss: 3.4305  

## Network 3

In [21]:
layer_list = generate_random_network(input_size, output_size, min_layers, max_layers)

# Print the randomly generated network
print("Layer List:", layer_list)

Layer List: ['Linear|784,512,True', 'AlphaDropout|0.1', 'BatchNorm1d|512,0.001,0.9', 'BatchNorm1d|512,0.001,0.1', 'Act|Sigmoid', 'BatchNorm1d|512,0.001,0.99', 'Act|SiLU', 'LayerNorm|512,0.0001', 'Act|SELU', 'AlphaDropout|0.7', 'LayerNorm|512,0.001', 'AlphaDropout|0.1', 'LayerNorm|512,0.0001', 'Act|CELU', 'Linear|512,64,False', 'Dropout|0.7', 'AlphaDropout|0.5', 'AlphaDropout|0.7', 'Act|Sigmoid', 'AlphaDropout|0.7', 'LayerNorm|64,0.0001', 'Dropout|0.5', 'Linear|64,256,True', 'Linear|256,128,False', 'BatchNorm1d|128,1e-05,0.9', 'Act|ELU', 'BatchNorm1d|128,0.0001,0.1', 'AlphaDropout|0.7', 'LayerNorm|128,1e-05', 'Act|SiLU', 'Act|SELU', 'LayerNorm|128,0.001', 'BatchNorm1d|128,0.001,0.1', 'Linear|128,10,True']


In [22]:
model3 = Net(layer_list).to(device)

# Define the loss function, we will use the standard Binary Crossentropy
loss_fn = nn.CrossEntropyLoss()

In [23]:
model3

Net(
  (layers): ModuleList(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): AlphaDropout(p=0.1, inplace=False)
    (2): BatchNorm1d(512, eps=0.001, momentum=0.9, affine=True, track_running_stats=True)
    (3): BatchNorm1d(512, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    (4): Sigmoid()
    (5): BatchNorm1d(512, eps=0.001, momentum=0.99, affine=True, track_running_stats=True)
    (6): SiLU()
    (7): LayerNorm((16, 512), eps=0.0001, elementwise_affine=True)
    (8): SELU()
    (9): AlphaDropout(p=0.7, inplace=False)
    (10): LayerNorm((16, 512), eps=0.001, elementwise_affine=True)
    (11): AlphaDropout(p=0.1, inplace=False)
    (12): LayerNorm((16, 512), eps=0.0001, elementwise_affine=True)
    (13): CELU(alpha=1.0)
    (14): Linear(in_features=512, out_features=64, bias=True)
    (15): Dropout(p=0.7, inplace=False)
    (16): AlphaDropout(p=0.5, inplace=False)
    (17): AlphaDropout(p=0.7, inplace=False)
    (18): Sigmoid()
    (19): Alph

In [24]:
# Randomly select an optimizer
optimizer_str = random.choice(list(optimizer_params.keys()))

# Randomly select parameters for the chosen optimizer
params = {}
for param, values in optimizer_params[optimizer_str].items():
    params[param] = random.choice(values)

# Print the randomly generated optimizer and its parameters
print("Optimizer:", optimizer_str)
print("Parameters:", params)

Optimizer: Adam
Parameters: {'lr': 0.001, 'betas': (0.85, 0.95)}


In [25]:
optimizer = build_optimizer(optimizer_str, params, model3)

In [26]:
train_model(model3)

Epoch 1/10
Training Loss: 2.3344  Training Accuracy: 0.1021
Validation Loss: 2.2422  Validation Accuracy: 0.1661
------------------------
Epoch 2/10
Training Loss: 2.3145  Training Accuracy: 0.1026
Validation Loss: 2.2134  Validation Accuracy: 0.2435
------------------------
Epoch 3/10
Training Loss: 2.3107  Training Accuracy: 0.1047
Validation Loss: 2.2704  Validation Accuracy: 0.1824
------------------------
Epoch 4/10
Training Loss: 2.3068  Training Accuracy: 0.1037
Validation Loss: 2.2456  Validation Accuracy: 0.2592
------------------------
Epoch 5/10
Training Loss: 2.3046  Training Accuracy: 0.1060
Validation Loss: 2.2757  Validation Accuracy: 0.2165
------------------------
Epoch 6/10
Training Loss: 2.3032  Training Accuracy: 0.1080
Validation Loss: 2.2772  Validation Accuracy: 0.2506
------------------------
Epoch 7/10
Training Loss: 2.3021  Training Accuracy: 0.1084
Validation Loss: 2.2923  Validation Accuracy: 0.1288
------------------------
Epoch 8/10
Training Loss: 2.3026  

## Network 4

In [27]:
layer_list = generate_random_network(input_size, output_size, min_layers, max_layers)

# Print the randomly generated network
print("Layer List:", layer_list)

Layer List: ['Linear|784,512,True', 'Act|ELU', 'Act|SiLU', 'LayerNorm|512,1e-05', 'LayerNorm|512,0.001', 'Linear|512,512,True', 'LayerNorm|512,0.0001', 'Linear|512,10,True']


In [28]:
model4 = Net(layer_list).to(device)

# Define the loss function, we will use the standard Binary Crossentropy
loss_fn = nn.CrossEntropyLoss()

In [29]:
model4

Net(
  (layers): ModuleList(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ELU(alpha=1.0)
    (2): SiLU()
    (3): LayerNorm((16, 512), eps=1e-05, elementwise_affine=True)
    (4): LayerNorm((16, 512), eps=0.001, elementwise_affine=True)
    (5): Linear(in_features=512, out_features=512, bias=True)
    (6): LayerNorm((16, 512), eps=0.0001, elementwise_affine=True)
    (7): Linear(in_features=512, out_features=10, bias=True)
  )
  (flat): Flatten(start_dim=1, end_dim=-1)
)

In [30]:
# Randomly select an optimizer
optimizer_str = random.choice(list(optimizer_params.keys()))

# Randomly select parameters for the chosen optimizer
params = {}
for param, values in optimizer_params[optimizer_str].items():
    params[param] = random.choice(values)

# Print the randomly generated optimizer and its parameters
print("Optimizer:", optimizer_str)
print("Parameters:", params)

Optimizer: Adam
Parameters: {'lr': 0.001, 'betas': (0.8, 0.9)}


In [31]:
optimizer = build_optimizer(optimizer_str, params, model4)

In [32]:
train_model(model4)

Epoch 1/10
Training Loss: 0.2643  Training Accuracy: 0.9220
Validation Loss: 0.1431  Validation Accuracy: 0.9604
------------------------
Epoch 2/10
Training Loss: 0.1269  Training Accuracy: 0.9633
Validation Loss: 0.1164  Validation Accuracy: 0.9673
------------------------
Epoch 3/10
Training Loss: 0.0970  Training Accuracy: 0.9719
Validation Loss: 0.1077  Validation Accuracy: 0.9713
------------------------
Epoch 4/10
Training Loss: 0.0822  Training Accuracy: 0.9766
Validation Loss: 0.0997  Validation Accuracy: 0.9716
------------------------
Epoch 5/10
Training Loss: 0.0711  Training Accuracy: 0.9798
Validation Loss: 0.0944  Validation Accuracy: 0.9731
------------------------
Epoch 6/10
Training Loss: 0.0619  Training Accuracy: 0.9825
Validation Loss: 0.1020  Validation Accuracy: 0.9725
------------------------
Epoch 7/10
Training Loss: 0.0576  Training Accuracy: 0.9840
Validation Loss: 0.0916  Validation Accuracy: 0.9765
------------------------
Epoch 8/10
Training Loss: 0.0497  

## Network 5

In [33]:
layer_list = generate_random_network(input_size, output_size, min_layers, max_layers)

# Print the randomly generated network
print("Layer List:", layer_list)

Layer List: ['Linear|784,512,True', 'Dropout|0.5', 'AlphaDropout|0.5', 'Act|GELU', 'LayerNorm|512,1e-05', 'Dropout|0.1', 'Act|CELU', 'Act|SELU', 'AlphaDropout|0.1', 'LayerNorm|512,0.001', 'AlphaDropout|0.5', 'LayerNorm|512,0.0001', 'LayerNorm|512,1e-05', 'AlphaDropout|0.7', 'BatchNorm1d|512,1e-05,0.1', 'LayerNorm|512,0.0001', 'LayerNorm|512,0.0001', 'Linear|512,10,True']


In [34]:
model5 = Net(layer_list).to(device)

# Define the loss function, we will use the standard Binary Crossentropy
loss_fn = nn.CrossEntropyLoss()

In [35]:
model5

Net(
  (layers): ModuleList(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): Dropout(p=0.5, inplace=False)
    (2): AlphaDropout(p=0.5, inplace=False)
    (3): GELU(approximate='none')
    (4): LayerNorm((16, 512), eps=1e-05, elementwise_affine=True)
    (5): Dropout(p=0.1, inplace=False)
    (6): CELU(alpha=1.0)
    (7): SELU()
    (8): AlphaDropout(p=0.1, inplace=False)
    (9): LayerNorm((16, 512), eps=0.001, elementwise_affine=True)
    (10): AlphaDropout(p=0.5, inplace=False)
    (11): LayerNorm((16, 512), eps=0.0001, elementwise_affine=True)
    (12): LayerNorm((16, 512), eps=1e-05, elementwise_affine=True)
    (13): AlphaDropout(p=0.7, inplace=False)
    (14): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (15-16): 2 x LayerNorm((16, 512), eps=0.0001, elementwise_affine=True)
    (17): Linear(in_features=512, out_features=10, bias=True)
  )
  (flat): Flatten(start_dim=1, end_dim=-1)
)

In [36]:
# Randomly select an optimizer
optimizer_str = random.choice(list(optimizer_params.keys()))

# Randomly select parameters for the chosen optimizer
params = {}
for param, values in optimizer_params[optimizer_str].items():
    params[param] = random.choice(values)

# Print the randomly generated optimizer and its parameters
print("Optimizer:", optimizer_str)
print("Parameters:", params)

Optimizer: AdamW
Parameters: {'lr': 0.001, 'betas': (0.85, 0.95), 'weight_decay': 0.0}


In [37]:
optimizer = build_optimizer(optimizer_str, params, model5)

In [38]:
train_model(model5)

Epoch 1/10
Training Loss: 2.3583  Training Accuracy: 0.1324
Validation Loss: 0.6179  Validation Accuracy: 0.8220
------------------------
Epoch 2/10
Training Loss: 2.3542  Training Accuracy: 0.1301
Validation Loss: 0.8411  Validation Accuracy: 0.7654
------------------------
Epoch 3/10
Training Loss: 2.3572  Training Accuracy: 0.1264
Validation Loss: 1.1082  Validation Accuracy: 0.7089
------------------------
Epoch 4/10
Training Loss: 2.3528  Training Accuracy: 0.1235
Validation Loss: 1.3362  Validation Accuracy: 0.6020
------------------------
Epoch 5/10
Training Loss: 2.3480  Training Accuracy: 0.1264
Validation Loss: 1.4301  Validation Accuracy: 0.4936
------------------------
Epoch 6/10
Training Loss: 2.3391  Training Accuracy: 0.1259
Validation Loss: 1.5244  Validation Accuracy: 0.4723
------------------------
Epoch 7/10
Training Loss: 2.3327  Training Accuracy: 0.1283
Validation Loss: 1.4946  Validation Accuracy: 0.4743
------------------------
Epoch 8/10
Training Loss: 2.3174  

## Crossover between networks 1 and 2

In [62]:
def crossover_networks(network1, network2):
    # Copy the networks to avoid modifying the original networks
    network1_copy = copy.deepcopy(network1)
    network2_copy = copy.deepcopy(network2)

    # Determine the cutoff points based on the smaller network's size
    min_num_layers = min(len(network1_copy.layers), len(network2_copy.layers))

    # Generate two unique cutoff points within the valid range
    valid_range = range(1, min_num_layers - 1)
    cutoff_points = random.sample(valid_range, 2)
    cutoff_points.sort()
    print("The cutoff points for the crossover are", cutoff_points)

    # Swap the blocks of layers between the networks
    for i in range(cutoff_points[0], cutoff_points[1] + 1):
        network1_copy.layers[i], network2_copy.layers[i] = network2_copy.layers[i], network1_copy.layers[i]

        # Adjust the feature values in the swapped layers
        if isinstance(network1_copy.layers[i], nn.Linear):
            network1_copy.layers[i].in_features, network2_copy.layers[i].in_features = network2_copy.layers[i].in_features, network1_copy.layers[i].in_features
            network1_copy.layers[i].out_features, network2_copy.layers[i].out_features = network2_copy.layers[i].out_features, network1_copy.layers[i].out_features

        elif isinstance(network1_copy.layers[i], (nn.BatchNorm1d, nn.LayerNorm)):
            # Find the index of the last linear layer before the swap position in network1_copy
            last_linear_index_1 = -1
            for j in range(i - 1, -1, -1):
                if isinstance(network1_copy.layers[j], nn.Linear):
                    last_linear_index_1 = j
                    break

            # Find the index of the last linear layer before the swap position in network2_copy
            last_linear_index_2 = -1
            for j in range(i - 1, -1, -1):
                if isinstance(network2_copy.layers[j], nn.Linear):
                    last_linear_index_2 = j
                    break

            if last_linear_index_1 != -1 and last_linear_index_2 != -1:
                # Adjust the feature values in the batch normalization or layer normalization layers
                last_linear_output_1 = network1_copy.layers[last_linear_index_1].out_features
                last_linear_output_2 = network2_copy.layers[last_linear_index_2].out_features

                network1_copy.layers[i].num_features = last_linear_output_1
                network2_copy.layers[i].num_features = last_linear_output_2

                network1_copy.layers[i].normalized_shape = (16, last_linear_output_1)
                network2_copy.layers[i].normalized_shape = (16, last_linear_output_2)


    return network1_copy, network2_copy


In [65]:
network1_copy, network2_copy = crossover_networks(model1, model2)

The cutoff points for the crossover are [2, 15]


As we can see below, even after the crossover the number of input features is coherent with the number of output features of the previous layer.

In [66]:
print(network1_copy)

Net(
  (layers): ModuleList(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): AlphaDropout(p=0.7, inplace=False)
    (2): AlphaDropout(p=0.1, inplace=False)
    (3): BatchNorm1d(512, eps=1e-05, momentum=0.99, affine=True, track_running_stats=True)
    (4): AlphaDropout(p=0.1, inplace=False)
    (5): ReLU()
    (6): CELU(alpha=1.0)
    (7): BatchNorm1d(512, eps=0.001, momentum=0.9, affine=True, track_running_stats=True)
    (8): BatchNorm1d(512, eps=0.0001, momentum=0.9, affine=True, track_running_stats=True)
    (9): Dropout(p=0.5, inplace=False)
    (10): AlphaDropout(p=0.7, inplace=False)
    (11): Dropout(p=0.7, inplace=False)
    (12-13): 2 x LayerNorm((16, 512), eps=0.0001, elementwise_affine=True)
    (14): BatchNorm1d(512, eps=0.0001, momentum=0.9, affine=True, track_running_stats=True)
    (15): BatchNorm1d(512, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    (16): AlphaDropout(p=0.7, inplace=False)
    (17): GELU(approximate='none')
  

In [67]:
print(network2_copy)

Net(
  (layers): ModuleList(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ELU(alpha=1.0)
    (2): Linear(in_features=512, out_features=64, bias=True)
    (3): LayerNorm((16, 64), eps=0.001, elementwise_affine=True)
    (4): Dropout(p=0.1, inplace=False)
    (5): Dropout(p=0.7, inplace=False)
    (6): AlphaDropout(p=0.1, inplace=False)
    (7): BatchNorm1d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    (8): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (9): Dropout(p=0.7, inplace=False)
    (10): AlphaDropout(p=0.7, inplace=False)
    (11): BatchNorm1d(64, eps=0.001, momentum=0.99, affine=True, track_running_stats=True)
    (12): Linear(in_features=64, out_features=512, bias=True)
    (13): AlphaDropout(p=0.7, inplace=False)
    (14): Dropout(p=0.1, inplace=False)
    (15): AlphaDropout(p=0.1, inplace=False)
    (16): Linear(in_features=512, out_features=512, bias=True)
    (17): LayerNorm((16, 512)

In [68]:
train_model(network1_copy)

Epoch 1/10
Training Loss: 2.3036  Training Accuracy: 0.1091
Validation Loss: 2.2972  Validation Accuracy: 0.1526
------------------------
Epoch 2/10
Training Loss: 2.3045  Training Accuracy: 0.1077
Validation Loss: 2.3011  Validation Accuracy: 0.1181
------------------------
Epoch 3/10
Training Loss: 2.3037  Training Accuracy: 0.1081
Validation Loss: 2.2944  Validation Accuracy: 0.1175
------------------------
Epoch 4/10
Training Loss: 2.3040  Training Accuracy: 0.1085
Validation Loss: 2.2953  Validation Accuracy: 0.1569
------------------------
Epoch 5/10
Training Loss: 2.3033  Training Accuracy: 0.1102
Validation Loss: 2.3101  Validation Accuracy: 0.1162
------------------------
Epoch 6/10
Training Loss: 2.3041  Training Accuracy: 0.1087
Validation Loss: 2.3056  Validation Accuracy: 0.1201
------------------------
Epoch 7/10
Training Loss: 2.3040  Training Accuracy: 0.1096
Validation Loss: 2.3036  Validation Accuracy: 0.1030
------------------------
Epoch 8/10
Training Loss: 2.3030  

In [69]:
train_model(network2_copy)

Epoch 1/10
Training Loss: 3.2231  Training Accuracy: 0.1002
Validation Loss: 2.6348  Validation Accuracy: 0.1033
------------------------
Epoch 2/10
Training Loss: 3.2145  Training Accuracy: 0.1023
Validation Loss: 2.6352  Validation Accuracy: 0.1033
------------------------
Epoch 3/10
Training Loss: 3.2208  Training Accuracy: 0.1019
Validation Loss: 2.6348  Validation Accuracy: 0.1033
------------------------
Epoch 4/10
Training Loss: 3.2312  Training Accuracy: 0.1013
Validation Loss: 2.6348  Validation Accuracy: 0.1033
------------------------
Epoch 5/10
Training Loss: 3.2269  Training Accuracy: 0.0993
Validation Loss: 2.6355  Validation Accuracy: 0.1033
------------------------
Epoch 6/10
Training Loss: 3.2416  Training Accuracy: 0.0985
Validation Loss: 2.6370  Validation Accuracy: 0.1033
------------------------
Epoch 7/10
Training Loss: 3.2339  Training Accuracy: 0.0988
Validation Loss: 2.6371  Validation Accuracy: 0.1033
------------------------
Epoch 8/10
Training Loss: 3.2182  

## Network 3 with add layer mutation

In [70]:
def add_layer_mutation(network):
    # Clone the original network
    new_network = copy.deepcopy(network)

    # Generate 1 layer at random (the first and third layers will be the linear default ones)
    random_network = generate_random_network(input_size=0, output_size=0, min_layers=1, max_layers=3)

    # Get the right layer string from the randomly generated network
    new_layer_str = random_network[1]  # Assuming there is only one layer in the random network

    # Parse the layer string and convert it into a layer object
    new_layer = new_network.parse_layer_string(new_layer_str)

    # Randomly choose a position for the new layer
    insert_index = random.randint(1, len(new_network.layers) - 1)  # Exclude the first and last layers
    print("The new random layer will be inserted at position ", insert_index)

    # Check if the new layer is Linear, BatchNorm1d, or LayerNorm
    if isinstance(new_layer, (nn.Linear, nn.BatchNorm1d, nn.LayerNorm)):
        # Find the index of the last linear layer before the insert position
        last_linear_index = -1
        for i in range(insert_index - 1, -1, -1):
            if isinstance(new_network.layers[i], nn.Linear):
                last_linear_index = i
                break
        # Get the output size of the last linear layer
        last_linear_output = new_network.layers[last_linear_index].out_features

        # Adjust the number of features in the new layer
        if isinstance(new_layer, nn.Linear):
            new_layer.in_features = last_linear_output
            new_layer.out_features = new_layer.in_features #it will output the same number of features to not disrupt the sequence
        elif isinstance(new_layer, nn.BatchNorm1d):
            new_layer.num_features = last_linear_output
        elif isinstance(new_layer, nn.LayerNorm):
            new_layer.normalized_shape = (16, last_linear_output)

    # Insert the new layer into the genotype at the chosen position
    new_network.layers.insert(insert_index, new_layer)

    return new_network

In [74]:
new_network = add_layer_mutation(model3)

The new random layer will be inserted at position  30


In [75]:
print(new_network)

Net(
  (layers): ModuleList(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): AlphaDropout(p=0.1, inplace=False)
    (2): BatchNorm1d(512, eps=0.001, momentum=0.9, affine=True, track_running_stats=True)
    (3): BatchNorm1d(512, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    (4): Sigmoid()
    (5): BatchNorm1d(512, eps=0.001, momentum=0.99, affine=True, track_running_stats=True)
    (6): SiLU()
    (7): LayerNorm((16, 512), eps=0.0001, elementwise_affine=True)
    (8): SELU()
    (9): AlphaDropout(p=0.7, inplace=False)
    (10): LayerNorm((16, 512), eps=0.001, elementwise_affine=True)
    (11): AlphaDropout(p=0.1, inplace=False)
    (12): LayerNorm((16, 512), eps=0.0001, elementwise_affine=True)
    (13): CELU(alpha=1.0)
    (14): Linear(in_features=512, out_features=64, bias=True)
    (15): Dropout(p=0.7, inplace=False)
    (16): AlphaDropout(p=0.5, inplace=False)
    (17): AlphaDropout(p=0.7, inplace=False)
    (18): Sigmoid()
    (19): Alph

In [76]:
train_model(new_network)

Epoch 1/10
Training Loss: 2.3025  Training Accuracy: 0.1094
Validation Loss: 2.2953  Validation Accuracy: 0.1156
------------------------
Epoch 2/10
Training Loss: 2.3026  Training Accuracy: 0.1109
Validation Loss: 2.2950  Validation Accuracy: 0.1155
------------------------
Epoch 3/10
Training Loss: 2.3026  Training Accuracy: 0.1107
Validation Loss: 2.2958  Validation Accuracy: 0.1179
------------------------
Epoch 4/10
Training Loss: 2.3026  Training Accuracy: 0.1101
Validation Loss: 2.2961  Validation Accuracy: 0.1117
------------------------
Epoch 5/10
Training Loss: 2.3024  Training Accuracy: 0.1110
Validation Loss: 2.2937  Validation Accuracy: 0.1276
------------------------
Epoch 6/10
Training Loss: 2.3025  Training Accuracy: 0.1110
Validation Loss: 2.2956  Validation Accuracy: 0.1192
------------------------
Epoch 7/10
Training Loss: 2.3025  Training Accuracy: 0.1114
Validation Loss: 2.2981  Validation Accuracy: 0.1028
------------------------
Epoch 8/10
Training Loss: 2.3025  

## Network 4 with remove layer mutation

In [85]:
def remove_layer_mutation(network):
    # Clone the original network
    new_network = copy.deepcopy(network)

    # Check the number of layers in the network genotype
    num_layers = len(new_network.layers)

    # Randomly choose a layer to remove (excluding the first and last layers)
    remove_index = random.randint(1, num_layers - 2)
    print("The position of the layer that will be removed is ", remove_index)
    
    # Remove the layer from the genotype
    removed_layer = new_network.layers.pop(remove_index)

    # Check if the removed layer is a Linear layer
    if isinstance(removed_layer, nn.Linear):
        next_linear_index = -1
        for i in range(remove_index, num_layers):
            if isinstance(new_network.layers[i], nn.Linear):
                next_linear_index = i
                break

        if next_linear_index != -1:
            next_linear_layer = new_network.layers[next_linear_index]
            next_linear_layer.in_features = removed_layer.in_features

            for i in range(remove_index + 1, next_linear_index):
                if isinstance(new_network.layers[i], nn.BatchNorm1d):
                    new_network.layers[i].num_features = removed_layer.in_features

                elif isinstance(new_network.layers[i], nn.LayerNorm):
                    new_network.layers[i].normalized_shape = (16, removed_layer.in_features)

    return new_network

In [86]:
new_network = remove_layer_mutation(model4)

The position of the layer that will be removed is  3


In [88]:
print(new_network)

Net(
  (layers): ModuleList(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ELU(alpha=1.0)
    (2): SiLU()
    (3): LayerNorm((16, 512), eps=0.001, elementwise_affine=True)
    (4): Linear(in_features=512, out_features=512, bias=True)
    (5): LayerNorm((16, 512), eps=0.0001, elementwise_affine=True)
    (6): Linear(in_features=512, out_features=10, bias=True)
  )
  (flat): Flatten(start_dim=1, end_dim=-1)
)


In [89]:
train_model(new_network)

Epoch 1/10
Training Loss: 0.0707  Training Accuracy: 0.9776
Validation Loss: 0.1593  Validation Accuracy: 0.9614
------------------------
Epoch 2/10
Training Loss: 0.0720  Training Accuracy: 0.9779
Validation Loss: 0.1593  Validation Accuracy: 0.9614
------------------------
Epoch 3/10
Training Loss: 0.0702  Training Accuracy: 0.9780
Validation Loss: 0.1593  Validation Accuracy: 0.9614
------------------------
Epoch 4/10
Training Loss: 0.0715  Training Accuracy: 0.9775
Validation Loss: 0.1593  Validation Accuracy: 0.9614
------------------------
Epoch 5/10
Training Loss: 0.0714  Training Accuracy: 0.9779
Validation Loss: 0.1593  Validation Accuracy: 0.9614
------------------------
Epoch 6/10
Training Loss: 0.0713  Training Accuracy: 0.9778
Validation Loss: 0.1593  Validation Accuracy: 0.9614
------------------------
Epoch 7/10
Training Loss: 0.0721  Training Accuracy: 0.9774
Validation Loss: 0.1593  Validation Accuracy: 0.9614
------------------------
Epoch 8/10
Training Loss: 0.0716  

## Network 5 with change optimizer mutation

In [115]:
def change_optimizer_mutation(model, optimizer_str):
    # Randomly select a new optimizer
    new_optimizer_str = random.choice(list(optimizer_params.keys()))

    if new_optimizer_str == optimizer_str:
        # The optimizer did not change, so we will select a random parameter to change its value
        params = optimizer_params[optimizer_str].copy()  # Make a copy of the parameters

        # Randomly select a parameter to change
        param_to_change = random.choice(list(params.keys()))

        # Randomly select a new value for the parameter
        new_value = random.choice(optimizer_params[optimizer_str][param_to_change])

        # Update the parameter with the new value
        params[param_to_change] = new_value
    else:
        # The optimizer changed, so we need to select new parameters for the new optimizer
        new_optimizer_params = optimizer_params[new_optimizer_str]
        params = {}
        for param, values in new_optimizer_params.items():
            params[param] = random.choice(values)

    # Print the randomly generated optimizer and its parameters
    print("Optimizer:", new_optimizer_str)
    print("Parameters:", params)

    # Rebuild the optimizer with the new optimizer string and parameters
    optimizer = build_optimizer(new_optimizer_str, params, model)
    return optimizer


In [117]:
optimizer = change_optimizer_mutation(model5, optimizer_str)

Optimizer: Adam
Parameters: {'lr': 0.001, 'betas': (0.85, 0.95)}


In [118]:
train_model(model5)

Epoch 1/10
Training Loss: 2.2337  Training Accuracy: 0.1708
Validation Loss: 1.3555  Validation Accuracy: 0.5721
------------------------
Epoch 2/10
Training Loss: 2.2196  Training Accuracy: 0.1762
Validation Loss: 1.4887  Validation Accuracy: 0.5179
------------------------
Epoch 3/10
Training Loss: 2.2138  Training Accuracy: 0.1798
Validation Loss: 1.4069  Validation Accuracy: 0.5359
------------------------
Epoch 4/10
Training Loss: 2.2142  Training Accuracy: 0.1770
Validation Loss: 1.3402  Validation Accuracy: 0.6093
------------------------
Epoch 5/10
Training Loss: 2.2225  Training Accuracy: 0.1732
Validation Loss: 1.2619  Validation Accuracy: 0.5819
------------------------
Epoch 6/10
Training Loss: 2.2271  Training Accuracy: 0.1723
Validation Loss: 1.4103  Validation Accuracy: 0.5307
------------------------
Epoch 7/10
Training Loss: 2.2364  Training Accuracy: 0.1659
Validation Loss: 1.4790  Validation Accuracy: 0.4943
------------------------
Epoch 8/10
Training Loss: 2.2501  

The end. 🔚