# Clasificación binaria

Otro tipo de problema que se suele resolver con redes neuronales es el de clasificación binaria. Ahora no tenemos un valor que predecir, sino que en función de las entradas tenemos que clasificar los datos en dos clases distintas

Importamos la base de datos de tipos de cancer

In [1]:
from sklearn import datasets

cancer = datasets.load_breast_cancer()

Podemos ver qué trae esta base de datos

In [2]:
cancer.keys()

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename'])

In [3]:
print(cancer['target_names'])

['malignant' 'benign']


La llave `DESCR` es una descripción de la base de datos

In [4]:
print(cancer['DESCR'])

.. _breast_cancer_dataset:

Breast cancer wisconsin (diagnostic) dataset
--------------------------------------------

**Data Set Characteristics:**

    :Number of Instances: 569

    :Number of Attributes: 30 numeric, predictive attributes and the class

    :Attribute Information:
        - radius (mean of distances from center to points on the perimeter)
        - texture (standard deviation of gray-scale values)
        - perimeter
        - area
        - smoothness (local variation in radius lengths)
        - compactness (perimeter^2 / area - 1.0)
        - concavity (severity of concave portions of the contour)
        - concave points (number of concave portions of the contour)
        - symmetry
        - fractal dimension ("coastline approximation" - 1)

        The mean, standard error, and "worst" or largest (mean of the three
        worst/largest values) of these features were computed for each image,
        resulting in 30 features.  For instance, field 0 is Mean Radi

Además tiene las llaves `data` y `target` donde se encuentran los datos anteriormente descritos. La llave `feature_names` contiene los numbres de cada una de las características

Así que creamos un dataframe con los datos

In [5]:
import pandas as pd

cancer_df = pd.DataFrame(cancer['data'], columns=cancer['feature_names'])
cancer_df['type'] = cancer['target']
cancer_df.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,type
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,0
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,0
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,0
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,0
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,0


Vemos las posibles clases que hay

In [6]:
print(cancer.target_names)

['malignant' 'benign']


Vemos cuantos elementos hay de cada clase

In [7]:
cancer_df['type'].value_counts()

1    357
0    212
Name: type, dtype: int64

Por último vemos si hay algún dato faltante

In [8]:
cancer_df.isnull().sum()

mean radius                0
mean texture               0
mean perimeter             0
mean area                  0
mean smoothness            0
mean compactness           0
mean concavity             0
mean concave points        0
mean symmetry              0
mean fractal dimension     0
radius error               0
texture error              0
perimeter error            0
area error                 0
smoothness error           0
compactness error          0
concavity error            0
concave points error       0
symmetry error             0
fractal dimension error    0
worst radius               0
worst texture              0
worst perimeter            0
worst area                 0
worst smoothness           0
worst compactness          0
worst concavity            0
worst concave points       0
worst symmetry             0
worst fractal dimension    0
type                       0
dtype: int64

## División de los datos en train y validacion

Para poder entrenar hemos visto que necesitamos dividir los datos en un conjunto de datos de entrenamiento y en un conjunto de datos de validación. Así que dividimos nuestros datos en estos dos conjuntos.

Primero vamos a ver cuantos datos tenemos

In [9]:
len(cancer_df)

569

Como no tenemos muchos datos vamos a dividir el conjunto de datos en un 80% para entrenamiento entrenamiento y un 20% para validación

In [10]:
cancer_train = cancer_df.iloc[0:int(0.8*len(cancer_df))]
cancer_val = cancer_df.iloc[int(0.8*len(cancer_df)):]

len(cancer_train), len(cancer_val), len(cancer_df), len(cancer_train) + len(cancer_val)

(455, 114, 569, 569)

Vemos cómo quedan los nuevos dataframes

In [11]:
cancer_train.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,type
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,0
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,0
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,0
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,0
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,0


In [12]:
cancer_val.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,type
455,13.38,30.72,86.34,557.2,0.09245,0.07426,0.02819,0.03264,0.1375,0.06016,...,41.61,96.69,705.6,0.1172,0.1421,0.07003,0.07763,0.2196,0.07675,1
456,11.63,29.29,74.87,415.1,0.09357,0.08574,0.0716,0.02017,0.1799,0.06166,...,38.81,86.04,527.8,0.1406,0.2031,0.2923,0.06835,0.2884,0.0722,1
457,13.21,25.25,84.1,537.9,0.08791,0.05205,0.02772,0.02068,0.1619,0.05584,...,34.23,91.29,632.9,0.1289,0.1063,0.139,0.06005,0.2444,0.06788,1
458,13.0,25.13,82.61,520.2,0.08369,0.05073,0.01206,0.01762,0.1667,0.05449,...,31.88,91.06,628.5,0.1218,0.1093,0.04462,0.05921,0.2306,0.06291,1
459,9.755,28.2,61.68,290.9,0.07984,0.04626,0.01541,0.01043,0.1621,0.05952,...,36.92,68.03,349.9,0.111,0.1109,0.0719,0.04866,0.2321,0.07211,1


Vemos que al dividir el dataframe, en el dataframe de validación los índices no empiezan en 0. Esto nos puede dar problemas más adelante, así que reseteamos los índices

In [13]:
cancer_val = cancer_val.reset_index()   # reset index
cancer_val = cancer_val.drop(columns=['index']) # drop old index

cancer_val.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,type
0,13.38,30.72,86.34,557.2,0.09245,0.07426,0.02819,0.03264,0.1375,0.06016,...,41.61,96.69,705.6,0.1172,0.1421,0.07003,0.07763,0.2196,0.07675,1
1,11.63,29.29,74.87,415.1,0.09357,0.08574,0.0716,0.02017,0.1799,0.06166,...,38.81,86.04,527.8,0.1406,0.2031,0.2923,0.06835,0.2884,0.0722,1
2,13.21,25.25,84.1,537.9,0.08791,0.05205,0.02772,0.02068,0.1619,0.05584,...,34.23,91.29,632.9,0.1289,0.1063,0.139,0.06005,0.2444,0.06788,1
3,13.0,25.13,82.61,520.2,0.08369,0.05073,0.01206,0.01762,0.1667,0.05449,...,31.88,91.06,628.5,0.1218,0.1093,0.04462,0.05921,0.2306,0.06291,1
4,9.755,28.2,61.68,290.9,0.07984,0.04626,0.01541,0.01043,0.1621,0.05952,...,36.92,68.03,349.9,0.111,0.1109,0.0719,0.04866,0.2321,0.07211,1


## Dataset y Dataloader

Creamos el dataset

In [26]:
import torch

class CancerDataset(torch.utils.data.Dataset):
    def __init__(self, dataframe):
        cols = [col for col in dataframe.columns if col != 'target']
        self.parameters = dataframe[cols].values.astype('float32')
        self.targets = torch.tensor(dataframe['type'].values.astype('int'))
        self.targets = self.targets.reshape((len(self.targets), 1))
        self.targets = self.targets.float()

    def __len__(self):
        return len(self.parameters)

    def __getitem__(self, idx):
        parameters = self.parameters[idx]
        target = self.targets[idx]
        return parameters, target

In [27]:
train_ds = CancerDataset(cancer_train)
valid_ds = CancerDataset(cancer_val)
len(train_ds), len(valid_ds)

(455, 114)

vemos que tienen el mismo tamaño que los dataframes de entrenamiento y validación

Vamos a ver una muestra

In [28]:
cancer_train.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,type
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,0
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,0
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,0
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,0
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,0


In [29]:
sample = train_ds[0]
print(f"len(sample): {len(sample)}")

parameters, target = sample
print(f"parameters: {parameters}\nparameters.dtype: {parameters.dtype}\nparameters.shape: {parameters.shape}\n\n")
print(f"target: {target}, target.dtype: {target.dtype}, target.shape: {target.shape}")

len(sample): 2
parameters: [1.799e+01 1.038e+01 1.228e+02 1.001e+03 1.184e-01 2.776e-01 3.001e-01
 1.471e-01 2.419e-01 7.871e-02 1.095e+00 9.053e-01 8.589e+00 1.534e+02
 6.399e-03 4.904e-02 5.373e-02 1.587e-02 3.003e-02 6.193e-03 2.538e+01
 1.733e+01 1.846e+02 2.019e+03 1.622e-01 6.656e-01 7.119e-01 2.654e-01
 4.601e-01 1.189e-01 0.000e+00]
parameters.dtype: float32
parameters.shape: (31,)


target: tensor([0.]), target.dtype: torch.float32, target.shape: torch.Size([1])


Creamos ahora el dataloader

In [32]:
from torch.utils.data import DataLoader

BS_train = 64
BS_val = 1024

train_dl = DataLoader(train_ds, batch_size=BS_train, shuffle=True)
val_dl = DataLoader(valid_ds, batch_size=BS_val, shuffle=False)

Vemos un batch

In [33]:
batch = next(iter(train_dl))
parameters, target = batch[0], batch[1]
type(parameters), parameters.dtype, parameters.shape, type(target), target.shape

(torch.Tensor,
 torch.float32,
 torch.Size([64, 31]),
 torch.Tensor,
 torch.Size([64, 1]))

## Red Neuronal

Creamos una red neuronal para entrenarla

In [34]:
from torch import nn

class CancerNeuralNetwork(nn.Module):
    def __init__(self, num_inputs, num_outputs, hidden_layers=[20, 8, 3]):
        super().__init__()
        self.network = torch.nn.Sequential(
            torch.nn.Linear(num_inputs, hidden_layers[0]),
            torch.nn.Sigmoid(),
            torch.nn.Linear(hidden_layers[0], hidden_layers[1]),
            torch.nn.Sigmoid(),
            torch.nn.Linear(hidden_layers[1], hidden_layers[2]),
            torch.nn.Sigmoid(),
            torch.nn.Linear(hidden_layers[2], num_outputs),
            torch.nn.Sigmoid()
        )

    def forward(self, x):
        probs = self.network(x)
        return probs

Vamos a ver qué tamaño necesitamos a la entrada y a la salida de la red

Un batch tiene unos parámetros con este tamaño

In [35]:
parameters.shape

torch.Size([64, 31])

Tenemos una matriz de tamaño 64x31. 64 es el tamaño del batch size, mientras que 31 es el número de parámetros, por lo que a la entrada necesitamos 31 neuronas

Otra forma de verlo es que como se tiene que hacer una multiplicación matricial de las entradas con la primera capa de la red, si la matriz de entradas tiene un tamaño de 64x31, la matriz que representa las neuronas de la primera capa tiene que tener un tamaño de 31xM. Ya que en una multiplicación matricial, el tamaño de las matrices que se multiplican tienen que ser AxB y BxC, es decir, la dimensión de en medio de las dos matrices tiene que ser la misma

Por otro lado, el mismo batch a la salida tiene un target con este tamaño

In [36]:
target.shape

torch.Size([64, 1])

32 es el tamaño del batch size, pero hay 3 clases, por lo que a la salida queremos que haya 3 neuronas

In [37]:
num_inputs = parameters.shape[1]
num_outputs = target.shape[1]
model = CancerNeuralNetwork(num_inputs, num_outputs)

model

CancerNeuralNetwork(
  (network): Sequential(
    (0): Linear(in_features=31, out_features=20, bias=True)
    (1): Sigmoid()
    (2): Linear(in_features=20, out_features=8, bias=True)
    (3): Sigmoid()
    (4): Linear(in_features=8, out_features=3, bias=True)
    (5): Sigmoid()
    (6): Linear(in_features=3, out_features=1, bias=True)
    (7): Sigmoid()
  )
)

Primero cogemos un batch del dataloader y se lo metemos a la red a ver si funciona y la hemos definido bien

In [38]:
probs = model(parameters)
probs.shape

torch.Size([64, 1])

Si se puede se manda la red a la GPU

In [39]:
# Get cpu or gpu device for training.
device = "cuda" if torch.cuda.is_available() else "cpu"
print("Using {} device".format(device))

model.to(device)

Using cuda device


CancerNeuralNetwork(
  (network): Sequential(
    (0): Linear(in_features=31, out_features=20, bias=True)
    (1): Sigmoid()
    (2): Linear(in_features=20, out_features=8, bias=True)
    (3): Sigmoid()
    (4): Linear(in_features=8, out_features=3, bias=True)
    (5): Sigmoid()
    (6): Linear(in_features=3, out_features=1, bias=True)
    (7): Sigmoid()
  )
)

Ahora volvemos a probar a meterle un batch

In [40]:
parameters_gpu = parameters.to(device)
probs = model(parameters_gpu)
probs.shape

torch.Size([64, 1])

## Función de pérdida y optimizador

Definimos una función de pérdida y un optimizador

Para este tipo de problemas no debemos usar como función de pérdida el MSE, ya que a la salida vamos a tener 1s y 0s. El MSE mide la distancia entre lo predicho por la red y la realidad, pero en este problema la distancia siempre va a ser de 1 en caso de que la predicción sea mala o 0 en caso de que la predicción sea buena, por lo que se usa el BCE (Binary Cross Entropy) que explicaremos más adelante

In [42]:
LR = 1e-3

loss_fn = nn.BCELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=LR)


## Ciclo de entrenamiento

Entrenamos la red

In [43]:
num_prints = 2

def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    for batch, (X, y) in enumerate(dataloader):
        # X and y to device
        X, y = X.to(device), y.to(device)

        # Compute prediction and loss
        probs = model(X)
        loss = loss_fn(probs, y)

        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if batch % int(len(dataloader)/num_prints) == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")


def val_loop(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss, correct = 0, 0

    with torch.no_grad():
        for X, y in dataloader:
            # X and y to device
            X, y = X.to(device), y.to(device)
            
            probs = model(X)
            test_loss += loss_fn(probs, y).item()
            correct += (probs.argmax(1) == y.argmax(1)).type(torch.float).sum().item()

    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

Entrenamos

In [44]:
epochs = 14
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dl, model, loss_fn, optimizer)
    val_loop(val_dl, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 0.765211  [    0/  455]
loss: 0.835728  [  256/  455]
Test Error: 
 Accuracy: 100.0%, Avg loss: 0.920110 

Epoch 2
-------------------------------
loss: 0.774712  [    0/  455]
loss: 0.744132  [  256/  455]
Test Error: 
 Accuracy: 100.0%, Avg loss: 0.918551 

Epoch 3
-------------------------------
loss: 0.854282  [    0/  455]
loss: 0.753566  [  256/  455]
Test Error: 
 Accuracy: 100.0%, Avg loss: 0.917090 

Epoch 4
-------------------------------
loss: 0.823215  [    0/  455]
loss: 0.752992  [  256/  455]
Test Error: 
 Accuracy: 100.0%, Avg loss: 0.915632 

Epoch 5
-------------------------------
loss: 0.802391  [    0/  455]
loss: 0.762360  [  256/  455]
Test Error: 
 Accuracy: 100.0%, Avg loss: 0.914267 

Epoch 6
-------------------------------
loss: 0.781848  [    0/  455]
loss: 0.801161  [  256/  455]
Test Error: 
 Accuracy: 100.0%, Avg loss: 0.912610 

Epoch 7
-------------------------------
loss: 0.820286  [    0/  455]
loss: 0.7903

# Función de pérdida

Aunque más adelante veremos más en detalle las funciones de pérdida, aquí hemos usado `BCELoss`. Esto es porque en la red neuronal hemos puesto una `Sigmoid` en la última capa

```Python
    class WineNeuralNetwork(nn.Module):
        def __init__(self, num_inputs, num_outputs, hidden_layers=[20, 8, 3]):
            super().__init__()
            self.network = torch.nn.Sequential(
                torch.nn.Linear(num_inputs, hidden_layers[0]),
                torch.nn.Sigmoid(),
                torch.nn.Linear(hidden_layers[0], hidden_layers[1]),
                torch.nn.Sigmoid(),
                torch.nn.Linear(hidden_layers[1], hidden_layers[2]),
                torch.nn.Sigmoid(),
                torch.nn.Linear(hidden_layers[2], num_outputs),
                torch.nn.Sigmoid()
            )

        def forward(self, x):
            probs = self.network(x)
            return probs
```

Sin embargo lo que se suele hacer es no poner esa `Sigmoid` en la última capa y usar como función de pérdida `BCEWithLogitsLoss` que ya incluye en ella la capa `Sigmoid`

```Python
    class WineNeuralNetwork(nn.Module):
        def __init__(self, num_inputs, num_outputs, hidden_layers=[20, 8, 3]):
            super().__init__()
            self.network = torch.nn.Sequential(
                torch.nn.Linear(num_inputs, hidden_layers[0]),
                torch.nn.Sigmoid(),
                torch.nn.Linear(hidden_layers[0], hidden_layers[1]),
                torch.nn.Sigmoid(),
                torch.nn.Linear(hidden_layers[1], hidden_layers[2]),
                torch.nn.Sigmoid(),
                torch.nn.Linear(hidden_layers[2], num_outputs),
                # torch.nn.Sigmoid()
            )

        def forward(self, x):
            probs = self.network(x)
            return probs
```

```Python
    loss_fn = nn.BCEWithLogitsLoss()
```

Podemos ver en la [documentación](https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html) que esto es así. Esto se hace porque a nivel computacional es más estable como explican

In [45]:
class WineNeuralNetwork2(nn.Module):
    def __init__(self, num_inputs, num_outputs, hidden_layers=[20, 8, 3]):
        super().__init__()
        self.network = torch.nn.Sequential(
            torch.nn.Linear(num_inputs, hidden_layers[0]),
            torch.nn.Sigmoid(),
            torch.nn.Linear(hidden_layers[0], hidden_layers[1]),
            torch.nn.Sigmoid(),
            torch.nn.Linear(hidden_layers[1], hidden_layers[2]),
            torch.nn.Sigmoid(),
            torch.nn.Linear(hidden_layers[2], num_outputs),
            #torch.nn.Sigmoid()
        )

    def forward(self, x):
        probs = self.network(x)
        return probs

num_inputs = parameters.shape[1]
num_outputs = target.shape[1]
model2 = WineNeuralNetwork2(num_inputs, num_outputs)

model2

WineNeuralNetwork2(
  (network): Sequential(
    (0): Linear(in_features=31, out_features=20, bias=True)
    (1): Sigmoid()
    (2): Linear(in_features=20, out_features=8, bias=True)
    (3): Sigmoid()
    (4): Linear(in_features=8, out_features=3, bias=True)
    (5): Sigmoid()
    (6): Linear(in_features=3, out_features=1, bias=True)
  )
)

In [46]:
model2.to(device)

WineNeuralNetwork2(
  (network): Sequential(
    (0): Linear(in_features=31, out_features=20, bias=True)
    (1): Sigmoid()
    (2): Linear(in_features=20, out_features=8, bias=True)
    (3): Sigmoid()
    (4): Linear(in_features=8, out_features=3, bias=True)
    (5): Sigmoid()
    (6): Linear(in_features=3, out_features=1, bias=True)
  )
)

In [47]:
LR = 1e-3

loss_fn2 = nn.BCEWithLogitsLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=LR)

In [48]:
epochs = 14
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dl, model2, loss_fn2, optimizer)
    val_loop(val_dl, model2, loss_fn2)
print("Done!")

Epoch 1
-------------------------------
loss: 0.719862  [    0/  455]
loss: 0.727536  [  256/  455]
Test Error: 
 Accuracy: 100.0%, Avg loss: 0.767435 

Epoch 2
-------------------------------
loss: 0.731344  [    0/  455]
loss: 0.712192  [  256/  455]
Test Error: 
 Accuracy: 100.0%, Avg loss: 0.767435 

Epoch 3
-------------------------------
loss: 0.746708  [    0/  455]
loss: 0.754381  [  256/  455]
Test Error: 
 Accuracy: 100.0%, Avg loss: 0.767435 

Epoch 4
-------------------------------
loss: 0.693018  [    0/  455]
loss: 0.719857  [  256/  455]
Test Error: 
 Accuracy: 100.0%, Avg loss: 0.767435 

Epoch 5
-------------------------------
loss: 0.723699  [    0/  455]
loss: 0.727535  [  256/  455]
Test Error: 
 Accuracy: 100.0%, Avg loss: 0.767435 

Epoch 6
-------------------------------
loss: 0.723701  [    0/  455]
loss: 0.735205  [  256/  455]
Test Error: 
 Accuracy: 100.0%, Avg loss: 0.767435 

Epoch 7
-------------------------------
loss: 0.708360  [    0/  455]
loss: 0.7160