<a href="https://colab.research.google.com/github/matt-cornelius/Flip/blob/main/Assignment_1_MLP_classifier.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Assignment 1: Multi Layer Perceptron (MLP) Classifier

# Student name: Matt Cornelius


Based (very loosely) on a MLP demo by Thomas Summe, Zheng Ning, Adam Czajka (February 2023)

Updated Jan 20, 2025

In this assignment you will implement (starting from some Prof-provided "scaffolding") a multi-layer perceptron classifier, building on the single-neuron classifier you explored in the first Practical class.

### Logistics
**When you first open this notebook, save it in your Github repo**. Open it from Github when you're working on it. Commit it often (commit messages can be short but you have to enter something...). The notebook submitted closest to, but before, the assignment due date will be the one graded.

###AI policy for this assignment
I hope and expect that you will consult on-line resources including AIs as you attack this problem - but **cite your sources** below (I provide an example), and make sure you understand how your solution works!

### Resources used in my solution (students replace the material below):
 * https://machinelearningmastery.com/building-a-multiclass-classification-model-in-pytorch/ - I used Pytorch's one-hot encoder rather than sklearn's. I avoided using Pandas because we already have a nice dictionary of data.

## Problem Setup

In Practical 1, you used a single neuron as a _binary classifier_ -- we trained the neuron's output to be high or low based on which class the 128-D input came from.

We're working with the same data here -- but your classifier is going to choose between more than two classes.

How does this differ?
 + **A more complex hidden layer**: you'll have a *set* of linear neurons receiving the 128-D inputs and computing linear functions from them, not just one.
 + **An additional "output" layer**: you'll have an additional layer containing neurons that combine the outputs of the hidden layer, and we'll train them to produce a one-hot style class indication.
 + **One-hot class encoding**: The class label will be encoded as a one-hot vector rather than a scalar with the class label
 + **Cross-Entropy Loss**: Since this is a multi-class problem (more than two classes to choose from), we can't use the binary cross-entropy loss for training; we'll use its more general form.
 + **GPU**: you'll use code that uses the GPU functionality of Pytorch in this assignment if you're running on a system with a GPU in it - so you'll need to make sure you're running on a GPU instance to take advantage of it.

First, we import the things we need. Add more imports here if you need them. You'll see code at the bottom that will assign 'cuda' or 'cpu' to the variable `device`, which we will use later.

In [None]:
import os
import sys
import random
import gdown

import torch
import torch.nn as nn
from torch.utils.data import DataLoader, random_split
import torchsummary
import numpy as np

import matplotlib.pyplot as plt
import tqdm.notebook as tqdm

# some housekeeping; random stuff & cuda
_ = torch.random.seed()

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'Executing NN code on {device}')

Executing NN code on cuda


## Data set

Get the FR data from Practical 1


In [39]:
# download data from Google Drive using gdown

url = 'https://drive.google.com/uc?id=19heLsvTf6AHj9irshCS-zdhTeL-w-Zc7'
localname = 'embeddingsbysubject.npy'
if os.path.exists(localname):
    print(f'{localname} already exists. Delete if you want to re-download it.')
else:
    gdown.download(url,localname)

# load the file
# the file contains a numpy array with one element: a ginormous dict.
# the .item() method pries the dict out of the array

d = np.load(localname,allow_pickle=True).item()

embeddingsbysubject.npy already exists. Delete if you want to re-download it.


## A utility function to get the `n` subject-IDs with the most data

`d` is a dictionary. Subject-ID strings (_e.g._, `'02463'`) are the keys, and for each key, the value is a numpy array with shape `(nvecs,128)`; that is, `nvecs` rows and 128 columns. `nvecs` is the number of data points (embeddings) for the subject. Each embedding comes from an image, so `nvecs` is also the number of images processed for that subject. The 128-dimensional vector is a representation for the subject's face in that image (it's computed by a convolutional neural net).

Give this function the dictionary and, optionally, a value for `n` (the default is 10). It will return a dictionary containing the (key,value) pairs with the `n` largest number of embeddings.

In [None]:
def get_top_n(d,n=10):
    """
    given the dict d whose keys map to 2D numpy arrays, return a subset
    dictionary with the n keys that have the largest number of rows.
    In this notebook, 'return the n subjects with the largest number of
    face embeddings or, equivalently, images seen.'
    """
    skeys = sorted(list(d.keys()),key=lambda k:d[k].shape[0],reverse=True)
    return skeys[:n]

In [None]:
# print out the top ten keys and their sample counts
k_top_10 = get_top_n(d,n=10)
print(f'Top 10 subject IDs and their sample counts:')
print([[k,d[k].shape[0]] for k in k_top_10])

Top 10 subject IDs and their sample counts:
[['04202', 221], ['04514', 217], ['04530', 214], ['02463', 213], ['04557', 209], ['04495', 209], ['04203', 205], ['04580', 204], ['04385', 203], ['04470', 200]]


## A dataset class

This code constructs a Pytorch `Dataset` from a subset of the data in the dictionary. When instantiating it, supply the dict containing the data and a list of keys of the subjects you want in the `Dataset`.

Note: the labels scored with this class (and returned from its `__getitem__()` method) are **one-hot** encodings. If you initialize it with `k` subject IDs (keys), then the label returned by `__getitem__()` will be a `k`-element vector with exactly one nonzero element.

This class has an introspection method called `sample_size()` that returns the number of dimensions of each sample (embedding). This quantity could vary if data from a different feature extractor was used (some face embeddings are 512 dimensions or larger), and since we will provide these samples to the input of a neural network, the method gives us a "clean" way to get it.

Note: you don't have to call the superclass constructor when you're subclassing `torch.utils.data.Dataset` because there is no constructor for the superclass (it's abstract and doesn't need to do anything). Feel free to call it if you like to be rigorous. Down below, you'll subclass `torch.nn.Module` to define your custom neural net class, and a superclass constructor call is mandatory there.

In [None]:
# Dataset

class ManyClassEmbeddingData(torch.utils.data.Dataset):
    """A dataset obtained from a dictionary where keys (class names) map to
    2D numpy arrays with shape (nembeddings,dim)"""
    def __init__(self,d,keys):
        """init method: select the data with the given keys. Keys are unique
        so will sort stably."""
        self.keys = keys.copy()
        self.data = np.vstack([d[k] for k in keys])
        # nested list of labels for label vector
        labs = [[i]*d[k].shape[0] for (i,k) in enumerate(keys)]
        # stack it and generate one-hot encodings
        self.label = torch.nn.functional.one_hot(torch.tensor(np.hstack(labs)))

    def sample_size(self):
        return self.data[0].shape[0]

    def __len__(self):
        return self.data.shape[0]

    def __getitem__(self,idx):
        return self.data[idx],self.label[idx]

# Task 1: Get some data

Create an instance of `ManyClassEmbeddingData` for the subject-IDs (keys) with the `nclasses = 10` largest numbers of samples. Use the `get_top_n()` function to get those keys and supply the list of keys to the constructor along with the dictionary of data. Also, create a variable `input_size` and set it equal to the output of the `sample_size()` method for your `ManyClassEmbeddingData` instance.


In [40]:

# Assume d is your dictionary containing the data
def get_top_n(d, n=10):
    skeys = sorted(list(d.keys()), key=lambda k: d[k].shape[0], reverse=True)
    return skeys[:n]

# Get the top 10 keys based on the number of samples
top_keys = get_top_n(d, n=10)

# Create an instance of ManyClassEmbeddingData with the top keys
dataset = ManyClassEmbeddingData(d, top_keys)

# Set input_size to the sample size of the dataset
input_size = dataset.sample_size()


## Task 2: Set up data loaders

In the code cell below:
 + Define and initialize a variable: `b_size = 64`
 + Use what you learned from Practical 1 to divide your data set into three pieces:
   + `train_data` with 55% of the data
   + `test_data` with 20% of the data
   + `validation_data` with 25% of the data.
   
   Wrap them individually with `DataLoaders`, specifying `shuffle=True` and `batch_size = b_size`.

In [42]:

# Define batch size
b_size = 64

# Define split ratios
total_size = len(dataset)
train_size = int(0.55 * total_size)
test_size = int(0.20 * total_size)
val_size = total_size - train_size - test_size  # Ensure all data is used

# Split dataset
train_data, test_data, validation_data = random_split(dataset, [train_size, test_size, val_size])

# Wrap datasets in DataLoaders
train_loader = DataLoader(train_data, batch_size=b_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=b_size, shuffle=True)
val_loader = DataLoader(validation_data, batch_size=b_size, shuffle=True)


# Task 3: Define a neural net

Define a class `MLP` that inherits from `torch.nn.Module`. The class has these methods:
 + `__init__(self,input_dim,hidden_dim,output_dim)` has two `torch.nn.Linear` layers with a `torch.nn.ReLU` activation function applied to the output of the first layer. The first linear layer has `input_dim` inputs and `hidden_dim` outputs; the second linear layer has `hidden_dim` inputs and `nclasses` outputs.
 + `forward(self,input)` sends the `input` to the first layer, runs the `ReLU` on the result, feeds the output of `ReLU` to the second linear layer and returns the output of the second linear layer.

There are lots of ways to implement this. Good designs set up the network in the constructor (`__init__()`) and use it in the `forward()` method. For what it's worth, I like to wrap the layers and activations in an instance of class `torch.nn.Sequential` in the constructor.  Make sure you invoke the superclass's constructor as your constructor's first action.

In [43]:
# STUDENT CODE GOES HERE

class MLP(torch.nn.Module):
    def __init__(self,input_dim,hidden_dim,output_dim):
        """ initialize the MLP with n-dimensional input, hidden_dim units in the
        hidden layer, and output_dim units in the output layer. output_dim
        is normally the number of classes."""
        super(MLP, self).__init__()

        # Define the network layers using nn.Sequential
        self.network = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, output_dim)
        )

    def forward(self,input):
        """return the output response of the network  to an input."""
        return self.network(input)


# Freebie: an evaluation function

The function below takes a model (an instence of `MLP`; it could be trained, doesn't have to be) and a `DataLoader` instance, and feeds all of the data through the net, keeping track of its output. From the net's output (an estimated label) and the known correct label for each sample, both expressed as a one-hot encoding of the class number, it reports the fraction of samples that were crrectly classified. We want this to be 1.0, ideally.

You're not being asked to code this (this time) because of some "fiddly bits", including
 + the batching of samples delivered from the loader (all of the variables in the inner loop are vectors; `X` is a matrix).
 + the use of `.float().to(device)` to ensure the data is in single precision floating point format, and move it from CPU RAM to GPU RAM (if you're running on a GPU, of course). The model had previously been moved to the device before training, and torch requires you to explicitly move data to the model (and sometimes move results in the reverse direction).
 + The somewhat cryptic comparison of estimated class labels to true class labels and conversion to a vector with 1 elements for correct classifications and 0s for incorrect ones).

In [36]:
# evaluates the model's current state on some data that we get from a
# DataLoader (should NOT be evaluated on training data!)

def evaluate(model, loader):
    model.eval() # no training
    corrs = [] # list containing a 1 for each correct prediction and a 0 for each mistake
    for X,y in loader: # data loader gives us a batch; data sample and one-hot encoded class label
        X = X.float().to(device) # move data and label to GPU if we're using one
        y = y.float().to(device)
        y_pred = model(X) # feed the data to the model, get OHE prediction
        # this code compares the net's output OHE to the ground truth
        # OHE and assigns a 1 for each case where they agree and a 0 when they don't
        corr = (torch.argmax(y_pred,1) == torch.argmax(y,1)).float()
        corrs += corr # append to the running list
    # return the classification accuracy: num correct/total num
    acc = sum(corrs)/len(corrs)
    return acc

# Task 4: Set up and run an experiment

This task has a lot going on, so the description is lengthy.

In the cell below, add code that performs an experiment. It's very useful to do this in a single cell during the test process.  Among other reasons, it ensures that you have a fresh instance of `MLP` each time you run it (if you want to retrain an `MLP` or any net in PyTorch, you have to explicitly reset its weights to all zeros or random otherwise you won't be training from scratch; you'll be "fine-tuning").

1. Define some parameters: `epoch_max = 1000`; `input_dim = input_size` as defined above, `hidden_dim=5`, and `output_dim = nclasses` as specified above.
2. Instantiate `MLP` with the proper values. Assuming you named the instance `mlp`, then execute `mlp = mlp.to(device)`. If you don't to this, it won't run on the GPU if you're using one!
3. Run `torchsummary.summary(mlp,input_size=(input_dim,))`. There's a question on its output below.
4. Define the loss function: `loss_fn = torch.nn.CrossEntropyLoss()`. Think of that statement as defining an instance of an object - the parentheses are important.
5. Define `learning_rate` to be some number between 0.005 and 0.05`.
6. The final setup step is to choose an optimizer that guides the gradient descent procedure that is used dring training.
 + If your family name begins with the letters A through J, use `optimizer = torch.optim.SGD(mlp.parameters(),lr=learning_rate)`
 + if your family name begins with the letters K through Z, use `optimizer = torch.optim.Adam(mlp.parameters(),lr=learning_rate)`

Now, implement the training loop.
1. Define a loop variable `epoch` and have it run from 0 to `max_epochs` times. You might want to use `tqdm` to show a nice progress bar to watch as this runs.  Within this outer loop, do this:
 1.  initialize `running_loss to 0`. once all the batches for this epoch are done, it will contain the overall loss value.
 1. invoke `mlp.train()` to put the net in training mode. Basically, backpropagation won't work if you don't do this (and you'll notice).
 1.  Iterate over the **training** dataloader; you get a batch `(X,y)` where `X` is a matrix of samples and `y` is a vector of known class labels, one for each sample - each one of the labels is an OHE. Within this inner loop, do this:
    1. `optimizer.zero_grad()` to reset the gradients stored with all the internal data to 0.
    2. use `.float().to(device)` to convert both `X` and `y` to floating point and move them to the GPU if you're using one. The idiom here is `X = X.float().to(device)`
    3. put X through `mlp` to get `ypred`, the predicted class label.
    4. calculate the loss: `loss = loss_fn(ypred,y)` (it's not a number, it's an object; The numerical value of loss is `loss.item()`.)
    5. backpropagate the error: call `loss.backward()`
    6. update the optimizer: call `optimizer.step()`
    7. update the running loss: `running_loss += loss.item()`
 2. (still within an iteration of `epoch`) Calculate the accuracy of the MLP on the validation set at this iteration by sending it and the **validation** loader to the `evaluate()` function above. Print out the return value from `evaluate()` and print out the `running_loss` as well.
3. (Done with all of the training epochs) Run `evaluate()` on the trained(?) net and the **test** loader and print the result.

**After all that implementation work**: run the cell, and report the final accuracy on the test set in the area below labeled `Answer`, where indicated.

**Additional answering**: In the area below labeled `Answer`, enter a formula for the number of parameters at each level reported by `torchsummary.summary()`. (i.e., thenumbers in the `Param #` column). The formula should be in terms of one or more of the parameters above and should be exact. Remember that there's a bias input to every neuron as well!  In the same text area, explain why ReLU has no parameters.

Answer:
1. answer here
2. answer here

In [44]:
# STUDENT CODE GOES HERE

# 1. define parameters
epoch_max = 1000
input_dim = dataset.sample_size()
hidden_dim = 5
output_dim = len(top_keys)

# 2. instantiate
mlp = MLP(input_dim=input_dim, hidden_dim=hidden_dim, output_dim=output_dim)
mlp = mlp.to(device)

# 3. summary
torchsummary.summary(mlp,input_size=(input_dim,))

# 4. loss function
loss_fn = torch.nn.CrossEntropyLoss()

# 5. Learning rate
learning_rate = 0.01

# 6. optimizer
optimizer = torch.optim.SGD(mlp.parameters(),lr=learning_rate)

# Training loop
# 1. loop variable
for epoch in tqdm.trange(epoch_max):
    # 1.1 running loss
    running_loss = 0.0
    # 1.2 training mode
    mlp.train()
    # 1.3 loop over batches
    for X,y in train_loader:
        # 1.3.1 reset gradients
        optimizer.zero_grad()
        # 1.3.2 convert data and labels to float and move them to the device
        X, y = X.float().to(device), y.float().to(device)
        # 1.3.3 provide X to mlp() to get ypred predicted labels
        y_pred = mlp(X)
        # 1.4.4 get loss from ypred and y
        loss = loss_fn(y_pred, y)
        # 1.4.5 propagate loss backward through the graph
        loss.backward()
        # 1.4.6 update the optimizer
        optimizer.step()
        # update running loss
        running_loss += loss.item()
    # 1.4 calculate & print accuracy on validation set
    val_accuracy = evaluate(mlp, val_loader)
    print(f"Epoch {epoch+1}/{epoch_max} - Loss: {running_loss:.4f} - Val Accuracy: {val_accuracy:.4f}")

# 2. calculate & print test accuracy
test_accuracy = evaluate(mlp, test_loader)
print(f"Final Test Accuracy: {test_accuracy:.4f}")

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                    [-1, 5]             645
              ReLU-2                    [-1, 5]               0
            Linear-3                   [-1, 10]              60
Total params: 705
Trainable params: 705
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
----------------------------------------------------------------


  0%|          | 0/1000 [00:00<?, ?it/s]

Epoch 1/1000 - Loss: 42.5583 - Val Accuracy: 0.0916
Epoch 2/1000 - Loss: 42.5040 - Val Accuracy: 0.1011
Epoch 3/1000 - Loss: 42.4525 - Val Accuracy: 0.0916
Epoch 4/1000 - Loss: 42.4033 - Val Accuracy: 0.0897
Epoch 5/1000 - Loss: 42.3620 - Val Accuracy: 0.0897
Epoch 6/1000 - Loss: 42.3278 - Val Accuracy: 0.0897
Epoch 7/1000 - Loss: 42.2960 - Val Accuracy: 0.0897
Epoch 8/1000 - Loss: 42.2655 - Val Accuracy: 0.0897
Epoch 9/1000 - Loss: 42.2363 - Val Accuracy: 0.0897
Epoch 10/1000 - Loss: 42.2075 - Val Accuracy: 0.0897
Epoch 11/1000 - Loss: 42.1809 - Val Accuracy: 0.0897
Epoch 12/1000 - Loss: 42.1548 - Val Accuracy: 0.0897
Epoch 13/1000 - Loss: 42.1291 - Val Accuracy: 0.0897
Epoch 14/1000 - Loss: 42.1046 - Val Accuracy: 0.0897
Epoch 15/1000 - Loss: 42.0813 - Val Accuracy: 0.0897
Epoch 16/1000 - Loss: 42.0581 - Val Accuracy: 0.0897
Epoch 17/1000 - Loss: 42.0358 - Val Accuracy: 0.0897
Epoch 18/1000 - Loss: 42.0150 - Val Accuracy: 0.0897
Epoch 19/1000 - Loss: 41.9938 - Val Accuracy: 0.0897
Ep

# Task 5

In the code cell below, make a copy of the working code in the cell and wrap it in a function `runexp()` with keyword arguments as follows: `runexp(hidden_dim=1,summary=False,max_epochs=1000)` . The function returns the accuracy of the MLP on the testing data after `max_epochs` of training are complete.

`runexp()` will run the experiment as it runs above:
 + with the MLP instance having the hidden layer size as specified in the function call
 + with the `torchsummary.summary()` call printed only if `summary is `True`
 + with the experiment running for the specified number of epochs.

 You'll run the code in Task 6 below.

In [None]:
# STUDENT CODE GOES HERE


def runexp(hidden_dim=1,summary=False,max_epochs=1000):
    # add apppropriately modified code from the code cell above


# Task 6: the last task

In this final task, you're going to explore a _hyperparameter_, namely the size of (number of neurons in) the hidden layer. Depending on the complexity of the task, the number of hidden layer units (and, indeed, the number of hidden layers needed - you can and usually do have more than one) required for good performance is unknown and must be searched for.

Loop from `n`=2 to `n`=21, inclusive, running `runexp(hidden_dim=n)` **ten times** for each value of `n` (200 experiments total). This will probably take a couple of hours with a GPU on colab. Save the ten accuracies obtained for each value of `n`, and when the ten experiments are done, print out `n` the sample mean (`np.mean()`) of the accuracies, and the sample standard deviation (`np.std()`)  of the accuracies.  Comment below on what you see in these numbers. You'd like to see high accuracy and low standard deviation. Does accuracy seem to saturate for some value of `n`? What value is it? Do the standard deviations reach an acceptable level (lower is more acceptable)?

Student Comments:

In [None]:
nruns_per_hid = 10

for nhid in range(2,22):
    print(f'=== {nhid} hidden units ===')
    accs = [runexp(nhid,summary=(i==0)) for i in range(nruns_per_hid)]
    print(f'accs: {accs}')
    print(f'Mean accuracy: {np.mean(accs)}')
    print(f'Std accuracy: {np.std(accs)}')

=== 1 hidden units ===
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                    [-1, 1]             129
              ReLU-2                    [-1, 1]               0
            Linear-3                   [-1, 10]              20
Total params: 149
Trainable params: 149
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
----------------------------------------------------------------


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

accs: [tensor(0.1146), tensor(0.1146), tensor(0.2888), tensor(0.2530), tensor(0.1098), tensor(0.1098), tensor(0.2100), tensor(0.2554), tensor(0.3150), tensor(0.3365)]
Mean accuracy: 0.2107398509979248
Std accuracy: 0.08686204254627228
=== 2 hidden units ===
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                    [-1, 2]             258
              ReLU-2                    [-1, 2]               0
            Linear-3                   [-1, 10]              30
Total params: 288
Trainable params: 288
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
----------------------------------------------------------------


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

accs: [tensor(0.3365), tensor(0.1098), tensor(0.3007), tensor(0.5680), tensor(0.5394), tensor(0.5871), tensor(0.3198), tensor(0.5728), tensor(0.1098), tensor(0.6993)]
Mean accuracy: 0.41431981325149536
Std accuracy: 0.19718341529369354
=== 3 hidden units ===
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                    [-1, 3]             387
              ReLU-2                    [-1, 3]               0
            Linear-3                   [-1, 10]              40
Total params: 427
Trainable params: 427
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
----------------------------------------------------------------


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

accs: [tensor(0.5704), tensor(0.2578), tensor(0.6158), tensor(0.5322), tensor(0.6659), tensor(0.5704), tensor(0.1527), tensor(0.5489), tensor(0.2721), tensor(0.7804)]
Mean accuracy: 0.49665871262550354
Std accuracy: 0.1907888948917389
=== 4 hidden units ===
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                    [-1, 4]             516
              ReLU-2                    [-1, 4]               0
            Linear-3                   [-1, 10]              50
Total params: 566
Trainable params: 566
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
----------------------------------------------------------------


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

accs: [tensor(0.7709), tensor(0.8186), tensor(0.2506), tensor(0.5585), tensor(0.6897), tensor(0.5895), tensor(0.3198), tensor(0.6038), tensor(0.3317), tensor(0.6969)]
Mean accuracy: 0.5630072355270386
Std accuracy: 0.18845486640930176
=== 5 hidden units ===
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                    [-1, 5]             645
              ReLU-2                    [-1, 5]               0
            Linear-3                   [-1, 10]              60
Total params: 705
Trainable params: 705
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
----------------------------------------------------------------


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

accs: [tensor(0.6086), tensor(0.7947), tensor(0.7375), tensor(0.8831), tensor(0.8473), tensor(0.6158), tensor(0.6754), tensor(0.8091), tensor(0.8305), tensor(0.8329)]
Mean accuracy: 0.7634844779968262
Std accuracy: 0.09377653151750565
=== 6 hidden units ===
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                    [-1, 6]             774
              ReLU-2                    [-1, 6]               0
            Linear-3                   [-1, 10]              70
Total params: 844
Trainable params: 844
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
----------------------------------------------------------------


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

accs: [tensor(0.8663), tensor(0.8234), tensor(0.8138), tensor(0.8544), tensor(0.5871), tensor(0.6754), tensor(0.8282), tensor(0.8282), tensor(0.8067), tensor(0.6468)]
Mean accuracy: 0.7730310559272766
Std accuracy: 0.09315492957830429
=== 7 hidden units ===
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                    [-1, 7]             903
              ReLU-2                    [-1, 7]               0
            Linear-3                   [-1, 10]              80
Total params: 983
Trainable params: 983
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
----------------------------------------------------------------


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

accs: [tensor(0.8568), tensor(0.5609), tensor(0.7804), tensor(0.8282), tensor(0.8687), tensor(0.8305), tensor(0.8711), tensor(0.6778), tensor(0.8496), tensor(0.8425)]
Mean accuracy: 0.7966587543487549
Std accuracy: 0.09560134261846542
=== 8 hidden units ===
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                    [-1, 8]           1,032
              ReLU-2                    [-1, 8]               0
            Linear-3                   [-1, 10]              90
Total params: 1,122
Trainable params: 1,122
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
----------------------------------------------------------------


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

accs: [tensor(0.8425), tensor(0.8568), tensor(0.8449), tensor(0.7136), tensor(0.8759), tensor(0.8138), tensor(0.8353), tensor(0.8854), tensor(0.8807), tensor(0.8329)]
Mean accuracy: 0.8381861448287964
Std accuracy: 0.0469239316880703
=== 9 hidden units ===
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                    [-1, 9]           1,161
              ReLU-2                    [-1, 9]               0
            Linear-3                   [-1, 10]             100
Total params: 1,261
Trainable params: 1,261
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.01
----------------------------------------------------------------


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

accs: [tensor(0.8950), tensor(0.7566), tensor(0.8186), tensor(0.8878), tensor(0.8902), tensor(0.8425), tensor(0.9165), tensor(0.8568), tensor(0.8854), tensor(0.8568)]
Mean accuracy: 0.8606206178665161
Std accuracy: 0.04413919895887375
=== 10 hidden units ===
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                   [-1, 10]           1,290
              ReLU-2                   [-1, 10]               0
            Linear-3                   [-1, 10]             110
Total params: 1,400
Trainable params: 1,400
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.01
Estimated Total Size (MB): 0.01
----------------------------------------------------------------


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

accs: [tensor(0.8926), tensor(0.8735), tensor(0.8998), tensor(0.8663), tensor(0.8998), tensor(0.8926), tensor(0.9069), tensor(0.9021), tensor(0.8711), tensor(0.8902)]
Mean accuracy: 0.8894988894462585
Std accuracy: 0.013502949848771095
=== 11 hidden units ===
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                   [-1, 11]           1,419
              ReLU-2                   [-1, 11]               0
            Linear-3                   [-1, 10]             120
Total params: 1,539
Trainable params: 1,539
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.01
Estimated Total Size (MB): 0.01
----------------------------------------------------------------


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

accs: [tensor(0.9212), tensor(0.8663), tensor(0.8878), tensor(0.8950), tensor(0.9093), tensor(0.9093), tensor(0.7852), tensor(0.9093), tensor(0.9045), tensor(0.8759)]
Mean accuracy: 0.8863962292671204
Std accuracy: 0.03734453395009041
=== 12 hidden units ===
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                   [-1, 12]           1,548
              ReLU-2                   [-1, 12]               0
            Linear-3                   [-1, 10]             130
Total params: 1,678
Trainable params: 1,678
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.01
Estimated Total Size (MB): 0.01
----------------------------------------------------------------


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

accs: [tensor(0.8926), tensor(0.9117), tensor(0.8878), tensor(0.8496), tensor(0.8162), tensor(0.9117), tensor(0.8902), tensor(0.9045), tensor(0.8926), tensor(0.8807)]
Mean accuracy: 0.8837709426879883
Std accuracy: 0.02817947044968605
=== 13 hidden units ===
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                   [-1, 13]           1,677
              ReLU-2                   [-1, 13]               0
            Linear-3                   [-1, 10]             140
Total params: 1,817
Trainable params: 1,817
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.01
Estimated Total Size (MB): 0.01
----------------------------------------------------------------


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

accs: [tensor(0.9045), tensor(0.9117), tensor(0.8663), tensor(0.9093), tensor(0.9141), tensor(0.9069), tensor(0.8950), tensor(0.9141), tensor(0.8974), tensor(0.9189)]
Mean accuracy: 0.9038187265396118
Std accuracy: 0.014361504465341568
=== 14 hidden units ===
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                   [-1, 14]           1,806
              ReLU-2                   [-1, 14]               0
            Linear-3                   [-1, 10]             150
Total params: 1,956
Trainable params: 1,956
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.01
Estimated Total Size (MB): 0.01
----------------------------------------------------------------


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

accs: [tensor(0.9021), tensor(0.9069), tensor(0.9045), tensor(0.9021), tensor(0.8902), tensor(0.9021), tensor(0.8807), tensor(0.8878), tensor(0.8998), tensor(0.8592)]
Mean accuracy: 0.8935559988021851
Std accuracy: 0.01396538969129324
=== 15 hidden units ===
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                   [-1, 15]           1,935
              ReLU-2                   [-1, 15]               0
            Linear-3                   [-1, 10]             160
Total params: 2,095
Trainable params: 2,095
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.01
Estimated Total Size (MB): 0.01
----------------------------------------------------------------


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

accs: [tensor(0.8878), tensor(0.8878), tensor(0.8950), tensor(0.8974), tensor(0.8974), tensor(0.8282), tensor(0.9021), tensor(0.8329), tensor(0.8878), tensor(0.9117)]
Mean accuracy: 0.8828163146972656
Std accuracy: 0.027074376121163368
=== 16 hidden units ===
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                   [-1, 16]           2,064
              ReLU-2                   [-1, 16]               0
            Linear-3                   [-1, 10]             170
Total params: 2,234
Trainable params: 2,234
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.01
Estimated Total Size (MB): 0.01
----------------------------------------------------------------


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

accs: [tensor(0.9045), tensor(0.8950), tensor(0.9021), tensor(0.8902), tensor(0.9189), tensor(0.8831), tensor(0.9189), tensor(0.9045), tensor(0.9045), tensor(0.9093)]
Mean accuracy: 0.9031025767326355
Std accuracy: 0.010842788964509964
=== 17 hidden units ===
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                   [-1, 17]           2,193
              ReLU-2                   [-1, 17]               0
            Linear-3                   [-1, 10]             180
Total params: 2,373
Trainable params: 2,373
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.01
Estimated Total Size (MB): 0.01
----------------------------------------------------------------


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

accs: [tensor(0.8878), tensor(0.9093), tensor(0.9045), tensor(0.9045), tensor(0.9117), tensor(0.9069), tensor(0.9141), tensor(0.9069), tensor(0.9141), tensor(0.9069)]
Mean accuracy: 0.9066826701164246
Std accuracy: 0.007116015534847975
=== 18 hidden units ===
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                   [-1, 18]           2,322
              ReLU-2                   [-1, 18]               0
            Linear-3                   [-1, 10]             190
Total params: 2,512
Trainable params: 2,512
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.01
Estimated Total Size (MB): 0.01
----------------------------------------------------------------


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

accs: [tensor(0.9045), tensor(0.9117), tensor(0.9165), tensor(0.9165), tensor(0.9236), tensor(0.9117), tensor(0.8854), tensor(0.9045), tensor(0.9093), tensor(0.9117)]
Mean accuracy: 0.9095465540885925
Std accuracy: 0.009691620245575905
=== 19 hidden units ===
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                   [-1, 19]           2,451
              ReLU-2                   [-1, 19]               0
            Linear-3                   [-1, 10]             200
Total params: 2,651
Trainable params: 2,651
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.01
Estimated Total Size (MB): 0.01
----------------------------------------------------------------


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

accs: [tensor(0.9021), tensor(0.8687), tensor(0.8974), tensor(0.9069), tensor(0.9045), tensor(0.9021), tensor(0.9069), tensor(0.9069), tensor(0.9117), tensor(0.8950)]
Mean accuracy: 0.9002386927604675
Std accuracy: 0.011485632508993149
=== 20 hidden units ===
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                   [-1, 20]           2,580
              ReLU-2                   [-1, 20]               0
            Linear-3                   [-1, 10]             210
Total params: 2,790
Trainable params: 2,790
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.01
Estimated Total Size (MB): 0.01
----------------------------------------------------------------


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

accs: [tensor(0.9189), tensor(0.9045), tensor(0.9069), tensor(0.8950), tensor(0.9141), tensor(0.9021), tensor(0.9117), tensor(0.8902), tensor(0.8735), tensor(0.8878)]
Mean accuracy: 0.9004772901535034
Std accuracy: 0.013161158189177513
