# Lab 6: Visualizing the training process of neural networks. Hyperparameter tuning.

In this lab, you will learn how to use [wandb](https://wandb.ai/) to visualize the training process of neural networks. We are going to build and train a feed-forward neural network for recognizing handwritten digits of the MNIST dataset. The training process will be visualized in the wandb dashboard, which will allow us to monitor the loss and accuracy of the model in real-time.

---



Feel free to create an account at [wandb.ai](https://wandb.ai/) before starting this lab.

### A simple example of how to use wandb in a typical training loop is shown below:

```python
import wandb

wandb.login() # Log in to your wandb account

# Start a new run

some_config = {
    'learning_rate': 0.01,
    'layer_1_size': 128,
    'layer_2_size': 64,
    'batch_size': 32
} # This is just an example of a configuration dictionary, you can put anything you want here

wandb.init(project='mnist-classifier', config=some_config) # start a new run and log parameters

# Here you would prepare your data, and initialize the model, optimizer, etc.

# Training loop
for epoch in range(100):
    ...
    wandb.log({'loss': loss, 'accuracy': accuracy})
    # This will send the loss and accuracy to wandb and you can visualize it in the dashboard

# End of the run
wandb.finish()
```

The most important part is the `wandb.log()` function, which sends the data to the wandb dashboard. You can log any metric you want, not just loss and accuracy. The value passed to the function must be a dictionary.


## Exercise 1: Prepare data for training a mnist classifier (2 points)

Before you start training a neural network, you need to prepare the data. In this exercise, you will prepare the MNIST dataset of handwritten digits for training a classifier. You should:

1. Load the MNIST dataset using from `data/mnist_train.csv` and `data/mnist_test.csv` files.
2. Normalize the data to the range [0, 1].
3. Encode the labels using one-hot encoding.
4. Create a PyTorch `Dataset` object for the training and test sets.
5. Create a PyTorch `DataLoader` object for the training and test sets.

In [2]:
import pandas as pd

'''
The mnist_train.csv file contains the 60,000 training examples and labels. 
The mnist_test.csv contains 10,000 test examples and labels. Each row consists of 785 values: the first value is the label 
(a number from 0 to 9) and the remaining 784 values are the pixel values (a number from 0 to 255).
'''
# Load the MNIST dataset
train_data = pd.read_csv('data/mnist_train.csv') 
test_data = pd.read_csv('data/mnist_test.csv')


In [3]:
train_data.iloc[:,0]

0        0
1        4
2        1
3        9
4        2
        ..
59994    8
59995    3
59996    5
59997    6
59998    8
Name: 5, Length: 59999, dtype: int64

In [4]:
# Normalize the data to the range [0, 1]
train_data.iloc[:, 1:] = train_data.iloc[:, 1:] / 255.0
test_data.iloc[:, 1:] = test_data.iloc[:, 1:] / 255.0

In [5]:
# Encode the labels using one-hot encoding
train_labels = pd.get_dummies(train_data.iloc[:, 0])
test_labels = pd.get_dummies(test_data.iloc[:, 0])

train_labels

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,1,0,0,0,0,0,0,0,0,0
1,0,0,0,0,1,0,0,0,0,0
2,0,1,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,1
4,0,0,1,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...
59994,0,0,0,0,0,0,0,0,1,0
59995,0,0,0,1,0,0,0,0,0,0
59996,0,0,0,0,0,1,0,0,0,0
59997,0,0,0,0,0,0,1,0,0,0


In [6]:
import torch

# Convert data and labels to tensors
train_data_tensor = torch.tensor(train_data.iloc[:, 1:].values, dtype=torch.float32)
train_labels_tensor = torch.tensor(train_labels.values, dtype=torch.float32)
test_data_tensor = torch.tensor(test_data.iloc[:, 1:].values, dtype=torch.float32)
test_labels_tensor = torch.tensor(test_labels.values, dtype=torch.float32)

# Create the train and test datasets
train_dataset = torch.utils.data.TensorDataset(train_data_tensor, train_labels_tensor)
test_dataset = torch.utils.data.TensorDataset(test_data_tensor, test_labels_tensor)


In [7]:
from torch.utils.data import DataLoader

# Create the train and test dataloaders
train_dataloader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_dataloader = DataLoader(test_dataset, batch_size=32, shuffle=False)

print(f'Train batches: {len(train_dataloader)}')
print(f'Test batches: {len(test_dataloader)}')


Train batches: 1875
Test batches: 313


In [8]:
# we can iterate over the dataloader to see how it works

X, y = next(iter(train_dataloader))

print("X is a tensor of shape:", X.shape)
print("y is a tensor of shape:", y.shape)

X is a tensor of shape: torch.Size([32, 784])
y is a tensor of shape: torch.Size([32, 10])


## Exercise 2: Prepare the architecture of the neural network (2 points)

In this exercise, you will prepare the architecture of the neural network. You should:

1. Create a neural network class that inherits from `torch.nn.Module`.
2. The neural network should have at least one hidden layer.
3. Use ReLU activation functions after each but the output layer.
4. Use a softmax activation function in the output layer to get the probabilities of each class.

**Feel free to experiment with the architecture of your network** - try adding more hidden layers, changing the number of neurons in each layer, etc. You can also add a dropout layer or some other regularization technique and see if it improves the performance of your model.

In [9]:
import torch.nn as nn
import torch.nn.functional as F

class NeuralNetwork(nn.Module):
    def __init__(self, input_size, hidden1_size, hidden2_size, hidden3_size, output_size, dropout_prob=0.0):
        super(NeuralNetwork, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden1_size)
        self.fc2 = nn.Linear(hidden1_size, hidden2_size)
        self.fc3 = nn.Linear(hidden2_size, hidden3_size)
        self.fc4 = nn.Linear(hidden3_size, output_size)
        self.dropout = nn.Dropout(dropout_prob)
    
    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = F.relu(self.fc2(x))
        x = self.dropout(x)
        x = F.relu(self.fc3(x))
        x = self.dropout(x)
        x = F.softmax(self.fc4(x), dim=1)
        return x

# Initialize the model with input size 784 (28x28 images), hidden sizes 128, 64, 32, output size 10 (number of classes), and dropout probability 0.0
model = NeuralNetwork(input_size=784, hidden1_size=128, hidden2_size=64, hidden3_size=32, output_size=10, dropout_prob=0.0)
print(model)

# Forward pass with a sample input tensor X
output = model(X)
print("Output shape:", output.shape)

NeuralNetwork(
  (fc1): Linear(in_features=784, out_features=128, bias=True)
  (fc2): Linear(in_features=128, out_features=64, bias=True)
  (fc3): Linear(in_features=64, out_features=32, bias=True)
  (fc4): Linear(in_features=32, out_features=10, bias=True)
  (dropout): Dropout(p=0.0, inplace=False)
)
Output shape: torch.Size([32, 10])


In [10]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cuda')

## *Training PyTorch models on GPU

**GPUs are optimized for performing matrix operations in parallel.** Although we call them "graphics processing units", they are actually very powerful processors that can be used for any kind of parallel computation, including training deep neural networks. In fact, data science is one of the most common applications of GPUs today, as can be seen by the revenue of companies like NVIDIA over the past few years. NVIDIA is a monopolist in the GPU market - in 2023, the company owned 92% of the data center GPU market share. As for 31 July, the 2024 revenue of NVIDIA was 60.92 billion USD, while the total revenue of 2020 was $10.92 billion. If someone benefits from the current deep learning hype, it is certainly NVIDIA.

If you happen to have an NVIDIA GPU in your computer, you can use it to train your deep learning models, as PyTorch has excellent support for CUDA, which is NVIDIA's parallel computing API. To train a model on GPU, you need to explicitly tell PyTorch to move the model and the data to the GPU. 

Here is an example training loop that uses the GPU:

```python
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')   # Check if a GPU is available

# Initialize the model and move it to the GPU
model = SomeNeuralNetwork().to(device)   # Move the model to the GPU

for epoch in range(100):
    for batch in data_loader:
        X, y = batch
        X, y = X.to(device), y.to(device)   # Move the tensors to GPU
        
        y_pred = model(X)   # Perform a forward pass (on the GPU)
        loss = criterion(y_pred, y)   # Compute the loss (still on the GPU)
        
        ...  # The rest of the training loop
        
        y_pred = y_pred.detach().cpu()   # Move the predictions back to the CPU to do anything else with them
```

Note that **the model and all the tensors it uses for computation should be moved to the GPU**. You can do this by calling the `.to(device)` method on the model and the data tensors. If you want to move the data back to the CPU (to process it further, calculate metrics, visualize), you call the `.cpu()` method on the tensor.

**Doing calculations on the GPU, you should be wary of few things:**

* **The GPU has a limited amount of memory**, so you should be careful not to run out of memory. A typical graphics card has a few gigabytes of memory, so you should be fine with most models and datasets. However, moving very large tensors to the GPU can cause out-of-memory errors. That's one of the reasons why we use a dataloader and process the data in batches.
* While the GPU is much faster than the CPU for large matrix operations, **transferring data between the CPU and the GPU is slow**. Therefore, it is best to minimize the number of data transfers between the CPU and the GPU.

## Exercise 3: Prepare the training loop (2 points)

In this exercise, you will prepare the training loop. You should:

1. Initialize the neural network.
2. Define the loss function (CrossEntropyLoss) and the optimizer (Adam).
3. Pass a dictionary with the configuration to wandb. This dictionary should contain all the hyperparameters of our model, including the learning rate, the size of the hidden layers, batch size, etc.
4. Train the neural network. Each epoch should consist of a training and validation phase. You should log the loss and accuracy of the training and validation sets using wandb.
5. Open you project at [wandb.ai](https://wandb.ai/) and see how cool it is!

### Saving and loading the model
As training can take some time, it is a good idea to save the model's state dictionary (its learned weights) to a file after training. You can do this with the following code:

    torch.save(vae.state_dict(), 'vae.pth')
    
To load the model from the file, you can use the following code:

    vae.load_state_dict(torch.load('vae.pth'))

In [1]:
import wandb
wandb.login()

[34m[1mwandb[0m: Currently logged in as: [33migmure[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /home/ig/.netrc


True

In [13]:
# Define the configuration dictionary
config = {
    'learning_rate': 0.001,
    'batch_size': 32,
    'epochs': 10,
    'input_size': 784,
    'hidden1_size': 128,
    'hidden2_size': 64,
    'hidden3_size': 32,
    'output_size': 10,
    'dropout_prob': 0.25
}

# Initialize wandb with the configuration
wandb.init(project='mnist-classifier', config=config)

# Initialize the model, loss function, and optimizer
model = NeuralNetwork(
    input_size=config['input_size'],
    hidden1_size=config['hidden1_size'],
    hidden2_size=config['hidden2_size'],
    hidden3_size=config['hidden3_size'],
    output_size=config['output_size'],
    dropout_prob=config['dropout_prob']
).to(device)

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=config['learning_rate'])


# Training loop

In [14]:
# Training loop with wandb logging
def train(model, train_loader, val_loader, epochs=10):
    model.train()
    for epoch in range(epochs):
        train_loss = 0.0
        train_correct = 0
        total_train = 0
        
        for X_batch, y_batch in train_loader:
            X_batch, y_batch = X_batch.to(device), y_batch.to(device)
            optimizer.zero_grad()
            y_pred = model(X_batch)
            loss = criterion(y_pred, y_batch)
            loss.backward()
            optimizer.step()
            
            train_loss += loss.item()
            _, predicted = torch.max(y_pred.data, 1)
            _, labels = torch.max(y_batch.data, 1)
            train_correct += (predicted == labels).sum().item()
            total_train += labels.size(0)
        
        train_loss /= len(train_loader)
        train_accuracy = train_correct / total_train
        
        model.eval()
        val_loss = 0.0
        val_correct = 0
        total_val = 0
        
        with torch.no_grad():
            for X_batch, y_batch in val_loader:
                X_batch, y_batch = X_batch.to(device), y_batch.to(device)
                y_pred = model(X_batch)
                loss = criterion(y_pred, y_batch)
                
                val_loss += loss.item()
                _, predicted = torch.max(y_pred.data, 1)
                _, labels = torch.max(y_batch.data, 1)
                val_correct += (predicted == labels).sum().item()
                total_val += labels.size(0)
        
        val_loss /= len(val_loader)
        val_accuracy = val_correct / total_val
        
        wandb.log({
            'epoch': epoch + 1,
            'train_loss': train_loss,
            'train_accuracy': train_accuracy,
            'val_loss': val_loss,
            'val_accuracy': val_accuracy
        })
        
        print(f'Epoch {epoch + 1}/{epochs}, Train Loss: {train_loss:.4f}, Train Accuracy: {train_accuracy:.4f}, Val Loss: {val_loss:.4f}, Val Accuracy: {val_accuracy:.4f}')
        
    wandb.finish()

# Call the train function
train(model, train_dataloader, test_dataloader, epochs=config['epochs'])

Epoch 1/10, Train Loss: 1.6642, Train Accuracy: 0.8063, Val Loss: 1.5371, Val Accuracy: 0.9253
Epoch 2/10, Train Loss: 1.5270, Train Accuracy: 0.9352, Val Loss: 1.5179, Val Accuracy: 0.9439
Epoch 3/10, Train Loss: 1.5105, Train Accuracy: 0.9511, Val Loss: 1.5127, Val Accuracy: 0.9481
Epoch 4/10, Train Loss: 1.5021, Train Accuracy: 0.9595, Val Loss: 1.5076, Val Accuracy: 0.9546
Epoch 5/10, Train Loss: 1.4978, Train Accuracy: 0.9636, Val Loss: 1.5005, Val Accuracy: 0.9599
Epoch 6/10, Train Loss: 1.4932, Train Accuracy: 0.9681, Val Loss: 1.4951, Val Accuracy: 0.9662
Epoch 7/10, Train Loss: 1.4917, Train Accuracy: 0.9697, Val Loss: 1.4984, Val Accuracy: 0.9625
Epoch 8/10, Train Loss: 1.4899, Train Accuracy: 0.9713, Val Loss: 1.4921, Val Accuracy: 0.9691
Epoch 9/10, Train Loss: 1.4883, Train Accuracy: 0.9728, Val Loss: 1.4921, Val Accuracy: 0.9693
Epoch 10/10, Train Loss: 1.4884, Train Accuracy: 0.9728, Val Loss: 1.4908, Val Accuracy: 0.9700


0,1
epoch,▁▂▃▃▄▅▆▆▇█
train_accuracy,▁▆▇▇██████
train_loss,█▃▂▂▁▁▁▁▁▁
val_accuracy,▁▄▅▆▆▇▇███
val_loss,█▅▄▄▂▂▂▁▁▁

0,1
epoch,10.0
train_accuracy,0.97282
train_loss,1.48845
val_accuracy,0.97
val_loss,1.49078


## Exercise 4: Easy hyperparameter tuning with wandb (2 points)

Wandb allows you to perform hyperparameter tuning by automatically creating multiple runs with different hyperparameters and logging the performance of each run. Below is a brief instruction to `wandb` hyperparameter tuning, but you are more than welcome to find more information in the [official wandb guide](https://docs.wandb.ai/guides/sweeps/).

Your task is to use wandb to perform hyperparameter tuning of the neural network, trying different values of the learning rate, batch size, and the size of the hidden layers. You can use the following hyperparameters:

First, we need to define a dictionary with the hyperparameters that we want to tune. For example:

```python
parameters = {
    'learning_rate': {'values': [0.01, 0.001, 0.0001]},
    'batch_size': {'values': [32, 64, 128]},
    'layer_1_size': {'values': [64, 128, 256]},
    'layer_2_size': {'values': [32, 64, 128]}
}
```

Then we need to create a dictionary with the configuration of the run:

```python
sweep_config = {
    'name': 'mnist-sweep',
    'method': 'grid',   # grid search, you can also try 'random' or 'bayes'
    'metric': {'goal': 'minimize', 'name': 'val_loss'},
    'parameters': parameters,   # that's the dictionary with the hyperparameters
}
```

Finally, we can use the `wandb.sweep` function to perform hyperparameter tuning:

```python
sweep_id = wandb.sweep(sweep_config, project='mnist-classifier')
```

After that, we can finally run the sweep:

```python
wandb.agent(sweep_id, function=train)
```
where `train` is a function that trains the model and logs the metrics to wandb. This function should take a `config` argument, which will contain the hyperparameters of the run. That is how wandb knows which hyperparameters to tune.

1. Rewrite the VAE training loop into a function that takes a single dictionary `parameters` as an argument, initializes the model, optimizer, and criterion, and trains the model for a fixed number of epochs. The function should log the loss and accuracy of the training and validation sets to wandb.
2. Create a dictionary with the hyperparameters that you want to tune.
3. Create a sweep configuration dictionary.
4. Run the sweep and monitor the results in the wandb dashboard.

In [31]:
def train():
    # Initialize the model, loss function, and optimizer
    wandb.init(project='mnist-classifier')
    parameters = wandb.config
    model = NeuralNetwork(
        input_size=parameters.input_size,
        hidden1_size=parameters.hidden1_size,
        hidden2_size=parameters.hidden2_size,
        hidden3_size=parameters.hidden3_size,
        output_size=parameters.output_size,
        dropout_prob=parameters.dropout_prob
    ).to(device)

    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=parameters.learning_rate)


    # Training loop
    model.train()
    for epoch in range(parameters.epochs):
        train_loss = 0.0
        train_correct = 0
        total_train = 0

        for X_batch, y_batch in train_dataloader:
            X_batch, y_batch = X_batch.to(device), y_batch.to(device)
            optimizer.zero_grad()
            y_pred = model(X_batch)
            loss = criterion(y_pred, y_batch)
            loss.backward()
            optimizer.step()

            train_loss += loss.item()
            _, predicted = torch.max(y_pred.data, 1)
            _, labels = torch.max(y_batch.data, 1)
            train_correct += (predicted == labels).sum().item()
            total_train += labels.size(0)

        train_loss /= len(train_dataloader)
        train_accuracy = train_correct / total_train

        model.eval()
        val_loss = 0.0
        val_correct = 0
        total_val = 0

        with torch.no_grad():
            for X_batch, y_batch in test_dataloader:
                X_batch, y_batch = X_batch.to(device), y_batch.to(device)
                y_pred = model(X_batch)
                loss = criterion(y_pred, y_batch)

                val_loss += loss.item()
                _, predicted = torch.max(y_pred.data, 1)
                _, labels = torch.max(y_batch.data, 1)
                val_correct += (predicted == labels).sum().item()
                total_val += labels.size(0)

        val_loss /= len(test_dataloader)
        val_accuracy = val_correct / total_val

        wandb.log({
            'epoch': epoch + 1,
            'train_loss': train_loss,
            'train_accuracy': train_accuracy,
            'val_loss': val_loss,
            'val_accuracy': val_accuracy
        })

        print(f'Epoch {epoch + 1}/{parameters["epochs"]}, Train Loss: {train_loss:.4f}, Train Accuracy: {train_accuracy:.4f}, Val Loss: {val_loss:.4f}, Val Accuracy: {val_accuracy:.4f}')

    wandb.finish()

In [32]:
parameters = {
    'input_size': {'values': [784]},
    'output_size': {'values': [10]},
    'learning_rate': {'values': [0.01, 0.001, 0.0001]},
    'batch_size': {'values': [16, 32, 64, 128]},
    'hidden1_size': {'values': [64, 128, 256]},
    'hidden2_size': {'values': [32, 64, 128]},
    'hidden3_size': {'values': [16, 32, 64]},
    'dropout_prob': {'values': [0.0, 0.25, 0.5]},
    'epochs': {'values': [5, 10, 15]}
}

sweep_config = {
    'name': 'mnist-sweep',
    'method': 'bayes', # try 'grid' or 'random'
    'metric': {'goal': 'maximize', 'name': 'val_accuracy'}, # if we want to maximize the accuracy
    # remember to log the metric that you want to maximize or minimize!
    'parameters': parameters,
}

sweep_id = wandb.sweep(sweep_config, project='mnist-classifier')    # This will create a new sweep
wandb.agent(sweep_id, function=train)   # This will start the hyperparameter tuning process

Create sweep with ID: dqaqwep8
Sweep URL: https://wandb.ai/igmure/mnist-classifier/sweeps/dqaqwep8


[34m[1mwandb[0m: Agent Starting Run: crqnsdxz with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	dropout_prob: 0.5
[34m[1mwandb[0m: 	epochs: 5
[34m[1mwandb[0m: 	hidden1_size: 64
[34m[1mwandb[0m: 	hidden2_size: 64
[34m[1mwandb[0m: 	hidden3_size: 64
[34m[1mwandb[0m: 	input_size: 784
[34m[1mwandb[0m: 	learning_rate: 0.0001
[34m[1mwandb[0m: 	output_size: 10


Epoch 1/5, Train Loss: 2.1100, Train Accuracy: 0.3693, Val Loss: 1.8088, Val Accuracy: 0.6664
Epoch 2/5, Train Loss: 1.7032, Train Accuracy: 0.7792, Val Loss: 1.6050, Val Accuracy: 0.8748
Epoch 3/5, Train Loss: 1.5878, Train Accuracy: 0.8873, Val Loss: 1.5688, Val Accuracy: 0.9014
Epoch 4/5, Train Loss: 1.5642, Train Accuracy: 0.9062, Val Loss: 1.5532, Val Accuracy: 0.9159
Epoch 5/5, Train Loss: 1.5511, Train Accuracy: 0.9172, Val Loss: 1.5450, Val Accuracy: 0.9226


0,1
epoch,▁▃▅▆█
train_accuracy,▁▆███
train_loss,█▃▁▁▁
val_accuracy,▁▇▇██
val_loss,█▃▂▁▁

0,1
epoch,5.0
train_accuracy,0.91725
train_loss,1.55113
val_accuracy,0.92259
val_loss,1.54501


[34m[1mwandb[0m: Agent Starting Run: ehh5qkiy with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	dropout_prob: 0.5
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	hidden1_size: 128
[34m[1mwandb[0m: 	hidden2_size: 128
[34m[1mwandb[0m: 	hidden3_size: 32
[34m[1mwandb[0m: 	input_size: 784
[34m[1mwandb[0m: 	learning_rate: 0.0001
[34m[1mwandb[0m: 	output_size: 10


VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.011113484444447548, max=1.0…

Epoch 1/10, Train Loss: 2.0585, Train Accuracy: 0.4260, Val Loss: 1.7575, Val Accuracy: 0.7219
Epoch 2/10, Train Loss: 1.6317, Train Accuracy: 0.8515, Val Loss: 1.5805, Val Accuracy: 0.8921
Epoch 3/10, Train Loss: 1.5680, Train Accuracy: 0.9032, Val Loss: 1.5573, Val Accuracy: 0.9101
Epoch 4/10, Train Loss: 1.5464, Train Accuracy: 0.9218, Val Loss: 1.5388, Val Accuracy: 0.9265
Epoch 5/10, Train Loss: 1.5334, Train Accuracy: 0.9335, Val Loss: 1.5300, Val Accuracy: 0.9352
Epoch 6/10, Train Loss: 1.5240, Train Accuracy: 0.9417, Val Loss: 1.5228, Val Accuracy: 0.9423
Epoch 7/10, Train Loss: 1.5172, Train Accuracy: 0.9483, Val Loss: 1.5175, Val Accuracy: 0.9473
Epoch 8/10, Train Loss: 1.5118, Train Accuracy: 0.9535, Val Loss: 1.5134, Val Accuracy: 0.9498
Epoch 9/10, Train Loss: 1.5075, Train Accuracy: 0.9571, Val Loss: 1.5114, Val Accuracy: 0.9515
Epoch 10/10, Train Loss: 1.5044, Train Accuracy: 0.9598, Val Loss: 1.5100, Val Accuracy: 0.9526


0,1
epoch,▁▂▃▃▄▅▆▆▇█
train_accuracy,▁▇▇███████
train_loss,█▃▂▂▁▁▁▁▁▁
val_accuracy,▁▆▇▇▇█████
val_loss,█▃▂▂▂▁▁▁▁▁

0,1
epoch,10.0
train_accuracy,0.9598
train_loss,1.50441
val_accuracy,0.9526
val_loss,1.51003


[34m[1mwandb[0m: Agent Starting Run: x7wx2lz1 with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	dropout_prob: 0.25
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden1_size: 128
[34m[1mwandb[0m: 	hidden2_size: 32
[34m[1mwandb[0m: 	hidden3_size: 16
[34m[1mwandb[0m: 	input_size: 784
[34m[1mwandb[0m: 	learning_rate: 0.0001
[34m[1mwandb[0m: 	output_size: 10


Epoch 1/15, Train Loss: 2.0594, Train Accuracy: 0.4250, Val Loss: 1.8072, Val Accuracy: 0.6972
Epoch 2/15, Train Loss: 1.7121, Train Accuracy: 0.7741, Val Loss: 1.6249, Val Accuracy: 0.8549
Epoch 3/15, Train Loss: 1.6087, Train Accuracy: 0.8678, Val Loss: 1.5890, Val Accuracy: 0.8854
Epoch 4/15, Train Loss: 1.5819, Train Accuracy: 0.8901, Val Loss: 1.5716, Val Accuracy: 0.8988
Epoch 5/15, Train Loss: 1.5663, Train Accuracy: 0.9041, Val Loss: 1.5574, Val Accuracy: 0.9118
Epoch 6/15, Train Loss: 1.5551, Train Accuracy: 0.9138, Val Loss: 1.5496, Val Accuracy: 0.9183
Epoch 7/15, Train Loss: 1.5467, Train Accuracy: 0.9217, Val Loss: 1.5435, Val Accuracy: 0.9238
Epoch 8/15, Train Loss: 1.5401, Train Accuracy: 0.9278, Val Loss: 1.5377, Val Accuracy: 0.9292
Epoch 9/15, Train Loss: 1.5347, Train Accuracy: 0.9321, Val Loss: 1.5335, Val Accuracy: 0.9323
Epoch 10/15, Train Loss: 1.5299, Train Accuracy: 0.9369, Val Loss: 1.5302, Val Accuracy: 0.9351
Epoch 11/15, Train Loss: 1.5259, Train Accuracy: 

0,1
epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
train_accuracy,▁▆▇▇▇▇█████████
train_loss,█▄▂▂▂▂▁▁▁▁▁▁▁▁▁
val_accuracy,▁▅▆▇▇▇▇▇███████
val_loss,█▄▃▂▂▂▂▁▁▁▁▁▁▁▁

0,1
epoch,15.0
train_accuracy,0.9518
train_loss,1.51365
val_accuracy,0.94779
val_loss,1.517


[34m[1mwandb[0m: Agent Starting Run: 3z58018w with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	dropout_prob: 0.5
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden1_size: 128
[34m[1mwandb[0m: 	hidden2_size: 32
[34m[1mwandb[0m: 	hidden3_size: 16
[34m[1mwandb[0m: 	input_size: 784
[34m[1mwandb[0m: 	learning_rate: 0.0001
[34m[1mwandb[0m: 	output_size: 10


Epoch 1/15, Train Loss: 2.2074, Train Accuracy: 0.2340, Val Loss: 1.9864, Val Accuracy: 0.5562
Epoch 2/15, Train Loss: 1.8061, Train Accuracy: 0.6798, Val Loss: 1.6780, Val Accuracy: 0.8126
Epoch 3/15, Train Loss: 1.6531, Train Accuracy: 0.8256, Val Loss: 1.6320, Val Accuracy: 0.8384
Epoch 4/15, Train Loss: 1.6295, Train Accuracy: 0.8398, Val Loss: 1.6189, Val Accuracy: 0.8488
Epoch 5/15, Train Loss: 1.6193, Train Accuracy: 0.8476, Val Loss: 1.6131, Val Accuracy: 0.8523
Epoch 6/15, Train Loss: 1.6126, Train Accuracy: 0.8534, Val Loss: 1.6081, Val Accuracy: 0.8560
Epoch 7/15, Train Loss: 1.5839, Train Accuracy: 0.8853, Val Loss: 1.5548, Val Accuracy: 0.9178
Epoch 8/15, Train Loss: 1.5477, Train Accuracy: 0.9236, Val Loss: 1.5415, Val Accuracy: 0.9276
Epoch 9/15, Train Loss: 1.5367, Train Accuracy: 0.9326, Val Loss: 1.5335, Val Accuracy: 0.9334
Epoch 10/15, Train Loss: 1.5290, Train Accuracy: 0.9386, Val Loss: 1.5316, Val Accuracy: 0.9342
Epoch 11/15, Train Loss: 1.5238, Train Accuracy: 

0,1
epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
train_accuracy,▁▅▇▇▇▇▇████████
train_loss,█▄▂▂▂▂▂▁▁▁▁▁▁▁▁
val_accuracy,▁▆▆▆▆▆▇████████
val_loss,█▃▃▃▂▂▂▁▁▁▁▁▁▁▁

0,1
epoch,15.0
train_accuracy,0.95598
train_loss,1.50949
val_accuracy,0.9506
val_loss,1.51334


[34m[1mwandb[0m: Agent Starting Run: wwt2kupw with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	dropout_prob: 0.25
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden1_size: 256
[34m[1mwandb[0m: 	hidden2_size: 32
[34m[1mwandb[0m: 	hidden3_size: 16
[34m[1mwandb[0m: 	input_size: 784
[34m[1mwandb[0m: 	learning_rate: 0.0001
[34m[1mwandb[0m: 	output_size: 10


Epoch 1/15, Train Loss: 2.0087, Train Accuracy: 0.4908, Val Loss: 1.7135, Val Accuracy: 0.7791
Epoch 2/15, Train Loss: 1.6727, Train Accuracy: 0.8041, Val Loss: 1.6481, Val Accuracy: 0.8208
Epoch 3/15, Train Loss: 1.6014, Train Accuracy: 0.8735, Val Loss: 1.5698, Val Accuracy: 0.9006
Epoch 4/15, Train Loss: 1.5615, Train Accuracy: 0.9088, Val Loss: 1.5520, Val Accuracy: 0.9167
Epoch 5/15, Train Loss: 1.5474, Train Accuracy: 0.9211, Val Loss: 1.5419, Val Accuracy: 0.9252
Epoch 6/15, Train Loss: 1.5382, Train Accuracy: 0.9290, Val Loss: 1.5355, Val Accuracy: 0.9314
Epoch 7/15, Train Loss: 1.5312, Train Accuracy: 0.9354, Val Loss: 1.5298, Val Accuracy: 0.9354
Epoch 8/15, Train Loss: 1.5253, Train Accuracy: 0.9418, Val Loss: 1.5244, Val Accuracy: 0.9403
Epoch 9/15, Train Loss: 1.5204, Train Accuracy: 0.9464, Val Loss: 1.5215, Val Accuracy: 0.9430
Epoch 10/15, Train Loss: 1.5163, Train Accuracy: 0.9495, Val Loss: 1.5185, Val Accuracy: 0.9461
Epoch 11/15, Train Loss: 1.5128, Train Accuracy: 

0,1
epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
train_accuracy,▁▆▇▇▇██████████
train_loss,█▃▂▂▂▁▁▁▁▁▁▁▁▁▁
val_accuracy,▁▃▆▆▇▇▇▇▇██████
val_loss,█▆▃▃▂▂▂▂▁▁▁▁▁▁▁

0,1
epoch,15.0
train_accuracy,0.96267
train_loss,1.50219
val_accuracy,0.9559
val_loss,1.50793


[34m[1mwandb[0m: Agent Starting Run: 2rh6ay84 with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	dropout_prob: 0.5
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	hidden1_size: 256
[34m[1mwandb[0m: 	hidden2_size: 128
[34m[1mwandb[0m: 	hidden3_size: 16
[34m[1mwandb[0m: 	input_size: 784
[34m[1mwandb[0m: 	learning_rate: 0.0001
[34m[1mwandb[0m: 	output_size: 10


Epoch 1/10, Train Loss: 2.1001, Train Accuracy: 0.3789, Val Loss: 1.7852, Val Accuracy: 0.7001
Epoch 2/10, Train Loss: 1.6586, Train Accuracy: 0.8251, Val Loss: 1.5727, Val Accuracy: 0.9018
Epoch 3/10, Train Loss: 1.5575, Train Accuracy: 0.9145, Val Loss: 1.5439, Val Accuracy: 0.9245
Epoch 4/10, Train Loss: 1.5360, Train Accuracy: 0.9322, Val Loss: 1.5295, Val Accuracy: 0.9379
Epoch 5/10, Train Loss: 1.5238, Train Accuracy: 0.9430, Val Loss: 1.5215, Val Accuracy: 0.9459
Epoch 6/10, Train Loss: 1.5156, Train Accuracy: 0.9503, Val Loss: 1.5143, Val Accuracy: 0.9511
Epoch 7/10, Train Loss: 1.5096, Train Accuracy: 0.9556, Val Loss: 1.5098, Val Accuracy: 0.9541
Epoch 8/10, Train Loss: 1.5047, Train Accuracy: 0.9606, Val Loss: 1.5047, Val Accuracy: 0.9585
Epoch 9/10, Train Loss: 1.5007, Train Accuracy: 0.9639, Val Loss: 1.5023, Val Accuracy: 0.9614
Epoch 10/10, Train Loss: 1.4975, Train Accuracy: 0.9668, Val Loss: 1.4992, Val Accuracy: 0.9651


0,1
epoch,▁▂▃▃▄▅▆▆▇█
train_accuracy,▁▆▇███████
train_loss,█▃▂▁▁▁▁▁▁▁
val_accuracy,▁▆▇▇▇█████
val_loss,█▃▂▂▂▁▁▁▁▁

0,1
epoch,10.0
train_accuracy,0.96685
train_loss,1.49746
val_accuracy,0.9651
val_loss,1.49915


[34m[1mwandb[0m: Agent Starting Run: bvgbbegd with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	dropout_prob: 0.5
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden1_size: 256
[34m[1mwandb[0m: 	hidden2_size: 128
[34m[1mwandb[0m: 	hidden3_size: 16
[34m[1mwandb[0m: 	input_size: 784
[34m[1mwandb[0m: 	learning_rate: 0.0001
[34m[1mwandb[0m: 	output_size: 10


Epoch 1/15, Train Loss: 2.0820, Train Accuracy: 0.3950, Val Loss: 1.7658, Val Accuracy: 0.7555
Epoch 2/15, Train Loss: 1.6169, Train Accuracy: 0.8712, Val Loss: 1.5702, Val Accuracy: 0.9019
Epoch 3/15, Train Loss: 1.5595, Train Accuracy: 0.9106, Val Loss: 1.5469, Val Accuracy: 0.9204
Epoch 4/15, Train Loss: 1.5414, Train Accuracy: 0.9267, Val Loss: 1.5353, Val Accuracy: 0.9323
Epoch 5/15, Train Loss: 1.5301, Train Accuracy: 0.9371, Val Loss: 1.5257, Val Accuracy: 0.9395
Epoch 6/15, Train Loss: 1.5214, Train Accuracy: 0.9444, Val Loss: 1.5200, Val Accuracy: 0.9447
Epoch 7/15, Train Loss: 1.5155, Train Accuracy: 0.9497, Val Loss: 1.5156, Val Accuracy: 0.9497
Epoch 8/15, Train Loss: 1.5106, Train Accuracy: 0.9541, Val Loss: 1.5121, Val Accuracy: 0.9528
Epoch 9/15, Train Loss: 1.5063, Train Accuracy: 0.9584, Val Loss: 1.5087, Val Accuracy: 0.9554
Epoch 10/15, Train Loss: 1.5025, Train Accuracy: 0.9624, Val Loss: 1.5070, Val Accuracy: 0.9558
Epoch 11/15, Train Loss: 1.4993, Train Accuracy: 

0,1
epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
train_accuracy,▁▇▇▇███████████
train_loss,█▂▂▂▁▁▁▁▁▁▁▁▁▁▁
val_accuracy,▁▆▇▇▇▇▇████████
val_loss,█▃▂▂▂▂▁▁▁▁▁▁▁▁▁

0,1
epoch,15.0
train_accuracy,0.97357
train_loss,1.4903
val_accuracy,0.965
val_loss,1.49782


[34m[1mwandb[0m: Agent Starting Run: 540rxu1y with config:
[34m[1mwandb[0m: 	batch_size: 128
[34m[1mwandb[0m: 	dropout_prob: 0.25
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	hidden1_size: 256
[34m[1mwandb[0m: 	hidden2_size: 128
[34m[1mwandb[0m: 	hidden3_size: 16
[34m[1mwandb[0m: 	input_size: 784
[34m[1mwandb[0m: 	learning_rate: 0.0001
[34m[1mwandb[0m: 	output_size: 10


Epoch 1/10, Train Loss: 1.9209, Train Accuracy: 0.5838, Val Loss: 1.6886, Val Accuracy: 0.7885
Epoch 2/10, Train Loss: 1.6526, Train Accuracy: 0.8187, Val Loss: 1.6307, Val Accuracy: 0.8351
Epoch 3/10, Train Loss: 1.6262, Train Accuracy: 0.8397, Val Loss: 1.6163, Val Accuracy: 0.8488
Epoch 4/10, Train Loss: 1.6150, Train Accuracy: 0.8496, Val Loss: 1.6093, Val Accuracy: 0.8541
Epoch 5/10, Train Loss: 1.6075, Train Accuracy: 0.8564, Val Loss: 1.6042, Val Accuracy: 0.8595
Epoch 6/10, Train Loss: 1.6017, Train Accuracy: 0.8620, Val Loss: 1.6007, Val Accuracy: 0.8626
Epoch 7/10, Train Loss: 1.5973, Train Accuracy: 0.8660, Val Loss: 1.5963, Val Accuracy: 0.8661
Epoch 8/10, Train Loss: 1.5936, Train Accuracy: 0.8691, Val Loss: 1.5948, Val Accuracy: 0.8676
Epoch 9/10, Train Loss: 1.5900, Train Accuracy: 0.8724, Val Loss: 1.5927, Val Accuracy: 0.8689
Epoch 10/10, Train Loss: 1.5871, Train Accuracy: 0.8752, Val Loss: 1.5916, Val Accuracy: 0.8699


0,1
epoch,▁▂▃▃▄▅▆▆▇█
train_accuracy,▁▇▇▇██████
train_loss,█▂▂▂▁▁▁▁▁▁
val_accuracy,▁▅▆▇▇▇████
val_loss,█▄▃▂▂▂▁▁▁▁

0,1
epoch,10.0
train_accuracy,0.87521
train_loss,1.58705
val_accuracy,0.86989
val_loss,1.59155


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: kiibblon with config:
[34m[1mwandb[0m: 	batch_size: 128
[34m[1mwandb[0m: 	dropout_prob: 0.5
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden1_size: 256
[34m[1mwandb[0m: 	hidden2_size: 32
[34m[1mwandb[0m: 	hidden3_size: 16
[34m[1mwandb[0m: 	input_size: 784
[34m[1mwandb[0m: 	learning_rate: 0.0001
[34m[1mwandb[0m: 	output_size: 10


Epoch 1/15, Train Loss: 2.1780, Train Accuracy: 0.2724, Val Loss: 1.9513, Val Accuracy: 0.6220
Epoch 2/15, Train Loss: 1.7276, Train Accuracy: 0.7646, Val Loss: 1.6638, Val Accuracy: 0.8100
Epoch 3/15, Train Loss: 1.6418, Train Accuracy: 0.8336, Val Loss: 1.5848, Val Accuracy: 0.8971
Epoch 4/15, Train Loss: 1.5726, Train Accuracy: 0.9039, Val Loss: 1.5593, Val Accuracy: 0.9115
Epoch 5/15, Train Loss: 1.5539, Train Accuracy: 0.9178, Val Loss: 1.5464, Val Accuracy: 0.9219
Epoch 6/15, Train Loss: 1.5426, Train Accuracy: 0.9266, Val Loss: 1.5399, Val Accuracy: 0.9280
Epoch 7/15, Train Loss: 1.5339, Train Accuracy: 0.9344, Val Loss: 1.5324, Val Accuracy: 0.9340
Epoch 8/15, Train Loss: 1.5271, Train Accuracy: 0.9408, Val Loss: 1.5268, Val Accuracy: 0.9395
Epoch 9/15, Train Loss: 1.5211, Train Accuracy: 0.9459, Val Loss: 1.5235, Val Accuracy: 0.9425
Epoch 10/15, Train Loss: 1.5166, Train Accuracy: 0.9502, Val Loss: 1.5197, Val Accuracy: 0.9459
Epoch 11/15, Train Loss: 1.5127, Train Accuracy: 

0,1
epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
train_accuracy,▁▆▇▇███████████
train_loss,█▃▂▂▂▁▁▁▁▁▁▁▁▁▁
val_accuracy,▁▅▇▇▇▇█████████
val_loss,█▃▂▂▂▂▁▁▁▁▁▁▁▁▁

0,1
epoch,15.0
train_accuracy,0.9638
train_loss,1.5014
val_accuracy,0.9569
val_loss,1.50742


[34m[1mwandb[0m: Agent Starting Run: wtzd8s9e with config:
[34m[1mwandb[0m: 	batch_size: 128
[34m[1mwandb[0m: 	dropout_prob: 0.5
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden1_size: 256
[34m[1mwandb[0m: 	hidden2_size: 64
[34m[1mwandb[0m: 	hidden3_size: 16
[34m[1mwandb[0m: 	input_size: 784
[34m[1mwandb[0m: 	learning_rate: 0.0001
[34m[1mwandb[0m: 	output_size: 10


Epoch 1/15, Train Loss: 2.1214, Train Accuracy: 0.3528, Val Loss: 1.8587, Val Accuracy: 0.6328
Epoch 2/15, Train Loss: 1.6755, Train Accuracy: 0.8116, Val Loss: 1.5788, Val Accuracy: 0.8966
Epoch 3/15, Train Loss: 1.5672, Train Accuracy: 0.9064, Val Loss: 1.5539, Val Accuracy: 0.9154
Epoch 4/15, Train Loss: 1.5461, Train Accuracy: 0.9229, Val Loss: 1.5390, Val Accuracy: 0.9298
Epoch 5/15, Train Loss: 1.5330, Train Accuracy: 0.9348, Val Loss: 1.5289, Val Accuracy: 0.9370
Epoch 6/15, Train Loss: 1.5237, Train Accuracy: 0.9432, Val Loss: 1.5218, Val Accuracy: 0.9445
Epoch 7/15, Train Loss: 1.5168, Train Accuracy: 0.9497, Val Loss: 1.5185, Val Accuracy: 0.9481
Epoch 8/15, Train Loss: 1.5115, Train Accuracy: 0.9546, Val Loss: 1.5123, Val Accuracy: 0.9521
Epoch 9/15, Train Loss: 1.5072, Train Accuracy: 0.9589, Val Loss: 1.5086, Val Accuracy: 0.9574
Epoch 10/15, Train Loss: 1.5035, Train Accuracy: 0.9619, Val Loss: 1.5076, Val Accuracy: 0.9565
Epoch 11/15, Train Loss: 1.5004, Train Accuracy: 

0,1
epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
train_accuracy,▁▆▇▇███████████
train_loss,█▃▂▂▁▁▁▁▁▁▁▁▁▁▁
val_accuracy,▁▇▇▇▇██████████
val_loss,█▃▂▂▂▁▁▁▁▁▁▁▁▁▁

0,1
epoch,15.0
train_accuracy,0.97203
train_loss,1.49189
val_accuracy,0.9634
val_loss,1.49893


[34m[1mwandb[0m: Agent Starting Run: 5esex6ey with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	dropout_prob: 0.5
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden1_size: 128
[34m[1mwandb[0m: 	hidden2_size: 32
[34m[1mwandb[0m: 	hidden3_size: 32
[34m[1mwandb[0m: 	input_size: 784
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	output_size: 10


Epoch 1/15, Train Loss: 1.8620, Train Accuracy: 0.6113, Val Loss: 1.5662, Val Accuracy: 0.9051
Epoch 2/15, Train Loss: 1.5399, Train Accuracy: 0.9235, Val Loss: 1.5229, Val Accuracy: 0.9397
Epoch 3/15, Train Loss: 1.5168, Train Accuracy: 0.9454, Val Loss: 1.5099, Val Accuracy: 0.9513
Epoch 4/15, Train Loss: 1.5052, Train Accuracy: 0.9565, Val Loss: 1.5064, Val Accuracy: 0.9545
Epoch 5/15, Train Loss: 1.4990, Train Accuracy: 0.9625, Val Loss: 1.4969, Val Accuracy: 0.9648
Epoch 6/15, Train Loss: 1.4943, Train Accuracy: 0.9672, Val Loss: 1.4979, Val Accuracy: 0.9640
Epoch 7/15, Train Loss: 1.4915, Train Accuracy: 0.9699, Val Loss: 1.4950, Val Accuracy: 0.9662
Epoch 8/15, Train Loss: 1.4895, Train Accuracy: 0.9715, Val Loss: 1.4958, Val Accuracy: 0.9652
Epoch 9/15, Train Loss: 1.4871, Train Accuracy: 0.9742, Val Loss: 1.4938, Val Accuracy: 0.9670
Epoch 10/15, Train Loss: 1.4864, Train Accuracy: 0.9749, Val Loss: 1.4953, Val Accuracy: 0.9655
Epoch 11/15, Train Loss: 1.4844, Train Accuracy: 

0,1
epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
train_accuracy,▁▇▇████████████
train_loss,█▂▂▁▁▁▁▁▁▁▁▁▁▁▁
val_accuracy,▁▅▆▆▇▇▇▇▇▇█▇███
val_loss,█▄▃▃▂▂▂▂▂▂▁▂▁▁▁

0,1
epoch,15.0
train_accuracy,0.9798
train_loss,1.4814
val_accuracy,0.9707
val_loss,1.49023


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
500 response executing GraphQL.
{"errors":[{"message":"Post \"http://anaconda2.default.svc.cluster.local/search\": read tcp 10.54.77.4:43546-\u003e10.55.247.53:80: read: connection reset by peer","path":["agentHeartbeat"]}],"data":{"agentHeartbeat":null}}
[34m[1mwandb[0m: [32m[41mERROR[0m Error while calling W&B API: Post "http://anaconda2.default.svc.cluster.local/search": read tcp 10.54.77.4:43546->10.55.247.53:80: read: connection reset by peer (<Response [500]>)
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: awi0lajp with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	dropout_prob: 0.5
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden1_size: 128
[34m[1mwandb[0m: 	hidden2_size: 32
[34m[1mwandb[0m: 	hidden3_size: 64
[34m[1mwandb[0m: 	input_size: 784
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	output_size: 10


Epoch 1/15, Train Loss: 1.8124, Train Accuracy: 0.6624, Val Loss: 1.5581, Val Accuracy: 0.9044
Epoch 2/15, Train Loss: 1.5347, Train Accuracy: 0.9279, Val Loss: 1.5175, Val Accuracy: 0.9440
Epoch 3/15, Train Loss: 1.5126, Train Accuracy: 0.9493, Val Loss: 1.5097, Val Accuracy: 0.9518
Epoch 4/15, Train Loss: 1.5035, Train Accuracy: 0.9578, Val Loss: 1.5029, Val Accuracy: 0.9581
Epoch 5/15, Train Loss: 1.4976, Train Accuracy: 0.9639, Val Loss: 1.5003, Val Accuracy: 0.9608
Epoch 6/15, Train Loss: 1.4951, Train Accuracy: 0.9662, Val Loss: 1.4946, Val Accuracy: 0.9666
Epoch 7/15, Train Loss: 1.4919, Train Accuracy: 0.9693, Val Loss: 1.4967, Val Accuracy: 0.9642
Epoch 8/15, Train Loss: 1.4897, Train Accuracy: 0.9716, Val Loss: 1.4950, Val Accuracy: 0.9663
Epoch 9/15, Train Loss: 1.4897, Train Accuracy: 0.9713, Val Loss: 1.4979, Val Accuracy: 0.9627
Epoch 10/15, Train Loss: 1.4871, Train Accuracy: 0.9742, Val Loss: 1.4920, Val Accuracy: 0.9693
Epoch 11/15, Train Loss: 1.4875, Train Accuracy: 

0,1
epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
train_accuracy,▁▇▇████████████
train_loss,█▂▂▁▁▁▁▁▁▁▁▁▁▁▁
val_accuracy,▁▅▆▇▇█▇█▇██▇██▇
val_loss,█▄▃▂▂▁▂▁▂▁▁▂▁▁▂

0,1
epoch,15.0
train_accuracy,0.97633
train_loss,1.4848
val_accuracy,0.9634
val_loss,1.49787


[34m[1mwandb[0m: Agent Starting Run: lps2ql2t with config:
[34m[1mwandb[0m: 	batch_size: 128
[34m[1mwandb[0m: 	dropout_prob: 0.5
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden1_size: 256
[34m[1mwandb[0m: 	hidden2_size: 32
[34m[1mwandb[0m: 	hidden3_size: 32
[34m[1mwandb[0m: 	input_size: 784
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	output_size: 10


Epoch 1/15, Train Loss: 1.8280, Train Accuracy: 0.6463, Val Loss: 1.5515, Val Accuracy: 0.9142
Epoch 2/15, Train Loss: 1.5298, Train Accuracy: 0.9328, Val Loss: 1.5179, Val Accuracy: 0.9435
Epoch 3/15, Train Loss: 1.5086, Train Accuracy: 0.9530, Val Loss: 1.5000, Val Accuracy: 0.9615
Epoch 4/15, Train Loss: 1.4995, Train Accuracy: 0.9620, Val Loss: 1.4995, Val Accuracy: 0.9621
Epoch 5/15, Train Loss: 1.4929, Train Accuracy: 0.9687, Val Loss: 1.4971, Val Accuracy: 0.9633
Epoch 6/15, Train Loss: 1.4906, Train Accuracy: 0.9708, Val Loss: 1.4931, Val Accuracy: 0.9684
Epoch 7/15, Train Loss: 1.4881, Train Accuracy: 0.9731, Val Loss: 1.4909, Val Accuracy: 0.9703
Epoch 8/15, Train Loss: 1.4856, Train Accuracy: 0.9756, Val Loss: 1.4904, Val Accuracy: 0.9705
Epoch 9/15, Train Loss: 1.4839, Train Accuracy: 0.9772, Val Loss: 1.4896, Val Accuracy: 0.9712
Epoch 10/15, Train Loss: 1.4839, Train Accuracy: 0.9774, Val Loss: 1.4887, Val Accuracy: 0.9723
Epoch 11/15, Train Loss: 1.4830, Train Accuracy: 

0,1
epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
train_accuracy,▁▇▇████████████
train_loss,█▂▂▁▁▁▁▁▁▁▁▁▁▁▁
val_accuracy,▁▅▇▇▇▇█████▇███
val_loss,█▄▂▂▂▂▁▁▁▁▁▁▁▁▁

0,1
epoch,15.0
train_accuracy,0.98062
train_loss,1.48035
val_accuracy,0.9726
val_loss,1.48845


[34m[1mwandb[0m: Agent Starting Run: 50882kej with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	dropout_prob: 0.25
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden1_size: 256
[34m[1mwandb[0m: 	hidden2_size: 64
[34m[1mwandb[0m: 	hidden3_size: 64
[34m[1mwandb[0m: 	input_size: 784
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	output_size: 10


Epoch 1/15, Train Loss: 1.6288, Train Accuracy: 0.8413, Val Loss: 1.5331, Val Accuracy: 0.9275
Epoch 2/15, Train Loss: 1.5182, Train Accuracy: 0.9439, Val Loss: 1.5064, Val Accuracy: 0.9552
Epoch 3/15, Train Loss: 1.5048, Train Accuracy: 0.9570, Val Loss: 1.5051, Val Accuracy: 0.9562
Epoch 4/15, Train Loss: 1.4974, Train Accuracy: 0.9640, Val Loss: 1.4971, Val Accuracy: 0.9638
Epoch 5/15, Train Loss: 1.4942, Train Accuracy: 0.9670, Val Loss: 1.5031, Val Accuracy: 0.9577
Epoch 6/15, Train Loss: 1.4917, Train Accuracy: 0.9695, Val Loss: 1.4948, Val Accuracy: 0.9664
Epoch 7/15, Train Loss: 1.4906, Train Accuracy: 0.9706, Val Loss: 1.4898, Val Accuracy: 0.9711
Epoch 8/15, Train Loss: 1.4889, Train Accuracy: 0.9721, Val Loss: 1.4924, Val Accuracy: 0.9688
Epoch 9/15, Train Loss: 1.4884, Train Accuracy: 0.9728, Val Loss: 1.4906, Val Accuracy: 0.9702
Epoch 10/15, Train Loss: 1.4871, Train Accuracy: 0.9740, Val Loss: 1.4948, Val Accuracy: 0.9660
Epoch 11/15, Train Loss: 1.4877, Train Accuracy: 

0,1
epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
train_accuracy,▁▆▇▇███████████
train_loss,█▃▂▂▁▁▁▁▁▁▁▁▁▁▁
val_accuracy,▁▅▅▇▆▇█▇█▇▇█▇█▇
val_loss,█▄▄▂▃▂▁▂▁▂▂▁▂▁▂

0,1
epoch,15.0
train_accuracy,0.97535
train_loss,1.48573
val_accuracy,0.967
val_loss,1.4941


[34m[1mwandb[0m: Agent Starting Run: fyu7f9iq with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	dropout_prob: 0.5
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden1_size: 256
[34m[1mwandb[0m: 	hidden2_size: 64
[34m[1mwandb[0m: 	hidden3_size: 64
[34m[1mwandb[0m: 	input_size: 784
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	output_size: 10


Epoch 1/15, Train Loss: 1.7138, Train Accuracy: 0.7580, Val Loss: 1.5405, Val Accuracy: 0.9211
Epoch 2/15, Train Loss: 1.5270, Train Accuracy: 0.9345, Val Loss: 1.5131, Val Accuracy: 0.9488
Epoch 3/15, Train Loss: 1.5077, Train Accuracy: 0.9540, Val Loss: 1.5031, Val Accuracy: 0.9589
Epoch 4/15, Train Loss: 1.5008, Train Accuracy: 0.9605, Val Loss: 1.5034, Val Accuracy: 0.9575
Epoch 5/15, Train Loss: 1.4956, Train Accuracy: 0.9657, Val Loss: 1.4924, Val Accuracy: 0.9690
Epoch 6/15, Train Loss: 1.4934, Train Accuracy: 0.9678, Val Loss: 1.4959, Val Accuracy: 0.9652
Epoch 7/15, Train Loss: 1.4918, Train Accuracy: 0.9693, Val Loss: 1.4948, Val Accuracy: 0.9660
Epoch 8/15, Train Loss: 1.4894, Train Accuracy: 0.9718, Val Loss: 1.4964, Val Accuracy: 0.9646
Epoch 9/15, Train Loss: 1.4898, Train Accuracy: 0.9714, Val Loss: 1.4929, Val Accuracy: 0.9683
Epoch 10/15, Train Loss: 1.4886, Train Accuracy: 0.9725, Val Loss: 1.4917, Val Accuracy: 0.9695
Epoch 11/15, Train Loss: 1.4863, Train Accuracy: 

0,1
epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
train_accuracy,▁▇▇████████████
train_loss,█▂▂▁▁▁▁▁▁▁▁▁▁▁▁
val_accuracy,▁▅▆▆█▇▇▇████▆██
val_loss,█▄▃▃▁▂▁▂▁▁▁▁▃▁▁

0,1
epoch,15.0
train_accuracy,0.97392
train_loss,1.48715
val_accuracy,0.968
val_loss,1.49266


[34m[1mwandb[0m: Agent Starting Run: e7ifi15n with config:
[34m[1mwandb[0m: 	batch_size: 128
[34m[1mwandb[0m: 	dropout_prob: 0.5
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	hidden1_size: 256
[34m[1mwandb[0m: 	hidden2_size: 64
[34m[1mwandb[0m: 	hidden3_size: 64
[34m[1mwandb[0m: 	input_size: 784
[34m[1mwandb[0m: 	learning_rate: 0.0001
[34m[1mwandb[0m: 	output_size: 10


Epoch 1/15, Train Loss: 1.9956, Train Accuracy: 0.5081, Val Loss: 1.6904, Val Accuracy: 0.7939
Epoch 2/15, Train Loss: 1.6153, Train Accuracy: 0.8612, Val Loss: 1.5692, Val Accuracy: 0.9019
Epoch 3/15, Train Loss: 1.5605, Train Accuracy: 0.9084, Val Loss: 1.5482, Val Accuracy: 0.9199
Epoch 4/15, Train Loss: 1.5423, Train Accuracy: 0.9250, Val Loss: 1.5367, Val Accuracy: 0.9295
Epoch 5/15, Train Loss: 1.5303, Train Accuracy: 0.9353, Val Loss: 1.5292, Val Accuracy: 0.9365
Epoch 6/15, Train Loss: 1.5206, Train Accuracy: 0.9449, Val Loss: 1.5188, Val Accuracy: 0.9456
Epoch 7/15, Train Loss: 1.5137, Train Accuracy: 0.9511, Val Loss: 1.5141, Val Accuracy: 0.9495
Epoch 8/15, Train Loss: 1.5082, Train Accuracy: 0.9566, Val Loss: 1.5093, Val Accuracy: 0.9540
Epoch 9/15, Train Loss: 1.5039, Train Accuracy: 0.9602, Val Loss: 1.5057, Val Accuracy: 0.9578


[34m[1mwandb[0m: Ctrl + C detected. Stopping sweep.


Epoch 10/15, Train Loss: 1.5000, Train Accuracy: 0.9637, Val Loss: 1.5035, Val Accuracy: 0.9594
Epoch 11/15, Train Loss: 1.4967, Train Accuracy: 0.9670, Val Loss: 1.5012, Val Accuracy: 0.9613
Epoch 12/15, Train Loss: 1.4941, Train Accuracy: 0.9698, Val Loss: 1.5006, Val Accuracy: 0.9616
Epoch 13/15, Train Loss: 1.4916, Train Accuracy: 0.9720, Val Loss: 1.4983, Val Accuracy: 0.9639
Epoch 14/15, Train Loss: 1.4892, Train Accuracy: 0.9744, Val Loss: 1.4969, Val Accuracy: 0.9646
Epoch 15/15, Train Loss: 1.4873, Train Accuracy: 0.9761, Val Loss: 1.4958, Val Accuracy: 0.9666


0,1
epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
train_accuracy,▁▆▇▇▇██████████
train_loss,█▃▂▂▂▁▁▁▁▁▁▁▁▁▁
val_accuracy,▁▅▆▆▇▇▇▇███████
val_loss,█▄▃▂▂▂▂▁▁▁▁▁▁▁▁

0,1
epoch,15.0
train_accuracy,0.97607
train_loss,1.48728
val_accuracy,0.9666
val_loss,1.49577
