https://optuna.readthedocs.io/en/stable/tutorial/10_key_features/001_first.html
https://optuna.readthedocs.io/en/stable/tutorial/10_key_features/005_visualization.html

In [1]:
import pickle
import numpy as np
import optuna

from sgd_vanilla import NeuralNetwork

rng = np.random.default_rng()

with open("../data/train_data.pkl", "rb") as train_file:
    train_data = pickle.load(train_file)

rng.shuffle(train_data)
validation_data = train_data[:5000]
train_data = train_data[5000:]

In [2]:
def objective(trial):
    learning_rate = trial.suggest_float("learning rate", 1e-4, 1e-3, log=True)
    nn = NeuralNetwork(
        [28*28, 512, 128, 64, 10], # layers size
        learning_rate,             # learning rate
        64,                        # mini batch size
        5                          # training epochs
    )
    accuracy, _ = nn.train(train_data, validation_data)
    return accuracy[-1]

In [None]:
study = optuna.create_study()
study.optimize(objective, n_trials=25)

[I 2024-06-12 13:23:46,705] A new study created in memory with name: no-name-82ffe1b3-dae1-4c85-8043-ddeac76cb5ce


Pre-train stats
==> Accuracy: 10.1%, Avg loss: 9.000728

Epoch 0
--------------------
loss: 8.866458 [mini-batch 0 / 859]
loss: 0.881842 [mini-batch 100 / 859]
loss: 0.659893 [mini-batch 200 / 859]
loss: 0.530655 [mini-batch 300 / 859]
loss: 0.700067 [mini-batch 400 / 859]
loss: 0.367362 [mini-batch 500 / 859]
loss: 0.435664 [mini-batch 600 / 859]
loss: 0.455259 [mini-batch 700 / 859]
loss: 0.923461 [mini-batch 800 / 859]
==> Accuracy: 80.4%, Avg loss: 0.541495

Epoch 1
--------------------
loss: 0.518105 [mini-batch 0 / 859]
loss: 0.611965 [mini-batch 100 / 859]
loss: 0.647817 [mini-batch 200 / 859]
loss: 0.477707 [mini-batch 300 / 859]
loss: 0.771698 [mini-batch 400 / 859]
loss: 0.575689 [mini-batch 500 / 859]
loss: 0.391102 [mini-batch 600 / 859]
loss: 0.491002 [mini-batch 700 / 859]
loss: 0.374999 [mini-batch 800 / 859]
==> Accuracy: 83.4%, Avg loss: 0.472870

Epoch 2
--------------------
loss: 0.553387 [mini-batch 0 / 859]
loss: 0.456707 [mini-batch 100 / 859]
loss: 0.636097 [mini

[I 2024-06-12 13:38:33,487] Trial 0 finished with value: 0.8512 and parameters: {'learning rate': 0.0004437163581561674}. Best is trial 0 with value: 0.8512.


==> Accuracy: 85.1%, Avg loss: 0.431644

Pre-train stats
==> Accuracy: 10.3%, Avg loss: 4.562168

Epoch 0
--------------------
loss: 4.820167 [mini-batch 0 / 859]
loss: 0.936026 [mini-batch 100 / 859]
loss: 0.715067 [mini-batch 200 / 859]
loss: 0.780578 [mini-batch 300 / 859]
loss: 0.696230 [mini-batch 400 / 859]
loss: 0.538574 [mini-batch 500 / 859]
loss: 0.688270 [mini-batch 600 / 859]
loss: 0.594144 [mini-batch 700 / 859]
loss: 0.434258 [mini-batch 800 / 859]
==> Accuracy: 80.1%, Avg loss: 0.562122

Epoch 1
--------------------
loss: 0.598184 [mini-batch 0 / 859]
loss: 0.442537 [mini-batch 100 / 859]
loss: 0.491224 [mini-batch 200 / 859]
loss: 0.559158 [mini-batch 300 / 859]
loss: 0.504514 [mini-batch 400 / 859]
loss: 0.387381 [mini-batch 500 / 859]
loss: 0.365784 [mini-batch 600 / 859]
loss: 0.391437 [mini-batch 700 / 859]
loss: 0.443364 [mini-batch 800 / 859]
==> Accuracy: 83.8%, Avg loss: 0.481733

Epoch 2
--------------------
loss: 0.287478 [mini-batch 0 / 859]
loss: 0.309082 [m

[I 2024-06-12 13:54:27,728] Trial 1 finished with value: 0.8604 and parameters: {'learning rate': 0.0004514440280308172}. Best is trial 0 with value: 0.8512.


==> Accuracy: 86.0%, Avg loss: 0.420970

Pre-train stats
==> Accuracy: 12.0%, Avg loss: 6.242573

Epoch 0
--------------------
loss: 5.966149 [mini-batch 0 / 859]
loss: 1.047390 [mini-batch 100 / 859]
loss: 0.799829 [mini-batch 200 / 859]
loss: 0.794129 [mini-batch 300 / 859]
loss: 0.811532 [mini-batch 400 / 859]
loss: 0.764422 [mini-batch 500 / 859]
loss: 0.692177 [mini-batch 600 / 859]
loss: 0.723001 [mini-batch 700 / 859]
loss: 0.701173 [mini-batch 800 / 859]
==> Accuracy: 80.1%, Avg loss: 0.588663

Epoch 1
--------------------
loss: 0.640417 [mini-batch 0 / 859]
loss: 0.672681 [mini-batch 100 / 859]
loss: 0.777295 [mini-batch 200 / 859]
loss: 0.403340 [mini-batch 300 / 859]
loss: 0.678557 [mini-batch 400 / 859]
loss: 0.617411 [mini-batch 500 / 859]
loss: 0.458200 [mini-batch 600 / 859]
loss: 0.630584 [mini-batch 700 / 859]
loss: 0.445764 [mini-batch 800 / 859]
==> Accuracy: 81.5%, Avg loss: 0.545493

Epoch 2
--------------------
loss: 0.482509 [mini-batch 0 / 859]
loss: 0.505648 [m

[I 2024-06-12 14:09:19,371] Trial 2 finished with value: 0.8408 and parameters: {'learning rate': 0.00020866599733371298}. Best is trial 2 with value: 0.8408.


==> Accuracy: 84.1%, Avg loss: 0.468643

Pre-train stats
==> Accuracy: 10.1%, Avg loss: 7.797148

Epoch 0
--------------------
loss: 6.925289 [mini-batch 0 / 859]
loss: 1.524329 [mini-batch 100 / 859]
loss: 1.068945 [mini-batch 200 / 859]
loss: 0.906683 [mini-batch 300 / 859]
loss: 0.961782 [mini-batch 400 / 859]
loss: 0.738598 [mini-batch 500 / 859]
loss: 0.616360 [mini-batch 600 / 859]
loss: 0.519327 [mini-batch 700 / 859]
loss: 0.680749 [mini-batch 800 / 859]
==> Accuracy: 79.2%, Avg loss: 0.598241

Epoch 1
--------------------
loss: 0.750898 [mini-batch 0 / 859]
loss: 0.738507 [mini-batch 100 / 859]
loss: 0.412878 [mini-batch 200 / 859]
loss: 0.857545 [mini-batch 300 / 859]
loss: 0.622967 [mini-batch 400 / 859]
loss: 0.620825 [mini-batch 500 / 859]
loss: 0.493126 [mini-batch 600 / 859]
loss: 0.672778 [mini-batch 700 / 859]
loss: 0.549215 [mini-batch 800 / 859]
==> Accuracy: 81.8%, Avg loss: 0.536513

Epoch 2
--------------------
loss: 0.520314 [mini-batch 0 / 859]
loss: 0.708093 [m

[I 2024-06-12 14:25:33,808] Trial 3 finished with value: 0.831 and parameters: {'learning rate': 0.00013627600193272257}. Best is trial 3 with value: 0.831.


==> Accuracy: 83.1%, Avg loss: 0.489092

Pre-train stats
==> Accuracy: 8.8%, Avg loss: 6.087781

Epoch 0
--------------------
loss: 6.498596 [mini-batch 0 / 859]
loss: 1.837100 [mini-batch 100 / 859]
loss: 1.176148 [mini-batch 200 / 859]
loss: 0.754956 [mini-batch 300 / 859]
loss: 0.954912 [mini-batch 400 / 859]
loss: 0.717914 [mini-batch 500 / 859]
loss: 0.589918 [mini-batch 600 / 859]
loss: 0.416977 [mini-batch 700 / 859]
loss: 0.662961 [mini-batch 800 / 859]
==> Accuracy: 80.1%, Avg loss: 0.568350

Epoch 1
--------------------
loss: 0.477265 [mini-batch 0 / 859]
loss: 0.572798 [mini-batch 100 / 859]
loss: 0.542463 [mini-batch 200 / 859]
loss: 0.619687 [mini-batch 300 / 859]
loss: 0.606729 [mini-batch 400 / 859]
loss: 0.533448 [mini-batch 500 / 859]
loss: 0.469538 [mini-batch 600 / 859]
loss: 0.555447 [mini-batch 700 / 859]
loss: 0.440747 [mini-batch 800 / 859]
==> Accuracy: 82.5%, Avg loss: 0.513392

Epoch 2
--------------------
loss: 0.541072 [mini-batch 0 / 859]
loss: 0.380361 [mi

[I 2024-06-12 14:40:42,625] Trial 4 finished with value: 0.8416 and parameters: {'learning rate': 0.000287994693976329}. Best is trial 3 with value: 0.831.


==> Accuracy: 84.2%, Avg loss: 0.467387

Pre-train stats
==> Accuracy: 13.4%, Avg loss: 8.188748

Epoch 0
--------------------
loss: 7.201646 [mini-batch 0 / 859]
loss: 0.891333 [mini-batch 100 / 859]
loss: 0.598788 [mini-batch 200 / 859]
loss: 0.790441 [mini-batch 300 / 859]
loss: 0.680422 [mini-batch 400 / 859]
loss: 0.500444 [mini-batch 500 / 859]
loss: 0.627865 [mini-batch 600 / 859]
loss: 0.567348 [mini-batch 700 / 859]
loss: 0.555104 [mini-batch 800 / 859]
==> Accuracy: 82.9%, Avg loss: 0.499266

Epoch 1
--------------------
loss: 0.454355 [mini-batch 0 / 859]
loss: 0.480871 [mini-batch 100 / 859]
loss: 0.664678 [mini-batch 200 / 859]
loss: 0.629459 [mini-batch 300 / 859]
loss: 0.544381 [mini-batch 400 / 859]
loss: 0.648892 [mini-batch 500 / 859]
loss: 0.391485 [mini-batch 600 / 859]
loss: 0.551259 [mini-batch 700 / 859]
loss: 0.498753 [mini-batch 800 / 859]
==> Accuracy: 84.2%, Avg loss: 0.458844

Epoch 2
--------------------
loss: 0.816084 [mini-batch 0 / 859]
loss: 0.588929 [m

[I 2024-06-12 14:56:23,065] Trial 5 finished with value: 0.8542 and parameters: {'learning rate': 0.0005816055034884646}. Best is trial 3 with value: 0.831.


==> Accuracy: 85.4%, Avg loss: 0.429201

Pre-train stats
==> Accuracy: 9.2%, Avg loss: 6.191231

Epoch 0
--------------------
loss: 6.965971 [mini-batch 0 / 859]
loss: 1.462080 [mini-batch 100 / 859]
loss: 0.861791 [mini-batch 200 / 859]
loss: 0.799740 [mini-batch 300 / 859]
loss: 0.755628 [mini-batch 400 / 859]
loss: 0.833375 [mini-batch 500 / 859]
loss: 1.014141 [mini-batch 600 / 859]
loss: 0.634485 [mini-batch 700 / 859]
loss: 0.485032 [mini-batch 800 / 859]
==> Accuracy: 78.7%, Avg loss: 0.645980

Epoch 1
--------------------
loss: 0.533410 [mini-batch 0 / 859]
loss: 0.792100 [mini-batch 100 / 859]
loss: 0.751857 [mini-batch 200 / 859]
loss: 0.503681 [mini-batch 300 / 859]
loss: 0.445223 [mini-batch 400 / 859]
loss: 0.305877 [mini-batch 500 / 859]
loss: 0.814349 [mini-batch 600 / 859]
loss: 0.645560 [mini-batch 700 / 859]
loss: 0.489360 [mini-batch 800 / 859]
==> Accuracy: 80.7%, Avg loss: 0.574781

Epoch 2
--------------------
loss: 0.299847 [mini-batch 0 / 859]
loss: 0.456690 [mi

[I 2024-06-12 15:12:00,664] Trial 6 finished with value: 0.8254 and parameters: {'learning rate': 0.0001364991280059422}. Best is trial 6 with value: 0.8254.


==> Accuracy: 82.5%, Avg loss: 0.507367

Pre-train stats
==> Accuracy: 10.4%, Avg loss: 11.084662

Epoch 0
--------------------
loss: 9.041826 [mini-batch 0 / 859]
loss: 0.779126 [mini-batch 100 / 859]
loss: 0.616738 [mini-batch 200 / 859]
loss: 0.595445 [mini-batch 300 / 859]
loss: 0.608927 [mini-batch 400 / 859]
loss: 0.621036 [mini-batch 500 / 859]
loss: 0.611373 [mini-batch 600 / 859]
loss: 0.568117 [mini-batch 700 / 859]
loss: 0.534933 [mini-batch 800 / 859]
==> Accuracy: 83.1%, Avg loss: 0.498324

Epoch 1
--------------------
loss: 0.671123 [mini-batch 0 / 859]
loss: 0.431759 [mini-batch 100 / 859]
loss: 0.536215 [mini-batch 200 / 859]
loss: 0.526610 [mini-batch 300 / 859]
loss: 0.417372 [mini-batch 400 / 859]
loss: 0.549309 [mini-batch 500 / 859]
loss: 0.529299 [mini-batch 600 / 859]
loss: 0.605050 [mini-batch 700 / 859]
loss: 0.472748 [mini-batch 800 / 859]
==> Accuracy: 83.7%, Avg loss: 0.466718

Epoch 2
--------------------
loss: 0.763190 [mini-batch 0 / 859]
loss: 0.549931 [

[I 2024-06-12 15:27:51,821] Trial 7 finished with value: 0.8276 and parameters: {'learning rate': 0.000903002305155762}. Best is trial 6 with value: 0.8254.


==> Accuracy: 82.8%, Avg loss: 0.486249

Pre-train stats
==> Accuracy: 3.5%, Avg loss: 9.136259

Epoch 0
--------------------
loss: 9.485009 [mini-batch 0 / 859]
loss: 0.687885 [mini-batch 100 / 859]
loss: 0.705041 [mini-batch 200 / 859]
loss: 0.794209 [mini-batch 300 / 859]
loss: 0.549363 [mini-batch 400 / 859]
loss: 0.550709 [mini-batch 500 / 859]
loss: 0.684033 [mini-batch 600 / 859]
loss: 0.426324 [mini-batch 700 / 859]
loss: 0.347384 [mini-batch 800 / 859]
==> Accuracy: 81.9%, Avg loss: 0.519501

Epoch 1
--------------------
loss: 0.359496 [mini-batch 0 / 859]
loss: 0.442184 [mini-batch 100 / 859]
loss: 0.426862 [mini-batch 200 / 859]
loss: 0.652791 [mini-batch 300 / 859]
loss: 0.565212 [mini-batch 400 / 859]
loss: 0.391590 [mini-batch 500 / 859]
loss: 0.370072 [mini-batch 600 / 859]
loss: 0.291972 [mini-batch 700 / 859]
loss: 0.397506 [mini-batch 800 / 859]
==> Accuracy: 83.0%, Avg loss: 0.481632

Epoch 2
--------------------
loss: 0.618995 [mini-batch 0 / 859]
loss: 0.452287 [mi

[I 2024-06-12 15:43:52,977] Trial 8 finished with value: 0.8414 and parameters: {'learning rate': 0.0004574410149513964}. Best is trial 6 with value: 0.8254.


==> Accuracy: 84.1%, Avg loss: 0.436923

Pre-train stats
==> Accuracy: 11.3%, Avg loss: 6.101556

Epoch 0
--------------------
loss: 5.834920 [mini-batch 0 / 859]
loss: 0.953763 [mini-batch 100 / 859]
loss: 0.901686 [mini-batch 200 / 859]
loss: 0.556840 [mini-batch 300 / 859]
loss: 0.581588 [mini-batch 400 / 859]
loss: 0.528405 [mini-batch 500 / 859]
loss: 0.546244 [mini-batch 600 / 859]
loss: 0.607057 [mini-batch 700 / 859]
loss: 0.493144 [mini-batch 800 / 859]
==> Accuracy: 81.8%, Avg loss: 0.526259

Epoch 1
--------------------
loss: 0.565667 [mini-batch 0 / 859]
loss: 0.641366 [mini-batch 100 / 859]
loss: 0.568173 [mini-batch 200 / 859]
loss: 0.567551 [mini-batch 300 / 859]
loss: 0.483608 [mini-batch 400 / 859]
loss: 0.471723 [mini-batch 500 / 859]
loss: 0.480278 [mini-batch 600 / 859]
loss: 0.500612 [mini-batch 700 / 859]
loss: 0.611594 [mini-batch 800 / 859]
==> Accuracy: 82.9%, Avg loss: 0.490841

Epoch 2
--------------------
loss: 0.501723 [mini-batch 0 / 859]
loss: 0.345873 [m

[I 2024-06-12 15:59:38,867] Trial 9 finished with value: 0.8534 and parameters: {'learning rate': 0.000724521465319238}. Best is trial 6 with value: 0.8254.


==> Accuracy: 85.3%, Avg loss: 0.430354

Pre-train stats
==> Accuracy: 10.3%, Avg loss: 8.582256

Epoch 0
--------------------
loss: 8.813116 [mini-batch 0 / 859]
loss: 1.228042 [mini-batch 100 / 859]
loss: 0.858859 [mini-batch 200 / 859]
loss: 0.863346 [mini-batch 300 / 859]
loss: 0.780993 [mini-batch 400 / 859]
loss: 0.800408 [mini-batch 500 / 859]
loss: 0.636946 [mini-batch 600 / 859]
loss: 0.752505 [mini-batch 700 / 859]
loss: 1.169689 [mini-batch 800 / 859]
==> Accuracy: 78.4%, Avg loss: 0.651013

Epoch 1
--------------------
loss: 0.538985 [mini-batch 0 / 859]
loss: 0.916497 [mini-batch 100 / 859]
loss: 0.714216 [mini-batch 200 / 859]
loss: 0.546618 [mini-batch 300 / 859]
loss: 0.559702 [mini-batch 400 / 859]
loss: 0.563504 [mini-batch 500 / 859]
loss: 0.607393 [mini-batch 600 / 859]
loss: 0.643787 [mini-batch 700 / 859]
loss: 0.942644 [mini-batch 800 / 859]
==> Accuracy: 81.0%, Avg loss: 0.571775

Epoch 2
--------------------
loss: 0.601617 [mini-batch 0 / 859]
loss: 0.860419 [m

[I 2024-06-12 16:14:40,153] Trial 10 finished with value: 0.8274 and parameters: {'learning rate': 0.00010762376532531508}. Best is trial 6 with value: 0.8254.


==> Accuracy: 82.7%, Avg loss: 0.504456

Pre-train stats
==> Accuracy: 10.1%, Avg loss: 12.304446

Epoch 0
--------------------
loss: 13.407714 [mini-batch 0 / 859]
loss: 1.608603 [mini-batch 100 / 859]
loss: 1.048988 [mini-batch 200 / 859]
loss: 0.915929 [mini-batch 300 / 859]
loss: 1.123023 [mini-batch 400 / 859]
loss: 0.745810 [mini-batch 500 / 859]
loss: 0.866037 [mini-batch 600 / 859]
loss: 0.810282 [mini-batch 700 / 859]
loss: 1.092426 [mini-batch 800 / 859]
==> Accuracy: 75.6%, Avg loss: 0.691549

Epoch 1
--------------------
loss: 0.547916 [mini-batch 0 / 859]
loss: 0.567841 [mini-batch 100 / 859]
loss: 0.769707 [mini-batch 200 / 859]
loss: 0.709317 [mini-batch 300 / 859]
loss: 0.614541 [mini-batch 400 / 859]
loss: 0.561203 [mini-batch 500 / 859]
loss: 0.619512 [mini-batch 600 / 859]
loss: 0.484621 [mini-batch 700 / 859]
loss: 0.637933 [mini-batch 800 / 859]
==> Accuracy: 80.3%, Avg loss: 0.598218

Epoch 2
--------------------
loss: 0.821456 [mini-batch 0 / 859]
loss: 0.637292 

[I 2024-06-12 16:29:31,285] Trial 11 finished with value: 0.827 and parameters: {'learning rate': 0.00010431954414663995}. Best is trial 6 with value: 0.8254.


==> Accuracy: 82.7%, Avg loss: 0.512641

Pre-train stats
==> Accuracy: 18.9%, Avg loss: 5.829361

Epoch 0
--------------------
loss: 4.815021 [mini-batch 0 / 859]
loss: 0.883402 [mini-batch 100 / 859]
loss: 0.773169 [mini-batch 200 / 859]
loss: 0.748894 [mini-batch 300 / 859]
loss: 1.097523 [mini-batch 400 / 859]
loss: 0.864094 [mini-batch 500 / 859]
loss: 0.592872 [mini-batch 600 / 859]
loss: 0.695615 [mini-batch 700 / 859]
loss: 0.739422 [mini-batch 800 / 859]
==> Accuracy: 79.3%, Avg loss: 0.634386

Epoch 1
--------------------
loss: 0.634315 [mini-batch 0 / 859]
loss: 0.949971 [mini-batch 100 / 859]
loss: 0.439669 [mini-batch 200 / 859]
loss: 0.966108 [mini-batch 300 / 859]
loss: 0.647790 [mini-batch 400 / 859]
loss: 0.729302 [mini-batch 500 / 859]
loss: 0.547869 [mini-batch 600 / 859]
loss: 0.581580 [mini-batch 700 / 859]
loss: 0.421671 [mini-batch 800 / 859]
==> Accuracy: 81.3%, Avg loss: 0.559901

Epoch 2
--------------------
loss: 0.439477 [mini-batch 0 / 859]
loss: 0.669086 [m

[I 2024-06-12 16:44:38,849] Trial 12 finished with value: 0.8402 and parameters: {'learning rate': 0.00016933381001221494}. Best is trial 6 with value: 0.8254.


==> Accuracy: 84.0%, Avg loss: 0.481168

Pre-train stats
==> Accuracy: 15.8%, Avg loss: 7.188771

Epoch 0
--------------------
loss: 7.227539 [mini-batch 0 / 859]
loss: 1.328421 [mini-batch 100 / 859]
loss: 1.005189 [mini-batch 200 / 859]
loss: 0.978880 [mini-batch 300 / 859]
loss: 0.829907 [mini-batch 400 / 859]
loss: 0.569535 [mini-batch 500 / 859]
loss: 1.066597 [mini-batch 600 / 859]
loss: 0.606941 [mini-batch 700 / 859]
loss: 0.854316 [mini-batch 800 / 859]
==> Accuracy: 77.0%, Avg loss: 0.680500

Epoch 1
--------------------
loss: 0.602432 [mini-batch 0 / 859]
loss: 0.613415 [mini-batch 100 / 859]
loss: 0.753229 [mini-batch 200 / 859]
loss: 0.775502 [mini-batch 300 / 859]
loss: 0.532699 [mini-batch 400 / 859]
loss: 0.642419 [mini-batch 500 / 859]
loss: 0.668701 [mini-batch 600 / 859]
loss: 0.561737 [mini-batch 700 / 859]
loss: 0.688981 [mini-batch 800 / 859]
==> Accuracy: 79.8%, Avg loss: 0.593077

Epoch 2
--------------------
loss: 0.624537 [mini-batch 0 / 859]
loss: 0.598567 [m

[I 2024-06-12 16:59:35,422] Trial 13 finished with value: 0.8266 and parameters: {'learning rate': 0.00010106850375012938}. Best is trial 6 with value: 0.8254.


==> Accuracy: 82.7%, Avg loss: 0.518504

Pre-train stats
==> Accuracy: 4.3%, Avg loss: 10.868140

Epoch 0
--------------------
loss: 10.638941 [mini-batch 0 / 859]
loss: 0.880592 [mini-batch 100 / 859]
loss: 1.241369 [mini-batch 200 / 859]
loss: 0.569123 [mini-batch 300 / 859]
loss: 0.571198 [mini-batch 400 / 859]
loss: 0.438380 [mini-batch 500 / 859]
loss: 0.851574 [mini-batch 600 / 859]
loss: 0.554498 [mini-batch 700 / 859]
loss: 0.517305 [mini-batch 800 / 859]
==> Accuracy: 79.6%, Avg loss: 0.600123

Epoch 1
--------------------
loss: 0.901549 [mini-batch 0 / 859]
loss: 0.557255 [mini-batch 100 / 859]
loss: 0.462633 [mini-batch 200 / 859]
loss: 0.423014 [mini-batch 300 / 859]
loss: 0.618620 [mini-batch 400 / 859]
loss: 0.745997 [mini-batch 500 / 859]
loss: 0.642329 [mini-batch 600 / 859]
loss: 0.441015 [mini-batch 700 / 859]
loss: 0.520188 [mini-batch 800 / 859]
==> Accuracy: 82.5%, Avg loss: 0.519097

Epoch 2
--------------------
loss: 0.501148 [mini-batch 0 / 859]
loss: 0.527507 [

[I 2024-06-12 17:15:15,055] Trial 14 finished with value: 0.8442 and parameters: {'learning rate': 0.00022450011623949178}. Best is trial 6 with value: 0.8254.


==> Accuracy: 84.4%, Avg loss: 0.460980

Pre-train stats
==> Accuracy: 10.3%, Avg loss: 11.050397

Epoch 0
--------------------
loss: 11.862652 [mini-batch 0 / 859]
loss: 1.337187 [mini-batch 100 / 859]
loss: 0.889774 [mini-batch 200 / 859]
loss: 0.995628 [mini-batch 300 / 859]
loss: 0.681408 [mini-batch 400 / 859]
loss: 0.806510 [mini-batch 500 / 859]
loss: 0.589728 [mini-batch 600 / 859]
loss: 0.789766 [mini-batch 700 / 859]
loss: 0.882852 [mini-batch 800 / 859]
==> Accuracy: 78.4%, Avg loss: 0.629384

Epoch 1
--------------------
loss: 0.345153 [mini-batch 0 / 859]
loss: 0.888046 [mini-batch 100 / 859]
loss: 0.543578 [mini-batch 200 / 859]
loss: 0.428291 [mini-batch 300 / 859]
loss: 0.530319 [mini-batch 400 / 859]
loss: 0.947160 [mini-batch 500 / 859]
loss: 0.381508 [mini-batch 600 / 859]
loss: 0.368938 [mini-batch 700 / 859]
loss: 0.665501 [mini-batch 800 / 859]
==> Accuracy: 81.3%, Avg loss: 0.555048

Epoch 2
--------------------
loss: 0.769186 [mini-batch 0 / 859]
loss: 0.570009 

[I 2024-06-12 17:30:50,707] Trial 15 finished with value: 0.8384 and parameters: {'learning rate': 0.00014646582977515874}. Best is trial 6 with value: 0.8254.


==> Accuracy: 83.8%, Avg loss: 0.482063

Pre-train stats
==> Accuracy: 7.0%, Avg loss: 6.870666

Epoch 0
--------------------
loss: 6.434717 [mini-batch 0 / 859]
loss: 1.001386 [mini-batch 100 / 859]
loss: 1.204050 [mini-batch 200 / 859]
loss: 0.845254 [mini-batch 300 / 859]
loss: 1.074202 [mini-batch 400 / 859]
loss: 0.749385 [mini-batch 500 / 859]
loss: 0.798826 [mini-batch 600 / 859]
loss: 0.723101 [mini-batch 700 / 859]
loss: 0.810905 [mini-batch 800 / 859]
==> Accuracy: 78.0%, Avg loss: 0.649214

Epoch 1
--------------------
loss: 0.689021 [mini-batch 0 / 859]
loss: 0.613855 [mini-batch 100 / 859]
loss: 0.592896 [mini-batch 200 / 859]
loss: 0.639997 [mini-batch 300 / 859]
loss: 0.664862 [mini-batch 400 / 859]
loss: 0.715876 [mini-batch 500 / 859]
loss: 0.544935 [mini-batch 600 / 859]
loss: 0.442775 [mini-batch 700 / 859]
loss: 0.818790 [mini-batch 800 / 859]
==> Accuracy: 80.5%, Avg loss: 0.588147

Epoch 2
--------------------
loss: 0.515232 [mini-batch 0 / 859]
loss: 0.591208 [mi

[I 2024-06-12 17:45:27,947] Trial 16 finished with value: 0.834 and parameters: {'learning rate': 0.00013049463092922162}. Best is trial 6 with value: 0.8254.


==> Accuracy: 83.4%, Avg loss: 0.504878

Pre-train stats
==> Accuracy: 13.8%, Avg loss: 5.753775

Epoch 0
--------------------
loss: 5.749799 [mini-batch 0 / 859]
loss: 0.772579 [mini-batch 100 / 859]
loss: 1.014158 [mini-batch 200 / 859]
loss: 0.747893 [mini-batch 300 / 859]
loss: 0.720569 [mini-batch 400 / 859]
