Marcin Wardyński  
wtorek, 9:45

## Laboratorium 7
### 7.2 DBN

Oto kolejne funkcje pomocnicze:

Wczytywanie zbiorów danych:

In [26]:
import warnings
warnings.filterwarnings("ignore")

import importlib
import lab7_utils as utils
importlib.reload(utils)

seed = 42

Wrapper wokół RBM, który wyłącza możliwość uczenia się. Przydatny, gdy chcemy wytrenować jedynie dołożone warstwy RBM, a zachować wagi dla tych już wytrenowanych.

In [6]:
from sklearn.base import TransformerMixin, BaseEstimator

class FrozenRBM(TransformerMixin, BaseEstimator):
    def __init__(self, rbm):
        self.rbm = rbm

    def fit(self, X, y=None):
        return self

    def transform(self, X):
        return self.rbm.transform(X)

Poniższe funkcje odpowiadają za szkolenie poszczególnych warst klasyfikatora DBN, każdej z osobna. Na wyjściu otrzymujemy trzy warstwy RBM, oraz cztery regresje liniowe, do użycia odpowiednio z 0, 1, 2 i 3 warstwami RBM.

In [7]:
from sklearn.base import clone
from sklearn.pipeline import Pipeline

def wrap_rbm_snapshots(rbms_snapshots):
    rbm_tuples = []
    for i in range(len(rbms_snapshots)):
        rbm_tuples.append((f"rbm_L{i+1}", rbms_snapshots[i]))

    return rbm_tuples


def create_dbn(hidden_dims, rbm_base, log_reg_base, X_train, y_train):
    rbm_snapshots = []
    log_regs = []

    log_reg = clone(log_reg_base)
    pipeline = Pipeline([("log_reg", log_reg)])
    pipeline.fit(X_train, y_train)
    log_regs.append(log_reg)


    for hidden_dim in hidden_dims:
        log_reg = clone(log_reg_base)
        rbm = clone(rbm_base)
        rbm.n_components = hidden_dim


        pipeline_def = []
        pipeline_def.extend(wrap_rbm_snapshots(rbm_snapshots))
        pipeline_def.append((f"rbm_L{len(rbm_snapshots)+1}", rbm))
        pipeline_def.append(("log_reg", log_reg))
       
        pipeline = Pipeline(pipeline_def)
        pipeline.fit(X_train, y_train)

        rbm_snapshots.append(FrozenRBM(rbm))
        log_regs.append(log_reg)
    
    return rbm_snapshots, log_regs


Poniższy kod testuje jakość klasyfikacji już wytrenowanych DBN-ów o różnej wysokości, dobierając do każdego z nich odpowiednią głowę regresji liniowej.

In [8]:
from sklearn.metrics import accuracy_score, classification_report

def test_dbn(rbms, log_regs, X_test, y_test, print_report):
    accuracies = []
    for i in range(len(rbms)+1):
        pipeline_def = []
        pipeline_def.extend(wrap_rbm_snapshots(rbms[:i]))
        pipeline_def.append(("log_reg", log_regs[i]))

        pipeline = Pipeline(pipeline_def)
        y_pred = pipeline.predict(X_test)

        accuracy = accuracy_score(y_test, y_pred)
        accuracies.append(accuracy)

        if print_report:
            print(f"Warstwy modelu: {pipeline.named_steps.keys()}")
            print(classification_report(y_test, y_pred))
            
    return accuracies

Następne dwa bloki kodu przedstawiają funkcje, które spinają cały proces ewaluacji DBN-ów w jedną całóść, oraz wykorzystują optymalizator `Optuna`, do sprawdzenia kilku relewantnych konfiguracji hiperparametrów.

Konfiguracji przez Optunę podlegają jedynie wielkości poszczególnych warstw DBN, a pozostałe hiperparametry zostały przejęte z najlepszych modeli dla poszczególnych zbiorów danych z poprzedniego zadania.

In [9]:
from sklearn.linear_model import LogisticRegression
from sklearn.neural_network import BernoulliRBM

def check_dbn(dataset_name, model_params, l1_h_dim, l2_h_dim, l3_h_dim, print_report):
    X_train, X_test, y_train, y_test = utils.get_dataset(dataset_name)

    rbm_base = BernoulliRBM(learning_rate=model_params['learning_rate'], batch_size=model_params['batch_size'], random_state=seed)
    log_reg_base = LogisticRegression(max_iter=model_params['max_iter'], solver=model_params['solver'], C=model_params['C'])

    hidden_dims = [l1_h_dim, l2_h_dim, l3_h_dim]

    rbms, log_regs = create_dbn(hidden_dims, rbm_base, log_reg_base, X_train, y_train)

    accuracies = test_dbn(rbms, log_regs, X_test, y_test, print_report)

    return max(accuracies)

In [27]:
import optuna

def objective(trial, dataset_name, model_params, l1_h_dims=[128, 256, 512]):

    l1_h_dim = trial.suggest_categorical('l1_h_dim', l1_h_dims)
    l2_h_dim = trial.suggest_categorical('l2_h_dim', [64, 128, 192])
    l3_h_dim = trial.suggest_categorical('l3_h_dim', [32, 64, 96])
    
    return check_dbn(dataset_name, model_params, l1_h_dim, l2_h_dim, l3_h_dim, print_report=False)

def optimize_dbn(dataset_name, model_params, n_trials, l1_h_dims):

    study = optuna.create_study(direction='maximize')
    study.optimize(lambda trial: objective(trial, dataset_name, model_params, l1_h_dims), n_trials=n_trials, show_progress_bar=True)

    print(f"Best parameters: {study.best_params}")
    print_best_dbn_summary(dataset_name, model_params, study.best_params)

def print_best_dbn_summary(dataset_name, model_params, l_h_dims):
    check_dbn(dataset_name, model_params, l_h_dims['l1_h_dim'], l_h_dims['l2_h_dim'], l_h_dims['l3_h_dim'], print_report=True)

Każdy zbiór danych zostanie przebadany przy użyciu 15 konfiguracji hiperparametrów.

Dodatkowo zmieniłem algorytm solvera w regresji logistycznej w porównaniu do najlepszego (`sag`), który został wyłoniony w poprzednim zadaniu, gdyż działał on bardzo wolno, a czas jego pracy jedynie w małym stopniu przekładał się na wzrost jakości klasyfikacji.

*Dodatkowa uwaga:* Rozmiar największej z warstw RBM został ograniczony do 512 neuronów, stąd jeśli tylko DBN osiąga najlepszy wynik dla pojedyńczej warstwy RBM, to ten wynik może być odrobinę gorszy, niż ten z poprzedniego zadania.

**Kolejna uwaga:** Przeprowadziłem eksperyment dla każdego zbioru danych i za każdym razem uzyskiwałem wynik z którego wynikało, że najlepiej sprawdza się pojedyńcza warstwa DBN. Faktycznie, przy dużym rozmiarze 512 neuronów pierwszej warstwy, pozostałe dwie warstwy o znacznie mniejszej wielkości tylko obniżają jakość klasyfikacji. Ponieważ jestem zainteresowany wynikami z bardziej wyważonej konfiguracji, przeprowadziłem dla każdego zbioru danych eksperyment jeszcze raz, tym razem ograniczając rozmiar warstwy pierwszej do 256 neuronów. Wyniki przedstawione w każdej sekcji zawierają rezultaty obydwu eksperymentów.

Pomimo powtórzenia ekspyrymentu dla mniejszej warstwy pierwszej, z pierwszej części eksperymentu wyciągam wniosek, że odpowiednio duża pojedyńcza warstwa modelu DBN (a więc zwykły RBM) potrafi dać lepsze wyniki, niż model bardziej złożony o wielu warstwach.

In [11]:
n_trials = 15
solver = 'lbfgs'

### MNIST


#### Eksperyment 1

In [29]:
model_params = {
    'C': 0.5,
    'solver': solver,
    'max_iter': 1000,
    'batch_size': 20,
    'learning_rate': 0.05,
}

optimize_dbn(utils.Dataset_Select.MNIST.value, model_params, n_trials, [128, 256, 512])

[I 2025-01-06 18:39:02,129] A new study created in memory with name: no-name-95489487-ff54-4269-a254-b5c8defe86f6
Best trial: 0. Best value: 0.9684:   7%|▋         | 1/15 [05:55<1:23:02, 355.91s/it]

[I 2025-01-06 18:44:58,034] Trial 0 finished with value: 0.9684 and parameters: {'l1_h_dim': 512, 'l2_h_dim': 128, 'l3_h_dim': 32}. Best is trial 0 with value: 0.9684.


Best trial: 0. Best value: 0.9684:  13%|█▎        | 2/15 [08:09<48:46, 225.08s/it]  

[I 2025-01-06 18:47:11,526] Trial 1 finished with value: 0.9538 and parameters: {'l1_h_dim': 128, 'l2_h_dim': 128, 'l3_h_dim': 96}. Best is trial 0 with value: 0.9684.


Best trial: 0. Best value: 0.9684:  20%|██        | 3/15 [15:27<1:04:26, 322.24s/it]

[I 2025-01-06 18:54:29,401] Trial 2 finished with value: 0.9684 and parameters: {'l1_h_dim': 512, 'l2_h_dim': 128, 'l3_h_dim': 64}. Best is trial 0 with value: 0.9684.


Best trial: 0. Best value: 0.9684:  27%|██▋       | 4/15 [19:35<53:41, 292.83s/it]  

[I 2025-01-06 18:58:37,133] Trial 3 finished with value: 0.9658 and parameters: {'l1_h_dim': 256, 'l2_h_dim': 192, 'l3_h_dim': 32}. Best is trial 0 with value: 0.9684.


Best trial: 0. Best value: 0.9684:  33%|███▎      | 5/15 [23:44<46:12, 277.29s/it]

[I 2025-01-06 19:02:46,878] Trial 4 finished with value: 0.9658 and parameters: {'l1_h_dim': 256, 'l2_h_dim': 192, 'l3_h_dim': 64}. Best is trial 0 with value: 0.9684.


Best trial: 0. Best value: 0.9684:  40%|████      | 6/15 [26:02<34:29, 229.99s/it]

[I 2025-01-06 19:05:05,052] Trial 5 finished with value: 0.9427 and parameters: {'l1_h_dim': 128, 'l2_h_dim': 64, 'l3_h_dim': 32}. Best is trial 0 with value: 0.9684.


Best trial: 0. Best value: 0.9684:  47%|████▋     | 7/15 [29:52<30:37, 229.74s/it]

[I 2025-01-06 19:08:54,194] Trial 6 finished with value: 0.96 and parameters: {'l1_h_dim': 256, 'l2_h_dim': 64, 'l3_h_dim': 32}. Best is trial 0 with value: 0.9684.


Best trial: 0. Best value: 0.9684:  53%|█████▎    | 8/15 [37:19<34:52, 298.86s/it]

[I 2025-01-06 19:16:21,138] Trial 7 finished with value: 0.9684 and parameters: {'l1_h_dim': 512, 'l2_h_dim': 64, 'l3_h_dim': 96}. Best is trial 0 with value: 0.9684.


Best trial: 0. Best value: 0.9684:  60%|██████    | 9/15 [39:49<25:15, 252.63s/it]

[I 2025-01-06 19:18:52,101] Trial 8 finished with value: 0.96 and parameters: {'l1_h_dim': 256, 'l2_h_dim': 64, 'l3_h_dim': 96}. Best is trial 0 with value: 0.9684.


Best trial: 0. Best value: 0.9684:  67%|██████▋   | 10/15 [42:45<19:04, 228.92s/it]

[I 2025-01-06 19:21:47,943] Trial 9 finished with value: 0.9658 and parameters: {'l1_h_dim': 256, 'l2_h_dim': 192, 'l3_h_dim': 64}. Best is trial 0 with value: 0.9684.


Best trial: 0. Best value: 0.9684:  73%|███████▎  | 11/15 [51:13<20:56, 314.15s/it]

[I 2025-01-06 19:30:15,355] Trial 10 finished with value: 0.9684 and parameters: {'l1_h_dim': 512, 'l2_h_dim': 128, 'l3_h_dim': 32}. Best is trial 0 with value: 0.9684.


Best trial: 0. Best value: 0.9684:  80%|████████  | 12/15 [58:50<17:53, 357.76s/it]

[I 2025-01-06 19:37:52,853] Trial 11 finished with value: 0.9684 and parameters: {'l1_h_dim': 512, 'l2_h_dim': 128, 'l3_h_dim': 64}. Best is trial 0 with value: 0.9684.


Best trial: 0. Best value: 0.9684:  87%|████████▋ | 13/15 [1:05:42<12:28, 374.19s/it]

[I 2025-01-06 19:44:44,858] Trial 12 finished with value: 0.9684 and parameters: {'l1_h_dim': 512, 'l2_h_dim': 128, 'l3_h_dim': 64}. Best is trial 0 with value: 0.9684.


Best trial: 0. Best value: 0.9684:  93%|█████████▎| 14/15 [1:13:43<06:46, 406.27s/it]

[I 2025-01-06 19:52:45,241] Trial 13 finished with value: 0.9684 and parameters: {'l1_h_dim': 512, 'l2_h_dim': 128, 'l3_h_dim': 64}. Best is trial 0 with value: 0.9684.


Best trial: 0. Best value: 0.9684: 100%|██████████| 15/15 [1:19:26<00:00, 317.78s/it]


[I 2025-01-06 19:58:28,752] Trial 14 finished with value: 0.9684 and parameters: {'l1_h_dim': 512, 'l2_h_dim': 128, 'l3_h_dim': 32}. Best is trial 0 with value: 0.9684.
Best parameters: {'l1_h_dim': 512, 'l2_h_dim': 128, 'l3_h_dim': 32}
Warstwy modelu: dict_keys(['log_reg'])
              precision    recall  f1-score   support

           0       0.94      0.97      0.95       980
           1       0.96      0.98      0.97      1135
           2       0.92      0.89      0.90      1032
           3       0.90      0.91      0.90      1010
           4       0.92      0.92      0.92       982
           5       0.88      0.85      0.87       892
           6       0.94      0.94      0.94       958
           7       0.92      0.92      0.92      1028
           8       0.87      0.87      0.87       974
           9       0.90      0.90      0.90      1009

    accuracy                           0.92     10000
   macro avg       0.92      0.92      0.92     10000
weighted avg       0

Najlepiej wypada DBN o tylko jednej, acz bardzo dużej, warstwie RBM i osiąga on `accuracy=0.97`, dodawanie kolejnych warstw obniża `accuracy` do odpowiednio 0.96 i 0.95 przy dwóch i trzech warstwach. Zdecydowanie zbiór MNIST jest dość prostym zbiorem, w którym dodatkowa pojemność modelu z użyciem dodatkowych warstw niekoniecznie przynosi porządany skutek.

#### Eksperyment 2

In [17]:
model_params = {
    'C': 0.5,
    'solver': solver,
    'max_iter': 1000,
    'batch_size': 20,
    'learning_rate': 0.05,
}

optimize_dbn(utils.Dataset_Select.MNIST.value, model_params, 10, [128, 256])

[I 2025-01-06 10:35:34,944] A new study created in memory with name: no-name-7f6b0300-cbfb-42e5-8d08-90ea63679719
Best trial: 0. Best value: 0.9538:  10%|█         | 1/10 [01:00<09:07, 60.86s/it]

[I 2025-01-06 10:36:35,802] Trial 0 finished with value: 0.9538 and parameters: {'l1_h_dim': 128, 'l2_h_dim': 128, 'l3_h_dim': 96}. Best is trial 0 with value: 0.9538.


Best trial: 1. Best value: 0.96:  20%|██        | 2/10 [02:39<11:04, 83.12s/it]  

[I 2025-01-06 10:38:14,512] Trial 1 finished with value: 0.96 and parameters: {'l1_h_dim': 256, 'l2_h_dim': 128, 'l3_h_dim': 64}. Best is trial 1 with value: 0.96.


Best trial: 2. Best value: 0.9613:  30%|███       | 3/10 [03:42<08:38, 74.00s/it]

[I 2025-01-06 10:39:17,665] Trial 2 finished with value: 0.9613 and parameters: {'l1_h_dim': 128, 'l2_h_dim': 192, 'l3_h_dim': 32}. Best is trial 2 with value: 0.9613.


Best trial: 2. Best value: 0.9613:  40%|████      | 4/10 [04:47<07:01, 70.31s/it]

[I 2025-01-06 10:40:22,318] Trial 3 finished with value: 0.9538 and parameters: {'l1_h_dim': 128, 'l2_h_dim': 128, 'l3_h_dim': 32}. Best is trial 2 with value: 0.9613.


Best trial: 2. Best value: 0.9613:  50%|█████     | 5/10 [07:00<07:45, 93.12s/it]

[I 2025-01-06 10:42:35,862] Trial 4 finished with value: 0.96 and parameters: {'l1_h_dim': 256, 'l2_h_dim': 64, 'l3_h_dim': 96}. Best is trial 2 with value: 0.9613.


Best trial: 2. Best value: 0.9613:  60%|██████    | 6/10 [08:46<06:29, 97.28s/it]

[I 2025-01-06 10:44:21,239] Trial 5 finished with value: 0.9613 and parameters: {'l1_h_dim': 128, 'l2_h_dim': 192, 'l3_h_dim': 96}. Best is trial 2 with value: 0.9613.


Best trial: 2. Best value: 0.9613:  70%|███████   | 7/10 [10:25<04:53, 97.75s/it]

[I 2025-01-06 10:45:59,945] Trial 6 finished with value: 0.9613 and parameters: {'l1_h_dim': 128, 'l2_h_dim': 192, 'l3_h_dim': 96}. Best is trial 2 with value: 0.9613.


Best trial: 2. Best value: 0.9613:  80%|████████  | 8/10 [12:49<03:45, 112.73s/it]

[I 2025-01-06 10:48:24,744] Trial 7 finished with value: 0.96 and parameters: {'l1_h_dim': 256, 'l2_h_dim': 128, 'l3_h_dim': 64}. Best is trial 2 with value: 0.9613.


Best trial: 8. Best value: 0.9658:  90%|█████████ | 9/10 [15:12<02:02, 122.03s/it]

[I 2025-01-06 10:50:47,221] Trial 8 finished with value: 0.9658 and parameters: {'l1_h_dim': 256, 'l2_h_dim': 192, 'l3_h_dim': 96}. Best is trial 8 with value: 0.9658.


Best trial: 8. Best value: 0.9658: 100%|██████████| 10/10 [18:06<00:00, 108.67s/it]


[I 2025-01-06 10:53:41,621] Trial 9 finished with value: 0.9658 and parameters: {'l1_h_dim': 256, 'l2_h_dim': 192, 'l3_h_dim': 96}. Best is trial 8 with value: 0.9658.
Best parameters: {'l1_h_dim': 256, 'l2_h_dim': 192, 'l3_h_dim': 96}
Warstwy modelu: dict_keys(['log_reg'])
              precision    recall  f1-score   support

           0       0.94      0.97      0.95       980
           1       0.96      0.98      0.97      1135
           2       0.92      0.89      0.90      1032
           3       0.90      0.91      0.90      1010
           4       0.92      0.92      0.92       982
           5       0.88      0.85      0.87       892
           6       0.94      0.94      0.94       958
           7       0.92      0.92      0.92      1028
           8       0.87      0.87      0.87       974
           9       0.90      0.90      0.90      1009

    accuracy                           0.92     10000
   macro avg       0.92      0.92      0.92     10000
weighted avg       0.

Zredukowanie maksymalnego rozmiaru pierwszej warstwy pozwoliło osiągnąć ciekawy wynik, gdzie to najlepsze `accuracy=0.97` osiągamy dla dwuwarstwowej DBN, natomiast jednowarstwowa i trzywarstwowa DBN dają `accuracy` o 1 punkt procentowy mniejsze.

### Fashion_MNIST
#### Eksperyment 1

In [13]:
model_params = {
    'C': 0.5,
    'solver': solver,
    'max_iter': 5000,
    'batch_size': 10,
    'learning_rate': 0.01,
}

optimize_dbn(utils.Dataset_Select.F_MNIST.value, model_params, n_trials, [128, 256, 512])

[I 2025-01-05 14:22:49,458] A new study created in memory with name: no-name-20a7044c-707a-4b3f-b5da-dab23c64e9ad
Best trial: 0. Best value: 0.7932:   7%|▋         | 1/15 [01:48<25:21, 108.66s/it]

[I 2025-01-05 14:24:38,111] Trial 0 finished with value: 0.7932 and parameters: {'l1_h_dim': 128, 'l2_h_dim': 192, 'l3_h_dim': 32}. Best is trial 0 with value: 0.7932.


Best trial: 1. Best value: 0.8135:  13%|█▎        | 2/15 [05:28<37:40, 173.87s/it]

[I 2025-01-05 14:28:17,639] Trial 1 finished with value: 0.8135 and parameters: {'l1_h_dim': 256, 'l2_h_dim': 128, 'l3_h_dim': 64}. Best is trial 1 with value: 0.8135.


Best trial: 1. Best value: 0.8135:  20%|██        | 3/15 [08:54<37:42, 188.57s/it]

[I 2025-01-05 14:31:43,691] Trial 2 finished with value: 0.8135 and parameters: {'l1_h_dim': 256, 'l2_h_dim': 64, 'l3_h_dim': 64}. Best is trial 1 with value: 0.8135.


Best trial: 1. Best value: 0.8135:  27%|██▋       | 4/15 [10:57<29:51, 162.88s/it]

[I 2025-01-05 14:33:47,192] Trial 3 finished with value: 0.7932 and parameters: {'l1_h_dim': 128, 'l2_h_dim': 192, 'l3_h_dim': 32}. Best is trial 1 with value: 0.8135.


Best trial: 4. Best value: 0.8299:  33%|███▎      | 5/15 [18:03<42:55, 257.53s/it]

[I 2025-01-05 14:40:52,547] Trial 4 finished with value: 0.8299 and parameters: {'l1_h_dim': 512, 'l2_h_dim': 128, 'l3_h_dim': 64}. Best is trial 4 with value: 0.8299.


Best trial: 4. Best value: 0.8299:  40%|████      | 6/15 [20:48<33:56, 226.23s/it]

[I 2025-01-05 14:43:38,009] Trial 5 finished with value: 0.8135 and parameters: {'l1_h_dim': 256, 'l2_h_dim': 64, 'l3_h_dim': 96}. Best is trial 4 with value: 0.8299.


Best trial: 4. Best value: 0.8299:  47%|████▋     | 7/15 [27:49<38:40, 290.01s/it]

[I 2025-01-05 14:50:39,316] Trial 6 finished with value: 0.8299 and parameters: {'l1_h_dim': 512, 'l2_h_dim': 192, 'l3_h_dim': 96}. Best is trial 4 with value: 0.8299.


Best trial: 4. Best value: 0.8299:  53%|█████▎    | 8/15 [35:10<39:26, 338.05s/it]

[I 2025-01-05 14:58:00,236] Trial 7 finished with value: 0.8299 and parameters: {'l1_h_dim': 512, 'l2_h_dim': 64, 'l3_h_dim': 96}. Best is trial 4 with value: 0.8299.


Best trial: 4. Best value: 0.8299:  60%|██████    | 9/15 [41:59<36:00, 360.09s/it]

[I 2025-01-05 15:04:48,810] Trial 8 finished with value: 0.8299 and parameters: {'l1_h_dim': 512, 'l2_h_dim': 64, 'l3_h_dim': 64}. Best is trial 4 with value: 0.8299.


Best trial: 4. Best value: 0.8299:  67%|██████▋   | 10/15 [44:38<24:50, 298.03s/it]

[I 2025-01-05 15:07:27,874] Trial 9 finished with value: 0.8135 and parameters: {'l1_h_dim': 256, 'l2_h_dim': 64, 'l3_h_dim': 32}. Best is trial 4 with value: 0.8299.


Best trial: 4. Best value: 0.8299:  73%|███████▎  | 11/15 [49:34<19:49, 297.29s/it]

[I 2025-01-05 15:12:23,477] Trial 10 finished with value: 0.8299 and parameters: {'l1_h_dim': 512, 'l2_h_dim': 128, 'l3_h_dim': 64}. Best is trial 4 with value: 0.8299.


Best trial: 4. Best value: 0.8299:  80%|████████  | 12/15 [54:26<14:47, 295.73s/it]

[I 2025-01-05 15:17:15,626] Trial 11 finished with value: 0.8299 and parameters: {'l1_h_dim': 512, 'l2_h_dim': 128, 'l3_h_dim': 96}. Best is trial 4 with value: 0.8299.


Best trial: 4. Best value: 0.8299:  87%|████████▋ | 13/15 [59:31<09:57, 298.62s/it]

[I 2025-01-05 15:22:20,912] Trial 12 finished with value: 0.8299 and parameters: {'l1_h_dim': 512, 'l2_h_dim': 192, 'l3_h_dim': 96}. Best is trial 4 with value: 0.8299.


Best trial: 4. Best value: 0.8299:  93%|█████████▎| 14/15 [1:04:47<05:03, 303.91s/it]

[I 2025-01-05 15:27:37,041] Trial 13 finished with value: 0.8299 and parameters: {'l1_h_dim': 512, 'l2_h_dim': 128, 'l3_h_dim': 96}. Best is trial 4 with value: 0.8299.


Best trial: 4. Best value: 0.8299: 100%|██████████| 15/15 [1:10:01<00:00, 280.10s/it]


[I 2025-01-05 15:32:50,987] Trial 14 finished with value: 0.8299 and parameters: {'l1_h_dim': 512, 'l2_h_dim': 192, 'l3_h_dim': 64}. Best is trial 4 with value: 0.8299.
Best parameters: {'l1_h_dim': 512, 'l2_h_dim': 128, 'l3_h_dim': 64}
Warstwy modelu: dict_keys(['log_reg'])
              precision    recall  f1-score   support

           0       0.74      0.76      0.75      1000
           1       0.93      0.95      0.94      1000
           2       0.69      0.67      0.68      1000
           3       0.79      0.78      0.78      1000
           4       0.68      0.70      0.69      1000
           5       0.86      0.88      0.87      1000
           6       0.54      0.51      0.52      1000
           7       0.86      0.88      0.87      1000
           8       0.91      0.90      0.90      1000
           9       0.91      0.92      0.91      1000

    accuracy                           0.79     10000
   macro avg       0.79      0.79      0.79     10000
weighted avg       0

W tym wypadku również, bardzo duża warstwa pierwsza RBM dostarcza najlepszy wynik `accuracy=0.83`, natomiast kolejne warstwy obniżają go do 0.78 i 0.75 przy dwóch i trzech warstwach.

#### Eksperyment 2

In [23]:
model_params = {
    'C': 0.5,
    'solver': solver,
    'max_iter': 5000,
    'batch_size': 10,
    'learning_rate': 0.01,
}

optimize_dbn(utils.Dataset_Select.F_MNIST.value, model_params, 10, [128, 256])

[I 2025-01-06 14:17:59,619] A new study created in memory with name: no-name-004d7256-c8b1-4aed-87c6-f5c488074b95
Best trial: 0. Best value: 0.7932:  10%|█         | 1/10 [01:39<14:57, 99.74s/it]

[I 2025-01-06 14:19:39,355] Trial 0 finished with value: 0.7932 and parameters: {'l1_h_dim': 128, 'l2_h_dim': 192, 'l3_h_dim': 32}. Best is trial 0 with value: 0.7932.


Best trial: 0. Best value: 0.7932:  20%|██        | 2/10 [03:21<13:29, 101.14s/it]

[I 2025-01-06 14:21:21,475] Trial 1 finished with value: 0.7932 and parameters: {'l1_h_dim': 128, 'l2_h_dim': 64, 'l3_h_dim': 64}. Best is trial 0 with value: 0.7932.


Best trial: 2. Best value: 0.8135:  30%|███       | 3/10 [06:14<15:36, 133.84s/it]

[I 2025-01-06 14:24:14,235] Trial 2 finished with value: 0.8135 and parameters: {'l1_h_dim': 256, 'l2_h_dim': 128, 'l3_h_dim': 96}. Best is trial 2 with value: 0.8135.


Best trial: 2. Best value: 0.8135:  40%|████      | 4/10 [09:02<14:42, 147.14s/it]

[I 2025-01-06 14:27:01,769] Trial 3 finished with value: 0.8135 and parameters: {'l1_h_dim': 256, 'l2_h_dim': 128, 'l3_h_dim': 32}. Best is trial 2 with value: 0.8135.


Best trial: 2. Best value: 0.8135:  50%|█████     | 5/10 [12:39<14:21, 172.32s/it]

[I 2025-01-06 14:30:38,716] Trial 4 finished with value: 0.8135 and parameters: {'l1_h_dim': 256, 'l2_h_dim': 128, 'l3_h_dim': 96}. Best is trial 2 with value: 0.8135.


Best trial: 2. Best value: 0.8135:  60%|██████    | 6/10 [16:52<13:19, 199.81s/it]

[I 2025-01-06 14:34:51,884] Trial 5 finished with value: 0.8135 and parameters: {'l1_h_dim': 256, 'l2_h_dim': 192, 'l3_h_dim': 96}. Best is trial 2 with value: 0.8135.


Best trial: 2. Best value: 0.8135:  70%|███████   | 7/10 [19:14<09:02, 180.98s/it]

[I 2025-01-06 14:37:14,107] Trial 6 finished with value: 0.7932 and parameters: {'l1_h_dim': 128, 'l2_h_dim': 128, 'l3_h_dim': 96}. Best is trial 2 with value: 0.8135.


Best trial: 2. Best value: 0.8135:  80%|████████  | 8/10 [21:33<05:34, 167.49s/it]

[I 2025-01-06 14:39:32,719] Trial 7 finished with value: 0.7932 and parameters: {'l1_h_dim': 128, 'l2_h_dim': 64, 'l3_h_dim': 64}. Best is trial 2 with value: 0.8135.


Best trial: 2. Best value: 0.8135:  90%|█████████ | 9/10 [25:18<03:05, 185.51s/it]

[I 2025-01-06 14:43:17,833] Trial 8 finished with value: 0.8135 and parameters: {'l1_h_dim': 256, 'l2_h_dim': 64, 'l3_h_dim': 96}. Best is trial 2 with value: 0.8135.


Best trial: 2. Best value: 0.8135: 100%|██████████| 10/10 [29:16<00:00, 175.61s/it]


[I 2025-01-06 14:47:15,701] Trial 9 finished with value: 0.8135 and parameters: {'l1_h_dim': 256, 'l2_h_dim': 192, 'l3_h_dim': 96}. Best is trial 2 with value: 0.8135.
Best parameters: {'l1_h_dim': 256, 'l2_h_dim': 128, 'l3_h_dim': 96}
Warstwy modelu: dict_keys(['log_reg'])
              precision    recall  f1-score   support

           0       0.74      0.76      0.75      1000
           1       0.93      0.95      0.94      1000
           2       0.69      0.67      0.68      1000
           3       0.79      0.78      0.78      1000
           4       0.68      0.70      0.69      1000
           5       0.86      0.88      0.87      1000
           6       0.54      0.51      0.52      1000
           7       0.86      0.88      0.87      1000
           8       0.91      0.90      0.90      1000
           9       0.91      0.92      0.91      1000

    accuracy                           0.79     10000
   macro avg       0.79      0.79      0.79     10000
weighted avg       0.

W tym wypadku, zmniejszenie rozmiaru warstwy pierwszej pogorszyło tylko `accuracy` na 0.81 w przypadku DBN jednowarstwowego, natomiast modele dwu i trój-warstwowe osiągnęły odpowiednio 0.78 oraz 0.76 `accuracy`.

### Kuzushiji-MNIST
#### Eksperyment 1

In [21]:
model_params = {
    'C': 1.0,
    'solver': solver,
    'max_iter': 1000,
    'batch_size': 10,
    'learning_rate': 0.1,
}

optimize_dbn(utils.Dataset_Select.K_MNIST.value, model_params, n_trials, [128, 256, 512])

[I 2025-01-06 12:27:09,047] A new study created in memory with name: no-name-e78ea10e-ce16-41b3-ad7a-c50758f47cc7
Best trial: 0. Best value: 0.7994:   7%|▋         | 1/15 [01:23<19:25, 83.27s/it]

[I 2025-01-06 12:28:32,319] Trial 0 finished with value: 0.7994 and parameters: {'l1_h_dim': 128, 'l2_h_dim': 192, 'l3_h_dim': 64}. Best is trial 0 with value: 0.7994.


Best trial: 1. Best value: 0.8191:  13%|█▎        | 2/15 [04:04<27:55, 128.86s/it]

[I 2025-01-06 12:31:13,095] Trial 1 finished with value: 0.8191 and parameters: {'l1_h_dim': 256, 'l2_h_dim': 64, 'l3_h_dim': 32}. Best is trial 1 with value: 0.8191.


2025-01-06 12:31:20.689561: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
Best trial: 1. Best value: 0.8191:  20%|██        | 3/15 [06:16<26:06, 130.55s/it]

[I 2025-01-06 12:33:25,648] Trial 2 finished with value: 0.777 and parameters: {'l1_h_dim': 128, 'l2_h_dim': 128, 'l3_h_dim': 96}. Best is trial 1 with value: 0.8191.


Best trial: 3. Best value: 0.8584:  27%|██▋       | 4/15 [14:40<50:58, 278.01s/it]

[I 2025-01-06 12:41:49,720] Trial 3 finished with value: 0.8584 and parameters: {'l1_h_dim': 512, 'l2_h_dim': 192, 'l3_h_dim': 32}. Best is trial 3 with value: 0.8584.


Best trial: 3. Best value: 0.8584:  33%|███▎      | 5/15 [22:41<58:30, 351.04s/it]

[I 2025-01-06 12:49:50,231] Trial 4 finished with value: 0.8584 and parameters: {'l1_h_dim': 512, 'l2_h_dim': 192, 'l3_h_dim': 96}. Best is trial 3 with value: 0.8584.


Best trial: 3. Best value: 0.8584:  40%|████      | 6/15 [25:08<42:14, 281.65s/it]

[I 2025-01-06 12:52:17,189] Trial 5 finished with value: 0.777 and parameters: {'l1_h_dim': 128, 'l2_h_dim': 128, 'l3_h_dim': 96}. Best is trial 3 with value: 0.8584.


Best trial: 3. Best value: 0.8584:  47%|████▋     | 7/15 [29:01<35:27, 265.95s/it]

[I 2025-01-06 12:56:10,820] Trial 6 finished with value: 0.8191 and parameters: {'l1_h_dim': 256, 'l2_h_dim': 128, 'l3_h_dim': 96}. Best is trial 3 with value: 0.8584.


Best trial: 3. Best value: 0.8584:  53%|█████▎    | 8/15 [31:01<25:35, 219.31s/it]

[I 2025-01-06 12:58:10,272] Trial 7 finished with value: 0.777 and parameters: {'l1_h_dim': 128, 'l2_h_dim': 128, 'l3_h_dim': 64}. Best is trial 3 with value: 0.8584.


Best trial: 3. Best value: 0.8584:  60%|██████    | 9/15 [38:19<28:47, 287.87s/it]

[I 2025-01-06 13:05:28,899] Trial 8 finished with value: 0.8584 and parameters: {'l1_h_dim': 512, 'l2_h_dim': 64, 'l3_h_dim': 64}. Best is trial 3 with value: 0.8584.


Best trial: 3. Best value: 0.8584:  67%|██████▋   | 10/15 [40:32<19:59, 239.98s/it]

[I 2025-01-06 13:07:41,625] Trial 9 finished with value: 0.7994 and parameters: {'l1_h_dim': 128, 'l2_h_dim': 192, 'l3_h_dim': 96}. Best is trial 3 with value: 0.8584.


Best trial: 3. Best value: 0.8584:  73%|███████▎  | 11/15 [49:27<22:01, 330.33s/it]

[I 2025-01-06 13:16:36,831] Trial 10 finished with value: 0.8584 and parameters: {'l1_h_dim': 512, 'l2_h_dim': 192, 'l3_h_dim': 32}. Best is trial 3 with value: 0.8584.


Best trial: 3. Best value: 0.8584:  80%|████████  | 12/15 [56:48<18:11, 363.81s/it]

[I 2025-01-06 13:23:57,218] Trial 11 finished with value: 0.8584 and parameters: {'l1_h_dim': 512, 'l2_h_dim': 192, 'l3_h_dim': 32}. Best is trial 3 with value: 0.8584.


Best trial: 3. Best value: 0.8584:  87%|████████▋ | 13/15 [1:03:49<12:42, 381.09s/it]

[I 2025-01-06 13:30:58,054] Trial 12 finished with value: 0.8584 and parameters: {'l1_h_dim': 512, 'l2_h_dim': 192, 'l3_h_dim': 32}. Best is trial 3 with value: 0.8584.


Best trial: 3. Best value: 0.8584:  93%|█████████▎| 14/15 [1:12:30<07:03, 423.34s/it]

[I 2025-01-06 13:39:39,049] Trial 13 finished with value: 0.8584 and parameters: {'l1_h_dim': 512, 'l2_h_dim': 192, 'l3_h_dim': 96}. Best is trial 3 with value: 0.8584.


Best trial: 3. Best value: 0.8584: 100%|██████████| 15/15 [1:18:39<00:00, 314.62s/it]


[I 2025-01-06 13:45:48,395] Trial 14 finished with value: 0.8584 and parameters: {'l1_h_dim': 512, 'l2_h_dim': 192, 'l3_h_dim': 32}. Best is trial 3 with value: 0.8584.
Best parameters: {'l1_h_dim': 512, 'l2_h_dim': 192, 'l3_h_dim': 32}
Warstwy modelu: dict_keys(['log_reg'])
              precision    recall  f1-score   support

           0       0.85      0.75      0.80      1000
           1       0.64      0.67      0.65      1000
           2       0.51      0.63      0.56      1000
           3       0.79      0.76      0.77      1000
           4       0.64      0.64      0.64      1000
           5       0.74      0.70      0.72      1000
           6       0.67      0.71      0.69      1000
           7       0.74      0.56      0.64      1000
           8       0.62      0.74      0.68      1000
           9       0.70      0.64      0.67      1000

    accuracy                           0.68     10000
   macro avg       0.69      0.68      0.68     10000
weighted avg       0

Wciąż pojedyńcza warstwa o największym rozmiarze daje najlepszą klasyfikację z `accuracy=0.86`.

Na przykładzie tego i poprzedniego zbioru danych widać, że nawet przy bardziej skomplikowanym zbiorze danych, niż MNIST, dostatecznie duża, pojedyńcza warstwa RBM potrafi być najlepszym ekstraktorem cech.

#### Eksperyment 2

In [22]:
model_params = {
    'C': 1.0,
    'solver': solver,
    'max_iter': 1000,
    'batch_size': 10,
    'learning_rate': 0.1,
}

optimize_dbn(utils.Dataset_Select.K_MNIST.value, model_params, 10, [128, 256])

[I 2025-01-06 13:54:05,487] A new study created in memory with name: no-name-d34e0c3c-cdcb-4758-9bff-7d1023d64bfa
Best trial: 0. Best value: 0.7994:  10%|█         | 1/10 [01:55<17:20, 115.57s/it]

[I 2025-01-06 13:56:01,057] Trial 0 finished with value: 0.7994 and parameters: {'l1_h_dim': 128, 'l2_h_dim': 192, 'l3_h_dim': 32}. Best is trial 0 with value: 0.7994.


Best trial: 1. Best value: 0.8191:  20%|██        | 2/10 [05:32<23:20, 175.11s/it]

[I 2025-01-06 13:59:37,843] Trial 1 finished with value: 0.8191 and parameters: {'l1_h_dim': 256, 'l2_h_dim': 64, 'l3_h_dim': 32}. Best is trial 1 with value: 0.8191.


Best trial: 1. Best value: 0.8191:  30%|███       | 3/10 [08:51<21:41, 185.93s/it]

[I 2025-01-06 14:02:56,659] Trial 2 finished with value: 0.8191 and parameters: {'l1_h_dim': 256, 'l2_h_dim': 64, 'l3_h_dim': 96}. Best is trial 1 with value: 0.8191.


Best trial: 1. Best value: 0.8191:  40%|████      | 4/10 [10:50<15:56, 159.45s/it]

[I 2025-01-06 14:04:55,512] Trial 3 finished with value: 0.7994 and parameters: {'l1_h_dim': 128, 'l2_h_dim': 192, 'l3_h_dim': 96}. Best is trial 1 with value: 0.8191.


Best trial: 4. Best value: 0.8294:  50%|█████     | 5/10 [13:34<13:26, 161.28s/it]

[I 2025-01-06 14:07:40,048] Trial 4 finished with value: 0.8294 and parameters: {'l1_h_dim': 256, 'l2_h_dim': 192, 'l3_h_dim': 96}. Best is trial 4 with value: 0.8294.


Best trial: 4. Best value: 0.8294:  60%|██████    | 6/10 [15:40<09:56, 149.11s/it]

[I 2025-01-06 14:09:45,524] Trial 5 finished with value: 0.8191 and parameters: {'l1_h_dim': 256, 'l2_h_dim': 128, 'l3_h_dim': 32}. Best is trial 4 with value: 0.8294.


Best trial: 4. Best value: 0.8294:  70%|███████   | 7/10 [16:52<06:12, 124.06s/it]

[I 2025-01-06 14:10:58,012] Trial 6 finished with value: 0.777 and parameters: {'l1_h_dim': 128, 'l2_h_dim': 128, 'l3_h_dim': 32}. Best is trial 4 with value: 0.8294.


Best trial: 4. Best value: 0.8294:  80%|████████  | 8/10 [18:06<03:36, 108.22s/it]

[I 2025-01-06 14:12:12,309] Trial 7 finished with value: 0.7994 and parameters: {'l1_h_dim': 128, 'l2_h_dim': 192, 'l3_h_dim': 64}. Best is trial 4 with value: 0.8294.


Best trial: 4. Best value: 0.8294:  90%|█████████ | 9/10 [20:22<01:56, 116.80s/it]

[I 2025-01-06 14:14:27,966] Trial 8 finished with value: 0.8191 and parameters: {'l1_h_dim': 256, 'l2_h_dim': 128, 'l3_h_dim': 64}. Best is trial 4 with value: 0.8294.


Best trial: 4. Best value: 0.8294: 100%|██████████| 10/10 [21:34<00:00, 129.47s/it]


[I 2025-01-06 14:15:40,195] Trial 9 finished with value: 0.777 and parameters: {'l1_h_dim': 128, 'l2_h_dim': 128, 'l3_h_dim': 32}. Best is trial 4 with value: 0.8294.
Best parameters: {'l1_h_dim': 256, 'l2_h_dim': 192, 'l3_h_dim': 96}
Warstwy modelu: dict_keys(['log_reg'])
              precision    recall  f1-score   support

           0       0.85      0.75      0.80      1000
           1       0.64      0.67      0.65      1000
           2       0.51      0.63      0.56      1000
           3       0.79      0.76      0.77      1000
           4       0.64      0.64      0.64      1000
           5       0.74      0.70      0.72      1000
           6       0.67      0.71      0.69      1000
           7       0.74      0.56      0.64      1000
           8       0.62      0.74      0.68      1000
           9       0.70      0.64      0.67      1000

    accuracy                           0.68     10000
   macro avg       0.69      0.68      0.68     10000
weighted avg       0.6

Kuzushiji-MNIST dał podobny rezultat do normalnego MNIST-a, gdzie to DBN o dwóch warstwach osiąga najlepsze `accuracy=0.83`. Jednowarstwowy DBN osiągnął `accuracy=0.82`, natomiast trójwarstwowy `accuracy=0.81`

### Kuzushiji-49
#### Eksperyment 1

In [15]:
model_params = {
    'C': 0.5,
    'solver': solver,
    'max_iter': 1000,
    'batch_size': 20,
    'learning_rate': 0.1,
}

optimize_dbn(utils.Dataset_Select.KUZ_49.value, model_params, n_trials, [128, 256, 512])

[I 2025-01-05 19:04:22,772] A new study created in memory with name: no-name-bdf9d0c7-2a7a-469a-b851-7f97545c689a
Best trial: 0. Best value: 0.779753:   7%|▋         | 1/15 [12:13<2:51:11, 733.69s/it]

[I 2025-01-05 19:16:36,458] Trial 0 finished with value: 0.7797529843418944 and parameters: {'l1_h_dim': 256, 'l2_h_dim': 64, 'l3_h_dim': 32}. Best is trial 0 with value: 0.7797529843418944.


Best trial: 0. Best value: 0.779753:  13%|█▎        | 2/15 [21:40<2:17:40, 635.45s/it]

[I 2025-01-05 19:26:03,138] Trial 1 finished with value: 0.7797529843418944 and parameters: {'l1_h_dim': 256, 'l2_h_dim': 128, 'l3_h_dim': 96}. Best is trial 0 with value: 0.7797529843418944.


Best trial: 0. Best value: 0.779753:  20%|██        | 3/15 [28:06<1:44:19, 521.66s/it]

[I 2025-01-05 19:32:29,392] Trial 2 finished with value: 0.7513048421270219 and parameters: {'l1_h_dim': 128, 'l2_h_dim': 192, 'l3_h_dim': 96}. Best is trial 0 with value: 0.7797529843418944.


Best trial: 0. Best value: 0.779753:  27%|██▋       | 4/15 [33:45<1:22:24, 449.53s/it]

[I 2025-01-05 19:38:08,356] Trial 3 finished with value: 0.7032711487778409 and parameters: {'l1_h_dim': 128, 'l2_h_dim': 64, 'l3_h_dim': 32}. Best is trial 0 with value: 0.7797529843418944.


Best trial: 0. Best value: 0.779753:  33%|███▎      | 5/15 [44:39<1:27:12, 523.29s/it]

[I 2025-01-05 19:49:02,398] Trial 4 finished with value: 0.7797529843418944 and parameters: {'l1_h_dim': 256, 'l2_h_dim': 128, 'l3_h_dim': 32}. Best is trial 0 with value: 0.7797529843418944.


Best trial: 5. Best value: 0.84161:  40%|████      | 6/15 [1:03:07<1:48:19, 722.20s/it] 

[I 2025-01-05 20:07:30,743] Trial 5 finished with value: 0.8416102527001189 and parameters: {'l1_h_dim': 512, 'l2_h_dim': 64, 'l3_h_dim': 64}. Best is trial 5 with value: 0.8416102527001189.


Best trial: 5. Best value: 0.84161:  47%|████▋     | 7/15 [1:20:36<1:50:30, 828.86s/it]

[I 2025-01-05 20:24:59,179] Trial 6 finished with value: 0.8416102527001189 and parameters: {'l1_h_dim': 512, 'l2_h_dim': 64, 'l3_h_dim': 96}. Best is trial 5 with value: 0.8416102527001189.


Best trial: 5. Best value: 0.84161:  53%|█████▎    | 8/15 [1:40:33<1:50:21, 945.95s/it]

[I 2025-01-05 20:44:55,851] Trial 7 finished with value: 0.8416102527001189 and parameters: {'l1_h_dim': 512, 'l2_h_dim': 192, 'l3_h_dim': 64}. Best is trial 5 with value: 0.8416102527001189.


Best trial: 5. Best value: 0.84161:  60%|██████    | 9/15 [1:48:16<1:19:29, 794.96s/it]

[I 2025-01-05 20:52:38,802] Trial 8 finished with value: 0.7032711487778409 and parameters: {'l1_h_dim': 128, 'l2_h_dim': 64, 'l3_h_dim': 32}. Best is trial 5 with value: 0.8416102527001189.


Best trial: 5. Best value: 0.84161:  67%|██████▋   | 10/15 [2:07:30<1:15:29, 905.80s/it]

[I 2025-01-05 21:11:52,813] Trial 9 finished with value: 0.8416102527001189 and parameters: {'l1_h_dim': 512, 'l2_h_dim': 128, 'l3_h_dim': 96}. Best is trial 5 with value: 0.8416102527001189.


Best trial: 5. Best value: 0.84161:  73%|███████▎  | 11/15 [2:24:43<1:02:59, 944.78s/it]

[I 2025-01-05 21:29:05,968] Trial 10 finished with value: 0.8416102527001189 and parameters: {'l1_h_dim': 512, 'l2_h_dim': 64, 'l3_h_dim': 64}. Best is trial 5 with value: 0.8416102527001189.


Best trial: 5. Best value: 0.84161:  80%|████████  | 12/15 [2:39:14<46:07, 922.53s/it]  

[I 2025-01-05 21:43:37,627] Trial 11 finished with value: 0.8416102527001189 and parameters: {'l1_h_dim': 512, 'l2_h_dim': 64, 'l3_h_dim': 64}. Best is trial 5 with value: 0.8416102527001189.


Best trial: 5. Best value: 0.84161:  87%|████████▋ | 13/15 [2:55:14<31:07, 933.69s/it]

[I 2025-01-05 21:59:36,973] Trial 12 finished with value: 0.8416102527001189 and parameters: {'l1_h_dim': 512, 'l2_h_dim': 64, 'l3_h_dim': 96}. Best is trial 5 with value: 0.8416102527001189.


Best trial: 5. Best value: 0.84161:  93%|█████████▎| 14/15 [3:08:22<14:49, 889.69s/it]

[I 2025-01-05 22:12:45,009] Trial 13 finished with value: 0.8416102527001189 and parameters: {'l1_h_dim': 512, 'l2_h_dim': 64, 'l3_h_dim': 64}. Best is trial 5 with value: 0.8416102527001189.


Best trial: 5. Best value: 0.84161: 100%|██████████| 15/15 [3:22:47<00:00, 811.20s/it]


[I 2025-01-05 22:27:10,734] Trial 14 finished with value: 0.8416102527001189 and parameters: {'l1_h_dim': 512, 'l2_h_dim': 64, 'l3_h_dim': 96}. Best is trial 5 with value: 0.8416102527001189.
Best parameters: {'l1_h_dim': 512, 'l2_h_dim': 64, 'l3_h_dim': 64}
Warstwy modelu: dict_keys(['log_reg'])
              precision    recall  f1-score   support

           0       0.66      0.65      0.65       995
           1       0.81      0.85      0.83       962
          10       0.76      0.71      0.74      1052
          11       0.66      0.75      0.70       970
          12       0.51      0.47      0.49       960
          13       0.75      0.72      0.74       669
          14       0.68      0.69      0.69       629
          15       0.55      0.48      0.51      1012
          16       0.71      0.76      0.73       425
          17       0.67      0.73      0.70       953
          18       0.75      0.82      0.78       976
          19       0.62      0.70      0.66      1046

#### Eksperyment 2

In [25]:
model_params = {
    'C': 0.5,
    'solver': solver,
    'max_iter': 1000,
    'batch_size': 20,
    'learning_rate': 0.1,
}

optimize_dbn(utils.Dataset_Select.KUZ_49.value, model_params, 10, [128, 256])

[I 2025-01-06 15:08:24,045] A new study created in memory with name: no-name-1cfe4611-dbc4-4bda-8c87-004959fc7082
Best trial: 0. Best value: 0.709007:  10%|█         | 1/10 [07:18<1:05:44, 438.25s/it]

[I 2025-01-06 15:15:42,295] Trial 0 finished with value: 0.709007286445145 and parameters: {'l1_h_dim': 128, 'l2_h_dim': 128, 'l3_h_dim': 32}. Best is trial 0 with value: 0.709007286445145.


Best trial: 0. Best value: 0.709007:  20%|██        | 2/10 [16:12<1:05:58, 494.76s/it]

[I 2025-01-06 15:24:36,618] Trial 1 finished with value: 0.709007286445145 and parameters: {'l1_h_dim': 128, 'l2_h_dim': 128, 'l3_h_dim': 32}. Best is trial 0 with value: 0.709007286445145.


Best trial: 0. Best value: 0.709007:  30%|███       | 3/10 [22:53<52:42, 451.85s/it]  

[I 2025-01-06 15:31:17,397] Trial 2 finished with value: 0.7032711487778409 and parameters: {'l1_h_dim': 128, 'l2_h_dim': 64, 'l3_h_dim': 96}. Best is trial 0 with value: 0.709007286445145.


Best trial: 0. Best value: 0.709007:  40%|████      | 4/10 [31:04<46:43, 467.25s/it]

[I 2025-01-06 15:39:28,253] Trial 3 finished with value: 0.7032711487778409 and parameters: {'l1_h_dim': 128, 'l2_h_dim': 64, 'l3_h_dim': 32}. Best is trial 0 with value: 0.709007286445145.


Best trial: 0. Best value: 0.709007:  50%|█████     | 5/10 [38:41<38:38, 463.67s/it]

[I 2025-01-06 15:47:05,586] Trial 4 finished with value: 0.709007286445145 and parameters: {'l1_h_dim': 128, 'l2_h_dim': 128, 'l3_h_dim': 64}. Best is trial 0 with value: 0.709007286445145.


Best trial: 5. Best value: 0.779753:  60%|██████    | 6/10 [49:04<34:31, 517.79s/it]

[I 2025-01-06 15:57:28,411] Trial 5 finished with value: 0.7797529843418944 and parameters: {'l1_h_dim': 256, 'l2_h_dim': 192, 'l3_h_dim': 32}. Best is trial 5 with value: 0.7797529843418944.


Best trial: 5. Best value: 0.779753:  70%|███████   | 7/10 [1:00:17<28:25, 568.57s/it]

[I 2025-01-06 16:08:41,530] Trial 6 finished with value: 0.7797529843418944 and parameters: {'l1_h_dim': 256, 'l2_h_dim': 192, 'l3_h_dim': 32}. Best is trial 5 with value: 0.7797529843418944.


Best trial: 5. Best value: 0.779753:  80%|████████  | 8/10 [1:10:15<19:15, 577.87s/it]

[I 2025-01-06 16:18:39,316] Trial 7 finished with value: 0.7797529843418944 and parameters: {'l1_h_dim': 256, 'l2_h_dim': 64, 'l3_h_dim': 64}. Best is trial 5 with value: 0.7797529843418944.


Best trial: 5. Best value: 0.779753:  90%|█████████ | 9/10 [1:15:44<08:20, 500.24s/it]

[I 2025-01-06 16:24:08,880] Trial 8 finished with value: 0.7032711487778409 and parameters: {'l1_h_dim': 128, 'l2_h_dim': 64, 'l3_h_dim': 64}. Best is trial 5 with value: 0.7797529843418944.


Best trial: 5. Best value: 0.779753: 100%|██████████| 10/10 [1:21:59<00:00, 491.94s/it]


[I 2025-01-06 16:30:23,409] Trial 9 finished with value: 0.7513048421270219 and parameters: {'l1_h_dim': 128, 'l2_h_dim': 192, 'l3_h_dim': 64}. Best is trial 5 with value: 0.7797529843418944.
Best parameters: {'l1_h_dim': 256, 'l2_h_dim': 192, 'l3_h_dim': 32}
Warstwy modelu: dict_keys(['log_reg'])
              precision    recall  f1-score   support

           0       0.66      0.65      0.65       995
           1       0.81      0.85      0.83       962
          10       0.76      0.71      0.74      1052
          11       0.66      0.75      0.70       970
          12       0.51      0.47      0.49       960
          13       0.75      0.72      0.74       669
          14       0.68      0.69      0.69       629
          15       0.55      0.48      0.51      1012
          16       0.71      0.76      0.73       425
          17       0.67      0.73      0.70       953
          18       0.75      0.82      0.78       976
          19       0.62      0.70      0.66      104

Kuzushiji-49 przy klasyfikacji DBN dostarcza wyniki zbliżone do Fashion-MNIST, gdzie to DBN jednowarstwowy jest najlepszym klasyfikatorem o `accuracy=0.78`. Przewyższa on minimalnie DBN dwuwarstwowy o `accuracy=0.77`. Natomiast ciekawy wynik otrzymujemy dla trójwarstwowego DBN, który to z `accuracy=0.60`, jest o 6 punktów procentowych gorszy od baseline-u. 

Zachowanie to można powiązać z faktem, w jaki skonfigurowaliśmy Optunę, a mianowicie optymalizator wybiera zestaw DBN, który wśród trzech badanych wysokości DBN dostarcza najwyższą metrykę `accuracy`. W naszym wypadku najlepszy wynik osiągamy dla jednowarstwowego DBN-a i akurat tak się złożyło, że w tym zestawie warstw, ostatnia )trzecia) warstwa ma tylko 32 neurony, czyli o 17 mniej, niż jest klas. Taka redukcja wymiarowości przeprowadzona w niewłaściwy sposób, może właśnie doprowadzić do utraty jakości wyników, a w szczególności przy użyciu regresji logistycznej jako głowy klasyfikatora, gdyż jak wykazałem w ćwiczeniu 7.1, a później jeszcze potwierdziłem w ćwiczeniu 7.4 dla autoenkoderów, regresja logistyczna działa lepiej wraz ze wzrostem zmiennych wejściowych.

### Generalne wnioski

Jak widać na przykładzie powyższych wyników, wystarczająco duże sieci jednowarstwowe potrafią osiągnąć bardzo dobre, a nawet najlepsze wyniki, ale stwierdzenie to tyczy się jedynie przebadanych zbiorów danych.

Hierarchiczne wydobywanie cech może mieć zastosowanie, gdy z cech niskopoziomowych możemy zbudować cechy wysokopoziomowe, dając prosty przykład, niech cechy niskopoziome przedstawiają najprostsze kształty, jak linie proste lub łuki, tak w kolejnej warstwie, mogłyby się one składać w bardziej skomplikowane kształty, jak np figury geometryczne lub znaki.