# Compound Scalling
https://arxiv.org/abs/1905.11946

![alt text](imgs/compoundScaling_scheme.png "title")

## 1 - Define a baseline model [go to cell](#valid_scale)

## 2 - Find $\alpha$, $\beta$ and $\gamma$.
 - Must follow the following constraint: $(\alpha . \beta^2 . \gamma^2) \approx 2$        [go to cell](#valid_scale)
 - Small grid search. Train models on one set of folds  [go to cell](#valid_scale)
 - Fixe the variables
 
The constraint define below is used to avoid a 
We want to scale up the model incrementaly (not too much at once). The following constraint is added so each step increase the complexity of the model (number of parameters and number of Mops) by a factor 2

$
\begin{align}
(\alpha . \beta^2 . \gamma^2) \approx 2
\end{align}
$

By performing a grid search we then fix the variables $\alpha$, $\beta$ and $\gamma$ and add a compound scaler $\phi$

This last ratio will be use for generating each model. It is used to determine the number of layer (**depth**), their size (**width**) as well as the input size (**resolution**)

$
\begin{align}
(\alpha . \beta^2 . \gamma^2)^{\phi}
\end{align}
$

# import

In [15]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [16]:
import os
os.environ["MKL_NUM_THREADS"] = "2"
os.environ["NUMEXPR_NU M_THREADS"] = "2"
os.environ["OMP_NUM_THREADS"] = "2"
import numpy as np
import tqdm
import time
import random
import gc

import librosa
import pprint
from torchsummaryX import summary

import torch
import torch.nn as nn
import torch.utils.data as data
import torch.nn.functional as F
from torch.optim.lr_scheduler import LambdaLR
from advertorch.attacks import GradientSignAttack
from torch.utils.tensorboard import SummaryWriter

In [17]:
import sys
sys.path.append("../src/")

from ubs8k.datasetManager import DatasetManager
from ubs8k.generators import Dataset
import ubs8k.signal_augmentations as signal_augmentations
from ubs8k.models import scallable2_new
from ubs8k.utils import get_datetime, reset_seed
from ubs8k.metrics import CategoricalAccuracy

from ubs8k.datasetManager import conditional_cache

# Initialisation

## set seeds

In [18]:
reset_seed(1324)

In [19]:
metadata_root = "../dataset/metadata"
audio_root="../dataset/audio"
manager = DatasetManager(metadata_root, audio_root, train_fold=[], val_fold = [], verbose=1)

0it [00:00, ?it/s]
0it [00:00, ?it/s]


## Prep the models

In [20]:
from multiprocessing import Process, Manager

def conditional_cache(func):
    def decorator(*args, **kwargs):
        if "filename" in kwargs.keys() and "cached" in kwargs.keys():
            filename = kwargs["filename"]
            cached = kwargs["cached"]

            if filename is not None and cached:
                if filename not in decorator.cache.keys():
                    decorator.cache[filename] = func(*args, **kwargs)
                    return decorator.cache[filename]

                else:
                    if decorator.cache[filename] is None:
                        decorator.cache[filename] = func(*args, **kwargs)
                        return decorator.cache[filename]
                    else:
                        return decorator.cache[filename]

        return func(*args, **kwargs)

    decorator.cache = dict()

    return decorator

In [21]:
class ConvBNReLUPool(nn.Sequential):
    def __init__(self, in_size, out_size, kernel_size, stride, padding,
                pool_kernel_size, pool_stride, dropout: float = 0.0):
        super(ConvBNReLUPool, self).__init__(
            nn.Conv2d(in_size, out_size, kernel_size=kernel_size, stride=stride, padding=padding),
            nn.BatchNorm2d(out_size),
            nn.Dropout2d(dropout),
            nn.ReLU6(inplace=True),
            nn.MaxPool2d(kernel_size=pool_kernel_size, stride=pool_stride),
        )
        
class ConvReLU(nn.Sequential):
    def __init__(self, in_size, out_size, kernel_size, stride, padding):
        super(ConvReLU, self).__init__(
            nn.Conv2d(in_size, out_size, kernel_size=kernel_size, stride=stride, padding=padding),
            nn.ReLU6(inplace=True),
        )

In [22]:
class ScalableCnn(nn.Module):
    """
    Compound Scaling based CNN
    see: https://arxiv.org/pdf/1905.11946.pdf
    """

    def __init__(self, dataset: DatasetManager,
                 compound_scales: tuple = (1, 1, 1),
                 initial_conv_inputs=[1, 32, 64, 64],
                 initial_conv_outputs=[32, 64, 64, 64],
                 initial_linear_inputs=[1344, ],
                 initial_linear_outputs=[10, ],
                 initial_resolution=[64, 173],
                 round_up: bool = False,
                 **kwargs
                 ):
        super(ScalableCnn, self).__init__()
        self.compound_scales = compound_scales
        self.dataset = dataset
        round_func = np.floor if not round_up else np.ceil

        alpha, beta, gamma = compound_scales[0], compound_scales[1], compound_scales[2]

        initial_nb_conv = len(initial_conv_inputs)
        initial_nb_dense = len(initial_linear_inputs)

        # Apply compound scaling

        # resolution ----
        # WARNING - RESOLUTION WILL CHANGE THE FEATURES EXTRACTION OF THE SAMPLE
        new_n_mels = int(round_func(initial_resolution[0] * gamma))
        new_n_time_bins = int(round_func(initial_resolution[1] * gamma))
        new_hop_length = int(round_func( (self.dataset.sr * DatasetManager.LENGTH) / new_n_time_bins))

        self.scaled_resolution = (new_n_mels, new_n_time_bins)
        print("new scaled resolution: ", self.scaled_resolution)

        self.dataset.extract_feature = self.generate_feature_extractor(new_n_mels, new_hop_length)

        # ======== CONVOLUTION PARTS ========
        # ---- depth ----
        scaled_nb_conv = round_func(initial_nb_conv * alpha)
        
        new_conv_inputs, new_conv_outputs = initial_conv_inputs.copy(), initial_conv_outputs.copy()
        if scaled_nb_conv != initial_nb_conv:  # Another conv layer must be created
            print("More conv layer must be created")
            gaps = np.array(initial_conv_outputs) - np.array(initial_conv_inputs)  # average filter gap
            avg_gap = gaps.mean()

            while len(new_conv_inputs) < scaled_nb_conv:
                new_conv_outputs.append(int(round_func(new_conv_outputs[-1] + avg_gap)))
                new_conv_inputs.append(new_conv_outputs[-2])
        
        # ---- width ----
        scaled_conv_inputs = [int(round_func(i * beta)) for i in new_conv_inputs]
        scaled_conv_outputs = [int(round_func(i * beta)) for i in new_conv_outputs]
        
        print("new conv layers:")
        print("inputs: ", scaled_conv_inputs)
        print("ouputs: ", scaled_conv_outputs)
        
        # Check how many conv with pooling layer can be used
        nb_max_pooling = int(np.floor(np.min([np.log2(self.scaled_resolution[0]), int(np.log2(self.scaled_resolution[1]))])))
        nb_model_pooling = len(scaled_conv_inputs)

        if nb_model_pooling > nb_max_pooling:
            nb_model_pooling = nb_max_pooling
            
        # fixe initial conv layers
        scaled_conv_inputs[0] = 1
        
        # ======== LINEAR PARTS ========
        # adjust the first dense input with the last convolutional layers
        initial_linear_inputs[0] = self.calc_initial_dense_input(
            self.scaled_resolution,
            nb_model_pooling,
            scaled_conv_outputs
        )
        
        # --- depth ---
        scaled_nb_linear = round_func(initial_nb_dense * alpha)
        
        if scaled_nb_linear != initial_nb_dense:  # Another dense layer must be created
            print("More dense layer must be created")
            dense_list = np.linspace(initial_linear_inputs[0], initial_linear_outputs[-1], scaled_nb_linear + 1)
            initial_linear_inputs = dense_list[:-1]
            initial_linear_outputs = dense_list[1:]
            
        # --- width ---
        scaled_dense_inputs = [int(round_func(i * beta)) for i in initial_linear_inputs]
        scaled_dense_outputs = [int(round_func(i * beta)) for i in initial_linear_outputs]
        
        # fix first and final linear layer
        scaled_dense_inputs[0] = self.calc_initial_dense_input(self.scaled_resolution,
                                                                nb_model_pooling,
                                                                scaled_conv_outputs)
        scaled_dense_outputs[-1] = 10
        
        print("new dense layers:")
        print("inputs: ", scaled_dense_inputs)
        print("ouputs: ", scaled_dense_outputs)

        # ======== BUILD THE MODEL=========
        # features part ----
        features = []

        # Create the layers
        for idx, (inp, out) in enumerate(zip(scaled_conv_inputs, scaled_conv_outputs)):
            if idx < nb_model_pooling:
                dropout = 0.3 if idx != 0 else 0.0
                features.append(ConvBNReLUPool(inp, out, 3, 1, 1, (2, 2), (2, 2), dropout))

            else:
                features.append(ConvReLU(inp, out, 3, 1, 1))

        self.features = nn.Sequential(
            *features,
        )

        # classifier part ----
        linears = []
        for inp, out in zip(scaled_dense_inputs[:-1], scaled_dense_outputs[:-1]):
            print(inp, out)
            linears.append(nn.Linear(inp, out))
            linears.append(nn.ReLU6(inplace=True))

        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Dropout(0.5),
            *linears,
            nn.Linear(scaled_dense_inputs[-1], scaled_dense_outputs[-1])
        )

    def forward(self, x):
        x = x.view(-1, 1, *x.shape[1:])

        x = self.features(x)
        x = self.classifier(x)

        return x

    def calc_initial_dense_input(self, resolution, nb_model_pooling, conv_outputs):
        dim1 = resolution[0]
        dim2 = resolution[1]

        for i in range(int(nb_model_pooling)):
            dim1 = dim1 // 2
            dim2 = dim2 // 2

        return dim1 * dim2 * conv_outputs[-1]

    def generate_feature_extractor(self, n_mels, hop_length):
        @conditional_cache
        def extract_feature(raw_data, filename = None, cached = False):
            feat = librosa.feature.melspectrogram(
                raw_data, self.dataset.sr, n_fft=2048, hop_length=hop_length, n_mels=n_mels, fmin=0, fmax=self.dataset.sr // 2)
            feat = librosa.power_to_db(feat, ref=np.max)
            return feat

        return extract_feature

## Model preparation

### Baseline parameters

In [23]:
# create model
alpha = 1.0
beta = 1.0
gamma = 1.0

torch.cuda.empty_cache()

model_func = ScalableCnn
parameters_baseline = dict(
    #dataset=manager,
    
    compound_scales = (alpha, beta, gamma),
    
    initial_conv_inputs=[1, 32, 64, 64],
    initial_conv_outputs=[32, 64, 64, 64],
    initial_linear_inputs=[1344, ],
    initial_linear_outputs=[10, ],
    initial_resolution=[64, 173],
    round_up=True,
)

### step 1 parameters

In [24]:
# create model
phi = 1.0
alpha = 1.36**phi
beta = 1.0**phi
gamma = 1.21**phi

torch.cuda.empty_cache()

model_func = ScalableCnn
parameters_step1 = dict(
    #dataset=manager,
    
    compound_scales = (alpha, beta, gamma),
    
    initial_conv_inputs=[1, 32, 64, 64],
    initial_conv_outputs=[32, 64, 64, 64],
    initial_linear_inputs=[1344, ],
    initial_linear_outputs=[10, ],
    initial_resolution=[64, 173],
    round_up=True,
)

# Get valid scale
<a id='valid_scale'></a>

In [None]:
alpha = np.linspace(1, 2, 20)
beta = np.linspace(1, 2, 20)
gamma = np.linspace(1, 2, 20)

import itertools

valid_scaling = []
tolerance = 0.005
target = 2
low_target = target - (target * tolerance)
high_target = target + (target * tolerance)

for a, b, g in tqdm.tqdm(itertools.product(alpha, beta, gamma)):
    M = a * b**2 * g**2
    
    if low_target < M < high_target:
        valid_scaling.append((a, b, g))

In [19]:
pprint.pprint(valid_scaling)

[(1.1578947368421053, 1.0, 1.3157894736842106),
 (1.1578947368421053, 1.3157894736842106, 1.0),
 (1.368421052631579, 1.0, 1.2105263157894737),
 (1.368421052631579, 1.2105263157894737, 1.0),
 (1.4736842105263157, 1.0526315789473684, 1.1052631578947367),
 (1.4736842105263157, 1.1052631578947367, 1.0526315789473684),
 (1.631578947368421, 1.0, 1.1052631578947367),
 (1.631578947368421, 1.0526315789473684, 1.0526315789473684),
 (1.631578947368421, 1.1052631578947367, 1.0),
 (2.0, 1.0, 1.0)]


In [11]:
alpha = ["%.3f" % s[0] for s in valid_scaling]
beta  = ["%.3f" % s[1] for s in valid_scaling]
gamma = ["%.3f" % s[2] for s in valid_scaling]

# make it bash ready
print("alpha=(" + " ".join(alpha) + ")")
print("beta=(" + " ".join(beta) + ")")
print("gamma=" + " ".join(gamma) + ")")

alpha=(1.158 1.158 1.368 1.368 1.474 1.474 1.632 1.632 1.632 2.000)
beta=(1.000 1.316 1.000 1.211 1.053 1.105 1.000 1.053 1.105 1.000)
gamma=1.316 1.000 1.211 1.000 1.105 1.053 1.105 1.053 1.000 1.000)


## test every model to check if they are architecturally valid

In [43]:
def testing_model(parameters, compound_scales = (1.0, 1.0, 1.0)):
    """The new scallable 1 model feature resolution scaling.
    It is also used as a starter for the new scallable 2 model
    """
#     compound_scales = kwargs.get("compound_scales", (1.0, 1.0, 1.0))

    del parameters["compound_scales"]
    return ScalableCnn(dataset=manager, compound_scales = compound_scales, **parameters)

In [44]:
testing_model(parameters_baseline, compound_scales=(1.1578947368421053, 1.0, 1.3157894736842106))

new scaled resolution:  (85, 228)
More conv layer must be created
new conv layers:
inputs:  [1, 32, 64, 64, 64]
ouputs:  [32, 64, 64, 64, 80]
More dense layer must be created
new dense layers:
inputs:  [1120, 565]
ouputs:  [565, 10]
1120 565




ScalableCnn(
  (features): Sequential(
    (0): ConvBNReLUPool(
      (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): Dropout2d(p=0.0, inplace=False)
      (3): ReLU6(inplace=True)
      (4): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
    )
    (1): ConvBNReLUPool(
      (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): Dropout2d(p=0.3, inplace=False)
      (3): ReLU6(inplace=True)
      (4): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
    )
    (2): ConvBNReLUPool(
      (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): Dropout2d(p=0.3, inp

In [21]:
manager = DatasetManager("../dataset/metadata", "../dataset/audio", train_fold=[1], val_fold = [2], verbose=1)

  0%|          | 0/1 [00:00<?, ?it/s]

filenames folds: 1:  ['57320-0-0-15.wav' '105415-2-0-15.wav' '139951-9-0-13.wav'
 '106905-8-0-2.wav' '102842-3-0-1.wav']


100%|██████████| 1/1 [00:01<00:00,  1.48s/it]
  0%|          | 0/1 [00:00<?, ?it/s]

filenames folds: 2:  ['96920-9-0-3.wav' '27349-3-0-2.wav' '34621-4-26-0.wav'
 '152908-5-0-11.wav' '158597-2-0-3.wav']


100%|██████████| 1/1 [00:01<00:00,  1.53s/it]


In [26]:
model = ScalableCnn(
    dataset=manager,
    compound_scales = (1.158, 1.316, 1.000),   
    initial_conv_inputs=[1, 32, 64, 64, 64],
    initial_conv_outputs=[32, 64, 64, 64, 79],
    initial_linear_inputs=[948, ],
    initial_linear_outputs=[10, ],
    initial_resolution=[77, 209],
    round_up=False,
)

resolution = model.scaled_resolution
input_tensor = torch.zeros(((1,) + resolution), dtype=torch.float)
summary(model, input_tensor)

new scaled resolution:  (77, 209)
new conv layers:
inputs:  [1, 42, 84, 84, 84]
ouputs:  [42, 84, 84, 84, 103]
new dense layers:
inputs:  [1236]
ouputs:  [10]
                                Kernel Shape      Output Shape   Params  \
Layer                                                                     
0_features.0.Conv2d_0          [1, 42, 3, 3]  [1, 42, 77, 209]    420.0   
1_features.0.BatchNorm2d_1              [42]  [1, 42, 77, 209]     84.0   
2_features.0.Dropout2d_2                   -  [1, 42, 77, 209]        -   
3_features.0.ReLU6_3                       -  [1, 42, 77, 209]        -   
4_features.0.MaxPool2d_4                   -  [1, 42, 38, 104]        -   
5_features.1.Conv2d_0         [42, 84, 3, 3]  [1, 84, 38, 104]  31.836k   
6_features.1.BatchNorm2d_1              [84]  [1, 84, 38, 104]    168.0   
7_features.1.Dropout2d_2                   -  [1, 84, 38, 104]        -   
8_features.1.ReLU6_3                       -  [1, 84, 38, 104]        -   
9_features.1.Max

Unnamed: 0_level_0,Kernel Shape,Output Shape,Params,Mult-Adds
Layer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0_features.0.Conv2d_0,"[1, 42, 3, 3]","[1, 42, 77, 209]",420.0,6083154.0
1_features.0.BatchNorm2d_1,[42],"[1, 42, 77, 209]",84.0,42.0
2_features.0.Dropout2d_2,-,"[1, 42, 77, 209]",,
3_features.0.ReLU6_3,-,"[1, 42, 77, 209]",,
4_features.0.MaxPool2d_4,-,"[1, 42, 38, 104]",,
5_features.1.Conv2d_0,"[42, 84, 3, 3]","[1, 84, 38, 104]",31836.0,125483904.0
6_features.1.BatchNorm2d_1,[84],"[1, 84, 38, 104]",168.0,84.0
7_features.1.Dropout2d_2,-,"[1, 84, 38, 104]",,
8_features.1.ReLU6_3,-,"[1, 84, 38, 104]",,
9_features.1.MaxPool2d_4,-,"[1, 84, 19, 52]",,


In [16]:
model_func = scallable2_new

for valid in valid_scaling:
    print("========================================")
    print("")
    print(valid)
    print("")
    print("========================================")
    
    torch.cuda.empty_cache()
    
    m1 = testing_model(dataset = manager, compound_scales=valid)
    resolution = m1.scaled_resolution

    m1.cuda()
    
    input_tensor = torch.zeros(((1,) + resolution), dtype=torch.float)
    input_tensor = input_tensor.cuda()

    s = summary(m1, input_tensor)

NameError: name 'valid_scaling' is not defined

# Prep training

In [11]:
class Trainer:
    def __init__(self, audio_root, metadata_root,
        model_parameters, criterion,
        batch_size=64, nb_epoch=100, augmentations = []
    ):
        self.audio_root = audio_root
        self.metadata_root = metadata_root
        
        self.model_parameters = model_parameters
        self.criterion = criterion
        self.batch_size = batch_size
        self.nb_epoch = nb_epoch
        
        self.model = None
        self.manager = None
        self.tensorboard = None
        self.train_dataset = None
        self.val_dataset = None
        self.train_loader = None
        self.val_loader = None
        
        self.data_ready: bool = False
        self.training_ready: bool = False
            
    def _free(self):
        self.model = None
        self.manager = None
        self.tensorboard = None
        self.train_dataset = None
        self.val_dataset = None
        self.train_loader = None
        self.val_loader = None
        
        gc.collect()
        
    def prepare_data(self, train_fold: list = [1,2,3,4,5,6,7,8,9], val_fold: list = [10]):

        self.manager = DatasetManager(
            self.metadata_root, self.audio_root,
            train_fold=train_fold, val_fold = val_fold,
            verbose=1)
        
        # train and val loaders
        self.train_dataset = Dataset(self.manager, train=True, val=False, augments=[], cached=True)
        self.val_dataset = Dataset(self.manager, train=False, val=True, augments=[], cached=True)
        
        self.training_loader = torch.utils.data.DataLoader(self.train_dataset, batch_size=self.batch_size, shuffle=True)
        self.val_loader = torch.utils.data.DataLoader(self.val_dataset, batch_size=self.batch_size, shuffle=True)
        
        self.nb_batch = len(self.train_dataset) // self.batch_size
        
        self.data_ready = True
        
    def prepare_training(self, tensorboard_title: str = None):
            
        # Create model
        self.model = ScalableCnn(dataset=self.manager, **self.model_parameters)
        self.model.cuda()
        
        if tensorboard_title is not None:
            self.tensorboard = SummaryWriter(log_dir="../tensorboard/compound_scaling/%s" % tensorboard_title, comment=self.model.__class__.__name__)

        self.optimizer = torch.optim.SGD(self.model.parameters(), weight_decay=1e-3, lr=0.05)
        
        # scheduler
        lr_lambda = lambda epoch: 0.05 * (np.cos(np.pi * epoch / self.nb_epoch) + 1)
        lr_scheduler = LambdaLR(self.optimizer, lr_lambda=lr_lambda)
        self.callbacks = [lr_scheduler]
        
        # metrics
        self.acc_func = CategoricalAccuracy()
        
        self.training_ready = True
        
    def reset_metrics(self):
        self.acc_func.reset()
        
    def cross_val(self, tensorboard_title: str = None):
        train_folds = [
            [2, 3, 4, 5, 6, 7, 8, 9, 10], 
            [1, 3, 4, 5, 6, 7, 8, 9, 10], 
            [1, 2, 4, 5, 6, 7, 8, 9, 10],
            [1, 2, 3, 5, 6, 7, 8, 9, 10],
            [1, 2, 3, 4, 6, 7, 8, 9, 10],
            [1, 2, 3, 4, 5, 7, 8, 9, 10],
            [1, 2, 3, 4, 5, 6, 8, 9, 10],
            [1, 2, 3, 4, 5, 6, 7, 9, 10],
            [1, 2, 3, 4, 5, 6, 7, 8, 10],
            [1, 2, 3, 4, 5, 6, 7, 8, 9], 
        ]
        
        val_folds = [[1], [2], [3], [4], [5], [6], [7] ,[8], [9], [10]]
        
        for run_id in range(10):
            self._free()
            
            tensorboard_title_ = "%s/%s_run%d" % (tensorboard_title, get_datetime(), (run_id + 1))
            
            self.prepare_data(train_fold=train_folds[run_id], val_fold=val_folds[run_id])
            self.prepare_training(tensorboard_title=tensorboard_title_)
            
            self.train()
        
    def train(self):
        if not self.data_ready:
            raise ValueError("The data are not ready, please use `prepare_data(..)`")
        if not self.training_ready:
            raise ValueError("The training is not ready, please use `prepare_training(...)`")
            
        for epoch in range(self.nb_epoch):
            self.train_step(epoch)
            self.val_step(epoch)
            
            for callback in self.callbacks:
                callback.step()
                
        if self.tensorboard is not None:
            self.tensorboard.flush()
            self.tensorboard.close()
        
    def train_step(self, epoch):
        start_time = time.time()
        print("")

        self.reset_metrics()
        self.model.train()

        for i, (X, y) in enumerate(self.training_loader):        
            # Transfer to GPU
            X = X.cuda()
            y = y.cuda()

            # predict
            logits = self.model(X)

            weak_loss = self.criterion(logits, y)

            total_loss = weak_loss

            # calc metrics
    #         y_pred = torch.log_softmax(logits, dim=1)
            _, y_pred = torch.max(logits, 1)
            acc = self.acc_func(y_pred, y)

            # ======== back propagation ========
            self.optimizer.zero_grad()
            total_loss.backward()
            self.optimizer.step()

            # ======== history ========
            print("Epoch {}, {:d}% \t ce: {:.4f} - acc: {:.4f} - took: {:.2f}s".format(
                epoch+1,
                int(100 * (i+1) / self.nb_batch),
                total_loss.item(),
                acc,
                time.time() - start_time
            ),end="\r")

        # using tensorboard to monitor loss and acc
        if self.tensorboard is not None:
            self.tensorboard.add_scalar('train/ce', total_loss.item(), epoch)
            self.tensorboard.add_scalar("train/acc", 100. * acc, epoch )
            
    def val_step(self, epoch):
        print("")
        with torch.set_grad_enabled(False):
            # reset metrics
            self.reset_metrics()
            self.model.eval()

            for i, (X_val, y_val) in enumerate(self.val_loader):
                # Transfer to GPU
                X_val = X_val.cuda()
                y_val = y_val.cuda()

    #             y_weak_val_pred, _ = model(X_val)
                logits = self.model(X_val)

                # calc loss
                weak_loss_val = self.criterion(logits, y_val)

                # metrics
    #             y_val_pred =torch.log_softmax(logits, dim=1)
                _, y_val_pred = torch.max(logits, 1)
                acc_val = self.acc_func(y_val_pred, y_val)

                #Print statistics
                print("Epoch {}, {:d}% \t ce val: {:.4f} - acc val: {:.4f}".format(
                    epoch+1,
                    int(100 * (i+1) / self.nb_batch),
                    weak_loss_val.item(),
                    acc_val,
                ),end="\r")

            # using tensorboard to monitor loss and acc
            if self.tensorboard is not None:
                self.tensorboard.add_scalar('validation/ce', weak_loss_val.item(), epoch)
                self.tensorboard.add_scalar("validation/acc", 100. * acc_val, epoch )

In [12]:
criterion = nn.CrossEntropyLoss(reduce="mean")



# Train baseline

In [13]:
# tensorboard
title = "%s_%s_scallable0" % ( get_datetime(), model_func.__name__)

torch.cuda.empty_cache()

trainer = Trainer(
    audio_root="../dataset/audio", metadata_root="../dataset/metadata",
    model_parameters = parameters_baseline,
    criterion=criterion,
    batch_size=64,
    nb_epoch=150,
)


## Train 1 fold

In [None]:
trainer.prepare_data(train_fold = [1,2,3,4,5,6,7,8,9], val_fold = [10])
trainer.prepare_training(tensorboard_title=title)
trainer.train()

## Train cross_val

In [None]:
title = "scallable0/cross_validation"

trainer.cross_val(title)

# Train step 1

In [15]:
# tensorboard
title = "scallable1"

torch.cuda.empty_cache()

trainer = Trainer(
    audio_root="../dataset/audio", metadata_root="../dataset/metadata",
    model_parameters = parameters_step1,
    criterion=criterion,
    batch_size=64,
    nb_epoch=150,
)

## Train 1 fold

In [16]:
trainer.prepare_data(train_fold = [1,2,3,4,5,6,7,8,9], val_fold = [10])
trainer.prepare_training(tensorboard_title=title)
trainer.train()

  0%|          | 0/9 [00:00<?, ?it/s]

filenames folds: 1:  ['57320-0-0-15.wav' '105415-2-0-15.wav' '139951-9-0-13.wav'
 '106905-8-0-2.wav' '102842-3-0-1.wav']


 11%|█         | 1/9 [00:01<00:12,  1.54s/it]

filenames folds: 2:  ['96920-9-0-3.wav' '27349-3-0-2.wav' '34621-4-26-0.wav'
 '152908-5-0-11.wav' '158597-2-0-3.wav']


 22%|██▏       | 2/9 [00:03<00:10,  1.57s/it]

filenames folds: 3:  ['195451-5-0-8.wav' '33696-3-4-0.wav' '62837-7-1-18.wav'
 '22601-8-0-44.wav' '65750-3-0-5.wav']


 33%|███▎      | 3/9 [00:04<00:09,  1.62s/it]

filenames folds: 4:  ['185909-2-0-86.wav' '169466-4-2-18.wav' '128160-5-0-12.wav'
 '121888-3-0-0.wav' '159751-8-0-14.wav']





KeyboardInterrupt: 

## Train cross_val

In [17]:
title = "scallable1/cross_validation/"

trainer.cross_val(title)

  0%|          | 0/9 [00:00<?, ?it/s]

filenames folds: 2:  ['96920-9-0-3.wav' '27349-3-0-2.wav' '34621-4-26-0.wav'
 '152908-5-0-11.wav' '158597-2-0-3.wav']


 11%|█         | 1/9 [00:01<00:12,  1.52s/it]

filenames folds: 3:  ['195451-5-0-8.wav' '33696-3-4-0.wav' '62837-7-1-18.wav'
 '22601-8-0-44.wav' '65750-3-0-5.wav']


 22%|██▏       | 2/9 [00:03<00:10,  1.54s/it]

filenames folds: 4:  ['185909-2-0-86.wav' '169466-4-2-18.wav' '128160-5-0-12.wav'
 '121888-3-0-0.wav' '159751-8-0-14.wav']


 33%|███▎      | 3/9 [00:04<00:09,  1.59s/it]

filenames folds: 5:  ['17578-5-0-9.wav' '17578-5-0-22.wav' '31150-2-0-1.wav' '34872-3-0-1.wav'
 '121286-0-0-5.wav']


 44%|████▍     | 4/9 [00:06<00:08,  1.60s/it]

filenames folds: 6:  ['132021-7-0-3.wav' '46299-2-0-36.wav' '63724-0-0-12.wav'
 '135544-6-19-0.wav' '34952-8-0-3.wav']


 56%|█████▌    | 5/9 [00:07<00:06,  1.54s/it]

filenames folds: 7:  ['83488-1-1-0.wav' '21683-9-0-30.wav' '201988-5-0-20.wav'
 '177537-7-0-20.wav' '209992-5-2-91.wav']


 67%|██████▋   | 6/9 [00:09<00:04,  1.49s/it]

filenames folds: 8:  ['113216-5-0-0.wav' '194733-9-0-14.wav' '52633-3-0-1.wav'
 '171243-9-0-11.wav' '161129-4-0-13.wav']


 78%|███████▊  | 7/9 [00:10<00:02,  1.42s/it]

filenames folds: 9:  ['105029-7-0-3.wav' '103249-5-0-13.wav' '188823-7-0-0.wav'
 '180029-4-8-0.wav' '180156-1-2-0.wav']


 89%|████████▉ | 8/9 [00:11<00:01,  1.38s/it]

filenames folds: 10:  ['188813-7-5-0.wav' '115241-9-0-9.wav' '93567-8-0-17.wav'
 '155280-2-0-6.wav' '203424-9-0-15.wav']


100%|██████████| 9/9 [00:13<00:00,  1.38s/it]
  0%|          | 0/1 [00:00<?, ?it/s]

filenames folds: 1:  ['57320-0-0-15.wav' '105415-2-0-15.wav' '139951-9-0-13.wav'
 '106905-8-0-2.wav' '102842-3-0-1.wav']


100%|██████████| 1/1 [00:01<00:00,  1.45s/it]


new scaled resolution:  (78, 210)
More conv layer must be created
new conv layers:
inputs:  [1, 32, 64, 64, 64, 80]
ouputs:  [32, 64, 64, 64, 80, 96]
More dense layer must be created
new dense layers:
inputs:  [288, 149]
ouputs:  [149, 10]
288 149

Epoch 1, 100% 	 ce: 2.1214 - acc: 0.1389 - took: 50.46s
Epoch 1, 11% 	 ce val: 2.1059 - acc val: 0.3185
Epoch 2, 100% 	 ce: 2.1158 - acc: 0.2027 - took: 3.91s
Epoch 2, 11% 	 ce val: 1.9582 - acc val: 0.3329
Epoch 3, 100% 	 ce: 2.0273 - acc: 0.2470 - took: 3.80s
Epoch 3, 11% 	 ce val: 1.7122 - acc val: 0.3644
Epoch 4, 100% 	 ce: 1.9564 - acc: 0.2680 - took: 3.80s
Epoch 4, 11% 	 ce val: 1.5833 - acc val: 0.3723
Epoch 5, 100% 	 ce: 1.6209 - acc: 0.3031 - took: 3.80s
Epoch 5, 11% 	 ce val: 1.6004 - acc val: 0.4111
Epoch 6, 100% 	 ce: 1.7008 - acc: 0.3235 - took: 3.81s
Epoch 6, 11% 	 ce val: 1.8365 - acc val: 0.4364
Epoch 7, 100% 	 ce: 1.5427 - acc: 0.3421 - took: 3.82s
Epoch 7, 11% 	 ce val: 1.6746 - acc val: 0.4451
Epoch 8, 100% 	 ce: 1.5615 - 

  0%|          | 0/9 [00:00<?, ?it/s]

filenames folds: 1:  ['57320-0-0-15.wav' '105415-2-0-15.wav' '139951-9-0-13.wav'
 '106905-8-0-2.wav' '102842-3-0-1.wav']


 11%|█         | 1/9 [00:01<00:09,  1.23s/it]

filenames folds: 3:  ['195451-5-0-8.wav' '33696-3-4-0.wav' '62837-7-1-18.wav'
 '22601-8-0-44.wav' '65750-3-0-5.wav']


 22%|██▏       | 2/9 [00:02<00:08,  1.24s/it]

filenames folds: 4:  ['185909-2-0-86.wav' '169466-4-2-18.wav' '128160-5-0-12.wav'
 '121888-3-0-0.wav' '159751-8-0-14.wav']


 33%|███▎      | 3/9 [00:03<00:07,  1.22s/it]

filenames folds: 5:  ['17578-5-0-9.wav' '17578-5-0-22.wav' '31150-2-0-1.wav' '34872-3-0-1.wav'
 '121286-0-0-5.wav']


 44%|████▍     | 4/9 [00:04<00:05,  1.20s/it]

filenames folds: 6:  ['132021-7-0-3.wav' '46299-2-0-36.wav' '63724-0-0-12.wav'
 '135544-6-19-0.wav' '34952-8-0-3.wav']


 56%|█████▌    | 5/9 [00:05<00:04,  1.19s/it]

filenames folds: 7:  ['83488-1-1-0.wav' '21683-9-0-30.wav' '201988-5-0-20.wav'
 '177537-7-0-20.wav' '209992-5-2-91.wav']


 67%|██████▋   | 6/9 [00:07<00:03,  1.19s/it]

filenames folds: 8:  ['113216-5-0-0.wav' '194733-9-0-14.wav' '52633-3-0-1.wav'
 '171243-9-0-11.wav' '161129-4-0-13.wav']


 78%|███████▊  | 7/9 [00:08<00:02,  1.17s/it]

filenames folds: 9:  ['105029-7-0-3.wav' '103249-5-0-13.wav' '188823-7-0-0.wav'
 '180029-4-8-0.wav' '180156-1-2-0.wav']


 89%|████████▉ | 8/9 [00:09<00:01,  1.19s/it]

filenames folds: 10:  ['188813-7-5-0.wav' '115241-9-0-9.wav' '93567-8-0-17.wav'
 '155280-2-0-6.wav' '203424-9-0-15.wav']


100%|██████████| 9/9 [00:10<00:00,  1.22s/it]
  0%|          | 0/1 [00:00<?, ?it/s]

filenames folds: 2:  ['96920-9-0-3.wav' '27349-3-0-2.wav' '34621-4-26-0.wav'
 '152908-5-0-11.wav' '158597-2-0-3.wav']


100%|██████████| 1/1 [00:01<00:00,  1.28s/it]


new scaled resolution:  (78, 210)
More conv layer must be created
new conv layers:
inputs:  [1, 32, 64, 64, 64, 80]
ouputs:  [32, 64, 64, 64, 80, 96]
More dense layer must be created
new dense layers:
inputs:  [288, 149]
ouputs:  [149, 10]
288 149

Epoch 1, 100% 	 ce: 2.2182 - acc: 0.1511 - took: 49.82s
Epoch 1, 11% 	 ce val: 2.1195 - acc val: 0.3369
Epoch 2, 100% 	 ce: 2.1665 - acc: 0.2116 - took: 3.90s
Epoch 2, 11% 	 ce val: 2.0105 - acc val: 0.3335
Epoch 3, 100% 	 ce: 1.7997 - acc: 0.2513 - took: 3.82s
Epoch 3, 11% 	 ce val: 1.7562 - acc val: 0.3476
Epoch 4, 100% 	 ce: 1.8043 - acc: 0.2803 - took: 3.81s
Epoch 4, 11% 	 ce val: 1.7213 - acc val: 0.3614
Epoch 5, 100% 	 ce: 1.6321 - acc: 0.3102 - took: 3.84s
Epoch 5, 11% 	 ce val: 1.6232 - acc val: 0.3487
Epoch 6, 100% 	 ce: 1.7419 - acc: 0.3269 - took: 3.82s
Epoch 6, 11% 	 ce val: 1.6762 - acc val: 0.3595
Epoch 7, 100% 	 ce: 1.5236 - acc: 0.3362 - took: 3.82s
Epoch 7, 11% 	 ce val: 1.5117 - acc val: 0.3854
Epoch 8, 100% 	 ce: 1.9999 - 

  0%|          | 0/9 [00:00<?, ?it/s]

filenames folds: 1:  ['57320-0-0-15.wav' '105415-2-0-15.wav' '139951-9-0-13.wav'
 '106905-8-0-2.wav' '102842-3-0-1.wav']


 11%|█         | 1/9 [00:01<00:09,  1.24s/it]

filenames folds: 2:  ['96920-9-0-3.wav' '27349-3-0-2.wav' '34621-4-26-0.wav'
 '152908-5-0-11.wav' '158597-2-0-3.wav']


 22%|██▏       | 2/9 [00:02<00:08,  1.25s/it]

filenames folds: 4:  ['185909-2-0-86.wav' '169466-4-2-18.wav' '128160-5-0-12.wav'
 '121888-3-0-0.wav' '159751-8-0-14.wav']


 33%|███▎      | 3/9 [00:03<00:07,  1.32s/it]

filenames folds: 5:  ['17578-5-0-9.wav' '17578-5-0-22.wav' '31150-2-0-1.wav' '34872-3-0-1.wav'
 '121286-0-0-5.wav']


 44%|████▍     | 4/9 [00:05<00:06,  1.33s/it]

filenames folds: 6:  ['132021-7-0-3.wav' '46299-2-0-36.wav' '63724-0-0-12.wav'
 '135544-6-19-0.wav' '34952-8-0-3.wav']


 56%|█████▌    | 5/9 [00:06<00:05,  1.31s/it]

filenames folds: 7:  ['83488-1-1-0.wav' '21683-9-0-30.wav' '201988-5-0-20.wav'
 '177537-7-0-20.wav' '209992-5-2-91.wav']


 67%|██████▋   | 6/9 [00:07<00:03,  1.24s/it]

filenames folds: 8:  ['113216-5-0-0.wav' '194733-9-0-14.wav' '52633-3-0-1.wav'
 '171243-9-0-11.wav' '161129-4-0-13.wav']


 78%|███████▊  | 7/9 [00:08<00:02,  1.22s/it]

filenames folds: 9:  ['105029-7-0-3.wav' '103249-5-0-13.wav' '188823-7-0-0.wav'
 '180029-4-8-0.wav' '180156-1-2-0.wav']


 89%|████████▉ | 8/9 [00:10<00:01,  1.24s/it]

filenames folds: 10:  ['188813-7-5-0.wav' '115241-9-0-9.wav' '93567-8-0-17.wav'
 '155280-2-0-6.wav' '203424-9-0-15.wav']


100%|██████████| 9/9 [00:11<00:00,  1.31s/it]
  0%|          | 0/1 [00:00<?, ?it/s]

filenames folds: 3:  ['195451-5-0-8.wav' '33696-3-4-0.wav' '62837-7-1-18.wav'
 '22601-8-0-44.wav' '65750-3-0-5.wav']


100%|██████████| 1/1 [00:01<00:00,  1.38s/it]


new scaled resolution:  (78, 210)
More conv layer must be created
new conv layers:
inputs:  [1, 32, 64, 64, 64, 80]
ouputs:  [32, 64, 64, 64, 80, 96]
More dense layer must be created
new dense layers:
inputs:  [288, 149]
ouputs:  [149, 10]
288 149

Epoch 1, 100% 	 ce: 2.2552 - acc: 0.1435 - took: 50.56s
Epoch 1, 12% 	 ce val: 2.1312 - acc val: 0.2595
Epoch 2, 100% 	 ce: 2.1527 - acc: 0.2102 - took: 3.86s
Epoch 2, 12% 	 ce val: 2.0399 - acc val: 0.2407
Epoch 3, 100% 	 ce: 2.0905 - acc: 0.2568 - took: 3.80s
Epoch 3, 12% 	 ce val: 1.9799 - acc val: 0.3103
Epoch 4, 100% 	 ce: 1.8646 - acc: 0.2772 - took: 3.80s
Epoch 4, 12% 	 ce val: 2.0129 - acc val: 0.2667
Epoch 5, 100% 	 ce: 1.5658 - acc: 0.3023 - took: 3.89s
Epoch 5, 12% 	 ce val: 1.8147 - acc val: 0.2932
Epoch 6, 100% 	 ce: 1.6884 - acc: 0.3321 - took: 3.86s
Epoch 6, 12% 	 ce val: 1.8672 - acc val: 0.3061
Epoch 7, 100% 	 ce: 1.6568 - acc: 0.3320 - took: 3.85s
Epoch 7, 12% 	 ce val: 1.8222 - acc val: 0.3128
Epoch 8, 100% 	 ce: 1.6038 - 

  0%|          | 0/9 [00:00<?, ?it/s]

filenames folds: 1:  ['57320-0-0-15.wav' '105415-2-0-15.wav' '139951-9-0-13.wav'
 '106905-8-0-2.wav' '102842-3-0-1.wav']


 11%|█         | 1/9 [00:01<00:11,  1.40s/it]

filenames folds: 2:  ['96920-9-0-3.wav' '27349-3-0-2.wav' '34621-4-26-0.wav'
 '152908-5-0-11.wav' '158597-2-0-3.wav']


 22%|██▏       | 2/9 [00:02<00:09,  1.42s/it]

filenames folds: 3:  ['195451-5-0-8.wav' '33696-3-4-0.wav' '62837-7-1-18.wav'
 '22601-8-0-44.wav' '65750-3-0-5.wav']


 33%|███▎      | 3/9 [00:04<00:08,  1.45s/it]

filenames folds: 5:  ['17578-5-0-9.wav' '17578-5-0-22.wav' '31150-2-0-1.wav' '34872-3-0-1.wav'
 '121286-0-0-5.wav']


 44%|████▍     | 4/9 [00:05<00:07,  1.49s/it]

filenames folds: 6:  ['132021-7-0-3.wav' '46299-2-0-36.wav' '63724-0-0-12.wav'
 '135544-6-19-0.wav' '34952-8-0-3.wav']


 56%|█████▌    | 5/9 [00:07<00:05,  1.45s/it]

filenames folds: 7:  ['83488-1-1-0.wav' '21683-9-0-30.wav' '201988-5-0-20.wav'
 '177537-7-0-20.wav' '209992-5-2-91.wav']


 67%|██████▋   | 6/9 [00:08<00:04,  1.46s/it]

filenames folds: 8:  ['113216-5-0-0.wav' '194733-9-0-14.wav' '52633-3-0-1.wav'
 '171243-9-0-11.wav' '161129-4-0-13.wav']


 78%|███████▊  | 7/9 [00:10<00:02,  1.40s/it]

filenames folds: 9:  ['105029-7-0-3.wav' '103249-5-0-13.wav' '188823-7-0-0.wav'
 '180029-4-8-0.wav' '180156-1-2-0.wav']


 89%|████████▉ | 8/9 [00:11<00:01,  1.36s/it]

filenames folds: 10:  ['188813-7-5-0.wav' '115241-9-0-9.wav' '93567-8-0-17.wav'
 '155280-2-0-6.wav' '203424-9-0-15.wav']


100%|██████████| 9/9 [00:12<00:00,  1.45s/it]
  0%|          | 0/1 [00:00<?, ?it/s]

filenames folds: 4:  ['185909-2-0-86.wav' '169466-4-2-18.wav' '128160-5-0-12.wav'
 '121888-3-0-0.wav' '159751-8-0-14.wav']


100%|██████████| 1/1 [00:01<00:00,  1.69s/it]


new scaled resolution:  (78, 210)
More conv layer must be created
new conv layers:
inputs:  [1, 32, 64, 64, 64, 80]
ouputs:  [32, 64, 64, 64, 80, 96]
More dense layer must be created
new dense layers:
inputs:  [288, 149]
ouputs:  [149, 10]
288 149

Epoch 1, 100% 	 ce: 2.2154 - acc: 0.1390 - took: 49.22s
Epoch 1, 13% 	 ce val: 2.2662 - acc val: 0.2039
Epoch 2, 100% 	 ce: 2.0415 - acc: 0.2033 - took: 3.87s
Epoch 2, 13% 	 ce val: 2.1318 - acc val: 0.2215
Epoch 3, 100% 	 ce: 1.9990 - acc: 0.2493 - took: 3.76s
Epoch 3, 13% 	 ce val: 2.0173 - acc val: 0.2531
Epoch 4, 100% 	 ce: 1.8184 - acc: 0.2835 - took: 3.79s
Epoch 4, 13% 	 ce val: 1.8752 - acc val: 0.2778
Epoch 5, 100% 	 ce: 1.8179 - acc: 0.3131 - took: 3.77s
Epoch 5, 13% 	 ce val: 1.7362 - acc val: 0.3601
Epoch 6, 100% 	 ce: 1.5206 - acc: 0.3401 - took: 3.77s
Epoch 6, 13% 	 ce val: 1.8268 - acc val: 0.3391
Epoch 7, 100% 	 ce: 1.7289 - acc: 0.3476 - took: 3.79s
Epoch 7, 13% 	 ce val: 1.6443 - acc val: 0.2798
Epoch 8, 100% 	 ce: 1.7445 - 

  0%|          | 0/9 [00:00<?, ?it/s]

filenames folds: 1:  ['57320-0-0-15.wav' '105415-2-0-15.wav' '139951-9-0-13.wav'
 '106905-8-0-2.wav' '102842-3-0-1.wav']


 11%|█         | 1/9 [00:01<00:11,  1.40s/it]

filenames folds: 2:  ['96920-9-0-3.wav' '27349-3-0-2.wav' '34621-4-26-0.wav'
 '152908-5-0-11.wav' '158597-2-0-3.wav']


 22%|██▏       | 2/9 [00:02<00:09,  1.39s/it]

filenames folds: 3:  ['195451-5-0-8.wav' '33696-3-4-0.wav' '62837-7-1-18.wav'
 '22601-8-0-44.wav' '65750-3-0-5.wav']


 33%|███▎      | 3/9 [00:04<00:08,  1.40s/it]

filenames folds: 4:  ['185909-2-0-86.wav' '169466-4-2-18.wav' '128160-5-0-12.wav'
 '121888-3-0-0.wav' '159751-8-0-14.wav']


 44%|████▍     | 4/9 [00:05<00:07,  1.45s/it]

filenames folds: 6:  ['132021-7-0-3.wav' '46299-2-0-36.wav' '63724-0-0-12.wav'
 '135544-6-19-0.wav' '34952-8-0-3.wav']


 56%|█████▌    | 5/9 [00:06<00:05,  1.38s/it]

filenames folds: 7:  ['83488-1-1-0.wav' '21683-9-0-30.wav' '201988-5-0-20.wav'
 '177537-7-0-20.wav' '209992-5-2-91.wav']


 67%|██████▋   | 6/9 [00:08<00:04,  1.37s/it]

filenames folds: 8:  ['113216-5-0-0.wav' '194733-9-0-14.wav' '52633-3-0-1.wav'
 '171243-9-0-11.wav' '161129-4-0-13.wav']


 78%|███████▊  | 7/9 [00:09<00:02,  1.33s/it]

filenames folds: 9:  ['105029-7-0-3.wav' '103249-5-0-13.wav' '188823-7-0-0.wav'
 '180029-4-8-0.wav' '180156-1-2-0.wav']


 89%|████████▉ | 8/9 [00:10<00:01,  1.36s/it]

filenames folds: 10:  ['188813-7-5-0.wav' '115241-9-0-9.wav' '93567-8-0-17.wav'
 '155280-2-0-6.wav' '203424-9-0-15.wav']


100%|██████████| 9/9 [00:12<00:00,  1.38s/it]
  0%|          | 0/1 [00:00<?, ?it/s]

filenames folds: 5:  ['17578-5-0-9.wav' '17578-5-0-22.wav' '31150-2-0-1.wav' '34872-3-0-1.wav'
 '121286-0-0-5.wav']


100%|██████████| 1/1 [00:01<00:00,  1.44s/it]


new scaled resolution:  (78, 210)
More conv layer must be created
new conv layers:
inputs:  [1, 32, 64, 64, 64, 80]
ouputs:  [32, 64, 64, 64, 80, 96]
More dense layer must be created
new dense layers:
inputs:  [288, 149]
ouputs:  [149, 10]
288 149

Epoch 1, 100% 	 ce: 2.2029 - acc: 0.1407 - took: 49.97s
Epoch 1, 12% 	 ce val: 2.2139 - acc val: 0.2498
Epoch 2, 100% 	 ce: 2.1122 - acc: 0.2058 - took: 3.90s
Epoch 2, 12% 	 ce val: 2.1316 - acc val: 0.2892
Epoch 3, 100% 	 ce: 1.9802 - acc: 0.2456 - took: 3.80s
Epoch 3, 12% 	 ce val: 1.9347 - acc val: 0.2992
Epoch 4, 100% 	 ce: 1.9464 - acc: 0.2694 - took: 3.81s
Epoch 4, 12% 	 ce val: 1.7880 - acc val: 0.2900
Epoch 5, 100% 	 ce: 1.7564 - acc: 0.2893 - took: 3.81s
Epoch 5, 12% 	 ce val: 1.8943 - acc val: 0.3554
Epoch 6, 100% 	 ce: 1.8151 - acc: 0.3031 - took: 3.82s
Epoch 6, 12% 	 ce val: 1.7588 - acc val: 0.3269
Epoch 7, 100% 	 ce: 1.6123 - acc: 0.3228 - took: 3.81s
Epoch 7, 12% 	 ce val: 1.5617 - acc val: 0.3842
Epoch 8, 100% 	 ce: 1.6275 - 

  0%|          | 0/9 [00:00<?, ?it/s]

filenames folds: 1:  ['57320-0-0-15.wav' '105415-2-0-15.wav' '139951-9-0-13.wav'
 '106905-8-0-2.wav' '102842-3-0-1.wav']


 11%|█         | 1/9 [00:01<00:10,  1.33s/it]

filenames folds: 2:  ['96920-9-0-3.wav' '27349-3-0-2.wav' '34621-4-26-0.wav'
 '152908-5-0-11.wav' '158597-2-0-3.wav']


 22%|██▏       | 2/9 [00:02<00:09,  1.36s/it]

filenames folds: 3:  ['195451-5-0-8.wav' '33696-3-4-0.wav' '62837-7-1-18.wav'
 '22601-8-0-44.wav' '65750-3-0-5.wav']


 33%|███▎      | 3/9 [00:04<00:08,  1.40s/it]

filenames folds: 4:  ['185909-2-0-86.wav' '169466-4-2-18.wav' '128160-5-0-12.wav'
 '121888-3-0-0.wav' '159751-8-0-14.wav']


 44%|████▍     | 4/9 [00:05<00:06,  1.39s/it]

filenames folds: 5:  ['17578-5-0-9.wav' '17578-5-0-22.wav' '31150-2-0-1.wav' '34872-3-0-1.wav'
 '121286-0-0-5.wav']


 56%|█████▌    | 5/9 [00:07<00:05,  1.40s/it]

filenames folds: 7:  ['83488-1-1-0.wav' '21683-9-0-30.wav' '201988-5-0-20.wav'
 '177537-7-0-20.wav' '209992-5-2-91.wav']


 67%|██████▋   | 6/9 [00:08<00:04,  1.36s/it]

filenames folds: 8:  ['113216-5-0-0.wav' '194733-9-0-14.wav' '52633-3-0-1.wav'
 '171243-9-0-11.wav' '161129-4-0-13.wav']


 78%|███████▊  | 7/9 [00:09<00:02,  1.33s/it]

filenames folds: 9:  ['105029-7-0-3.wav' '103249-5-0-13.wav' '188823-7-0-0.wav'
 '180029-4-8-0.wav' '180156-1-2-0.wav']


 89%|████████▉ | 8/9 [00:11<00:01,  1.39s/it]

filenames folds: 10:  ['188813-7-5-0.wav' '115241-9-0-9.wav' '93567-8-0-17.wav'
 '155280-2-0-6.wav' '203424-9-0-15.wav']


100%|██████████| 9/9 [00:12<00:00,  1.46s/it]
  0%|          | 0/1 [00:00<?, ?it/s]

filenames folds: 6:  ['132021-7-0-3.wav' '46299-2-0-36.wav' '63724-0-0-12.wav'
 '135544-6-19-0.wav' '34952-8-0-3.wav']


100%|██████████| 1/1 [00:01<00:00,  1.33s/it]


new scaled resolution:  (78, 210)
More conv layer must be created
new conv layers:
inputs:  [1, 32, 64, 64, 64, 80]
ouputs:  [32, 64, 64, 64, 80, 96]
More dense layer must be created
new dense layers:
inputs:  [288, 149]
ouputs:  [149, 10]
288 149

Epoch 1, 100% 	 ce: 2.0663 - acc: 0.1398 - took: 52.35s
Epoch 1, 10% 	 ce val: 2.1790 - acc val: 0.2157
Epoch 2, 100% 	 ce: 1.9807 - acc: 0.2088 - took: 3.93s
Epoch 2, 10% 	 ce val: 2.0635 - acc val: 0.2167
Epoch 3, 100% 	 ce: 1.9231 - acc: 0.2441 - took: 3.87s
Epoch 3, 10% 	 ce val: 2.0031 - acc val: 0.2398
Epoch 4, 100% 	 ce: 1.8327 - acc: 0.2798 - took: 3.86s
Epoch 4, 10% 	 ce val: 1.7781 - acc val: 0.3094
Epoch 5, 100% 	 ce: 1.7116 - acc: 0.3052 - took: 3.86s
Epoch 5, 10% 	 ce val: 1.7875 - acc val: 0.3056
Epoch 6, 100% 	 ce: 1.8995 - acc: 0.3318 - took: 3.93s
Epoch 6, 10% 	 ce val: 1.8932 - acc val: 0.3007
Epoch 7, 100% 	 ce: 1.6124 - acc: 0.3369 - took: 3.97s
Epoch 7, 10% 	 ce val: 1.9303 - acc val: 0.2832
Epoch 8, 100% 	 ce: 1.6980 - 

  0%|          | 0/9 [00:00<?, ?it/s]

filenames folds: 1:  ['57320-0-0-15.wav' '105415-2-0-15.wav' '139951-9-0-13.wav'
 '106905-8-0-2.wav' '102842-3-0-1.wav']


 11%|█         | 1/9 [00:01<00:09,  1.15s/it]

filenames folds: 2:  ['96920-9-0-3.wav' '27349-3-0-2.wav' '34621-4-26-0.wav'
 '152908-5-0-11.wav' '158597-2-0-3.wav']


 22%|██▏       | 2/9 [00:02<00:08,  1.18s/it]

filenames folds: 3:  ['195451-5-0-8.wav' '33696-3-4-0.wav' '62837-7-1-18.wav'
 '22601-8-0-44.wav' '65750-3-0-5.wav']


 33%|███▎      | 3/9 [00:03<00:07,  1.25s/it]

filenames folds: 4:  ['185909-2-0-86.wav' '169466-4-2-18.wav' '128160-5-0-12.wav'
 '121888-3-0-0.wav' '159751-8-0-14.wav']


 44%|████▍     | 4/9 [00:05<00:06,  1.25s/it]

filenames folds: 5:  ['17578-5-0-9.wav' '17578-5-0-22.wav' '31150-2-0-1.wav' '34872-3-0-1.wav'
 '121286-0-0-5.wav']


 56%|█████▌    | 5/9 [00:06<00:05,  1.30s/it]

filenames folds: 6:  ['132021-7-0-3.wav' '46299-2-0-36.wav' '63724-0-0-12.wav'
 '135544-6-19-0.wav' '34952-8-0-3.wav']


 67%|██████▋   | 6/9 [00:07<00:03,  1.29s/it]

filenames folds: 8:  ['113216-5-0-0.wav' '194733-9-0-14.wav' '52633-3-0-1.wav'
 '171243-9-0-11.wav' '161129-4-0-13.wav']


 78%|███████▊  | 7/9 [00:09<00:02,  1.33s/it]

filenames folds: 9:  ['105029-7-0-3.wav' '103249-5-0-13.wav' '188823-7-0-0.wav'
 '180029-4-8-0.wav' '180156-1-2-0.wav']


 89%|████████▉ | 8/9 [00:10<00:01,  1.39s/it]

filenames folds: 10:  ['188813-7-5-0.wav' '115241-9-0-9.wav' '93567-8-0-17.wav'
 '155280-2-0-6.wav' '203424-9-0-15.wav']


100%|██████████| 9/9 [00:12<00:00,  1.47s/it]
  0%|          | 0/1 [00:00<?, ?it/s]

filenames folds: 7:  ['83488-1-1-0.wav' '21683-9-0-30.wav' '201988-5-0-20.wav'
 '177537-7-0-20.wav' '209992-5-2-91.wav']


100%|██████████| 1/1 [00:01<00:00,  1.28s/it]


new scaled resolution:  (78, 210)
More conv layer must be created
new conv layers:
inputs:  [1, 32, 64, 64, 64, 80]
ouputs:  [32, 64, 64, 64, 80, 96]
More dense layer must be created
new dense layers:
inputs:  [288, 149]
ouputs:  [149, 10]
288 149

Epoch 1, 100% 	 ce: 2.1358 - acc: 0.1464 - took: 51.58s
Epoch 1, 11% 	 ce val: 2.1671 - acc val: 0.2556
Epoch 2, 100% 	 ce: 2.2218 - acc: 0.2047 - took: 3.90s
Epoch 2, 11% 	 ce val: 2.0240 - acc val: 0.2537
Epoch 3, 100% 	 ce: 1.9159 - acc: 0.2402 - took: 3.88s
Epoch 3, 11% 	 ce val: 2.0562 - acc val: 0.3263
Epoch 4, 100% 	 ce: 1.9544 - acc: 0.2722 - took: 3.92s
Epoch 4, 11% 	 ce val: 1.3780 - acc val: 0.3683
Epoch 5, 100% 	 ce: 1.6531 - acc: 0.3008 - took: 3.89s
Epoch 5, 11% 	 ce val: 2.1179 - acc val: 0.3367
Epoch 6, 100% 	 ce: 2.0344 - acc: 0.3257 - took: 3.86s
Epoch 6, 11% 	 ce val: 1.0652 - acc val: 0.3371
Epoch 7, 100% 	 ce: 1.6552 - acc: 0.3444 - took: 3.88s
Epoch 7, 11% 	 ce val: 1.4564 - acc val: 0.3374
Epoch 8, 100% 	 ce: 1.9572 - 

  0%|          | 0/9 [00:00<?, ?it/s]

filenames folds: 1:  ['57320-0-0-15.wav' '105415-2-0-15.wav' '139951-9-0-13.wav'
 '106905-8-0-2.wav' '102842-3-0-1.wav']


 11%|█         | 1/9 [00:01<00:10,  1.28s/it]

filenames folds: 2:  ['96920-9-0-3.wav' '27349-3-0-2.wav' '34621-4-26-0.wav'
 '152908-5-0-11.wav' '158597-2-0-3.wav']


 22%|██▏       | 2/9 [00:02<00:09,  1.37s/it]

filenames folds: 3:  ['195451-5-0-8.wav' '33696-3-4-0.wav' '62837-7-1-18.wav'
 '22601-8-0-44.wav' '65750-3-0-5.wav']


 33%|███▎      | 3/9 [00:04<00:08,  1.38s/it]

filenames folds: 4:  ['185909-2-0-86.wav' '169466-4-2-18.wav' '128160-5-0-12.wav'
 '121888-3-0-0.wav' '159751-8-0-14.wav']


 44%|████▍     | 4/9 [00:05<00:07,  1.45s/it]

filenames folds: 5:  ['17578-5-0-9.wav' '17578-5-0-22.wav' '31150-2-0-1.wav' '34872-3-0-1.wav'
 '121286-0-0-5.wav']


 56%|█████▌    | 5/9 [00:07<00:05,  1.43s/it]

filenames folds: 6:  ['132021-7-0-3.wav' '46299-2-0-36.wav' '63724-0-0-12.wav'
 '135544-6-19-0.wav' '34952-8-0-3.wav']


 67%|██████▋   | 6/9 [00:08<00:04,  1.40s/it]

filenames folds: 7:  ['83488-1-1-0.wav' '21683-9-0-30.wav' '201988-5-0-20.wav'
 '177537-7-0-20.wav' '209992-5-2-91.wav']


 78%|███████▊  | 7/9 [00:10<00:02,  1.43s/it]

filenames folds: 9:  ['105029-7-0-3.wav' '103249-5-0-13.wav' '188823-7-0-0.wav'
 '180029-4-8-0.wav' '180156-1-2-0.wav']


 89%|████████▉ | 8/9 [00:11<00:01,  1.49s/it]

filenames folds: 10:  ['188813-7-5-0.wav' '115241-9-0-9.wav' '93567-8-0-17.wav'
 '155280-2-0-6.wav' '203424-9-0-15.wav']


100%|██████████| 9/9 [00:13<00:00,  1.54s/it]
  0%|          | 0/1 [00:00<?, ?it/s]

filenames folds: 8:  ['113216-5-0-0.wav' '194733-9-0-14.wav' '52633-3-0-1.wav'
 '171243-9-0-11.wav' '161129-4-0-13.wav']


100%|██████████| 1/1 [00:01<00:00,  1.27s/it]


new scaled resolution:  (78, 210)
More conv layer must be created
new conv layers:
inputs:  [1, 32, 64, 64, 64, 80]
ouputs:  [32, 64, 64, 64, 80, 96]
More dense layer must be created
new dense layers:
inputs:  [288, 149]
ouputs:  [149, 10]
288 149

Epoch 1, 100% 	 ce: 2.0991 - acc: 0.1435 - took: 51.57s
Epoch 1, 10% 	 ce val: 2.1953 - acc val: 0.2081
Epoch 2, 100% 	 ce: 2.2600 - acc: 0.2045 - took: 3.96s
Epoch 2, 10% 	 ce val: 2.1128 - acc val: 0.2076
Epoch 3, 100% 	 ce: 1.8512 - acc: 0.2517 - took: 3.87s
Epoch 3, 10% 	 ce val: 1.9978 - acc val: 0.2863
Epoch 4, 100% 	 ce: 1.8602 - acc: 0.2723 - took: 3.87s
Epoch 4, 10% 	 ce val: 1.6836 - acc val: 0.3107
Epoch 5, 100% 	 ce: 1.7686 - acc: 0.2937 - took: 3.88s
Epoch 5, 10% 	 ce val: 1.9484 - acc val: 0.3348
Epoch 6, 100% 	 ce: 1.6964 - acc: 0.3101 - took: 3.90s
Epoch 6, 10% 	 ce val: 1.6816 - acc val: 0.3456
Epoch 7, 100% 	 ce: 1.6084 - acc: 0.3336 - took: 3.88s
Epoch 7, 10% 	 ce val: 1.6786 - acc val: 0.3864
Epoch 8, 100% 	 ce: 1.5235 - 

  0%|          | 0/9 [00:00<?, ?it/s]

filenames folds: 1:  ['57320-0-0-15.wav' '105415-2-0-15.wav' '139951-9-0-13.wav'
 '106905-8-0-2.wav' '102842-3-0-1.wav']


 11%|█         | 1/9 [00:01<00:11,  1.41s/it]

filenames folds: 2:  ['96920-9-0-3.wav' '27349-3-0-2.wav' '34621-4-26-0.wav'
 '152908-5-0-11.wav' '158597-2-0-3.wav']


 22%|██▏       | 2/9 [00:03<00:10,  1.47s/it]

filenames folds: 3:  ['195451-5-0-8.wav' '33696-3-4-0.wav' '62837-7-1-18.wav'
 '22601-8-0-44.wav' '65750-3-0-5.wav']


 33%|███▎      | 3/9 [00:04<00:09,  1.51s/it]

filenames folds: 4:  ['185909-2-0-86.wav' '169466-4-2-18.wav' '128160-5-0-12.wav'
 '121888-3-0-0.wav' '159751-8-0-14.wav']


 44%|████▍     | 4/9 [00:06<00:07,  1.57s/it]

filenames folds: 5:  ['17578-5-0-9.wav' '17578-5-0-22.wav' '31150-2-0-1.wav' '34872-3-0-1.wav'
 '121286-0-0-5.wav']


 56%|█████▌    | 5/9 [00:07<00:06,  1.57s/it]

filenames folds: 6:  ['132021-7-0-3.wav' '46299-2-0-36.wav' '63724-0-0-12.wav'
 '135544-6-19-0.wav' '34952-8-0-3.wav']


 67%|██████▋   | 6/9 [00:09<00:04,  1.53s/it]

filenames folds: 7:  ['83488-1-1-0.wav' '21683-9-0-30.wav' '201988-5-0-20.wav'
 '177537-7-0-20.wav' '209992-5-2-91.wav']


 78%|███████▊  | 7/9 [00:10<00:03,  1.53s/it]

filenames folds: 8:  ['113216-5-0-0.wav' '194733-9-0-14.wav' '52633-3-0-1.wav'
 '171243-9-0-11.wav' '161129-4-0-13.wav']


 89%|████████▉ | 8/9 [00:12<00:01,  1.53s/it]

filenames folds: 10:  ['188813-7-5-0.wav' '115241-9-0-9.wav' '93567-8-0-17.wav'
 '155280-2-0-6.wav' '203424-9-0-15.wav']


100%|██████████| 9/9 [00:14<00:00,  1.59s/it]
  0%|          | 0/1 [00:00<?, ?it/s]

filenames folds: 9:  ['105029-7-0-3.wav' '103249-5-0-13.wav' '188823-7-0-0.wav'
 '180029-4-8-0.wav' '180156-1-2-0.wav']


100%|██████████| 1/1 [00:01<00:00,  1.36s/it]


new scaled resolution:  (78, 210)
More conv layer must be created
new conv layers:
inputs:  [1, 32, 64, 64, 64, 80]
ouputs:  [32, 64, 64, 64, 80, 96]
More dense layer must be created
new dense layers:
inputs:  [288, 149]
ouputs:  [149, 10]
288 149

Epoch 1, 100% 	 ce: 2.1757 - acc: 0.1490 - took: 51.51s
Epoch 1, 10% 	 ce val: 2.1461 - acc val: 0.2167
Epoch 2, 100% 	 ce: 1.9652 - acc: 0.1996 - took: 3.93s
Epoch 2, 10% 	 ce val: 2.0034 - acc val: 0.3173
Epoch 3, 100% 	 ce: 1.8727 - acc: 0.2378 - took: 3.95s
Epoch 3, 10% 	 ce val: 1.9334 - acc val: 0.3690
Epoch 4, 100% 	 ce: 1.7735 - acc: 0.2709 - took: 3.87s
Epoch 4, 10% 	 ce val: 1.8906 - acc val: 0.3810
Epoch 5, 100% 	 ce: 1.7954 - acc: 0.2971 - took: 3.87s
Epoch 5, 10% 	 ce val: 1.7043 - acc val: 0.4095
Epoch 6, 100% 	 ce: 1.8739 - acc: 0.3176 - took: 3.87s
Epoch 6, 10% 	 ce val: 1.5443 - acc val: 0.4167
Epoch 7, 100% 	 ce: 1.7876 - acc: 0.3331 - took: 3.89s
Epoch 7, 10% 	 ce val: 1.6092 - acc val: 0.4179
Epoch 8, 100% 	 ce: 1.7662 - 

  0%|          | 0/9 [00:00<?, ?it/s]

filenames folds: 1:  ['57320-0-0-15.wav' '105415-2-0-15.wav' '139951-9-0-13.wav'
 '106905-8-0-2.wav' '102842-3-0-1.wav']


 11%|█         | 1/9 [00:01<00:11,  1.43s/it]

filenames folds: 2:  ['96920-9-0-3.wav' '27349-3-0-2.wav' '34621-4-26-0.wav'
 '152908-5-0-11.wav' '158597-2-0-3.wav']


 22%|██▏       | 2/9 [00:03<00:10,  1.48s/it]

filenames folds: 3:  ['195451-5-0-8.wav' '33696-3-4-0.wav' '62837-7-1-18.wav'
 '22601-8-0-44.wav' '65750-3-0-5.wav']


 33%|███▎      | 3/9 [00:04<00:09,  1.52s/it]

filenames folds: 4:  ['185909-2-0-86.wav' '169466-4-2-18.wav' '128160-5-0-12.wav'
 '121888-3-0-0.wav' '159751-8-0-14.wav']


 44%|████▍     | 4/9 [00:06<00:07,  1.58s/it]

filenames folds: 5:  ['17578-5-0-9.wav' '17578-5-0-22.wav' '31150-2-0-1.wav' '34872-3-0-1.wav'
 '121286-0-0-5.wav']


 56%|█████▌    | 5/9 [00:07<00:06,  1.58s/it]

filenames folds: 6:  ['132021-7-0-3.wav' '46299-2-0-36.wav' '63724-0-0-12.wav'
 '135544-6-19-0.wav' '34952-8-0-3.wav']


 67%|██████▋   | 6/9 [00:09<00:04,  1.54s/it]

filenames folds: 7:  ['83488-1-1-0.wav' '21683-9-0-30.wav' '201988-5-0-20.wav'
 '177537-7-0-20.wav' '209992-5-2-91.wav']


 78%|███████▊  | 7/9 [00:10<00:03,  1.51s/it]

filenames folds: 8:  ['113216-5-0-0.wav' '194733-9-0-14.wav' '52633-3-0-1.wav'
 '171243-9-0-11.wav' '161129-4-0-13.wav']


 89%|████████▉ | 8/9 [00:12<00:01,  1.51s/it]

filenames folds: 9:  ['105029-7-0-3.wav' '103249-5-0-13.wav' '188823-7-0-0.wav'
 '180029-4-8-0.wav' '180156-1-2-0.wav']


100%|██████████| 9/9 [00:13<00:00,  1.52s/it]
  0%|          | 0/1 [00:00<?, ?it/s]

filenames folds: 10:  ['188813-7-5-0.wav' '115241-9-0-9.wav' '93567-8-0-17.wav'
 '155280-2-0-6.wav' '203424-9-0-15.wav']


100%|██████████| 1/1 [00:01<00:00,  1.32s/it]


new scaled resolution:  (78, 210)
More conv layer must be created
new conv layers:
inputs:  [1, 32, 64, 64, 64, 80]
ouputs:  [32, 64, 64, 64, 80, 96]
More dense layer must be created
new dense layers:
inputs:  [288, 149]
ouputs:  [149, 10]
288 149

Epoch 1, 100% 	 ce: 2.2465 - acc: 0.1327 - took: 50.73s
Epoch 1, 11% 	 ce val: 2.0629 - acc val: 0.2261
Epoch 2, 100% 	 ce: 2.0742 - acc: 0.1978 - took: 3.95s
Epoch 2, 11% 	 ce val: 2.0647 - acc val: 0.2241
Epoch 3, 100% 	 ce: 2.0388 - acc: 0.2415 - took: 3.84s
Epoch 3, 11% 	 ce val: 1.9282 - acc val: 0.2766
Epoch 4, 100% 	 ce: 1.8279 - acc: 0.2721 - took: 3.85s
Epoch 4, 11% 	 ce val: 1.6178 - acc val: 0.3321
Epoch 5, 100% 	 ce: 1.8715 - acc: 0.2889 - took: 3.85s
Epoch 5, 11% 	 ce val: 2.0345 - acc val: 0.3411
Epoch 6, 100% 	 ce: 1.4025 - acc: 0.3174 - took: 3.85s
Epoch 6, 11% 	 ce val: 1.5596 - acc val: 0.3279
Epoch 7, 100% 	 ce: 1.6439 - acc: 0.3382 - took: 3.85s
Epoch 7, 11% 	 ce val: 2.3276 - acc val: 0.3569
Epoch 8, 100% 	 ce: 1.6159 - 

# ♫♪.ılılıll|̲̅̅●̲̅̅|̲̅̅=̲̅̅|̲̅̅●̲̅̅|llılılı.♫♪