# DeepTurkish

This notebook is for training, evaluating and testing our DeepSpeech2 implementation.

In [1]:
import os

import utilities.utilities as utils
from model.data_loader import pad_collate_fn
from decoders import decoders
from train import main

wandb: Currently logged in as: raraz15 (use `wandb login --relogin` to force relogin)


## Dataset Directories

Choose a dataset, and the spectrogram type for training the network.

In [2]:
dataset_name = "Mozilla"
spectrogram_type = "dB"

spectrogram_dir, df_dir0, df_dir1, alphabet = utils.dataset_pointers(dataset_name, spectrogram_type)

Alphabet with length 32:,
['0', ' ', '.', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'r', 's', 't', 'u', 'v', 'y', 'z', 'ç', 'ö', 'ü', 'ğ', 'ı', 'ş']


## Parameters

In [3]:
project_name = "METUbet+Mozilla"

# leave an empty string for new model
model_dir = os.path.join("data","Models and Losses","colab","09_53__04_02_2021","09_53__04_02_2021.pt")
#model_dir = ""

hyperparameters = {
    "n_cnn_layers": 1,
    "n_rnn_layers": 1,
    "rnn_dim": 512,
    "n_class": len(alphabet),
    "N_fft": 512,
    "stride":2,
    "dropout": 0.1,
    "learning_rate": 5e-4,   
    "epochs": 20,
    'batch_size':16,
    'SortaGrad': True,
    'model_dir': model_dir
}


loader_parameters = {'batch_size': hyperparameters['batch_size'],
                    'shuffle': True,
                    'collate_fn': pad_collate_fn,          
                    'pin_memory': True, # Faster cuda transfer if multiprocess, create tensors directly in cuda if single process
                    'num_workers': 0 # increase if you hav enough gpu memory
                    }


data_parameters = {'dataframe_dir_train': df_dir0, # directory of the dataframes,
                   'dataframe_dir_test': df_dir1,
                   'train_dir': spectrogram_dir,
                   'test_dir': spectrogram_dir,
                   'alphabet': alphabet, 
                   'blank': 0, #idx of blank symbol
                   'batch_size': hyperparameters['batch_size'],
                   'split_ratio': 0.8,
                   'loader_parameters': loader_parameters
                   }

## Decoder

Choose a decoder for decoding the ctc output matrix.

In [4]:
# Argmax decoder
decoder = decoders.Argmax_decoder(alphabet, data_parameters['blank'])

In [None]:
# BeamSearch Decoder
LM_text_name="NN_datasets_sentences"
beam_width = 3
prune_threshold = -7 # log(0.001)

decoder = decoders.BeamSearch_decoder(alphabet, data_parameters['blank'], beam_width, prune_threshold, LM_text_name)

In [None]:
# LexiconSearch
tolerance = 1

# choose an apprroximator for the Lexicon Search algorithm
BW = 2
prune = -7 # = log(0.001)
LM_text_name="NN_datasets_sentences"

approximator_properties = ('BeamSearch+LM',data_parameters['blank'], BW, prune, LM_text_name)

decoder = decoders.LexiconSearch_decoder(alphabet, tolerance, LM_text_name, approximator_properties)

## Train, Validate and Test

The cell below will use the parameters to train the network and the chosen decoder to decode the outputs. After each training epoch it will do a validation run on the validation set and when the training is complete it will test the model on the test set.

In [5]:
main(hyperparameters, data_parameters, decoder, project_name)

wandb: wandb version 0.10.17 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade


Model Name: 01_03__05_02_2021

SpeechRecognitionModel(
  (cnn): Conv2d(1, 32, kernel_size=(5, 5), stride=(2, 2), padding=(1, 1))
  (rescnn_layers): Sequential(
    (0): ResidualCNN(
      (cnn1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (cnn2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (dropout1): Dropout(p=0.1, inplace=False)
      (dropout2): Dropout(p=0.1, inplace=False)
      (layer_norm1): CNNLayerNorm(
        (layer_norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
      )
      (layer_norm2): CNNLayerNorm(
        (layer_norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
      )
    )
  )
  (fully_connected): Linear(in_features=4096, out_features=512, bias=True)
  (birnn_layers): Sequential(
    (0): BidirectionalGRU(
      (BiGRU): GRU(512, 512, batch_first=True, bidirectional=True)
      (layer_norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=F

[Training Epoch: 6/20] Iteration: 3671/14680 (25%)	Loss: 0.618631
[Training Epoch: 6/20] Iteration: 3700/14680 (25%)	Loss: 0.811212
[Training Epoch: 6/20] Iteration: 3800/14680 (26%)	Loss: 0.700595
[Training Epoch: 6/20] Iteration: 3900/14680 (27%)	Loss: 0.806781
[Training Epoch: 6/20] Iteration: 4000/14680 (27%)	Loss: 0.785080
[Training Epoch: 6/20] Iteration: 4100/14680 (28%)	Loss: 0.827677
[Training Epoch: 6/20] Iteration: 4200/14680 (29%)	Loss: 0.870739
[Training Epoch: 6/20] Iteration: 4300/14680 (29%)	Loss: 0.721457
[Training Epoch: 6/20] Iteration: 4400/14680 (30%)	Loss: 0.735444
[Training Epoch: 6/20] Iteration: 4404/14680 (30%)	Loss: 0.672466

evaluating...
------------------------------------------------------------
Target: türkiye kaddafiye karşı görüşlerini daha güçlü bir şekilde dile getirmekte neden tereddüt ediyor.
Predicted: türki ya kardafiya karşı görüşlerine daha güçlü bişekillidile getirmekleneden teredidediyor.

Target: peki ya çocukların sadece benim değil yunan a

[Training Epoch: 12/20] Iteration: 8075/14680 (55%)	Loss: 0.210321
[Training Epoch: 12/20] Iteration: 8100/14680 (55%)	Loss: 0.136830
[Training Epoch: 12/20] Iteration: 8200/14680 (56%)	Loss: 0.144904
[Training Epoch: 12/20] Iteration: 8300/14680 (57%)	Loss: 0.099349
[Training Epoch: 12/20] Iteration: 8400/14680 (57%)	Loss: 0.148213
[Training Epoch: 12/20] Iteration: 8500/14680 (58%)	Loss: 0.143104
[Training Epoch: 12/20] Iteration: 8600/14680 (59%)	Loss: 0.205477
[Training Epoch: 12/20] Iteration: 8700/14680 (59%)	Loss: 0.130504
[Training Epoch: 12/20] Iteration: 8800/14680 (60%)	Loss: 0.109031
[Training Epoch: 12/20] Iteration: 8808/14680 (60%)	Loss: 0.224617

evaluating...
------------------------------------------------------------
Target: türkiye kaddafiye karşı görüşlerini daha güçlü bir şekilde dile getirmekte neden tereddüt ediyor.
Predicted: türki ya kadafiyakaşı görüşlerine daha güçü bür şekrlediri getirmekteneden teredi dediyor.

Target: peki ya çocukların sadece benim değil

[Training Epoch: 18/20] Iteration: 12479/14680 (85%)	Loss: 0.021990
[Training Epoch: 18/20] Iteration: 12500/14680 (85%)	Loss: 0.010517
[Training Epoch: 18/20] Iteration: 12600/14680 (86%)	Loss: 0.027731
[Training Epoch: 18/20] Iteration: 12700/14680 (87%)	Loss: 0.047660
[Training Epoch: 18/20] Iteration: 12800/14680 (87%)	Loss: 0.029682
[Training Epoch: 18/20] Iteration: 12900/14680 (88%)	Loss: 0.013577
[Training Epoch: 18/20] Iteration: 13000/14680 (89%)	Loss: 0.084761
[Training Epoch: 18/20] Iteration: 13100/14680 (89%)	Loss: 0.033156
[Training Epoch: 18/20] Iteration: 13200/14680 (90%)	Loss: 0.047475
[Training Epoch: 18/20] Iteration: 13212/14680 (90%)	Loss: 0.019578

evaluating...
------------------------------------------------------------
Target: türkiye kaddafiye karşı görüşlerini daha güçlü bir şekilde dile getirmekte neden tereddüt ediyor.
Predicted: türki ya kadafiya kaşı görüşlerine daha güçü bü şekirle dili getirmekteneden teredü d ediyor.

Target: peki ya çocukların sadec

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
_step,234880.0
_runtime,30791.0
_timestamp,1612506971.0
Training Loss,0.01547
test_avg_loss,1.46523
test_avg_cer,0.22112
test_avg_wer,0.75096
epoch,20.0


0,1
_step,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇████
_runtime,▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇████
_timestamp,▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇████
Training Loss,█▇▇▅▅▅▄▅▄▄▃▄▃▃▃▂▂▃▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
test_avg_loss,▁
test_avg_cer,▁
test_avg_wer,▁
epoch,▁
