# PART I: Running a SpeechBrain ASR Recipe

## You have to fill in appropriate code in 4 locations in the notebook.

* The four locations start with "####TASK". Read the task specifications mentioned there.
* The required function names are already available in this notebook somewhere. You're only required to find and use the appropriate ones.
* Check the SpeechBrain documentation to find out how to use those functions. Refer to the starting material for more resources.
* By the end of this part of the assignment, you should be comfortable running a Speechbrain recipe.
* **P.S.** Note that none of the four tasks require you to write more than 1 line of code.

### Setting up the codebase.

In [1]:
%%capture

# Clone SpeechBrain repository
!git clone https://github.com/Darshan7575/speechbrain.git
%cd /content/speechbrain/

# Install required dependencies
!pip install -r requirements.txt

# Install SpeechBrain in editable mode
!pip install -e .

In [2]:
# @title
# Required imports
import os
import json
import shutil
import logging
import sys
import torch
from pathlib import Path
import speechbrain as sb
from hyperpyyaml import load_hyperpyyaml
from speechbrain.utils.data_utils import get_all_files, download_file
from speechbrain.dataio.dataio import read_audio
from speechbrain.utils.distributed import run_on_main, if_main_process

# Required variables and loggers
logger = logging.getLogger(__name__)
logger = logging.getLogger(__name__)
MINILIBRI_TRAIN_URL = "http://www.openslr.org/resources/31/train-clean-5.tar.gz"
MINILIBRI_VALID_URL = "http://www.openslr.org/resources/31/dev-clean-2.tar.gz"
MINILIBRI_TEST_URL = "https://www.openslr.org/resources/12/test-clean.tar.gz"
SAMPLERATE = 16000

device="cuda"
run_opts = {'device':device}

### Tokenizer Training
In this section, we will train a BPE tokenizer with **150 tokens** using `Sentencepiece`.



In [3]:
# ############################################################################
# Dataset creation helper functions
# ############################################################################

def prepare_mini_librispeech(
    data_folder, save_json_train, save_json_valid, save_json_test
):
    """
    Prepares the json files for the Mini Librispeech dataset.
    Downloads the dataset if its not found in the `data_folder`.
    """

    # Check if this phase is already done (if so, skip it)
    if skip(save_json_train, save_json_valid, save_json_test):
        logger.info("Preparation completed in previous run, skipping.")
        return

    # If the dataset doesn't exist yet, download it
    train_folder = os.path.join(data_folder, "LibriSpeech", "train-clean-5")
    valid_folder = os.path.join(data_folder, "LibriSpeech", "dev-clean-2")
    test_folder = os.path.join(data_folder, "LibriSpeech", "test-clean")
    if not check_folders(train_folder, valid_folder, test_folder):
        download_mini_librispeech(data_folder)

    # List files and create manifest from list
    logger.info(
        f"Creating {save_json_train}, {save_json_valid}, and {save_json_test}"
    )
    extension = [".flac"]

    # List of flac audio files
    wav_list_train = get_all_files(train_folder, match_and=extension)
    wav_list_valid = get_all_files(valid_folder, match_and=extension)
    wav_list_test = get_all_files(test_folder, match_and=extension)

    # List of transcription file
    extension = [".trans.txt"]
    trans_list = get_all_files(data_folder, match_and=extension)
    trans_dict = get_transcription(trans_list)

    # Create the json files
    create_json(wav_list_train, trans_dict, save_json_train)
    create_json(wav_list_valid, trans_dict, save_json_valid)
    create_json(wav_list_test, trans_dict, save_json_test)


def get_transcription(trans_list):
    """
    Returns a dictionary with the transcription of each sentence in the dataset.
    """
    # Processing all the transcription files in the list
    trans_dict = {}
    for trans_file in trans_list:
        # Reading the text file
        with open(trans_file) as f:
            for line in f:
                uttid = line.split(" ")[0]
                text = line.rstrip().split(" ")[1:]
                text = " ".join(text)
                trans_dict[uttid] = text

    logger.info("Transcription files read!")
    return trans_dict


def create_json(wav_list, trans_dict, json_file):
    """
    Creates the json file given a list of wav files and their transcriptions.
    """
    # Processing all the wav files in the list
    json_dict = {}
    for wav_file in wav_list:

        # Reading the signal (to retrieve duration in seconds)
        signal = read_audio(wav_file)
        duration = signal.shape[0] / SAMPLERATE

        # Manipulate path to get relative path and uttid
        path_parts = wav_file.split(os.path.sep)
        uttid, _ = os.path.splitext(path_parts[-1])
        relative_path = os.path.join("{data_root}", *path_parts[-5:])

        # Create entry for this utterance
        json_dict[uttid] = {
            "wav": relative_path,
            "length": duration,
            "words": trans_dict[uttid],
        }

    # Writing the dictionary to the json file
    with open(json_file, mode="w") as json_f:
        json.dump(json_dict, json_f, indent=2)

    logger.info(f"{json_file} successfully created!")


def skip(*filenames):
    """
    Detects if the data preparation has been already done.
    If the preparation has been done, we can skip it.
    """
    for filename in filenames:
        if not os.path.isfile(filename):
            return False
    return True


def check_folders(*folders):
    """Returns False if any passed folder does not exist."""
    for folder in folders:
        if not os.path.exists(folder):
            return False
    return True


def download_mini_librispeech(destination):
    """Download dataset and unpack it.
    """
    train_archive = os.path.join(destination, "train-clean-5.tar.gz")
    valid_archive = os.path.join(destination, "dev-clean-2.tar.gz")
    test_archive = os.path.join(destination, "test-clean.tar.gz")
    download_file(MINILIBRI_TRAIN_URL, train_archive)
    download_file(MINILIBRI_VALID_URL, valid_archive)
    download_file(MINILIBRI_TEST_URL, test_archive)
    shutil.unpack_archive(train_archive, destination)
    shutil.unpack_archive(valid_archive, destination)
    shutil.unpack_archive(test_archive, destination)

In [4]:
tokenizer_hyperparams = """
# ############################################################################
# Tokenizer: subword BPE with unigram 150
# ############################################################################

output_folder: !ref results/tokenizer/

# Data files
data_folder: data
train_annotation: !ref <data_folder>/train.json
valid_annotation: !ref <data_folder>/valid.json
test_annotation: !ref <data_folder>/test.json

# Tokenizer training parameters
token_type: unigram  # ["unigram", "bpe", "char"]
token_output: 150  # index(blank/eos/bos/unk) = 0
character_coverage: 1.0
json_read: words

tokenizer: !name:speechbrain.tokenizers.SentencePiece.SentencePiece
   model_dir: !ref <output_folder>
   vocab_size: !ref <token_output>
   annotation_train: !ref <train_annotation>
   annotation_read: !ref <json_read>
   annotation_format: json
   model_type: !ref <token_type> # ["unigram", "bpe", "char"]
   character_coverage: !ref <character_coverage>
   annotation_list_to_check: [!ref <train_annotation>, !ref <valid_annotation>]

"""

In [5]:
# load required params from the hyperpyyaml file
hparams = load_hyperpyyaml(tokenizer_hyperparams)

# 1. Dataset creation

## Create experiment directory
sb.create_experiment_directory(
    experiment_directory=hparams["output_folder"],
    overrides=None,
)

## Create dataset
run_on_main(
    prepare_mini_librispeech,
    kwargs={
        "data_folder": hparams["data_folder"],
        "save_json_train": hparams["train_annotation"],
        "save_json_valid": hparams["valid_annotation"],
        "save_json_test": hparams["test_annotation"],
    },
)

# 2. Tokenizer training
hparams["tokenizer"]()

# 3. Saving tokenizer in .ckpt extension
output_path = hparams["output_folder"]
token_output = hparams["token_output"]
token_type = hparams["token_type"]
bpe_model = f"{output_path}/{token_output}_{token_type}.model"
tokenizer_ckpt = f"{output_path}/tokenizer.ckpt"
shutil.copyfile(bpe_model, tokenizer_ckpt)

speechbrain.core - Beginning experiment!
speechbrain.core - Experiment folder: results/tokenizer/
Downloading http://www.openslr.org/resources/31/train-clean-5.tar.gz to data/train-clean-5.tar.gz


train-clean-5.tar.gz: 333MB [00:17, 18.9MB/s]                           


Downloading http://www.openslr.org/resources/31/dev-clean-2.tar.gz to data/dev-clean-2.tar.gz


dev-clean-2.tar.gz: 126MB [00:07, 16.7MB/s]                           


Downloading https://www.openslr.org/resources/12/test-clean.tar.gz to data/test-clean.tar.gz


test-clean.tar.gz: 347MB [00:17, 19.3MB/s]                           


__main__ - Creating data/train.json, data/valid.json, and data/test.json
__main__ - Transcription files read!
__main__ - data/train.json successfully created!
__main__ - data/valid.json successfully created!
__main__ - data/test.json successfully created!
speechbrain.tokenizers.SentencePiece - Train tokenizer with type:unigram
speechbrain.tokenizers.SentencePiece - Extract words sequences from:data/train.json
speechbrain.tokenizers.SentencePiece - Text file created at: results/tokenizer/train.txt
speechbrain.tokenizers.SentencePiece - ==== Loading Tokenizer ===
speechbrain.tokenizers.SentencePiece - Tokenizer path: results/tokenizer/150_unigram.model
speechbrain.tokenizers.SentencePiece - Tokenizer vocab_size: 150
speechbrain.tokenizers.SentencePiece - Tokenizer type: unigram
speechbrain.tokenizers.SentencePiece - ==== Accuracy checking for recovering text from tokenizer ===
speechbrain.tokenizers.SentencePiece - recover words from: data/train.json
speechbrain.tokenizers.SentencePiece 

'results/tokenizer//tokenizer.ckpt'

### Model Training
In this section, we will train a **6 layer Conformer** encoder only architecture with the `CTC objective`.

In [16]:
global_hyperparams = """
# Seed needs to be set at top of yaml, before objects with parameters are made
seed: 2024
__set_seed: !apply:torch.manual_seed [!ref <seed>]

# Data files
data_folder: data

####TASK ADD APPROPRIATE REFERENCES TO LOAD THE FILES ##############

train_annotation: !ref <data_folder>/train.json
valid_annotation: !ref <data_folder>/valid.json
test_annotation: !ref <data_folder>/test.json
#####################################################################

# Language model (LM) pretraining
pretrained_lm_tokenizer_path: ./results/tokenizer

# Training parameters
number_of_epochs: 30
batch_size: 8
lr_adam: 0.001
max_grad_norm: 5.0
ckpt_interval_minutes: 15 # save checkpoint every N min
loss_reduction: 'batchmean'

# Dataloader options
train_dataloader_opts:
    batch_size: !ref <batch_size>

valid_dataloader_opts:
    batch_size: !ref <batch_size>

test_dataloader_opts:
    batch_size: !ref <batch_size>

# Feature parameters
sample_rate: 16000
n_fft: 400
n_mels: 80

####################### Model parameters ###########################
# Transformer
d_model: 64
nhead: 4
num_encoder_layers: 6
d_ffn: 256
transformer_dropout: 0.1
activation: !name:torch.nn.GELU
output_neurons: 150
label_smoothing: 0.0
attention_type: RelPosMHAXL

# Outputs
blank_index: 0
pad_index: 0
bos_index: 1
eos_index: 2

# Decoding parameters
min_decode_ratio: 0.0
max_decode_ratio: 1.0
test_beam_size: 1
ctc_weight_decode: 1.0

############################## models ################################

compute_features: !new:speechbrain.lobes.features.Fbank
    sample_rate: !ref <sample_rate>
    n_fft: !ref <n_fft>
    n_mels: !ref <n_mels>

CNN: !new:speechbrain.lobes.models.convolution.ConvolutionFrontEnd
    input_shape: (8, 10, 80)
    num_blocks: 2
    num_layers_per_block: 1
    out_channels: (64, 32)
    kernel_sizes: (3, 3)
    strides: (2, 2)
    residuals: (False, False)

# standard parameters for the BASE model
Transformer: !new:speechbrain.lobes.models.transformer.TransformerASR.TransformerASR
    input_size: 640
    tgt_vocab: !ref <output_neurons>
    d_model: !ref <d_model>
    nhead: !ref <nhead>
    num_encoder_layers: !ref <num_encoder_layers>
    num_decoder_layers: 0
    d_ffn: !ref <d_ffn>
    dropout: !ref <transformer_dropout>
    activation: !ref <activation>
    encoder_module: conformer
    attention_type: !ref <attention_type>
    normalize_before: True

tokenizer: !new:sentencepiece.SentencePieceProcessor

ctc_lin: !new:speechbrain.nnet.linear.Linear
    input_size: !ref <d_model>
    n_neurons: !ref <output_neurons>

normalize: !new:speechbrain.processing.features.InputNormalization
    norm_type: global
    update_until_epoch: 4

modules:
    CNN: !ref <CNN>
    Transformer: !ref <Transformer>
    ctc_lin: !ref <ctc_lin>
    normalize: !ref <normalize>

model: !new:torch.nn.ModuleList
    - [!ref <CNN>, !ref <Transformer>, !ref <ctc_lin>]

# define two optimizers here for two-stage training
Adam: !name:torch.optim.Adam
    lr: !ref <lr_adam>
    betas: (0.9, 0.98)
    eps: 0.000000001

log_softmax: !new:torch.nn.LogSoftmax
    dim: -1

ctc_cost: !name:speechbrain.nnet.losses.ctc_loss
    blank_index: !ref <blank_index>
    reduction: !ref <loss_reduction>

noam_annealing: !new:speechbrain.nnet.schedulers.NoamScheduler
    lr_initial: !ref <lr_adam>
    n_warmup_steps: 1500

epoch_counter: !new:speechbrain.utils.epoch_loop.EpochCounter
    limit: !ref <number_of_epochs>

error_rate_computer: !name:speechbrain.utils.metric_stats.ErrorRateStats

cer_computer: !name:speechbrain.utils.metric_stats.ErrorRateStats
   split_tokens: True

# The pretrainer allows a mapping between pretrained files and instances that
# are declared in the yaml. E.g here, we will download the file tokenizer.ckpt
# and it will be loaded into "tokenizer" which is pointing to the <pretrained_lm_tokenizer_path> defined
# before.
pretrainer: !new:speechbrain.utils.parameter_transfer.Pretrainer
    loadables:
        tokenizer: !ref <tokenizer>
    paths:
        tokenizer: !ref <pretrained_lm_tokenizer_path>/tokenizer.ckpt
"""

In [17]:
def dataio_prepare(hparams):
    """This function prepares the datasets to be used in the brain class.
    It also defines the data processing pipeline through user-defined functions.
    """
    # Define audio pipeline. In this case, we simply read the path contained
    # in the variable wav with the audio reader.
    @sb.utils.data_pipeline.takes("wav")
    @sb.utils.data_pipeline.provides("sig")
    def audio_pipeline(wav):
        """Load the audio signal. This is done on the CPU in the `collate_fn`."""
        sig = sb.dataio.dataio.read_audio(wav)
        return sig

    tokenizer = hparams["tokenizer"]
    # Define text processing pipeline. We start from the raw text and then
    # encode it using the tokenizer. The tokens with BOS are used for feeding
    # decoder during training, the tokens with EOS for computing the cost function.
    # The tokens without BOS or EOS is for computing CTC loss.
    @sb.utils.data_pipeline.takes("words")
    @sb.utils.data_pipeline.provides(
        "wrd", "tokens_list", "tokens_bos", "tokens_eos", "tokens"
    )
    def text_pipeline(wrd):
        """Processes the transcriptions to generate proper labels"""
        yield wrd
        tokens_list = tokenizer.encode_as_ids(wrd)
        yield tokens_list
        tokens_bos = torch.LongTensor([hparams["bos_index"]] + (tokens_list))
        yield tokens_bos
        tokens_eos = torch.LongTensor(tokens_list + [hparams["eos_index"]])
        yield tokens_eos
        tokens = torch.LongTensor(tokens_list)
        yield tokens

    # Define datasets from json data manifest file
    # Define datasets sorted by ascending lengths for efficiency
    datasets = {}
    data_folder = hparams["data_folder"]
    for dataset in ["train", "valid", "test"]:
        datasets[dataset] = sb.dataio.dataset.DynamicItemDataset.from_json(
            json_path=hparams[f"{dataset}_annotation"],
            replacements={"data_root": data_folder},
            dynamic_items=[audio_pipeline, text_pipeline],
            output_keys=[
                "id",
                "sig",
                "wrd",
                "tokens_bos",
                "tokens_eos",
                "tokens",
            ],
        )
        hparams[f"{dataset}_dataloader_opts"]["shuffle"] = False

    datasets["train"] = datasets["train"].filtered_sorted(sort_key="length")
    hparams["train_dataloader_opts"]["shuffle"] = False

    return (
        datasets["train"],
        datasets["valid"],
        datasets["test"],
        tokenizer
    )

In [18]:
# Define training procedure
class BaseASR(sb.Brain):
    def __init__(
        self,
        modules=None,
        opt_class=None,
        hparams=None,
        run_opts=None,
        checkpointer=None,
        profiler=None,
        tokenizer=None,
    ):
        super(BaseASR, self).__init__(
            modules=modules,
            opt_class=opt_class,
            hparams=hparams,
            run_opts=run_opts,
            checkpointer=checkpointer,
            profiler=profiler
        )
        self.tokenizer = tokenizer

    def compute_forward(self, batch, stage):
        """Performs a forward pass through the encoder"""
        batch = batch.to(self.device)
        wavs, wav_lens = batch.sig
        tokens_bos, _ = batch.tokens_bos

        # compute features
        ####TASK MAKE APPROPRIATE FUNCTION CALLS TO COMPUTE THE FEATURES BELOW
        feats = self.hparams.compute_features(wavs)
        current_epoch = self.hparams.epoch_counter.current
        feats = self.modules.normalize(feats, wav_lens, epoch=current_epoch)

        # forward modules
        src = self.modules.CNN(feats)

        enc_out, _ = self.modules.Transformer(
            src, tokens_bos, wav_lens, pad_idx=self.hparams.pad_index,
        )

        # output layer for ctc log-probabilities
        logits = self.modules.ctc_lin(enc_out)

        ####TASK CALCULATE THE PROBABILITIES OF THESE LOGITS
        #### USING SPEECHBRAIN
        p_ctc = self.hparams.log_softmax(logits)

        # Compute outputs
        hyps = None
        if stage == sb.Stage.TRAIN:
            hyps = None
        else:
            hyps = sb.decoders.ctc_greedy_decode(
                p_ctc, wav_lens, blank_id=self.hparams.blank_index
            )

        return p_ctc, wav_lens, hyps

    def compute_objectives(self, predictions, batch, stage):
        """Computes the CTC loss given predictions and targets."""

        (p_ctc, wav_lens, hyps,) = predictions

        ids = batch.id
        tokens_eos, tokens_eos_lens = batch.tokens_eos
        tokens, tokens_lens = batch.tokens

        # Calculate CTC loss
        ####TASK Make required function call to compute CTC LOSS
        #### You have to aggregate the loss in the end so make appropriate
        #### modifications to the value returned.
        loss = self.hparams.ctc_cost(p_ctc, tokens, wav_lens, tokens_lens)

        if stage != sb.Stage.TRAIN:
            # Decode token terms to words
            predicted_words = [
                self.tokenizer.decode_ids(utt_seq).split(" ") for utt_seq in hyps
            ]
            target_words = [wrd.split(" ") for wrd in batch.wrd]
            self.wer_metric.append(ids, predicted_words, target_words)
            self.cer_metric.append(ids, predicted_words, target_words)

        return loss

    def on_evaluate_start(self, max_key=None, min_key=None):
        """Performs checkpoint averge if needed"""
        super().on_evaluate_start()

        ckpts = self.checkpointer.find_checkpoints(
            max_key=max_key, min_key=min_key
        )
        ckpt = sb.utils.checkpoints.average_checkpoints(
            ckpts, recoverable_name="model", device=self.device
        )

        self.hparams.model.load_state_dict(ckpt, strict=True)
        self.hparams.model.eval()
        print("Loaded the average")

    def evaluate_batch(self, batch, stage):
        """Computations needed for validation/test batches"""
        with torch.no_grad():
            predictions = self.compute_forward(batch, stage=stage)
            loss = self.compute_objectives(predictions, batch, stage=stage)
        return loss.detach()

    def on_stage_start(self, stage, epoch):
        """Gets called at the beginning of each epoch"""
        if stage != sb.Stage.TRAIN:
            self.cer_metric = self.hparams.cer_computer()
            self.wer_metric = self.hparams.error_rate_computer()

    def on_stage_end(self, stage, stage_loss, epoch):
        """Gets called at the end of a epoch."""
        # Compute/store important stats
        stage_stats = {"loss": stage_loss}
        if stage == sb.Stage.TRAIN:
            self.train_stats = stage_stats
        else:
            stage_stats["CER"] = self.cer_metric.summarize("error_rate")
            stage_stats["WER"] = self.wer_metric.summarize("error_rate")

        # log stats and save checkpoint at end-of-epoch
        if stage == sb.Stage.VALID and sb.utils.distributed.if_main_process():

            lr = self.hparams.noam_annealing.current_lr
            steps = self.optimizer_step
            optimizer = self.optimizer.__class__.__name__

            epoch_stats = {
                "epoch": epoch,
                "lr": lr,
                "steps": steps,
                "optimizer": optimizer,
            }
            self.hparams.train_logger.log_stats(
                stats_meta=epoch_stats,
                train_stats=self.train_stats,
                valid_stats=stage_stats,
            )
            # Save only last 10 checkpoints
            self.checkpointer.save_and_keep_only(
                meta={"loss": stage_loss, "epoch": epoch},
                max_keys=["epoch"],
                num_to_keep=10,
            )

        elif stage == sb.Stage.TEST:
            self.hparams.train_logger.log_stats(
                stats_meta={"Epoch loaded": self.hparams.epoch_counter.current},
                test_stats=stage_stats,
            )
            # Write the WER metric for test dataset
            if if_main_process():
                with open(self.hparams.test_wer_file, "w") as w:
                    self.wer_metric.write_stats(w)

    def fit_batch(self, batch):
        """Performs a forward + backward pass on 1 batch
        """

        should_step = self.step % self.grad_accumulation_factor == 0

        outputs = self.compute_forward(batch, sb.Stage.TRAIN)
        loss = self.compute_objectives(outputs, batch, sb.Stage.TRAIN)
        loss.backward()
        if self.check_gradients(loss):
            self.optimizer.step()
        self.zero_grad()
        self.optimizer_step += 1
        self.hparams.noam_annealing(self.optimizer)

        self.on_fit_batch_end(batch, outputs, loss, should_step)
        return loss.detach().cpu()

In [19]:
task_hyperparameters = """
# Setup the directory to host experiment results
output_folder: !ref results/transformer/Task_1
wer_file: !ref <output_folder>/wer.txt
save_folder: !ref <output_folder>/save
train_log: !ref <output_folder>/train_log.txt

train_logger: !new:speechbrain.utils.train_logger.FileTrainLogger
    save_file: !ref <train_log>

checkpointer: !new:speechbrain.utils.checkpoints.Checkpointer
    checkpoints_dir: !ref <save_folder>
    recoverables:
        model: !ref <model>
        noam_scheduler: !ref <noam_annealing>
        normalizer: !ref <normalize>
        counter: !ref <epoch_counter>
"""

In [20]:
hyperparams = global_hyperparams + task_hyperparameters
hparams = load_hyperpyyaml(hyperparams)

# Create experiment directory
sb.create_experiment_directory(
    experiment_directory=hparams["output_folder"],
    overrides=None,
)

# Here we create the datasets objects as well as tokenization and encoding
(
    train_data,
    valid_data,
    test_data,
    tokenizer
) = dataio_prepare(hparams)

# We download the pretrained LM from HuggingFace (or elsewhere depending on
# the path given in the YAML file). The tokenizer is loaded at the same time.
run_on_main(hparams["pretrainer"].collect_files)
hparams["pretrainer"].load_collected(device=run_opts["device"])

# Trainer initialization
asr_brain = BaseASR(
    modules=hparams["modules"],
    opt_class=hparams["Adam"],
    hparams=hparams,
    checkpointer=hparams["checkpointer"],
    run_opts=run_opts,
    tokenizer=tokenizer,
)

# adding objects to trainer:
train_dataloader_opts = hparams["train_dataloader_opts"]
valid_dataloader_opts = hparams["valid_dataloader_opts"]

# Training
asr_brain.fit(
    asr_brain.hparams.epoch_counter,
    train_data,
    valid_data,
    train_loader_kwargs=train_dataloader_opts,
    valid_loader_kwargs=valid_dataloader_opts
)

# Testing
asr_brain.hparams.test_wer_file = asr_brain.hparams.wer_file
asr_brain.evaluate(
    test_data,
    max_key="epoch",
    test_loader_kwargs=hparams["test_dataloader_opts"],
)

speechbrain.core - Beginning experiment!
speechbrain.core - Experiment folder: results/transformer/Task_1
speechbrain.pretrained.fetching - Fetch tokenizer.ckpt: Using existing file/symlink in model_checkpoints/tokenizer.ckpt.
speechbrain.utils.parameter_transfer - Set local path in self.paths[tokenizer] = model_checkpoints/tokenizer.ckpt
speechbrain.utils.parameter_transfer - Loading pretrained files for: tokenizer
speechbrain.utils.parameter_transfer - Redirecting (loading from local path): model_checkpoints/tokenizer.ckpt -> model_checkpoints/tokenizer.ckpt
speechbrain.core - Info: max_grad_norm arg from hparam file is used
speechbrain.core - Info: ckpt_interval_minutes arg from hparam file is used
speechbrain.core - 698.9k trainable parameters in BaseASR
speechbrain.utils.checkpoints - Would load a checkpoint here, but none found yet.
speechbrain.utils.epoch_loop - Going into epoch 1


100%|██████████| 190/190 [00:29<00:00,  6.42it/s, train_loss=537]
100%|██████████| 137/137 [00:09<00:00, 14.90it/s]

speechbrain.utils.train_logger - epoch: 1, lr: 1.26e-04, steps: 190, optimizer: Adam - train loss: 5.37e+02 - valid loss: 2.42e+02, valid CER: 1.00e+02, valid WER: 1.00e+02
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-28-15+00
speechbrain.utils.epoch_loop - Going into epoch 2



100%|██████████| 190/190 [00:25<00:00,  7.43it/s, train_loss=433]
100%|██████████| 137/137 [00:09<00:00, 14.31it/s]

speechbrain.utils.train_logger - epoch: 2, lr: 2.53e-04, steps: 380, optimizer: Adam - train loss: 4.33e+02 - valid loss: 2.34e+02, valid CER: 1.00e+02, valid WER: 1.00e+02





speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-28-51+00
speechbrain.utils.epoch_loop - Going into epoch 3


100%|██████████| 190/190 [00:25<00:00,  7.46it/s, train_loss=431]
100%|██████████| 137/137 [00:08<00:00, 15.85it/s]

speechbrain.utils.train_logger - epoch: 3, lr: 3.79e-04, steps: 570, optimizer: Adam - train loss: 4.31e+02 - valid loss: 2.34e+02, valid CER: 1.00e+02, valid WER: 1.00e+02
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-29-25+00
speechbrain.utils.epoch_loop - Going into epoch 4



100%|██████████| 190/190 [00:25<00:00,  7.48it/s, train_loss=428]
100%|██████████| 137/137 [00:08<00:00, 15.98it/s]

speechbrain.utils.train_logger - epoch: 4, lr: 5.06e-04, steps: 760, optimizer: Adam - train loss: 4.28e+02 - valid loss: 2.31e+02, valid CER: 1.00e+02, valid WER: 1.00e+02
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-29-59+00
speechbrain.utils.epoch_loop - Going into epoch 5



100%|██████████| 190/190 [00:25<00:00,  7.48it/s, train_loss=411]
100%|██████████| 137/137 [00:08<00:00, 16.03it/s]

speechbrain.utils.train_logger - epoch: 5, lr: 6.33e-04, steps: 950, optimizer: Adam - train loss: 4.11e+02 - valid loss: 2.14e+02, valid CER: 94.87, valid WER: 99.79





speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-30-33+00
speechbrain.utils.epoch_loop - Going into epoch 6


100%|██████████| 190/190 [00:26<00:00,  7.29it/s, train_loss=370]
100%|██████████| 137/137 [00:11<00:00, 12.35it/s]

speechbrain.utils.train_logger - epoch: 6, lr: 7.59e-04, steps: 1140, optimizer: Adam - train loss: 3.70e+02 - valid loss: 1.91e+02, valid CER: 75.43, valid WER: 95.31
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-31-10+00





speechbrain.utils.epoch_loop - Going into epoch 7


100%|██████████| 190/190 [00:26<00:00,  7.30it/s, train_loss=335]
100%|██████████| 137/137 [00:11<00:00, 11.72it/s]

speechbrain.utils.train_logger - epoch: 7, lr: 8.86e-04, steps: 1330, optimizer: Adam - train loss: 3.35e+02 - valid loss: 1.74e+02, valid CER: 70.41, valid WER: 94.23
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-31-48+00
speechbrain.utils.epoch_loop - Going into epoch 8



100%|██████████| 190/190 [00:25<00:00,  7.49it/s, train_loss=306]
100%|██████████| 137/137 [00:12<00:00, 11.12it/s]

speechbrain.utils.train_logger - epoch: 8, lr: 9.94e-04, steps: 1520, optimizer: Adam - train loss: 3.06e+02 - valid loss: 1.61e+02, valid CER: 65.11, valid WER: 92.03
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-32-26+00
speechbrain.utils.epoch_loop - Going into epoch 9



100%|██████████| 190/190 [00:25<00:00,  7.47it/s, train_loss=283]
100%|██████████| 137/137 [00:12<00:00, 10.67it/s]

speechbrain.utils.train_logger - epoch: 9, lr: 9.37e-04, steps: 1710, optimizer: Adam - train loss: 2.83e+02 - valid loss: 1.51e+02, valid CER: 58.89, valid WER: 89.52
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-33-05+00





speechbrain.utils.epoch_loop - Going into epoch 10


100%|██████████| 190/190 [00:25<00:00,  7.44it/s, train_loss=264]
100%|██████████| 137/137 [00:13<00:00, 10.51it/s]

speechbrain.utils.train_logger - epoch: 10, lr: 8.89e-04, steps: 1900, optimizer: Adam - train loss: 2.64e+02 - valid loss: 1.44e+02, valid CER: 56.19, valid WER: 88.15
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-33-44+00





speechbrain.utils.epoch_loop - Going into epoch 11


100%|██████████| 190/190 [00:26<00:00,  7.15it/s, train_loss=249]
100%|██████████| 137/137 [00:13<00:00, 10.07it/s]

speechbrain.utils.train_logger - epoch: 11, lr: 8.47e-04, steps: 2090, optimizer: Adam - train loss: 2.49e+02 - valid loss: 1.39e+02, valid CER: 54.63, valid WER: 87.12
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-34-24+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-28-15+00
speechbrain.utils.epoch_loop - Going into epoch 12


100%|██████████| 190/190 [00:25<00:00,  7.38it/s, train_loss=237]
100%|██████████| 137/137 [00:13<00:00,  9.92it/s]

speechbrain.utils.train_logger - epoch: 12, lr: 8.11e-04, steps: 2280, optimizer: Adam - train loss: 2.37e+02 - valid loss: 1.34e+02, valid CER: 51.81, valid WER: 85.55
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-35-04+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-28-51+00
speechbrain.utils.epoch_loop - Going into epoch 13


100%|██████████| 190/190 [00:25<00:00,  7.35it/s, train_loss=226]
100%|██████████| 137/137 [00:13<00:00,  9.85it/s]

speechbrain.utils.train_logger - epoch: 13, lr: 7.79e-04, steps: 2470, optimizer: Adam - train loss: 2.26e+02 - valid loss: 1.30e+02, valid CER: 49.96, valid WER: 84.99
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-35-44+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-29-25+00
speechbrain.utils.epoch_loop - Going into epoch 14


100%|██████████| 190/190 [00:25<00:00,  7.36it/s, train_loss=216]
100%|██████████| 137/137 [00:15<00:00,  9.07it/s]

speechbrain.utils.train_logger - epoch: 14, lr: 7.51e-04, steps: 2660, optimizer: Adam - train loss: 2.16e+02 - valid loss: 1.27e+02, valid CER: 49.22, valid WER: 84.27
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-36-25+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-29-59+00
speechbrain.utils.epoch_loop - Going into epoch 15


100%|██████████| 190/190 [00:25<00:00,  7.49it/s, train_loss=209]
100%|██████████| 137/137 [00:14<00:00,  9.66it/s]

speechbrain.utils.train_logger - epoch: 15, lr: 7.26e-04, steps: 2850, optimizer: Adam - train loss: 2.09e+02 - valid loss: 1.24e+02, valid CER: 46.92, valid WER: 83.66
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-37-05+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-30-33+00
speechbrain.utils.epoch_loop - Going into epoch 16


100%|██████████| 190/190 [00:25<00:00,  7.46it/s, train_loss=201]
100%|██████████| 137/137 [00:14<00:00,  9.55it/s]

speechbrain.utils.train_logger - epoch: 16, lr: 7.03e-04, steps: 3040, optimizer: Adam - train loss: 2.01e+02 - valid loss: 1.22e+02, valid CER: 45.44, valid WER: 82.63
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-37-45+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-31-10+00
speechbrain.utils.epoch_loop - Going into epoch 17


100%|██████████| 190/190 [00:25<00:00,  7.38it/s, train_loss=194]
100%|██████████| 137/137 [00:14<00:00,  9.53it/s]

speechbrain.utils.train_logger - epoch: 17, lr: 6.82e-04, steps: 3230, optimizer: Adam - train loss: 1.94e+02 - valid loss: 1.19e+02, valid CER: 44.77, valid WER: 82.40
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-38-26+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-31-48+00
speechbrain.utils.epoch_loop - Going into epoch 18


100%|██████████| 190/190 [00:25<00:00,  7.43it/s, train_loss=188]
100%|██████████| 137/137 [00:14<00:00,  9.52it/s]

speechbrain.utils.train_logger - epoch: 18, lr: 6.62e-04, steps: 3420, optimizer: Adam - train loss: 1.88e+02 - valid loss: 1.18e+02, valid CER: 44.20, valid WER: 82.01
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-39-06+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-32-26+00
speechbrain.utils.epoch_loop - Going into epoch 19


100%|██████████| 190/190 [00:25<00:00,  7.43it/s, train_loss=182]
100%|██████████| 137/137 [00:15<00:00,  8.90it/s]

speechbrain.utils.train_logger - epoch: 19, lr: 6.45e-04, steps: 3610, optimizer: Adam - train loss: 1.82e+02 - valid loss: 1.16e+02, valid CER: 43.08, valid WER: 81.22
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-39-47+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-33-05+00
speechbrain.utils.epoch_loop - Going into epoch 20


100%|██████████| 190/190 [00:25<00:00,  7.44it/s, train_loss=177]
100%|██████████| 137/137 [00:14<00:00,  9.33it/s]

speechbrain.utils.train_logger - epoch: 20, lr: 6.28e-04, steps: 3800, optimizer: Adam - train loss: 1.77e+02 - valid loss: 1.15e+02, valid CER: 42.19, valid WER: 80.79
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-40-28+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-33-44+00
speechbrain.utils.epoch_loop - Going into epoch 21


100%|██████████| 190/190 [00:25<00:00,  7.49it/s, train_loss=172]
100%|██████████| 137/137 [00:14<00:00,  9.21it/s]

speechbrain.utils.train_logger - epoch: 21, lr: 6.13e-04, steps: 3990, optimizer: Adam - train loss: 1.72e+02 - valid loss: 1.14e+02, valid CER: 42.20, valid WER: 80.49
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-41-08+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-34-24+00
speechbrain.utils.epoch_loop - Going into epoch 22


100%|██████████| 190/190 [00:26<00:00,  7.22it/s, train_loss=168]
100%|██████████| 137/137 [00:14<00:00,  9.36it/s]

speechbrain.utils.train_logger - epoch: 22, lr: 5.99e-04, steps: 4180, optimizer: Adam - train loss: 1.68e+02 - valid loss: 1.13e+02, valid CER: 41.35, valid WER: 80.42
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-41-50+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-35-04+00
speechbrain.utils.epoch_loop - Going into epoch 23


100%|██████████| 190/190 [00:25<00:00,  7.48it/s, train_loss=164]
100%|██████████| 137/137 [00:14<00:00,  9.22it/s]

speechbrain.utils.train_logger - epoch: 23, lr: 5.86e-04, steps: 4370, optimizer: Adam - train loss: 1.64e+02 - valid loss: 1.13e+02, valid CER: 40.99, valid WER: 80.15
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-42-31+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-35-44+00
speechbrain.utils.epoch_loop - Going into epoch 24


100%|██████████| 190/190 [00:25<00:00,  7.44it/s, train_loss=160]
100%|██████████| 137/137 [00:15<00:00,  8.86it/s]

speechbrain.utils.train_logger - epoch: 24, lr: 5.74e-04, steps: 4560, optimizer: Adam - train loss: 1.60e+02 - valid loss: 1.11e+02, valid CER: 40.59, valid WER: 79.61
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-43-12+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-36-25+00
speechbrain.utils.epoch_loop - Going into epoch 25


100%|██████████| 190/190 [00:25<00:00,  7.38it/s, train_loss=156]
100%|██████████| 137/137 [00:15<00:00,  9.11it/s]

speechbrain.utils.train_logger - epoch: 25, lr: 5.62e-04, steps: 4750, optimizer: Adam - train loss: 1.56e+02 - valid loss: 1.11e+02, valid CER: 40.12, valid WER: 79.57
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-43-53+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-37-05+00
speechbrain.utils.epoch_loop - Going into epoch 26


100%|██████████| 190/190 [00:25<00:00,  7.45it/s, train_loss=152]
100%|██████████| 137/137 [00:15<00:00,  8.82it/s]

speechbrain.utils.train_logger - epoch: 26, lr: 5.51e-04, steps: 4940, optimizer: Adam - train loss: 1.52e+02 - valid loss: 1.10e+02, valid CER: 39.85, valid WER: 78.91





speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-44-35+00
speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-37-45+00
speechbrain.utils.epoch_loop - Going into epoch 27


100%|██████████| 190/190 [00:25<00:00,  7.50it/s, train_loss=150]
100%|██████████| 137/137 [00:14<00:00,  9.27it/s]

speechbrain.utils.train_logger - epoch: 27, lr: 5.41e-04, steps: 5130, optimizer: Adam - train loss: 1.50e+02 - valid loss: 1.10e+02, valid CER: 39.33, valid WER: 78.13
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-45-16+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-38-26+00
speechbrain.utils.epoch_loop - Going into epoch 28


100%|██████████| 190/190 [00:25<00:00,  7.50it/s, train_loss=146]
100%|██████████| 137/137 [00:14<00:00,  9.21it/s]

speechbrain.utils.train_logger - epoch: 28, lr: 5.31e-04, steps: 5320, optimizer: Adam - train loss: 1.46e+02 - valid loss: 1.10e+02, valid CER: 39.06, valid WER: 77.77
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-45-56+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-39-06+00
speechbrain.utils.epoch_loop - Going into epoch 29


100%|██████████| 190/190 [00:26<00:00,  7.18it/s, train_loss=143]
100%|██████████| 137/137 [00:14<00:00,  9.30it/s]

speechbrain.utils.train_logger - epoch: 29, lr: 5.22e-04, steps: 5510, optimizer: Adam - train loss: 1.43e+02 - valid loss: 1.09e+02, valid CER: 38.79, valid WER: 77.54
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-46-38+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-39-47+00
speechbrain.utils.epoch_loop - Going into epoch 30


100%|██████████| 190/190 [00:25<00:00,  7.47it/s, train_loss=140]
100%|██████████| 137/137 [00:14<00:00,  9.19it/s]

speechbrain.utils.train_logger - epoch: 30, lr: 5.13e-04, steps: 5700, optimizer: Adam - train loss: 1.40e+02 - valid loss: 1.08e+02, valid CER: 38.28, valid WER: 77.42
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-47-19+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-17+14-40-28+00
speechbrain.utils.checkpoints - Loading a checkpoint from results/transformer/Task_1/save/CKPT+2024-02-17+14-47-19+00
Loaded the average


100%|██████████| 328/328 [00:43<00:00,  7.52it/s]

speechbrain.utils.train_logger - Epoch loaded: 30 - test loss: 1.22e+02, test CER: 39.75, test WER: 78.75





122.01812960461884

# Part II(A): CTC is all you need
In this section, you will train a **6-layer Conformer** encoder with both  `CTC` and `inter-CTC` losses.

> Indented block



In [21]:
class ASR_2A(BaseASR):
    def __init__(
        self, *args, **kwargs
    ):
        super().__init__(*args, **kwargs)
        self.inter_ctc_weight = self.hparams.interctc_weight
        self.intermediate_layers = [int(layer) for layer in self.hparams.intermediate_layers.split(',')]

        # Variable to hold intermediate logits for interCTC loss calculation
        self.inter_logits = []

        # TODO: Define a helper function get_intermediate_output for the forward hook
        def get_intermediate_output(module, input, output):
            # TODO: Complete this function
            self.inter_logits.append(output)


        # TODO: Register hooks for all the intermediate encoder layers of interest.
        # TODO: Refer to register_forward_hook (https://pytorch.org/docs/stable/generated/torch.nn.modules.module.register_module_forward_hook.html)
        # TODO: Save all the hooks in a list self.hooks that you can remove later from the module
        self.hooks = [self.modules.Transformer.encoder.layers[i-1].register_forward_hook(get_intermediate_output) for i in self.intermediate_layers] #i-1 since 1 indexed



    def compute_forward(self, batch, stage):
        """Performs a forward pass through the encoder"""
        batch = batch.to(self.device)
        wavs, wav_lens = batch.sig
        tokens_bos, _ = batch.tokens_bos

        # compute features
        feats = self.hparams.compute_features(wavs)
        current_epoch = self.hparams.epoch_counter.current
        feats = self.modules.normalize(feats, wav_lens, epoch=current_epoch)

        # forward modules
        src = self.modules.CNN(feats)

        assert len(self.inter_logits) == 0, "self.inter_logits should be empty as we haven't done a forward pass yet"
        enc_out, _ = self.modules.Transformer(
            src, tokens_bos, wav_lens, pad_idx=self.hparams.pad_index,
        )

        # Compute final layer logit
        logits = self.modules.ctc_lin(enc_out)
        p_ctc = self.hparams.log_softmax(logits)

        # TODO: Append all the intermediate layer logits to the following list: inter_p_ctc
        # TODO: Go through all the layers in intermediate_layers. Note that the comma-separated list in intermediate_layers is 1-indexed.
        # TODO: Complete code below to populate inter_p_ctc
        inter_p_ctc = [self.hparams.log_softmax(self.modules.ctc_lin(temp_logits[0])) for temp_logits in self.inter_logits]

        # Flush the logits saved during last forward pass.
        self.inter_logits = []

        # Compute outputs
        hyps = None
        if stage == sb.Stage.TRAIN:
            assert len(inter_p_ctc) != 0 , "inter_p_ctc should NOT be empty as forward pass is already done"
            hyps = None
        else:
            hyps = sb.decoders.ctc_greedy_decode(
                p_ctc, wav_lens, blank_id=self.hparams.blank_index
            )

        return p_ctc, inter_p_ctc, wav_lens, hyps

    def on_evaluate_start(self, max_key=None, min_key=None):
        """Performs sanity operations before inferencing on the test set."""
        if self.checkpointer is not None:
            self.checkpointer.recover_if_possible(
                max_key=max_key,
                min_key=min_key,
                device=torch.device(self.device),
            )

        # Deregister hooks here as they are not needed during evaluation
        for hook in self.hooks:
            hook.remove()

    def compute_objectives(self, predictions, batch, stage):
        """Computes the CTC + inter-CTC loss given predictions and targets."""

        (p_ctc, inter_p_ctc, wav_lens, hyps,) = predictions

        ids = batch.id
        tokens_eos, tokens_eos_lens = batch.tokens_eos
        tokens, tokens_lens = batch.tokens

        # TODO: Compute inter-CTC loss
        loss_inter_ctc = sum(self.hparams.ctc_cost(temp_inter_p_ctc, tokens, wav_lens, tokens_lens) for temp_inter_p_ctc in inter_p_ctc)
        # TODO: Write code to appropriately accumulate the inter-CTC loss in loss_inter_ctc
        # TODO: using the softmax probabilities saved for each intermediate layer in inter_p_ctc

        # Compute final layer CTC loss
        loss_ctc = self.hparams.ctc_cost(p_ctc, tokens, wav_lens, tokens_lens)

        # Compute final loss as a weighted combination of inter-CTC and CTC
        loss = self.inter_ctc_weight * loss_inter_ctc + (1 - self.inter_ctc_weight) * loss_ctc

        if stage != sb.Stage.TRAIN:
            # Decode token terms to words
            predicted_words = [
                    self.tokenizer.decode_ids(utt_seq).split(" ") for utt_seq in hyps
                ]
            target_words = [wrd.split(" ") for wrd in batch.wrd]
            self.wer_metric.append(ids, predicted_words, target_words)
            self.cer_metric.append(ids, predicted_words, target_words)

        return loss

In [22]:
task_hyperparameters = """

# Setup the directory to host experiment results
output_folder: !ref results/transformer/Part_2A
wer_file: !ref <output_folder>/wer.txt
save_folder: !ref <output_folder>/save
train_log: !ref <output_folder>/train_log.txt

interctc_weight: 0.3
intermediate_layers: '2,4'

train_logger: !new:speechbrain.utils.train_logger.FileTrainLogger
    save_file: !ref <train_log>

checkpointer: !new:speechbrain.utils.checkpoints.Checkpointer
    checkpoints_dir: !ref <save_folder>
    recoverables:
        model: !ref <model>
        noam_scheduler: !ref <noam_annealing>
        normalizer: !ref <normalize>
        counter: !ref <epoch_counter>
"""

In [None]:
hyperparams = global_hyperparams + task_hyperparameters
hparams = load_hyperpyyaml(hyperparams)

# Create experiment directory
sb.create_experiment_directory(
    experiment_directory=hparams["output_folder"],
    overrides=None,
)

# We download the pretrained LM from HuggingFace (or elsewhere depending on
# the path given in the YAML file). The tokenizer is loaded at the same time.
run_on_main(hparams["pretrainer"].collect_files)
hparams["pretrainer"].load_collected(device=run_opts["device"])

# Trainer initialization
asr_brain = ASR_2A(
    modules=hparams["modules"],
    opt_class=hparams["Adam"],
    hparams=hparams,
    checkpointer=hparams["checkpointer"],
    run_opts=run_opts,
    tokenizer=tokenizer,
)

# adding objects to trainer:
train_dataloader_opts = hparams["train_dataloader_opts"]
valid_dataloader_opts = hparams["valid_dataloader_opts"]

# Training
asr_brain.fit(
    asr_brain.hparams.epoch_counter,
    train_data,
    valid_data,
    train_loader_kwargs=train_dataloader_opts,
    valid_loader_kwargs=valid_dataloader_opts
)

# Testing

asr_brain.hparams.test_wer_file = asr_brain.hparams.wer_file
asr_brain.evaluate(
    test_data,
    max_key="ACC",
    test_loader_kwargs=hparams["test_dataloader_opts"],
)

speechbrain.core - Beginning experiment!
speechbrain.core - Experiment folder: results/transformer/Part_2A
speechbrain.pretrained.fetching - Fetch tokenizer.ckpt: Using existing file/symlink in model_checkpoints/tokenizer.ckpt.
speechbrain.utils.parameter_transfer - Set local path in self.paths[tokenizer] = model_checkpoints/tokenizer.ckpt
speechbrain.utils.parameter_transfer - Loading pretrained files for: tokenizer
speechbrain.utils.parameter_transfer - Redirecting (loading from local path): model_checkpoints/tokenizer.ckpt -> model_checkpoints/tokenizer.ckpt
speechbrain.core - Info: max_grad_norm arg from hparam file is used
speechbrain.core - Info: ckpt_interval_minutes arg from hparam file is used
speechbrain.core - 698.9k trainable parameters in ASR_2A
speechbrain.utils.checkpoints - Would load a checkpoint here, but none found yet.
speechbrain.utils.epoch_loop - Going into epoch 1


100%|██████████| 190/190 [00:25<00:00,  7.33it/s, train_loss=747]
100%|██████████| 137/137 [00:08<00:00, 15.25it/s]

speechbrain.utils.train_logger - epoch: 1, lr: 1.26e-04, steps: 190, optimizer: Adam - train loss: 7.47e+02 - valid loss: 3.15e+02, valid CER: 1.00e+02, valid WER: 1.00e+02





speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-34-19+00
speechbrain.utils.epoch_loop - Going into epoch 2


100%|██████████| 190/190 [00:27<00:00,  7.03it/s, train_loss=564]
100%|██████████| 137/137 [00:08<00:00, 15.39it/s]

speechbrain.utils.train_logger - epoch: 2, lr: 2.53e-04, steps: 380, optimizer: Adam - train loss: 5.64e+02 - valid loss: 3.05e+02, valid CER: 1.00e+02, valid WER: 1.00e+02





speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-34-55+00
speechbrain.utils.epoch_loop - Going into epoch 3


100%|██████████| 190/190 [00:26<00:00,  7.21it/s, train_loss=560]
100%|██████████| 137/137 [00:08<00:00, 15.24it/s]

speechbrain.utils.train_logger - epoch: 3, lr: 3.79e-04, steps: 570, optimizer: Adam - train loss: 5.60e+02 - valid loss: 3.05e+02, valid CER: 1.00e+02, valid WER: 1.00e+02





speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-35-31+00
speechbrain.utils.epoch_loop - Going into epoch 4


100%|██████████| 190/190 [00:26<00:00,  7.27it/s, train_loss=558]
100%|██████████| 137/137 [00:08<00:00, 15.23it/s]

speechbrain.utils.train_logger - epoch: 4, lr: 5.06e-04, steps: 760, optimizer: Adam - train loss: 5.58e+02 - valid loss: 3.02e+02, valid CER: 99.69, valid WER: 99.62





speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-36-06+00
speechbrain.utils.epoch_loop - Going into epoch 5


100%|██████████| 190/190 [00:25<00:00,  7.31it/s, train_loss=546]
100%|██████████| 137/137 [00:10<00:00, 13.62it/s]

speechbrain.utils.train_logger - epoch: 5, lr: 6.33e-04, steps: 950, optimizer: Adam - train loss: 5.46e+02 - valid loss: 2.88e+02, valid CER: 94.79, valid WER: 99.72
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-36-42+00





speechbrain.utils.epoch_loop - Going into epoch 6


100%|██████████| 190/190 [00:26<00:00,  7.25it/s, train_loss=503]
100%|██████████| 137/137 [00:10<00:00, 13.04it/s]

speechbrain.utils.train_logger - epoch: 6, lr: 7.59e-04, steps: 1140, optimizer: Adam - train loss: 5.03e+02 - valid loss: 2.59e+02, valid CER: 84.83, valid WER: 97.90
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-37-19+00





speechbrain.utils.epoch_loop - Going into epoch 7


100%|██████████| 190/190 [00:26<00:00,  7.24it/s, train_loss=452]
100%|██████████| 137/137 [00:11<00:00, 11.65it/s]

speechbrain.utils.train_logger - epoch: 7, lr: 8.86e-04, steps: 1330, optimizer: Adam - train loss: 4.52e+02 - valid loss: 2.33e+02, valid CER: 72.05, valid WER: 94.74
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-37-58+00





speechbrain.utils.epoch_loop - Going into epoch 8


100%|██████████| 190/190 [00:26<00:00,  7.20it/s, train_loss=411]
100%|██████████| 137/137 [00:12<00:00, 10.86it/s]

speechbrain.utils.train_logger - epoch: 8, lr: 9.94e-04, steps: 1520, optimizer: Adam - train loss: 4.11e+02 - valid loss: 2.14e+02, valid CER: 65.64, valid WER: 92.39
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-38-38+00





speechbrain.utils.epoch_loop - Going into epoch 9


100%|██████████| 190/190 [00:27<00:00,  7.02it/s, train_loss=380]
100%|██████████| 137/137 [00:13<00:00,  9.92it/s]

speechbrain.utils.train_logger - epoch: 9, lr: 9.37e-04, steps: 1710, optimizer: Adam - train loss: 3.80e+02 - valid loss: 2.01e+02, valid CER: 57.15, valid WER: 89.45
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-39-19+00





speechbrain.utils.epoch_loop - Going into epoch 10


100%|██████████| 190/190 [00:26<00:00,  7.14it/s, train_loss=356]
100%|██████████| 137/137 [00:14<00:00,  9.53it/s]

speechbrain.utils.train_logger - epoch: 10, lr: 8.89e-04, steps: 1900, optimizer: Adam - train loss: 3.56e+02 - valid loss: 1.92e+02, valid CER: 55.99, valid WER: 88.32
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-40-00+00





speechbrain.utils.epoch_loop - Going into epoch 11


100%|██████████| 190/190 [00:26<00:00,  7.17it/s, train_loss=338]
100%|██████████| 137/137 [00:13<00:00, 10.05it/s]

speechbrain.utils.train_logger - epoch: 11, lr: 8.47e-04, steps: 2090, optimizer: Adam - train loss: 3.38e+02 - valid loss: 1.86e+02, valid CER: 54.72, valid WER: 87.24
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-40-41+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-34-19+00
speechbrain.utils.epoch_loop - Going into epoch 12


100%|██████████| 190/190 [00:26<00:00,  7.06it/s, train_loss=324]
100%|██████████| 137/137 [00:13<00:00,  9.79it/s]

speechbrain.utils.train_logger - epoch: 12, lr: 8.11e-04, steps: 2280, optimizer: Adam - train loss: 3.24e+02 - valid loss: 1.80e+02, valid CER: 51.85, valid WER: 86.14
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-41-23+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-34-55+00
speechbrain.utils.epoch_loop - Going into epoch 13


100%|██████████| 190/190 [00:26<00:00,  7.13it/s, train_loss=311]
100%|██████████| 137/137 [00:14<00:00,  9.35it/s]

speechbrain.utils.train_logger - epoch: 13, lr: 7.79e-04, steps: 2470, optimizer: Adam - train loss: 3.11e+02 - valid loss: 1.75e+02, valid CER: 50.06, valid WER: 84.68
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-42-05+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-35-31+00
speechbrain.utils.epoch_loop - Going into epoch 14


100%|██████████| 190/190 [00:26<00:00,  7.15it/s, train_loss=300]
100%|██████████| 137/137 [00:14<00:00,  9.19it/s]

speechbrain.utils.train_logger - epoch: 14, lr: 7.51e-04, steps: 2660, optimizer: Adam - train loss: 3.00e+02 - valid loss: 1.71e+02, valid CER: 48.22, valid WER: 83.91
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-42-48+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-36-06+00
speechbrain.utils.epoch_loop - Going into epoch 15


100%|██████████| 190/190 [00:26<00:00,  7.04it/s, train_loss=291]
100%|██████████| 137/137 [00:14<00:00,  9.36it/s]

speechbrain.utils.train_logger - epoch: 15, lr: 7.26e-04, steps: 2850, optimizer: Adam - train loss: 2.91e+02 - valid loss: 1.68e+02, valid CER: 46.79, valid WER: 83.00
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-43-30+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-36-42+00
speechbrain.utils.epoch_loop - Going into epoch 16


100%|██████████| 190/190 [00:26<00:00,  7.14it/s, train_loss=282]
100%|██████████| 137/137 [00:15<00:00,  8.69it/s]

speechbrain.utils.train_logger - epoch: 16, lr: 7.03e-04, steps: 3040, optimizer: Adam - train loss: 2.82e+02 - valid loss: 1.65e+02, valid CER: 45.43, valid WER: 82.59
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-44-13+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-37-19+00
speechbrain.utils.epoch_loop - Going into epoch 17


100%|██████████| 190/190 [00:26<00:00,  7.11it/s, train_loss=274]
100%|██████████| 137/137 [00:14<00:00,  9.35it/s]

speechbrain.utils.train_logger - epoch: 17, lr: 6.82e-04, steps: 3230, optimizer: Adam - train loss: 2.74e+02 - valid loss: 1.62e+02, valid CER: 44.21, valid WER: 81.57
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-44-56+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-37-58+00
speechbrain.utils.epoch_loop - Going into epoch 18


100%|██████████| 190/190 [00:27<00:00,  7.02it/s, train_loss=268]
100%|██████████| 137/137 [00:15<00:00,  8.75it/s]

speechbrain.utils.train_logger - epoch: 18, lr: 6.62e-04, steps: 3420, optimizer: Adam - train loss: 2.68e+02 - valid loss: 1.60e+02, valid CER: 43.25, valid WER: 80.93
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-45-39+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-38-38+00
speechbrain.utils.epoch_loop - Going into epoch 19


100%|██████████| 190/190 [00:26<00:00,  7.16it/s, train_loss=261]
100%|██████████| 137/137 [00:15<00:00,  9.04it/s]

speechbrain.utils.train_logger - epoch: 19, lr: 6.45e-04, steps: 3610, optimizer: Adam - train loss: 2.61e+02 - valid loss: 1.58e+02, valid CER: 42.48, valid WER: 80.66
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-46-22+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-39-19+00
speechbrain.utils.epoch_loop - Going into epoch 20


100%|██████████| 190/190 [00:27<00:00,  7.01it/s, train_loss=255]
100%|██████████| 137/137 [00:15<00:00,  8.79it/s]

speechbrain.utils.train_logger - epoch: 20, lr: 6.28e-04, steps: 3800, optimizer: Adam - train loss: 2.55e+02 - valid loss: 1.56e+02, valid CER: 41.56, valid WER: 80.72
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-47-05+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-40-00+00
speechbrain.utils.epoch_loop - Going into epoch 21


100%|██████████| 190/190 [00:27<00:00,  7.00it/s, train_loss=250]
100%|██████████| 137/137 [00:15<00:00,  8.64it/s]

speechbrain.utils.train_logger - epoch: 21, lr: 6.13e-04, steps: 3990, optimizer: Adam - train loss: 2.50e+02 - valid loss: 1.55e+02, valid CER: 41.22, valid WER: 80.50
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-47-49+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-40-41+00
speechbrain.utils.epoch_loop - Going into epoch 22


100%|██████████| 190/190 [00:26<00:00,  7.20it/s, train_loss=244]
100%|██████████| 137/137 [00:15<00:00,  8.73it/s]

speechbrain.utils.train_logger - epoch: 22, lr: 5.99e-04, steps: 4180, optimizer: Adam - train loss: 2.44e+02 - valid loss: 1.54e+02, valid CER: 40.13, valid WER: 80.85
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-48-32+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-41-23+00
speechbrain.utils.epoch_loop - Going into epoch 23


100%|██████████| 190/190 [00:26<00:00,  7.12it/s, train_loss=240]
100%|██████████| 137/137 [00:16<00:00,  8.47it/s]

speechbrain.utils.train_logger - epoch: 23, lr: 5.86e-04, steps: 4370, optimizer: Adam - train loss: 2.40e+02 - valid loss: 1.52e+02, valid CER: 39.89, valid WER: 79.93
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-49-16+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-42-05+00
speechbrain.utils.epoch_loop - Going into epoch 24


100%|██████████| 190/190 [00:26<00:00,  7.09it/s, train_loss=235]
100%|██████████| 137/137 [00:15<00:00,  8.81it/s]

speechbrain.utils.train_logger - epoch: 24, lr: 5.74e-04, steps: 4560, optimizer: Adam - train loss: 2.35e+02 - valid loss: 1.51e+02, valid CER: 39.46, valid WER: 79.24
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-49-59+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-42-48+00
speechbrain.utils.epoch_loop - Going into epoch 25


100%|██████████| 190/190 [00:26<00:00,  7.12it/s, train_loss=232]
100%|██████████| 137/137 [00:15<00:00,  8.85it/s]

speechbrain.utils.train_logger - epoch: 25, lr: 5.62e-04, steps: 4750, optimizer: Adam - train loss: 2.32e+02 - valid loss: 1.51e+02, valid CER: 39.13, valid WER: 79.08
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-50-42+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-43-30+00
speechbrain.utils.epoch_loop - Going into epoch 26


100%|██████████| 190/190 [00:26<00:00,  7.28it/s, train_loss=228]
100%|██████████| 137/137 [00:16<00:00,  8.37it/s]

speechbrain.utils.train_logger - epoch: 26, lr: 5.51e-04, steps: 4940, optimizer: Adam - train loss: 2.28e+02 - valid loss: 1.49e+02, valid CER: 38.72, valid WER: 78.24
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-51-26+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-44-13+00
speechbrain.utils.epoch_loop - Going into epoch 27


100%|██████████| 190/190 [00:26<00:00,  7.16it/s, train_loss=224]
100%|██████████| 137/137 [00:15<00:00,  8.83it/s]

speechbrain.utils.train_logger - epoch: 27, lr: 5.41e-04, steps: 5130, optimizer: Adam - train loss: 2.24e+02 - valid loss: 1.49e+02, valid CER: 38.48, valid WER: 78.68
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-52-09+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-44-56+00
speechbrain.utils.epoch_loop - Going into epoch 28


100%|██████████| 190/190 [00:26<00:00,  7.10it/s, train_loss=220]
100%|██████████| 137/137 [00:15<00:00,  8.71it/s]

speechbrain.utils.train_logger - epoch: 28, lr: 5.31e-04, steps: 5320, optimizer: Adam - train loss: 2.20e+02 - valid loss: 1.48e+02, valid CER: 38.10, valid WER: 77.44
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-52-52+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-45-39+00
speechbrain.utils.epoch_loop - Going into epoch 29


100%|██████████| 190/190 [00:25<00:00,  7.33it/s, train_loss=217]
100%|██████████| 137/137 [00:15<00:00,  8.97it/s]

speechbrain.utils.train_logger - epoch: 29, lr: 5.22e-04, steps: 5510, optimizer: Adam - train loss: 2.17e+02 - valid loss: 1.48e+02, valid CER: 37.88, valid WER: 77.30
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-53-34+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-46-22+00
speechbrain.utils.epoch_loop - Going into epoch 30


100%|██████████| 190/190 [00:26<00:00,  7.18it/s, train_loss=214]
100%|██████████| 137/137 [00:16<00:00,  8.47it/s]

speechbrain.utils.train_logger - epoch: 30, lr: 5.13e-04, steps: 5700, optimizer: Adam - train loss: 2.14e+02 - valid loss: 1.47e+02, valid CER: 37.75, valid WER: 77.26
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-54-18+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-17+12-47-05+00
speechbrain.utils.checkpoints - Would load a checkpoint here, but none found yet.


100%|██████████| 328/328 [00:44<00:00,  7.40it/s]

speechbrain.utils.train_logger - Epoch loaded: 30 - test loss: 83.46, test CER: 38.40, test WER: 78.30





83.4635204861803

# Task 2.2: The PowerConv Module
In this section, we will update the Conformer encoder by replacing Convolution with **PowerConv**. Rest of the architecture remains the same. Note this will be added on top of inter-CTC.

In [23]:
task_hyperparameters = """

# Setup the directory to host experiment results
output_folder: !ref results/transformer/Task_2B
wer_file: !ref <output_folder>/wer.txt
save_folder: !ref <output_folder>/save
train_log: !ref <output_folder>/train_log.txt

interctc_weight: 0.3
intermediate_layers: '2,4'

train_logger: !new:speechbrain.utils.train_logger.FileTrainLogger
    save_file: !ref <train_log>

checkpointer: !new:speechbrain.utils.checkpoints.Checkpointer
    checkpoints_dir: !ref <save_folder>
    recoverables:
        model: !ref <model>
        noam_scheduler: !ref <noam_annealing>
        normalizer: !ref <normalize>
        counter: !ref <epoch_counter>
"""

In [24]:
import torch
import speechbrain as sb

from speechbrain.nnet.attention import (
    RelPosMHAXL,
    MultiheadAttention,
    PositionalwiseFeedForward,
)
from speechbrain.nnet.normalization import LayerNorm
from speechbrain.nnet.activations import Swish
from speechbrain.nnet.CNN import Conv1d

class PowerConv(torch.nn.Module):
    def __init__(
        self,
        input_size,
        kernel_size=31,
        dropout=0.0,
    ):
        super().__init__()

        # We upsample our input by a factor of 2 to
        input_size = input_size*2
        self.input_size = input_size

        n_channels = input_size // 2  # split input channels

        # TODO: First projection feedforward layer to upsample the input
        self.channel_proj1 = torch.nn.Linear(n_channels,input_size)
        self.norm = LayerNorm(n_channels)
        self.conv = Conv1d(in_channels=n_channels, out_channels=n_channels, kernel_size=kernel_size)
        ### TODO: Use the groups parameter in the Conv1D class and set it to n_channels
        ### TODO: Note that this conv operator does not change the feature dimensionality.
        ### TODO: Use the appropriate value for the padding parameter in the Conv1D class to keep the feature dimensionality unaltered.

        # TODO: Second projection feedforward layer
        self.channel_proj2 = torch.nn.Linear(n_channels,n_channels)
        self.dropout = torch.nn.Dropout(dropout)

        # Initialize convolution with ones.
        torch.nn.init.ones_(self.conv.conv.bias)

    def forward(self, x):
        """
            Shape of input x: (B, T, D)
            Return output of shape: (B, T, D)
        """
        # TODO: Implement the PowerConv module as described in the assignment pdf
        x = self.norm(x)
        V = self.channel_proj1(x)
        v1,v2 = V[...,:self.input_size//2],V[...,self.input_size//2:]
        v2 = self.norm(v2)
        v2 = self.conv(v2)
        z = v1*v2
        x = self.dropout(z)
        x = self.channel_proj2(x)
        return x

class CustomConformerEncoderLayer(torch.nn.Module):
    def __init__(
        self,
        d_model,
        d_ffn,
        nhead,
        kernel_size=31,
        kdim=None,
        vdim=None,
        activation=Swish,
        bias=True,
        dropout=0.0,
        causal=False,
        attention_type="RelPosMHAXL",
    ):
        super().__init__()

        # Self attention block
        if attention_type == "regularMHA":
            self.mha_layer = MultiheadAttention(
                nhead=nhead,
                d_model=d_model,
                dropout=dropout,
                kdim=kdim,
                vdim=vdim,
            )
        elif attention_type == "RelPosMHAXL":
            # transformerXL style positional encoding
            self.mha_layer = RelPosMHAXL(
                num_heads=nhead,
                embed_dim=d_model,
                dropout=dropout,
                mask_pos_future=causal,
            )
        else:
            raise ValueError("Unknown attention type")

        # Create instance of our custom convolution block
        self.convolution_module = PowerConv(
            d_model, kernel_size, dropout
        )

        # Feed forward macaron block
        self.ffn_module1 = torch.nn.Sequential(
            torch.nn.LayerNorm(d_model),
            PositionalwiseFeedForward(
                d_ffn=d_ffn,
                input_size=d_model,
                dropout=dropout,
                activation=activation,
            ),
            torch.nn.Dropout(dropout),
        )

        # Feed forward block
        self.ffn_module2 = torch.nn.Sequential(
            torch.nn.LayerNorm(d_model),
            PositionalwiseFeedForward(
                d_ffn=d_ffn,
                input_size=d_model,
                dropout=dropout,
                activation=activation,
            ),
            torch.nn.Dropout(dropout),
        )

        self.norm1 = LayerNorm(d_model)
        self.norm2 = LayerNorm(d_model)
        self.drop = torch.nn.Dropout(dropout)

    def forward(
        self,
        x,
        src_mask = None,
        src_key_padding_mask = None,
        pos_embs = None,
    ):
        conv_mask = None
        if src_key_padding_mask is not None:
            conv_mask = src_key_padding_mask.unsqueeze(-1)

        # ffn module
        x = x + 0.5 * self.ffn_module1(x)

        # muti-head attention module
        skip = x
        x = self.norm1(x)
        x, self_attn = self.mha_layer(
            x,
            x,
            x,
            attn_mask=src_mask,
            key_padding_mask=src_key_padding_mask,
            pos_embs=pos_embs,
        )
        x = x + skip

        # convolution module
        x = x + self.convolution_module(x)

        # ffn module
        x = self.norm2(x + 0.5 * self.ffn_module2(x))

        return x, self_attn


class CustomConformerEncoder(torch.nn.Module):
    def __init__(
        self,
        num_layers,
        d_model,
        d_ffn,
        nhead,
        kernel_size=31,
        kdim=None,
        vdim=None,
        activation=Swish,
        bias=True,
        dropout=0.0,
        causal=False,
        attention_type="RelPosMHAXL",
    ):
        super().__init__()

        # Create layers using our custom encoder layer that utilizes PowerConv
        self.layers = torch.nn.ModuleList(
            [
                CustomConformerEncoderLayer(
                    d_ffn=d_ffn,
                    nhead=nhead,
                    d_model=d_model,
                    kdim=kdim,
                    vdim=vdim,
                    dropout=dropout,
                    activation=activation,
                    kernel_size=kernel_size,
                    bias=bias,
                    causal=causal,
                    attention_type=attention_type,
                )
                for i in range(num_layers)
            ]
        )
        self.norm = LayerNorm(d_model, eps=1e-6)
        self.attention_type = attention_type

    def forward(
        self,
        src,
        src_mask = None,
        src_key_padding_mask = None,
        pos_embs = None,
    ):

        if self.attention_type == "RelPosMHAXL":
            if pos_embs is None:
                raise ValueError(
                    "The chosen attention type for the Conformer is RelPosMHAXL. For this attention type, the positional embeddings are mandatory"
                )

        output = src
        attention_lst = []
        # Loop through the encoder layers
        for enc_layer in self.layers:
            output, attention = enc_layer(
                output,
                src_mask=src_mask,
                src_key_padding_mask=src_key_padding_mask,
                pos_embs=pos_embs,
            )
            attention_lst.append(attention)
        output = self.norm(output)

        return output, attention_lst

In [25]:
class ASR_2B(ASR_2A):
    def __init__(
        self, device="cpu", *args, **kwargs
    ):
        super().__init__(*args, **kwargs)

        # Remove the old hooks as they are not useful
        for hook in self.hooks:
            hook.remove()

        # Instantiate our custom encoder that uses PowerConv
        encoder = CustomConformerEncoder(
            nhead=self.hparams.nhead,
            num_layers=self.hparams.num_encoder_layers,
            d_ffn=self.hparams.d_ffn,
            d_model=self.hparams.d_model,
            dropout=self.hparams.transformer_dropout,
            activation=self.hparams.activation,
            attention_type=self.hparams.attention_type,
        ).to(device)

        # Replace the standard encoder with our encoder
        self.modules.Transformer.encoder = encoder

        def get_intermediate_output(module, input, output):
          self.inter_logits.append(output)

        self.hooks = [self.modules.Transformer.encoder.layers[i-1].register_forward_hook(get_intermediate_output) for i in self.intermediate_layers] #i-1 since 1 indexed
        # TODO: Copy this code from your implemention in Part II(A) within the __init__ function of ASR_2A that populates self.hooks

In [26]:
hyperparams = global_hyperparams + task_hyperparameters
hparams = load_hyperpyyaml(hyperparams)

# Create experiment directory
sb.create_experiment_directory(
    experiment_directory=hparams["output_folder"],
    overrides=None,
)

# We download the pretrained LM from HuggingFace (or elsewhere depending on
# the path given in the YAML file). The tokenizer is loaded at the same time.
run_on_main(hparams["pretrainer"].collect_files)
hparams["pretrainer"].load_collected(device=run_opts["device"])

# Trainer initialization
asr_brain = ASR_2B(
    modules=hparams["modules"],
    opt_class=hparams["Adam"],
    hparams=hparams,
    checkpointer=hparams["checkpointer"],
    run_opts=run_opts,
    tokenizer=tokenizer,
    device=device
)

# adding objects to trainer:
train_dataloader_opts = hparams["train_dataloader_opts"]
valid_dataloader_opts = hparams["valid_dataloader_opts"]

# Training
asr_brain.fit(
    asr_brain.hparams.epoch_counter,
    train_data,
    valid_data,
    train_loader_kwargs=train_dataloader_opts,
    valid_loader_kwargs=valid_dataloader_opts
)

# Testing

asr_brain.hparams.test_wer_file = asr_brain.hparams.wer_file
asr_brain.evaluate(
    test_data,
    max_key="ACC",
    test_loader_kwargs=hparams["test_dataloader_opts"],
)

speechbrain.core - Beginning experiment!
speechbrain.core - Experiment folder: results/transformer/Task_2B
speechbrain.pretrained.fetching - Fetch tokenizer.ckpt: Using existing file/symlink in model_checkpoints/tokenizer.ckpt.
speechbrain.utils.parameter_transfer - Set local path in self.paths[tokenizer] = model_checkpoints/tokenizer.ckpt
speechbrain.utils.parameter_transfer - Loading pretrained files for: tokenizer
speechbrain.utils.parameter_transfer - Redirecting (loading from local path): model_checkpoints/tokenizer.ckpt -> model_checkpoints/tokenizer.ckpt
speechbrain.core - Info: max_grad_norm arg from hparam file is used
speechbrain.core - Info: ckpt_interval_minutes arg from hparam file is used
speechbrain.core - 698.9k trainable parameters in ASR_2B
speechbrain.utils.checkpoints - Would load a checkpoint here, but none found yet.
speechbrain.utils.epoch_loop - Going into epoch 1


100%|██████████| 190/190 [00:26<00:00,  7.21it/s, train_loss=687]
100%|██████████| 137/137 [00:08<00:00, 15.88it/s]

speechbrain.utils.train_logger - epoch: 1, lr: 1.26e-04, steps: 190, optimizer: Adam - train loss: 6.87e+02 - valid loss: 3.14e+02, valid CER: 1.00e+02, valid WER: 1.00e+02





speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+14-49-34+00
speechbrain.utils.epoch_loop - Going into epoch 2


100%|██████████| 190/190 [00:25<00:00,  7.56it/s, train_loss=564]
100%|██████████| 137/137 [00:08<00:00, 15.73it/s]

speechbrain.utils.train_logger - epoch: 2, lr: 2.53e-04, steps: 380, optimizer: Adam - train loss: 5.64e+02 - valid loss: 3.05e+02, valid CER: 1.00e+02, valid WER: 1.00e+02





speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+14-50-08+00
speechbrain.utils.epoch_loop - Going into epoch 3


100%|██████████| 190/190 [00:25<00:00,  7.51it/s, train_loss=560]
100%|██████████| 137/137 [00:08<00:00, 16.22it/s]

speechbrain.utils.train_logger - epoch: 3, lr: 3.79e-04, steps: 570, optimizer: Adam - train loss: 5.60e+02 - valid loss: 3.04e+02, valid CER: 1.00e+02, valid WER: 1.00e+02





speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+14-50-42+00
speechbrain.utils.epoch_loop - Going into epoch 4


100%|██████████| 190/190 [00:25<00:00,  7.36it/s, train_loss=548]
100%|██████████| 137/137 [00:08<00:00, 16.06it/s]

speechbrain.utils.train_logger - epoch: 4, lr: 5.06e-04, steps: 760, optimizer: Adam - train loss: 5.48e+02 - valid loss: 2.89e+02, valid CER: 94.74, valid WER: 99.98
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+14-51-17+00





speechbrain.utils.epoch_loop - Going into epoch 5


100%|██████████| 190/190 [00:25<00:00,  7.38it/s, train_loss=495]
100%|██████████| 137/137 [00:10<00:00, 12.71it/s]

speechbrain.utils.train_logger - epoch: 5, lr: 6.33e-04, steps: 950, optimizer: Adam - train loss: 4.95e+02 - valid loss: 2.50e+02, valid CER: 74.13, valid WER: 94.99
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+14-51-54+00





speechbrain.utils.epoch_loop - Going into epoch 6


100%|██████████| 190/190 [00:25<00:00,  7.40it/s, train_loss=428]
100%|██████████| 137/137 [00:12<00:00, 10.91it/s]

speechbrain.utils.train_logger - epoch: 6, lr: 7.59e-04, steps: 1140, optimizer: Adam - train loss: 4.28e+02 - valid loss: 2.21e+02, valid CER: 61.64, valid WER: 90.92
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+14-52-33+00





speechbrain.utils.epoch_loop - Going into epoch 7


100%|██████████| 190/190 [00:25<00:00,  7.54it/s, train_loss=380]
100%|██████████| 137/137 [00:13<00:00, 10.29it/s]

speechbrain.utils.train_logger - epoch: 7, lr: 8.86e-04, steps: 1330, optimizer: Adam - train loss: 3.80e+02 - valid loss: 2.02e+02, valid CER: 56.48, valid WER: 88.89
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+14-53-12+00





speechbrain.utils.epoch_loop - Going into epoch 8


100%|██████████| 190/190 [00:25<00:00,  7.36it/s, train_loss=345]
100%|██████████| 137/137 [00:13<00:00,  9.98it/s]

speechbrain.utils.train_logger - epoch: 8, lr: 9.94e-04, steps: 1520, optimizer: Adam - train loss: 3.45e+02 - valid loss: 1.91e+02, valid CER: 52.93, valid WER: 87.48
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+14-53-51+00





speechbrain.utils.epoch_loop - Going into epoch 9


100%|██████████| 190/190 [00:25<00:00,  7.43it/s, train_loss=318]
100%|██████████| 137/137 [00:14<00:00,  9.29it/s]

speechbrain.utils.train_logger - epoch: 9, lr: 9.37e-04, steps: 1710, optimizer: Adam - train loss: 3.18e+02 - valid loss: 1.83e+02, valid CER: 49.79, valid WER: 85.76
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+14-54-32+00





speechbrain.utils.epoch_loop - Going into epoch 10


100%|██████████| 190/190 [00:25<00:00,  7.48it/s, train_loss=295]
100%|██████████| 137/137 [00:14<00:00,  9.66it/s]

speechbrain.utils.train_logger - epoch: 10, lr: 8.89e-04, steps: 1900, optimizer: Adam - train loss: 2.95e+02 - valid loss: 1.78e+02, valid CER: 48.34, valid WER: 84.88
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+14-55-12+00





speechbrain.utils.epoch_loop - Going into epoch 11


100%|██████████| 190/190 [00:25<00:00,  7.51it/s, train_loss=278]
100%|██████████| 137/137 [00:14<00:00,  9.52it/s]

speechbrain.utils.train_logger - epoch: 11, lr: 8.47e-04, steps: 2090, optimizer: Adam - train loss: 2.78e+02 - valid loss: 1.76e+02, valid CER: 46.98, valid WER: 84.28
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+14-55-53+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+14-49-34+00
speechbrain.utils.epoch_loop - Going into epoch 12


100%|██████████| 190/190 [00:26<00:00,  7.25it/s, train_loss=262]
100%|██████████| 137/137 [00:14<00:00,  9.32it/s]

speechbrain.utils.train_logger - epoch: 12, lr: 8.11e-04, steps: 2280, optimizer: Adam - train loss: 2.62e+02 - valid loss: 1.73e+02, valid CER: 45.32, valid WER: 83.64
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+14-56-34+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+14-50-08+00
speechbrain.utils.epoch_loop - Going into epoch 13


100%|██████████| 190/190 [00:25<00:00,  7.43it/s, train_loss=249]
100%|██████████| 137/137 [00:14<00:00,  9.22it/s]

speechbrain.utils.train_logger - epoch: 13, lr: 7.79e-04, steps: 2470, optimizer: Adam - train loss: 2.49e+02 - valid loss: 1.72e+02, valid CER: 44.52, valid WER: 83.80
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+14-57-16+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+14-50-42+00
speechbrain.utils.epoch_loop - Going into epoch 14


100%|██████████| 190/190 [00:25<00:00,  7.36it/s, train_loss=238]
100%|██████████| 137/137 [00:15<00:00,  9.05it/s]

speechbrain.utils.train_logger - epoch: 14, lr: 7.51e-04, steps: 2660, optimizer: Adam - train loss: 2.38e+02 - valid loss: 1.71e+02, valid CER: 44.24, valid WER: 83.78
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+14-57-57+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+14-51-17+00
speechbrain.utils.epoch_loop - Going into epoch 15


100%|██████████| 190/190 [00:25<00:00,  7.48it/s, train_loss=229]
100%|██████████| 137/137 [00:15<00:00,  8.70it/s]

speechbrain.utils.train_logger - epoch: 15, lr: 7.26e-04, steps: 2850, optimizer: Adam - train loss: 2.29e+02 - valid loss: 1.71e+02, valid CER: 43.85, valid WER: 82.83
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+14-58-39+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+14-51-54+00
speechbrain.utils.epoch_loop - Going into epoch 16


100%|██████████| 190/190 [00:25<00:00,  7.40it/s, train_loss=219]
100%|██████████| 137/137 [00:15<00:00,  8.72it/s]

speechbrain.utils.train_logger - epoch: 16, lr: 7.03e-04, steps: 3040, optimizer: Adam - train loss: 2.19e+02 - valid loss: 1.72e+02, valid CER: 43.51, valid WER: 82.59
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+14-59-21+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+14-52-33+00
speechbrain.utils.epoch_loop - Going into epoch 17


100%|██████████| 190/190 [00:25<00:00,  7.48it/s, train_loss=211]
100%|██████████| 137/137 [00:14<00:00,  9.26it/s]

speechbrain.utils.train_logger - epoch: 17, lr: 6.82e-04, steps: 3230, optimizer: Adam - train loss: 2.11e+02 - valid loss: 1.71e+02, valid CER: 43.43, valid WER: 81.91
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+15-00-02+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+14-53-12+00
speechbrain.utils.epoch_loop - Going into epoch 18


100%|██████████| 190/190 [00:25<00:00,  7.42it/s, train_loss=204]
100%|██████████| 137/137 [00:15<00:00,  8.79it/s]

speechbrain.utils.train_logger - epoch: 18, lr: 6.62e-04, steps: 3420, optimizer: Adam - train loss: 2.04e+02 - valid loss: 1.71e+02, valid CER: 42.95, valid WER: 81.41
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+15-00-44+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+14-53-51+00
speechbrain.utils.epoch_loop - Going into epoch 19


100%|██████████| 190/190 [00:25<00:00,  7.35it/s, train_loss=198]
100%|██████████| 137/137 [00:15<00:00,  9.06it/s]

speechbrain.utils.train_logger - epoch: 19, lr: 6.45e-04, steps: 3610, optimizer: Adam - train loss: 1.98e+02 - valid loss: 1.71e+02, valid CER: 42.58, valid WER: 81.62
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+15-01-26+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+14-54-32+00
speechbrain.utils.epoch_loop - Going into epoch 20


100%|██████████| 190/190 [00:25<00:00,  7.37it/s, train_loss=192]
100%|██████████| 137/137 [00:16<00:00,  8.52it/s]

speechbrain.utils.train_logger - epoch: 20, lr: 6.28e-04, steps: 3800, optimizer: Adam - train loss: 1.92e+02 - valid loss: 1.72e+02, valid CER: 42.45, valid WER: 82.17
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+15-02-08+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+14-55-12+00
speechbrain.utils.epoch_loop - Going into epoch 21


100%|██████████| 190/190 [00:25<00:00,  7.41it/s, train_loss=186]
100%|██████████| 137/137 [00:15<00:00,  8.92it/s]

speechbrain.utils.train_logger - epoch: 21, lr: 6.13e-04, steps: 3990, optimizer: Adam - train loss: 1.86e+02 - valid loss: 1.73e+02, valid CER: 42.34, valid WER: 81.94
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+15-02-50+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+14-55-53+00
speechbrain.utils.epoch_loop - Going into epoch 22


100%|██████████| 190/190 [00:25<00:00,  7.40it/s, train_loss=181]
100%|██████████| 137/137 [00:16<00:00,  8.19it/s]

speechbrain.utils.train_logger - epoch: 22, lr: 5.99e-04, steps: 4180, optimizer: Adam - train loss: 1.81e+02 - valid loss: 1.72e+02, valid CER: 42.20, valid WER: 82.21





speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+15-03-33+00
speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+14-56-34+00
speechbrain.utils.epoch_loop - Going into epoch 23


100%|██████████| 190/190 [00:25<00:00,  7.47it/s, train_loss=177]
100%|██████████| 137/137 [00:15<00:00,  8.92it/s]

speechbrain.utils.train_logger - epoch: 23, lr: 5.86e-04, steps: 4370, optimizer: Adam - train loss: 1.77e+02 - valid loss: 1.74e+02, valid CER: 42.15, valid WER: 82.08
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+15-04-15+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+14-57-16+00
speechbrain.utils.epoch_loop - Going into epoch 24


100%|██████████| 190/190 [00:25<00:00,  7.36it/s, train_loss=173]
100%|██████████| 137/137 [00:16<00:00,  8.51it/s]

speechbrain.utils.train_logger - epoch: 24, lr: 5.74e-04, steps: 4560, optimizer: Adam - train loss: 1.73e+02 - valid loss: 1.75e+02, valid CER: 42.04, valid WER: 82.33
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+15-04-58+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+14-57-57+00
speechbrain.utils.epoch_loop - Going into epoch 25


100%|██████████| 190/190 [00:25<00:00,  7.43it/s, train_loss=169]
100%|██████████| 137/137 [00:15<00:00,  8.92it/s]

speechbrain.utils.train_logger - epoch: 25, lr: 5.62e-04, steps: 4750, optimizer: Adam - train loss: 1.69e+02 - valid loss: 1.76e+02, valid CER: 42.02, valid WER: 81.83
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+15-05-40+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+14-58-39+00
speechbrain.utils.epoch_loop - Going into epoch 26


100%|██████████| 190/190 [00:26<00:00,  7.12it/s, train_loss=165]
100%|██████████| 137/137 [00:15<00:00,  8.89it/s]

speechbrain.utils.train_logger - epoch: 26, lr: 5.51e-04, steps: 4940, optimizer: Adam - train loss: 1.65e+02 - valid loss: 1.78e+02, valid CER: 42.20, valid WER: 81.97
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+15-06-22+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+14-59-21+00
speechbrain.utils.epoch_loop - Going into epoch 27


100%|██████████| 190/190 [00:25<00:00,  7.42it/s, train_loss=161]
100%|██████████| 137/137 [00:15<00:00,  8.90it/s]

speechbrain.utils.train_logger - epoch: 27, lr: 5.41e-04, steps: 5130, optimizer: Adam - train loss: 1.61e+02 - valid loss: 1.78e+02, valid CER: 41.67, valid WER: 80.88
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+15-07-04+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+15-00-02+00
speechbrain.utils.epoch_loop - Going into epoch 28


100%|██████████| 190/190 [00:26<00:00,  7.25it/s, train_loss=158]
100%|██████████| 137/137 [00:15<00:00,  8.88it/s]

speechbrain.utils.train_logger - epoch: 28, lr: 5.31e-04, steps: 5320, optimizer: Adam - train loss: 1.58e+02 - valid loss: 1.79e+02, valid CER: 41.81, valid WER: 81.95
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+15-07-47+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+15-00-44+00
speechbrain.utils.epoch_loop - Going into epoch 29


100%|██████████| 190/190 [00:25<00:00,  7.45it/s, train_loss=154]
100%|██████████| 137/137 [00:15<00:00,  8.77it/s]

speechbrain.utils.train_logger - epoch: 29, lr: 5.22e-04, steps: 5510, optimizer: Adam - train loss: 1.54e+02 - valid loss: 1.81e+02, valid CER: 41.80, valid WER: 81.16
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+15-08-29+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+15-01-26+00
speechbrain.utils.epoch_loop - Going into epoch 30


100%|██████████| 190/190 [00:26<00:00,  7.24it/s, train_loss=151]
100%|██████████| 137/137 [00:15<00:00,  8.88it/s]

speechbrain.utils.train_logger - epoch: 30, lr: 5.13e-04, steps: 5700, optimizer: Adam - train loss: 1.51e+02 - valid loss: 1.81e+02, valid CER: 41.42, valid WER: 81.17
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+15-09-12+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-17+15-02-08+00
speechbrain.utils.checkpoints - Would load a checkpoint here, but none found yet.


100%|██████████| 328/328 [00:45<00:00,  7.21it/s]

speechbrain.utils.train_logger - Epoch loaded: 30 - test loss: 1.11e+02, test CER: 41.80, test WER: 81.63





111.36919463553075