**This is the task for the second home work (HW2)**

Need to get Armenian MCV dataset and train Armenian ASR model

Quality metric is WER on Armenian MCV test subset.

In [1]:
"""
You can run either this notebook locally (if you have all the dependencies and a GPU) or on Google Colab.

Instructions for setting up Colab are as follows:
1. Open a new Python 3 notebook.
2. Import this notebook from GitHub (File -> Upload Notebook -> "GITHUB" tab -> copy/paste GitHub URL)
3. Connect to an instance with a GPU (Runtime -> Change runtime type -> select "GPU" for hardware accelerator)
4. Run this cell to set up dependencies.
5. Restart the runtime (Runtime -> Restart Runtime) for any upgraded packages to take effect


NOTE: User is responsible for checking the content of datasets and the applicable licenses and determining if suitable for the intended use.
"""

# Install dependencies
!pip install wget
!apt-get install sox libsndfile1 ffmpeg libsox-fmt-mp3
!pip install text-unidecode
!pip install matplotlib>=3.3.2

## Install NeMo
BRANCH = 'main'
!python -m pip install git+https://github.com/NVIDIA/NeMo.git@$BRANCH#egg=nemo_toolkit[all]

"""
Remember to restart the runtime for the kernel to pick up any upgraded packages (e.g. matplotlib)!
Alternatively, you can uncomment the exit() below to crash and restart the kernel, in the case
that you want to use the "Run All Cells" (or similar) option.
"""
# exit()

Collecting wget
  Downloading wget-3.2.zip (10 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: wget
  Building wheel for wget (setup.py) ... [?25l[?25hdone
  Created wheel for wget: filename=wget-3.2-py3-none-any.whl size=9656 sha256=e1541b4a0d157c841dba2a274d8d122fa9f3e1c106cc0dbd31cc409d733ebd8d
  Stored in directory: /root/.cache/pip/wheels/8b/f1/7f/5c94f0a7a505ca1c81cd1d9208ae2064675d97582078e6c769
Successfully built wget
Installing collected packages: wget
Successfully installed wget-3.2
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
libsndfile1 is already the newest version (1.0.31-2ubuntu0.1).
ffmpeg is already the newest version (7:4.4.2-0ubuntu0.22.04.1).
The following additional packages will be installed:
  libid3tag0 libmad0 libopencore-amrnb0 libopencore-amrwb0 libsox-fmt-alsa libsox-fmt-base libsox3
  libwavpack1
Suggested packages:
  libsox-fmt-all
The following NEW packa

'\nRemember to restart the runtime for the kernel to pick up any upgraded packages (e.g. matplotlib)!\nAlternatively, you can uncomment the exit() below to crash and restart the kernel, in the case\nthat you want to use the "Run All Cells" (or similar) option.\n'

In [2]:
import os
import glob
import subprocess
import tarfile
import wget
import copy
from omegaconf import OmegaConf, open_dict


In [3]:
data_dir = 'datasets/'

if not os.path.exists(data_dir):
  os.makedirs(data_dir, exist_ok=True)

if not os.path.exists("scripts"):
  os.makedirs("scripts")

import nemo
import nemo.collections.asr as nemo_asr
from nemo.collections.asr.metrics.wer import word_error_rate
from nemo.utils import logging, exp_manager

**Download dataset**

We will use the NeMo script in the scripts directory to download and prepare the Mozilla Common Voice (MCV) dataset for Armenian.

The data preparation script will download the audio files and respective transcripts and then process the audio into mono-channel 16 kHz wave files that can be easily used for training ASR models.

**Hugging Face**

Now, let's download the Mozilla CommonVoice Spanish dataset. We will ignore the larger train file and get just the test part for the purposes of this tutorial. For good results, you will need to get the train files and likely other datasets too, bringing the total to over 1k hours.

Website steps:

Visit https://huggingface.co/settings/profile

Visit "Access Tokens" on list of items.

Create new token - provide a name for the token and "read" access is sufficient.

PRESERVE THAT TOKEN API KEY. You can copy that key for next step.

Visit the HuggingFace Dataset page for [Mozilla Common Voice 16.1](https://huggingface.co/datasets/mozilla-foundation/common_voice_16_1)

There should be a section that asks you for your approval.

Make sure you are logged in and then read that agreement.

If and only if you agree to the text, then accept the terms.

Code steps:

* Now below, run login()

* Paste your preserved HF TOKEN API KEY to the text box."

In [4]:
from huggingface_hub import login
login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [5]:
VERSION = "mozilla-foundation/common_voice_16_1"
LANGUAGE = "hy-AM"

In [6]:
tokenizer_dir = os.path.join('tokenizers', LANGUAGE)
manifest_dir = os.path.join('datasets', LANGUAGE, VERSION, LANGUAGE)

In [7]:
# If something goes wrong during data processing, un-comment the following line to delete the cached dataset
# !rm -rf datasets/$LANGUAGE
!mkdir -p datasets

The following cell will download the Japanese MCV corpus, preprocess the audio and prepare manifest files that can be directly used by NeMo models.

We will use the convert_hf_dataset_to_nemo.py script located in the nemo/scripts/speech_recognition dir if you cloned NeMo repo

In [8]:
if not os.path.exists("convert_hf_dataset_to_nemo.py"):
    !wget https://raw.githubusercontent.com/NVIDIA/NeMo/main/scripts/speech_recognition/convert_hf_dataset_to_nemo.py

--2024-04-07 11:38:24--  https://raw.githubusercontent.com/NVIDIA/NeMo/main/scripts/speech_recognition/convert_hf_dataset_to_nemo.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14735 (14K) [text/plain]
Saving to: ‘convert_hf_dataset_to_nemo.py’


2024-04-07 11:38:25 (131 MB/s) - ‘convert_hf_dataset_to_nemo.py’ saved [14735/14735]



In [9]:
!python convert_hf_dataset_to_nemo.py \
    output_dir=datasets/$LANGUAGE \
    path=$VERSION \
    name=$LANGUAGE \
    split="train" \
    ensure_ascii=False \
    use_auth_token=True

!python convert_hf_dataset_to_nemo.py \
    output_dir=datasets/$LANGUAGE \
    path=$VERSION \
    name=$LANGUAGE \
    split="validation" \
    ensure_ascii=False \
    use_auth_token=True

!python convert_hf_dataset_to_nemo.py \
    output_dir=datasets/$LANGUAGE \
    path=$VERSION \
    name=$LANGUAGE \
    split="test" \
    ensure_ascii=False \
    use_auth_token=True

!python convert_hf_dataset_to_nemo.py \
    output_dir=datasets/$LANGUAGE \
    path=$VERSION \
    name=$LANGUAGE \
    split="other" \
    ensure_ascii=False \
    use_auth_token=True

!python convert_hf_dataset_to_nemo.py \
    output_dir=datasets/$LANGUAGE \
    path=$VERSION \
    name=$LANGUAGE \
    split="invalidated" \
    ensure_ascii=False \
    use_auth_token=True

Map: 100% 744/744 [01:43<00:00,  7.16 examples/s]
Processing mozilla-foundation/common_voice_16_1 (split : invalidated):: 100% 744/744 [00:00<00:00, 976.23 samples/s]

Dataset conversion finished !
[0m

In [12]:
train_manifest = f"{manifest_dir}/train/train_mozilla-foundation_common_voice_16_1_manifest.json"
dev_manifest = f"{manifest_dir}/validation/validation_mozilla-foundation_common_voice_16_1_manifest.json"
test_manifest = f"{manifest_dir}/test/test_mozilla-foundation_common_voice_16_1_manifest.json"
other_manifest = f"{manifest_dir}/other/other_mozilla-foundation_common_voice_16_1_manifest.json"
invalidated_manifest = f"{manifest_dir}/invalidated/invalidated_mozilla-foundation_common_voice_16_1_manifest.json"

In [13]:
train_manifest_full = f"{manifest_dir}/train_full_mozilla-foundation_common_voice_16_1_manifest.json"
!cat $train_manifest $other_manifest $invalidated_manifest > $train_manifest_full

**Hint**: Convert texts to lowercase and remove punctuation to improve WER.

In [14]:
if not os.path.exists("scripts/process_asr_text_tokenizer.py"):
  !wget -P scripts/ https://raw.githubusercontent.com/NVIDIA/NeMo/main/scripts/tokenizers/process_asr_text_tokenizer.py

--2024-04-07 12:13:04--  https://raw.githubusercontent.com/NVIDIA/NeMo/main/scripts/tokenizers/process_asr_text_tokenizer.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16631 (16K) [text/plain]
Saving to: ‘scripts/process_asr_text_tokenizer.py’


2024-04-07 12:13:05 (123 MB/s) - ‘scripts/process_asr_text_tokenizer.py’ saved [16631/16631]



**Hint**: Play with `VOCAB_SIZE` to improve WER.

In [15]:
TOKENIZER_TYPE = "bpe" # "bpe", "unigram"

# while changing the vocab size no major improvements were observed
VOCAB_SIZE = 200 + 2

In [None]:
!python scripts/process_asr_text_tokenizer.py \
  --manifest=$train_manifest_full,$dev_manifest \
  --vocab_size=$VOCAB_SIZE \
  --data_root=$tokenizer_dir \
  --tokenizer="spe" \
  --spe_type=$TOKENIZER_TYPE \
  --spe_character_coverage=1.0 \
  --log

**Hint**: Try different models.

In [17]:
model = nemo_asr.models.ASRModel.from_pretrained("stt_en_fastconformer_ctc_large", map_location='cpu')

[NeMo I 2024-04-07 12:13:43 cloud:68] Downloading from: https://api.ngc.nvidia.com/v2/models/nvidia/nemo/stt_en_fastconformer_ctc_large/versions/1.0.0/files/stt_en_fastconformer_ctc_large.nemo to /root/.cache/torch/NeMo/NeMo_1.23.0rc0/stt_en_fastconformer_ctc_large/00a071a9dac048acc3aeea942b0bfa40/stt_en_fastconformer_ctc_large.nemo
[NeMo I 2024-04-07 12:14:04 common:815] Instantiating model from pre-trained checkpoint
[NeMo I 2024-04-07 12:14:06 mixins:172] Tokenizer SentencePieceTokenizer initialized with 1024 tokens


[NeMo W 2024-04-07 12:14:06 modelPT:165] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    manifest_filepath: null
    sample_rate: 16000
    batch_size: 1
    shuffle: true
    num_workers: 8
    pin_memory: true
    use_start_end_token: false
    trim_silence: false
    max_duration: 20
    min_duration: 0.1
    is_tarred: false
    tarred_audio_filepaths: null
    shuffle_n: 2048
    bucketing_strategy: fully_randomized
    bucketing_batch_size: null
    
[NeMo W 2024-04-07 12:14:06 modelPT:172] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s). 
    Validation config : 
    manifest_filepath: null
    sample_rate: 16000
    batch_size: 32
    shuffle: false
    num_workers: 8
    pin_m

[NeMo I 2024-04-07 12:14:06 features:289] PADDING: 0
[NeMo I 2024-04-07 12:14:09 save_restore_connector:263] Model EncDecCTCModelBPE was successfully restored from /root/.cache/torch/NeMo/NeMo_1.23.0rc0/stt_en_fastconformer_ctc_large/00a071a9dac048acc3aeea942b0bfa40/stt_en_fastconformer_ctc_large.nemo.


In [18]:
import torch
import torch.nn as nn

freeze_encoder = True # set to False if dare lol

def enable_bn_se(m):
    if type(m) == nn.BatchNorm1d:
        m.train()
        for param in m.parameters():
            param.requires_grad_(True)

    if 'SqueezeExcite' in type(m).__name__:
        m.train()
        for param in m.parameters():
            param.requires_grad_(True)

if freeze_encoder:
  model.encoder.freeze()
  model.encoder.apply(enable_bn_se)
  logging.info("Model encoder has been frozen")
else:
  model.encoder.unfreeze()
  logging.info("Model encoder has been un-frozen")

[NeMo I 2024-04-07 12:14:23 <ipython-input-18-180610cd99a3>:20] Model encoder has been frozen


In [19]:
TOKENIZER_DIR = os.path.join(tokenizer_dir, f"tokenizer_spe_{TOKENIZER_TYPE}_v{VOCAB_SIZE}")

model.change_vocabulary(new_tokenizer_dir=TOKENIZER_DIR, new_tokenizer_type=TOKENIZER_TYPE)

[NeMo W 2024-04-07 12:14:23 modelPT:258] You tried to register an artifact under config key=tokenizer.model_path but an artifact for it has already been registered.
[NeMo W 2024-04-07 12:14:23 modelPT:258] You tried to register an artifact under config key=tokenizer.vocab_path but an artifact for it has already been registered.
[NeMo W 2024-04-07 12:14:23 modelPT:258] You tried to register an artifact under config key=tokenizer.spe_tokenizer_vocab but an artifact for it has already been registered.


[NeMo I 2024-04-07 12:14:23 mixins:172] Tokenizer SentencePieceTokenizer initialized with 202 tokens
[NeMo I 2024-04-07 12:14:23 ctc_bpe_models:248] 
    Replacing old number of classes (1024) with new number of classes - 202
[NeMo I 2024-04-07 12:14:24 ctc_bpe_models:290] Changed tokenizer to ['<unk>', 'ու', 'ան', 'եր', 'ար', 'ակ', 'ում', '▁է', '▁հ', 'ներ', 'այ', 'ին', '▁ե', 'որ', '▁մ', '▁կ', 'ել', 'ութ', 'ուն', '▁ա', 'ությ', 'ատ', '▁ն', 'ամ', 'ական', '▁բ', '▁տ', '▁գ', 'ած', '▁ս', 'աց', 'աս', 'ով', '▁պ', 'ավ', 'են', '▁դ', 'ություն', '▁են', '▁եւ', 'եւ', 'ագ', 'ների', 'ող', '▁վ', 'իր', '▁այ', 'եղ', 'ես', '▁եր', 'ետ', '▁ան', '▁շ', 'ված', 'եց', 'առ', 'ալ', 'ները', 'ահ', 'աղ', 'ային', 'ման', 'ից', 'ության', '▁նա', '▁մի', '▁լ', 'վում', '▁թ', '▁ար', '▁ք', 'ազ', 'ոն', 'վել', '▁խ', 'րա', 'իս', '▁չ', '▁որ', '▁ու', 'ույ', '▁համ', 'իկ', '▁մե', '▁զ', 'ադ', '▁առ', 'ապ', '▁փ', 'ըն', 'ործ', '▁էր', '▁պատ', '▁տար', 'ուր', 'աբ', 'իմ', 'անի', 'րան', '▁իր', 'ստ', '▁ըն', 'անակ', 'աք', 'իտ', '▁ծ', '▁հետ', '

In [20]:
cfg = copy.deepcopy(model.cfg)

# Setup new tokenizer
cfg.tokenizer.dir = TOKENIZER_DIR
cfg.tokenizer.type = "bpe"

# Set tokenizer config
model.cfg.tokenizer = cfg.tokenizer

In [21]:
# Setup train/val/test configs
print(OmegaConf.to_yaml(cfg.train_ds))

manifest_filepath: null
sample_rate: 16000
batch_size: 1
shuffle: true
num_workers: 8
pin_memory: true
use_start_end_token: false
trim_silence: false
max_duration: 20
min_duration: 0.1
is_tarred: false
tarred_audio_filepaths: null
shuffle_n: 2048
bucketing_strategy: fully_randomized
bucketing_batch_size: null



In [22]:
# Setup train, validation, test configs
with open_dict(cfg):
  # Train dataset
  cfg.train_ds.manifest_filepath = f"{train_manifest_full},{dev_manifest}"
  cfg.train_ds.batch_size = 32
  cfg.train_ds.num_workers = 8
  cfg.train_ds.pin_memory = True
  cfg.train_ds.use_start_end_token = False
  cfg.train_ds.trim_silence = True

  # Validation dataset
  cfg.validation_ds.manifest_filepath = test_manifest
  cfg.validation_ds.batch_size = 8
  cfg.validation_ds.num_workers = 8
  cfg.validation_ds.pin_memory = True
  cfg.validation_ds.use_start_end_token = False
  cfg.validation_ds.trim_silence = True

  # Test dataset
  cfg.test_ds.manifest_filepath = test_manifest
  cfg.test_ds.batch_size = 8
  cfg.test_ds.num_workers = 8
  cfg.test_ds.pin_memory = True
  cfg.test_ds.use_start_end_token = False
  cfg.test_ds.trim_silence = True

In [23]:
# setup model with new configs
model.setup_training_data(cfg.train_ds)
model.setup_multiple_validation_data(cfg.validation_ds)
model.setup_multiple_test_data(cfg.test_ds)

[NeMo I 2024-04-07 12:14:28 collections:196] Dataset loaded with 12702 files totalling 18.82 hours
[NeMo I 2024-04-07 12:14:28 collections:197] 5 files were filtered totalling 0.13 hours


    


[NeMo I 2024-04-07 12:14:28 collections:196] Dataset loaded with 2853 files totalling 4.55 hours
[NeMo I 2024-04-07 12:14:28 collections:197] 0 files were filtered totalling 0.00 hours
[NeMo I 2024-04-07 12:14:29 collections:196] Dataset loaded with 2853 files totalling 4.55 hours
[NeMo I 2024-04-07 12:14:29 collections:197] 0 files were filtered totalling 0.00 hours


In [24]:
print(OmegaConf.to_yaml(cfg.optim))

name: adamw
lr: 0.001
betas:
- 0.9
- 0.98
weight_decay: 0.001
sched:
  name: CosineAnnealing
  warmup_steps: 15000
  warmup_ratio: null
  min_lr: 0.0001



In [25]:
with open_dict(model.cfg.optim):
  model.cfg.optim.lr = 0.025
  model.cfg.optim.weight_decay = 0.001
  model.cfg.optim.sched.warmup_steps = None  # Remove default number of steps of warmup
  model.cfg.optim.sched.warmup_ratio = 0.10  # 10 % warmup
  model.cfg.optim.sched.min_lr = 1e-9

with open_dict(model.cfg.spec_augment):
  model.cfg.spec_augment.freq_masks = 2
  model.cfg.spec_augment.freq_width = 25
  model.cfg.spec_augment.time_masks = 10
  model.cfg.spec_augment.time_width = 0.05

model.spec_augmentation = model.from_config_dict(model.cfg.spec_augment)

In [26]:
use_cer = False
log_prediction = False

model.wer.use_cer = use_cer
model.wer.log_prediction = log_prediction

In [27]:
import torch
import pytorch_lightning as ptl

if torch.cuda.is_available():
  accelerator = 'gpu'
else:
  accelerator = 'gpu'

EPOCHS = 5  # will take approximately 4 hours

trainer = ptl.Trainer(devices=1,
                      accelerator=accelerator,
                      max_epochs=EPOCHS,
                      accumulate_grad_batches=1,
                      enable_checkpointing=False,
                      logger=False,
                      log_every_n_steps=50,
                      check_val_every_n_epoch=1)

# Setup model with the trainer
model.set_trainer(trainer)

# finally, update the model's internal config
model.cfg = model._cfg

INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:IPU available: False, using: 0 IPUs
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs


In [28]:
from nemo.utils import exp_manager

# Environment variable generally used for multi-node multi-gpu training.
# In notebook environments, this flag is unnecessary and can cause logs of multiple training runs to overwrite each other.
os.environ.pop('NEMO_EXPM_VERSION', None)

config = exp_manager.ExpManagerConfig(
    exp_dir=f'experiments/lang-{LANGUAGE}/',
    name=f"ASR-Model-Language-{LANGUAGE}",
    checkpoint_callback_params=exp_manager.CallbackParams(
        monitor="val_wer",
        mode="min",
        always_save_nemo=True,
        save_best_model=True,
    ),
)

config = OmegaConf.structured(config)

logdir = exp_manager.exp_manager(trainer, config)

[NeMo I 2024-04-07 12:14:57 exp_manager:396] Experiments will be logged at experiments/lang-hy-AM/ASR-Model-Language-hy-AM/2024-04-07_12-14-57
[NeMo I 2024-04-07 12:14:57 exp_manager:856] TensorboardLogger has been set up


In [29]:
%%time
trainer.fit(model)

INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


[NeMo I 2024-04-07 12:15:05 modelPT:724] Optimizer config = AdamW (
    Parameter Group 0
        amsgrad: False
        betas: [0.9, 0.98]
        capturable: False
        differentiable: False
        eps: 1e-08
        foreach: None
        fused: None
        lr: 0.025
        maximize: False
        weight_decay: 0.001
    )
[NeMo I 2024-04-07 12:15:05 lr_scheduler:915] Scheduler "<nemo.core.optim.lr_scheduler.CosineAnnealing object at 0x7ab8aaf446a0>" 
    will be used during training (effective maximum steps = 1985) - 
    Parameters : 
    (warmup_steps: null
    warmup_ratio: 0.1
    min_lr: 1.0e-09
    max_steps: 1985
    )


INFO:pytorch_lightning.callbacks.model_summary:
  | Name              | Type                              | Params
------------------------------------------------------------------------
0 | preprocessor      | AudioToMelSpectrogramPreprocessor | 0     
1 | encoder           | ConformerEncoder                  | 115 M 
2 | spec_augmentation | SpectrogramAugmentation           | 0     
3 | wer               | WER                               | 0     
4 | decoder           | ConvASRDecoder                    | 104 K 
5 | loss              | CTCLoss                           | 0     
------------------------------------------------------------------------
122 K     Trainable params
115 M     Non-trainable params
115 M     Total params
460.715   Total estimated model params size (MB)


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

    


Training: |          | 0/? [00:00<?, ?it/s]

[NeMo I 2024-04-07 12:15:27 preemption:56] Preemption requires torch distributed to be initialized, disabling preemption


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.utilities.rank_zero:Epoch 0, global step 397: 'val_wer' reached 0.77138 (best 0.77138), saving model to '/content/experiments/lang-hy-AM/ASR-Model-Language-hy-AM/2024-04-07_12-14-57/checkpoints/ASR-Model-Language-hy-AM--val_wer=0.7714-epoch=0.ckpt' as top 3


[NeMo I 2024-04-07 12:20:54 nemo_model_checkpoint:217] New best .nemo model saved to: /content/experiments/lang-hy-AM/ASR-Model-Language-hy-AM/2024-04-07_12-14-57/checkpoints/ASR-Model-Language-hy-AM.nemo


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.utilities.rank_zero:Epoch 1, global step 794: 'val_wer' reached 0.69065 (best 0.69065), saving model to '/content/experiments/lang-hy-AM/ASR-Model-Language-hy-AM/2024-04-07_12-14-57/checkpoints/ASR-Model-Language-hy-AM--val_wer=0.6906-epoch=1.ckpt' as top 3


[NeMo I 2024-04-07 12:26:33 nemo_model_checkpoint:217] New best .nemo model saved to: /content/experiments/lang-hy-AM/ASR-Model-Language-hy-AM/2024-04-07_12-14-57/checkpoints/ASR-Model-Language-hy-AM.nemo


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.utilities.rank_zero:Epoch 2, global step 1191: 'val_wer' reached 0.66601 (best 0.66601), saving model to '/content/experiments/lang-hy-AM/ASR-Model-Language-hy-AM/2024-04-07_12-14-57/checkpoints/ASR-Model-Language-hy-AM--val_wer=0.6660-epoch=2.ckpt' as top 3


[NeMo I 2024-04-07 12:32:01 nemo_model_checkpoint:217] New best .nemo model saved to: /content/experiments/lang-hy-AM/ASR-Model-Language-hy-AM/2024-04-07_12-14-57/checkpoints/ASR-Model-Language-hy-AM.nemo


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.utilities.rank_zero:Epoch 3, global step 1588: 'val_wer' reached 0.62019 (best 0.62019), saving model to '/content/experiments/lang-hy-AM/ASR-Model-Language-hy-AM/2024-04-07_12-14-57/checkpoints/ASR-Model-Language-hy-AM--val_wer=0.6202-epoch=3.ckpt' as top 3


[NeMo I 2024-04-07 12:37:39 nemo_model_checkpoint:217] New best .nemo model saved to: /content/experiments/lang-hy-AM/ASR-Model-Language-hy-AM/2024-04-07_12-14-57/checkpoints/ASR-Model-Language-hy-AM.nemo


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.utilities.rank_zero:Epoch 4, global step 1985: 'val_wer' reached 0.61735 (best 0.61735), saving model to '/content/experiments/lang-hy-AM/ASR-Model-Language-hy-AM/2024-04-07_12-14-57/checkpoints/ASR-Model-Language-hy-AM--val_wer=0.6174-epoch=4.ckpt' as top 3


[NeMo I 2024-04-07 12:43:12 nemo_model_checkpoint:217] New best .nemo model saved to: /content/experiments/lang-hy-AM/ASR-Model-Language-hy-AM/2024-04-07_12-14-57/checkpoints/ASR-Model-Language-hy-AM.nemo


INFO:pytorch_lightning.utilities.rank_zero:`Trainer.fit` stopped: `max_epochs=5` reached.
INFO:pytorch_lightning.utilities.rank_zero:Restoring states from the checkpoint path at /content/experiments/lang-hy-AM/ASR-Model-Language-hy-AM/2024-04-07_12-14-57/checkpoints/ASR-Model-Language-hy-AM--val_wer=0.6174-epoch=4.ckpt
INFO:pytorch_lightning.utilities.rank_zero:Restored all states from the checkpoint at /content/experiments/lang-hy-AM/ASR-Model-Language-hy-AM/2024-04-07_12-14-57/checkpoints/ASR-Model-Language-hy-AM--val_wer=0.6174-epoch=4.ckpt


CPU times: user 21min 23s, sys: 1min 2s, total: 22min 26s
Wall time: 28min 28s


Please save and download your model.

In [30]:
trainer.test(model)

INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Testing: |          | 0/? [00:00<?, ?it/s]

[{'global_step': 1985.0,
  'test_loss': 0.9107210040092468,
  'test_wer': 0.6173529028892517}]

In [31]:
save_path = f"Model-{LANGUAGE}.nemo"
model.save_to(f"{save_path}")
print(f"Model saved at path : {os.getcwd() + os.path.sep + save_path}")

Model saved at path : /content/Model-hy-AM.nemo
