# Training an Acoustic Model with subword tokenization

In [1]:
# Install dependencies
!pip install wget
!apt-get install sox libsndfile1 ffmpeg
!pip install unidecode
!pip install matplotlib>=3.3.2

## Install NeMo
BRANCH = 'main'
!python -m pip install git+https://github.com/NVIDIA/NeMo.git@$BRANCH#egg=nemo_toolkit[all]

## Grab the config we'll use in this example
!mkdir configs
!wget -P configs/ https://raw.githubusercontent.com/NVIDIA/NeMo/$BRANCH/examples/asr/conf/citrinet/config_bpe.yaml

"""
Remember to restart the runtime for the kernel to pick up any upgraded packages (e.g. matplotlib)!
Alternatively, you can uncomment the exit() below to crash and restart the kernel, in the case
that you want to use the "Run All Cells" (or similar) option.
"""
# exit()

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Reading package lists... Done
Building dependency tree       
Reading state information... Done
ffmpeg is already the newest version (7:4.2.4-1ubuntu0.1).
libsndfile1 is already the newest version (1.0.28-7ubuntu0.1).
sox is already the newest version (14.4.2+git20190427-2).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting nemo_toolkit[all]
  Cloning https://github.com/NVIDIA/NeMo.git (to revision main) to /tmp/pip-install-3hdq52c5/nemo-toolkit_041a7e8147f4492ea4407389b985771b
  Running command git clone -q https://github.com/NVIDIA/NeMo.git /tmp/pip-install-3hdq52c5/nemo-toolkit_041a7e8147f4492ea4407389b985771b
  Resolved https://github.com/NVIDIA/NeMo.git to commit 7a9a8f012729f481b6ef5d6aabddce4a891124eb
mkdir: cannot create directory ‘configs

'\nRemember to restart the runtime for the kernel to pick up any upgraded packages (e.g. matplotlib)!\nAlternatively, you can uncomment the exit() below to crash and restart the kernel, in the case\nthat you want to use the "Run All Cells" (or similar) option.\n'

Let's begin constructing an ASR model that will use the subword tokenizer for its dataset pre-processing and post-processing steps.

We will use a Citrinet model to demonstrate the usage of subword tokenization models for training and inference. Citrinet is a [QuartzNet-like architecture](https://arxiv.org/abs/1910.10261), but it uses subword-tokenization along with 8x subsampling and [Squeeze-and-Excitation](https://arxiv.org/abs/1709.01507) to achieve strong accuracy in transcriptions while still using non-autoregressive decoding for efficient inference.

We'll be using the **Neural Modules (NeMo) toolkit** for this part, so if you haven't already, you should download and install NeMo and its dependencies. To do so, just follow the directions on the [GitHub page](https://github.com/NVIDIA/NeMo), or in the [documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/).

NeMo let us easily hook together the components (modules) of our model, such as the data layer, intermediate layers, and various losses, without worrying too much about implementation details of individual parts or connections between modules. NeMo also comes with complete models which only require your data and hyperparameters for training.

In [2]:
# NeMo's "core" package
import nemo
print(nemo.__version__)
# NeMo's ASR collection - this collections contains complete ASR models and
# building blocks (modules) for ASR
import nemo.collections.asr as nemo_asr

1.10.0rc0


################################################################################
###          (please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)
###          (or run as: KALDI_ROOT=<your_path> python <your_script>.py)
################################################################################



## Cross-Language Transfer Learning

Transfer learning is an important machine learning technique that uses a model’s knowledge of one task to perform better on another. Fine-tuning is one of the techniques to perform transfer learning. It is an essential part of the recipe for many state-of-the-art results where a base model is first pretrained on a task with abundant training data and then fine-tuned on different tasks of interest where the training data is less abundant or even scarce.

In ASR you might want to do fine-tuning in multiple scenarios, for example, when you want to improve your model's performance on a particular domain (medical, financial, etc.) or accented speech. You can even transfer learn from one language to another! Check out [this paper](https://arxiv.org/abs/2005.04290) for examples.

Transfer learning with NeMo is simple.

-----
First, let's create another tokenizer - perhaps using a larger vocabulary size than the small tokenizer we created earlier. Also we swap out `sentencepiece` for `BERT Word Piece` tokenizer.

In [3]:
import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.EncDecCTCModelBPE.from_pretrained(model_name="stt_en_conformer_ctc_large")

[NeMo I 2022-05-31 14:19:12 cloud:56] Found existing object /root/.cache/torch/NeMo/NeMo_1.10.0rc0/stt_en_conformer_ctc_large/010120d9959425c7862c9843960b3235/stt_en_conformer_ctc_large.nemo.
[NeMo I 2022-05-31 14:19:12 cloud:62] Re-using file from: /root/.cache/torch/NeMo/NeMo_1.10.0rc0/stt_en_conformer_ctc_large/010120d9959425c7862c9843960b3235/stt_en_conformer_ctc_large.nemo
[NeMo I 2022-05-31 14:19:12 common:789] Instantiating model from pre-trained checkpoint
[NeMo I 2022-05-31 14:19:15 mixins:166] Tokenizer SentencePieceTokenizer initialized with 128 tokens


[NeMo W 2022-05-31 14:19:15 modelPT:148] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    manifest_filepath: /data/NeMo_ASR_SET/English/v2.0/train/tarred_audio_manifest.json
    sample_rate: 16000
    batch_size: 32
    shuffle: true
    num_workers: 8
    pin_memory: true
    use_start_end_token: false
    trim_silence: false
    max_duration: 20.0
    min_duration: 0.1
    shuffle_n: 2048
    is_tarred: true
    tarred_audio_filepaths: /data/NeMo_ASR_SET/English/v2.0/train/audio__OP_0..4095_CL_.tar
    
[NeMo W 2022-05-31 14:19:15 modelPT:155] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s). 
    Validation config : 
    manifest_filepath:
    - /data/ASR/LibriSpeech/librispeech_withs

[NeMo I 2022-05-31 14:19:15 features:200] PADDING: 0
[NeMo I 2022-05-31 14:19:19 save_restore_connector:243] Model EncDecCTCModelBPE was successfully restored from /root/.cache/torch/NeMo/NeMo_1.10.0rc0/stt_en_conformer_ctc_large/010120d9959425c7862c9843960b3235/stt_en_conformer_ctc_large.nemo.


In [4]:
# Check what kind of vocabulary/alphabet the model has right now
print(asr_model.decoder.vocabulary)

['<unk>', 's', '▁', 'e', 't', 'u', 'd', 'a', 'o', 'n', 'i', '▁the', '▁a', 'm', 'y', 'l', 'h', 'p', 're', '▁s', 'g', 'r', '▁to', '▁i', 'ing', '▁and', 'f', '▁p', 'an', 'c', 'w', 'er', 'ed', '▁of', '▁in', 'k', "'", '▁w', 'ar', 'or', '▁f', 'b', '▁b', 'en', '▁you', 'al', 'le', 'in', 'll', '▁that', '▁he', 'ro', '▁t', 'es', '▁it', '▁be', 've', 'v', 'ly', '▁c', 'th', '▁o', 'ent', 'ch', 'ur', '▁we', '▁re', '▁n', 'it', '▁so', '▁co', '▁g', '▁on', '▁for', 'on', 'ce', 'ri', '▁do', '▁is', '▁ha', '▁ma', 'ver', 'li', 'ra', '▁was', 'ic', 'la', '▁e', 'se', 'ter', 'ct', 'ion', '▁ca', '▁st', '▁me', 'ir', '▁mo', '▁with', '▁but', '▁have', '▁go', '▁de', '▁ho', '▁di', '▁not', '▁know', '▁lo', '▁this', 'ation', 'ther', 'ate', '▁com', '▁like', '▁uh', 'ck', '▁his', 'j', '▁yeah', '▁my', '▁ex', '▁what', '▁will', '▁mi', 'q', 'ight', 'x', 'z', '-']


In [5]:
len(asr_model.decoder.vocabulary)

128

Now let's update the vocabulary in this model

In [6]:
# Lets change the tokenizer vocabulary by passing the path to the new directory,
# and also change the type
asr_model.change_vocabulary(
    new_tokenizer_dir="../data_preparation/data/processed/tokenizer/tokenizer_spe_bpe_v1024/",
    new_tokenizer_type="bpe"
)

[NeMo W 2022-05-31 14:19:19 modelPT:215] You tried to register an artifact under config key=tokenizer.model_path but an artifact for it has already been registered.
[NeMo W 2022-05-31 14:19:19 modelPT:215] You tried to register an artifact under config key=tokenizer.vocab_path but an artifact for it has already been registered.
[NeMo W 2022-05-31 14:19:19 modelPT:215] You tried to register an artifact under config key=tokenizer.spe_tokenizer_vocab but an artifact for it has already been registered.


[NeMo I 2022-05-31 14:19:19 mixins:166] Tokenizer SentencePieceTokenizer initialized with 1024 tokens
[NeMo I 2022-05-31 14:19:19 ctc_bpe_models:244] 
    Replacing old number of classes (128) with new number of classes - 1024
[NeMo I 2022-05-31 14:19:19 ctc_bpe_models:273] Changed tokenizer to ['<unk>', 'en', 'er', 'ch', '▁d', 'ei', 'ie', '▁s', 'un', '▁a', '▁w', '▁i', 'st', 'ein', 'ge', '▁die', 'ich', '▁b', '▁m', 'an', '▁un', 'te', '▁v', 'sch', '▁h', '▁da', 'es', '▁n', 'on', '▁z', '▁k', '▁f', '▁der', 'in', '▁ein', '▁au', 'gen', '▁und', 'it', 'll', 'or', 'ur', '▁in', 'ar', 'ss', 'at', '▁ge', 'ir', 'hr', 'ung', '▁er', 'ten', '▁g', 'em', 'den', 'al', '▁zu', 'au', '▁l', 'der', '▁p', 'icht', 'de', '▁wir', '▁r', '▁ver', 'lich', 'ter', '▁be', '▁an', '▁das', 'ig', 'ber', 'ier', 'isch', 'ür', '▁ist', '▁e', 'ach', 'ben', '▁t', 'eit', 'mm', '▁den', 'se', '▁sie', 'ion', '▁sch', '▁mit', 'tz', '▁nicht', '▁j', '▁auf', '▁es', '▁st', 'ent', 'el', 'ol', 'ra', 'um', 'ro', '▁auch', '▁ich', '▁von', 'ck', 

After this, our decoder has completely changed, but our encoder (where most of the weights are) remained intact. Let's fine tune-this model for 20 epochs on AN4 dataset. We will also use the smaller learning rate from ``new_opt` (see the "After Training" section)`.

**Note**: For this demonstration, we will also freeze the encoder to speed up finetuning (since both tokenizers are built on the same train set), but in general it should not be done for proper training on a new language (or on a different corpus than the original train corpus).

In [7]:
vars(asr_model)

{'tokenizer_cfg': {},
 'tokenizer_dir': '../data_preparation/data/processed/tokenizer/tokenizer_spe_bpe_v1024/',
 'tokenizer_type': 'bpe',
 'hf_tokenizer_kwargs': {},
 'artifacts': {'tokenizer.model_path': ArtifactItem(path='/localhome/local-vinhn/riva-german-sample-new/New-language-adaptation/German/data_preparation/data/processed/tokenizer/tokenizer_spe_bpe_v1024/tokenizer.model', path_type=<ArtifactPathType.LOCAL_PATH: 0>, hashed_path=None),
  'tokenizer.vocab_path': ArtifactItem(path='/localhome/local-vinhn/riva-german-sample-new/New-language-adaptation/German/data_preparation/data/processed/tokenizer/tokenizer_spe_bpe_v1024/vocab.txt', path_type=<ArtifactPathType.LOCAL_PATH: 0>, hashed_path=None),
  'tokenizer.spe_tokenizer_vocab': ArtifactItem(path='/localhome/local-vinhn/riva-german-sample-new/New-language-adaptation/German/data_preparation/data/processed/tokenizer/tokenizer_spe_bpe_v1024/tokenizer.vocab', path_type=<ArtifactPathType.LOCAL_PATH: 0>, hashed_path=None)},
 'model_p

In [8]:
## Grab the config we'll use in this example
BRANCH='main'
!mkdir configs
!wget -P configs/ https://raw.githubusercontent.com/NVIDIA/NeMo/$BRANCH/examples/asr/conf/conformer/conformer_ctc_bpe.yaml

mkdir: cannot create directory ‘configs’: File exists
--2022-05-31 14:19:20--  https://raw.githubusercontent.com/NVIDIA/NeMo/main/examples/asr/conf/conformer/conformer_ctc_bpe.yaml
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8205 (8.0K) [text/plain]
Saving to: ‘configs/conformer_ctc_bpe.yaml.2’


2022-05-31 14:19:20 (51.9 MB/s) - ‘configs/conformer_ctc_bpe.yaml.2’ saved [8205/8205]



In [9]:
import os
from hydra import initialize, initialize_config_module, initialize_config_dir, compose
from omegaconf import OmegaConf

with initialize(config_path="./configs/"):
    cfg = compose(config_name="conformer_ctc_bpe.yaml")
    print(cfg)
    

{'name': 'Conformer-CTC-BPE', 'model': {'sample_rate': 16000, 'log_prediction': True, 'ctc_reduction': 'mean_batch', 'skip_nan_grad': False, 'train_ds': {'manifest_filepath': '???', 'sample_rate': '${model.sample_rate}', 'batch_size': 16, 'shuffle': True, 'num_workers': 8, 'pin_memory': True, 'use_start_end_token': False, 'trim_silence': False, 'max_duration': 16.7, 'min_duration': 0.1, 'is_tarred': False, 'tarred_audio_filepaths': None, 'shuffle_n': 2048, 'bucketing_strategy': 'synced_randomized', 'bucketing_batch_size': None}, 'validation_ds': {'manifest_filepath': '???', 'sample_rate': '${model.sample_rate}', 'batch_size': 16, 'shuffle': False, 'num_workers': 8, 'pin_memory': True, 'use_start_end_token': False}, 'test_ds': {'manifest_filepath': None, 'sample_rate': '${model.sample_rate}', 'batch_size': 16, 'shuffle': False, 'num_workers': 8, 'pin_memory': True, 'use_start_end_token': False}, 'tokenizer': {'dir': '???', 'type': 'bpe'}, 'preprocessor': {'_target_': 'nemo.collections.a

In [10]:
print(OmegaConf.to_yaml(cfg))

name: Conformer-CTC-BPE
model:
  sample_rate: 16000
  log_prediction: true
  ctc_reduction: mean_batch
  skip_nan_grad: false
  train_ds:
    manifest_filepath: ???
    sample_rate: ${model.sample_rate}
    batch_size: 16
    shuffle: true
    num_workers: 8
    pin_memory: true
    use_start_end_token: false
    trim_silence: false
    max_duration: 16.7
    min_duration: 0.1
    is_tarred: false
    tarred_audio_filepaths: null
    shuffle_n: 2048
    bucketing_strategy: synced_randomized
    bucketing_batch_size: null
  validation_ds:
    manifest_filepath: ???
    sample_rate: ${model.sample_rate}
    batch_size: 16
    shuffle: false
    num_workers: 8
    pin_memory: true
    use_start_end_token: false
  test_ds:
    manifest_filepath: null
    sample_rate: ${model.sample_rate}
    batch_size: 16
    shuffle: false
    num_workers: 8
    pin_memory: true
    use_start_end_token: false
  tokenizer:
    dir: ???
    type: bpe
  preprocessor:
    _target_: nemo.collections.asr.modul

In [11]:
cfg['model']['train_ds']

{'manifest_filepath': '???', 'sample_rate': '${model.sample_rate}', 'batch_size': 16, 'shuffle': True, 'num_workers': 8, 'pin_memory': True, 'use_start_end_token': False, 'trim_silence': False, 'max_duration': 16.7, 'min_duration': 0.1, 'is_tarred': False, 'tarred_audio_filepaths': None, 'shuffle_n': 2048, 'bucketing_strategy': 'synced_randomized', 'bucketing_batch_size': None}

In [12]:
sample_rate = cfg.model.sample_rate
print(f"The sample_rate = {sample_rate}")

ds_sample_rate = cfg.model.train_ds.sample_rate
print(f"The ds_sample_rate = {ds_sample_rate}")


The sample_rate = 16000
The ds_sample_rate = 16000


In [13]:
params = cfg

import copy
new_opt = copy.deepcopy(params.model.optim)
new_opt.lr = 0.1

In [78]:
# Update paths to dataset
params.model.train_ds.manifest_filepath = '../data_preparation/data/processed/train_manifest_merged.json'
params.model.train_ds.sample_rate = 16000
params.model.train_ds.batch_size = 1

params.model.validation_ds.manifest_filepath = ['../data_preparation/data/processed/test_manifest_merged.json', '../data_preparation/data/processed/dev_manifest_merged.json']
params.model.validation_ds.sample_rate = 16000
params.model.validation_ds.batch_size = 1

In [79]:
params['model']['train_ds']

{'manifest_filepath': '../data_preparation/data/processed/train_manifest_merged.json', 'sample_rate': 16000, 'batch_size': 1, 'shuffle': True, 'num_workers': 8, 'pin_memory': True, 'use_start_end_token': False, 'trim_silence': False, 'max_duration': 16.7, 'min_duration': 0.1, 'is_tarred': False, 'tarred_audio_filepaths': None, 'shuffle_n': 2048, 'bucketing_strategy': 'synced_randomized', 'bucketing_batch_size': None}

In [80]:
params.model.encoder.d_model

512

In [81]:
# Use the smaller learning rate we set before
asr_model.setup_optimization(optim_config=new_opt)

[NeMo W 2022-05-31 14:55:26 modelPT:478] Trainer wasn't specified in model constructor. Make sure that you really wanted it.


[NeMo I 2022-05-31 14:55:26 modelPT:579] Optimizer config = AdamW (
    Parameter Group 0
        amsgrad: False
        betas: [0.9, 0.98]
        eps: 1e-08
        lr: 0.1
        weight_decay: 0.001
    )


[NeMo W 2022-05-31 14:55:26 lr_scheduler:816] Neither `max_steps` nor `iters_per_batch` were provided to `optim.sched`, cannot compute effective `max_steps` !
    Scheduler will not be instantiated !


(AdamW (
 Parameter Group 0
     amsgrad: False
     betas: [0.9, 0.98]
     eps: 1e-08
     lr: 0.1
     weight_decay: 0.001
 ),
 None)

In [82]:
asr_model._cfg.optim.sched.d_model = params.model.encoder.d_model
asr_model._cfg.optim.sched.d_model

512

In [86]:
params.model.optim.sched.d_model = params.model.encoder.d_model
print(params.model.optim.sched.d_model)

# Point to the data we'll use for fine-tuning as the training set
asr_model._cfg.train_ds.sample_rate = 16000
asr_model._cfg.validation_ds.sample_rate = 16000
asr_model.setup_training_data(train_data_config=params.model.train_ds)

512
[NeMo I 2022-05-31 14:56:08 collections:192] Dataset loaded with 309291 files totalling 569.59 hours
[NeMo I 2022-05-31 14:56:08 collections:193] 11862 files were filtered totalling 60.25 hours


In [87]:
# Point to the new validation data for fine-tuning
asr_model.setup_validation_data(val_data_config=params.model.validation_ds)

# Freeze the encoder layers (should not be done for finetuning, only done for demo)
asr_model.encoder.freeze()

[NeMo I 2022-05-31 14:56:10 collections:192] Dataset loaded with 41167 files totalling 85.44 hours
[NeMo I 2022-05-31 14:56:10 collections:193] 0 files were filtered totalling 0.00 hours


In [85]:
!ln -s ../data_preparation/data data

In [88]:
# And now we can create a PyTorch Lightning trainer and call `fit` again.
import pytorch_lightning as pl

trainer = pl.Trainer(devices=1, accelerator='gpu', max_epochs=20, log_every_n_steps = 100000)
trainer.fit(asr_model)

GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
[NeMo W 2022-05-31 14:56:13 modelPT:478] Trainer wasn't specified in model constructor. Make sure that you really wanted it.


[NeMo I 2022-05-31 14:56:13 modelPT:579] Optimizer config = AdamW (
    Parameter Group 0
        amsgrad: False
        betas: [0.9, 0.98]
        eps: 1e-08
        lr: 0.1
        weight_decay: 0.001
    )


[NeMo W 2022-05-31 14:56:13 lr_scheduler:816] Neither `max_steps` nor `iters_per_batch` were provided to `optim.sched`, cannot compute effective `max_steps` !
    Scheduler will not be instantiated !

  | Name              | Type                              | Params
------------------------------------------------------------------------
0 | preprocessor      | AudioToMelSpectrogramPreprocessor | 0     
1 | encoder           | ConformerEncoder                  | 121 M 
2 | spec_augmentation | SpectrogramAugmentation           | 0     
3 | _wer              | WERBPE                            | 0     
4 | decoder           | ConvASRDecoder                    | 525 K 
5 | loss              | CTCLoss                           | 0     
------------------------------------------------------------------------
525 K     Trainable params
121 M     Non-trainable params
121 M     Total params
487.844   Total estimated model params size (MB)


Sanity Checking: 0it [00:00, ?it/s]

[NeMo I 2022-05-31 14:56:15 wer_bpe:212] 
    
[NeMo I 2022-05-31 14:56:15 wer_bpe:213] reference:denken sie soeben weilten meine gedanken bei ihnen in adelaide und ich wünschte mir sie herzaubern zu können nun der zauber ist gelungen lachte münchhausen da bin ich und was mich herführt
[NeMo I 2022-05-31 14:56:15 wer_bpe:214] predicted:weu ichht bei war iche von vonuku wurdebeüen in warö war. niewwewerwewe war immerö rath warout
[NeMo I 2022-05-31 14:56:15 wer_bpe:212] 
    
[NeMo I 2022-05-31 14:56:15 wer_bpe:213] reference:also bei ihren technischen kenntnissen und ihrer erfindungsgabe auf diesem gebiet glaubt der lord keinen besseren ingenieur und kapitän für sein weltschiff finden zu können als sie
[NeMo I 2022-05-31 14:56:15 wer_bpe:214] predicted:ich dabei vonö tünkdenünken arierenannüwehünwekokenlweüwebeweal nie sein dabeiieren keine dennannierenü an beihkße vonweheit arwecheisweüö in in jetztwefer seinferweferününöweundenfer valierenöüö ichfer.


Training: 0it [00:00, ?it/s]

[NeMo I 2022-05-31 14:56:17 wer_bpe:212] 
    
[NeMo I 2022-05-31 14:56:17 wer_bpe:213] reference:diese versammlung trifft sich regelmäßig im abstand von einem bis drei monate.
[NeMo I 2022-05-31 14:56:17 wer_bpe:214] predicted:die soeogandw le tbewe dieseweie hoischeie landenmonatachen beg
[NeMo I 2022-05-31 14:56:17 wer_bpe:212] 
    
[NeMo I 2022-05-31 14:56:17 wer_bpe:213] reference:meistens stellt es asiatische motive oder ein tier dar.
[NeMo I 2022-05-31 14:56:17 wer_bpe:214] predicted:verwe verst tie vonandieren von.
[NeMo I 2022-05-31 14:56:17 wer_bpe:212] 
    
[NeMo I 2022-05-31 14:56:17 wer_bpe:213] reference:parallel der strecke entstanden zahlreiche grubenbahnen, die die strecke kreuzten.
[NeMo I 2022-05-31 14:56:17 wer_bpe:214] predicted:ver. weiterck ver tigbe war regel tmmck war..
[NeMo I 2022-05-31 14:56:17 wer_bpe:212] 
    
[NeMo I 2022-05-31 14:56:17 wer_bpe:213] reference:angeblich ist das aktuelle defizit dadurch von vier hundert sechs und vierzig millionen auf vi

So we get fast convergence even though the decoder vocabulary is double the size and we freeze the encoder.

### Fast Training

Last but not least, we could simply speed up training our model! If you have the resources, you can speed up training by splitting the workload across multiple GPUs. Otherwise (or in addition), there's always mixed precision training, which allows you to increase your batch size.

You can use [PyTorch Lightning's Trainer object](https://pytorch-lightning.readthedocs.io/en/latest/common/trainer.html?highlight=Trainer) to handle mixed-precision and distributed training for you. Below are some examples of flags you would pass to the `Trainer` to use these features:

```python
# Mixed precision:
trainer = pl.Trainer(amp_level='O1', precision=16)

# Trainer with a distributed backend:
trainer = pl.Trainer(devices=2, num_nodes=2, accelerator='gpu', strategy='dp')

# Of course, you can combine these flags as well.
```

Finally, have a look at [example scripts in NeMo repository](https://github.com/NVIDIA/NeMo/blob/stable/examples/asr/asr_ctc/speech_to_text_ctc_bpe.py) which can handle mixed precision and distributed training using command-line arguments.