# Automatic Speech Recognition with Transducer Models

This notebook is a basic tutorial for creating a Transducer ASR model and then training it on a small dataset (AN4). It includes discussion relevant to reducing memory issues when training such models and demonstrates how to change the decoding strategy after training. Finally, it also provides a brief glimpse of extracting alignment information from a trained Transducer model.

As we will see in this tutorial, apart from the differences in the config and the class used to instantiate the model, nearly all steps are precisely similar to any CTC-based model training. Many concepts such as data loader setup, optimization setup, pre-trained checkpoint weight loading will be nearly identical between CTC and Transducer models.

In essence, NeMo makes it seamless to take a config for a CTC ASR model, add in a few components related to Transducers (often without any modifications) and use a different class to instantiate a Transducer model!

--------

**Note**: It is assumed that the previous tutorial - "Intro-to-Transducers" has been reviewed, and there is some familiarity with the config components of transducer models.


# Preparing the dataset

In this tutorial, we will be utilizing the `AN4`dataset - also known as the Alphanumeric dataset, which was collected and published by Carnegie Mellon University. It consists of recordings of people spelling out addresses, names, telephone numbers, etc., one letter or number at a time and their corresponding transcripts. We choose to use AN4 for this tutorial because it is relatively small, with 948 training and 130 test utterances, and so it trains quickly.

Let's first download the preparation script from NeMo's scripts directory -

In [1]:
import os
import wget
import tarfile
import subprocess
import glob

data_dir = "datasets"

if not os.path.exists(data_dir):
  os.makedirs(data_dir)

# Download the dataset. This will take a few moments...
print("******")
if not os.path.exists(data_dir + '/an4_sphere.tar.gz'):
    an4_url = 'https://dldata-public.s3.us-east-2.amazonaws.com/an4_sphere.tar.gz'
    an4_path = wget.download(an4_url, data_dir)
    print(f"Dataset downloaded at: {an4_path}")
else:
    print("Tarfile already exists.")
    an4_path = data_dir + '/an4_sphere.tar.gz'


if not os.path.exists(data_dir + '/an4/'):
    # Untar and convert .sph to .wav (using sox)
    tar = tarfile.open(an4_path)
    tar.extractall(path=data_dir)

    print("Converting .sph to .wav...")
    sph_list = glob.glob(data_dir + '/an4/**/*.sph', recursive=True)
    for sph_path in sph_list:
        wav_path = sph_path[:-4] + '.wav'
        cmd = ["sox", sph_path, wav_path]
        subprocess.run(cmd)

print("Finished conversion.\n******")

******
Tarfile already exists.
Finished conversion.
******


In [2]:
# --- Building Manifest Files --- #
import json
import librosa

# Function to build a manifest
def build_manifest(transcripts_path, manifest_path, wav_path):
    with open(transcripts_path, 'r') as fin:
        with open(manifest_path, 'w') as fout:
            for line in fin:
                # Lines look like this:
                # <s> transcript </s> (fileID)
                transcript = line[: line.find('(')-1].lower()
                transcript = transcript.replace('<s>', '').replace('</s>', '')
                transcript = transcript.strip()

                file_id = line[line.find('(')+1 : -2]  # e.g. "cen4-fash-b"
                audio_path = os.path.join(
                    data_dir, wav_path,
                    file_id[file_id.find('-')+1 : file_id.rfind('-')],
                    file_id + '.wav')

                duration = librosa.core.get_duration(path=audio_path)

                # Write the metadata to the manifest
                metadata = {
                    "audio_filepath": audio_path,
                    "duration": duration,
                    "text": transcript
                }
                json.dump(metadata, fout)
                fout.write('\n')

# Building Manifests
print("******")
train_transcripts = os.path.join(data_dir, 'an4/etc/an4_train.transcription')
train_manifest = os.path.join(data_dir, 'an4/train_manifest.json')
if not os.path.isfile(train_manifest):
    build_manifest(train_transcripts, train_manifest, 'an4/wav/an4_clstk')
    print("Training manifest created.")

test_transcripts = os.path.join(data_dir, 'an4/etc/an4_test.transcription')
test_manifest = os.path.join(data_dir, 'an4/test_manifest.json')
if not os.path.isfile(test_manifest):
    build_manifest(test_transcripts, test_manifest, 'an4/wav/an4test_clstk')
    print("Test manifest created.")
print("***Done***")
# Manifest filepaths
TRAIN_MANIFEST = train_manifest
TEST_MANIFEST = test_manifest

******
***Done***


## Preparing the tokenizer

Now that we have a dataset ready, we need to decide whether to use a character-based model or a sub-word-based model. For completeness' sake, we will use a tokenizer based model so that we can leverage a modern encoder architecture like ContextNet or Conformer-T.

In [3]:
VOCAB_SIZE = 64  # can be any value above 29
TOKENIZER_TYPE = "spe"  # can be wpe or spe
SPE_TYPE = "bpe"  # can be bpe or unigram

# ------------------------------------------------------------------- #
!rm -r tokenizers/

if not os.path.exists("tokenizers"):
  os.makedirs("tokenizers")

!python scripts/process_asr_text_tokenizer.py \
   --manifest=$TRAIN_MANIFEST \
   --data_root="tokenizers" \
   --tokenizer=$TOKENIZER_TYPE \
   --spe_type=$SPE_TYPE \
   --no_lower_case \
   --log \
   --vocab_size=$VOCAB_SIZE

INFO:root:Finished extracting manifest : datasets/an4/train_manifest.json
INFO:root:Finished extracting all manifests ! Number of sentences : 948
[NeMo I 2025-07-18 16:10:48 sentencepiece_tokenizer:425] Processing tokenizers/text_corpus/document.txt and store at tokenizers/tokenizer_spe_bpe_v64
sentencepiece_trainer.cc(178) LOG(INFO) Running command: --input=tokenizers/text_corpus/document.txt --model_prefix=tokenizers/tokenizer_spe_bpe_v64/tokenizer --vocab_size=64 --shuffle_input_sentence=true --hard_vocab_limit=false --model_type=bpe --character_coverage=1.0 --bos_id=-1 --eos_id=-1 --remove_extra_whitespaces=false
sentencepiece_trainer.cc(78) LOG(INFO) Starts training with : 
trainer_spec {
  input: tokenizers/text_corpus/document.txt
  input_format: 
  model_prefix: tokenizers/tokenizer_spe_bpe_v64/tokenizer
  model_type: BPE
  vocab_size: 64
  self_test_sample_size: 0
  character_coverage: 1
  input_sentence_size: 0
  shuffle_input_sentence: 1
  seed_sentencepiece_size: 1000000
  

In [4]:
# Tokenizer path
if TOKENIZER_TYPE == 'spe':
  TOKENIZER = os.path.join("tokenizers", f"tokenizer_spe_{SPE_TYPE}_v{VOCAB_SIZE}")
  TOKENIZER_TYPE_CFG = "bpe"
else:
  TOKENIZER = os.path.join("tokenizers", f"tokenizer_wpe_v{VOCAB_SIZE}")
  TOKENIZER_TYPE_CFG = "wpe"

## Load model config

In [5]:
from omegaconf import OmegaConf, open_dict

config = OmegaConf.load("configs/contextnet_rnnt_1.yaml")
# config = OmegaConf.load("configs/fast-conformer_transducer_bpe.yaml")
config.model.train_ds.manifest_filepath = TRAIN_MANIFEST
config.model.validation_ds.manifest_filepath = TEST_MANIFEST
config.model.test_ds.manifest_filepath = TEST_MANIFEST

config.model.tokenizer.dir = TOKENIZER
config.model.tokenizer.type = TOKENIZER_TYPE_CFG

# Finally, let's remove logging of samples and the warmup since the dataset is small (similar to CTC models)
config.model.log_prediction = False
config.model.optim.sched.warmup_steps = None

config.model.spec_augment.freq_masks = 0
config.model.spec_augment.time_masks = 0

# config.model.encoder.jasper = config.model.encoder.jasper[:5]
# config.model.encoder.jasper[-1].filters = '${model.model_defaults.enc_hidden}'
config.model.encoder.n_layers = 6
# config.model.encoder.d_model = 176
config.model.encoder.n_heads = 1
# config.model.train_ds.max_duration = 5
config.model.encoder.conv_kernel_size = 17

## Initialize a Transducer ASR Model

Finally, let us create a Transducer model, which is as easy as changing a line of import if you already have a script to create CTC models. We will use a small model since the dataset is just 5 hours of speech.

In [6]:
import torch
from lightning.pytorch import Trainer

if torch.cuda.is_available():
  accelerator = 'gpu'
else:
  accelerator = 'gpu'

EPOCHS = 10

# Initialize a Trainer for the Transducer model
trainer = Trainer(devices=1, accelerator=accelerator, max_epochs=EPOCHS,
                  enable_checkpointing=False, logger=False,
                  log_every_n_steps=5, check_val_every_n_epoch=2)

  from .autonotebook import tqdm as notebook_tqdm
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


In [7]:
import nemo.collections.asr as nemo_asr
model = nemo_asr.models.EncDecRNNTBPEModel(cfg=config.model, trainer=trainer)
model.summarize()

[NeMo I 2025-07-18 16:10:55 mixins:181] Tokenizer SentencePieceTokenizer initialized with 64 tokens
[NeMo I 2025-07-18 16:10:55 collections:201] Dataset loaded with 948 files totalling 0.71 hours
[NeMo I 2025-07-18 16:10:55 collections:202] 0 files were filtered totalling 0.00 hours
[NeMo I 2025-07-18 16:10:55 collections:201] Dataset loaded with 130 files totalling 0.10 hours
[NeMo I 2025-07-18 16:10:55 collections:202] 0 files were filtered totalling 0.00 hours
[NeMo I 2025-07-18 16:10:55 collections:201] Dataset loaded with 130 files totalling 0.10 hours
[NeMo I 2025-07-18 16:10:55 collections:202] 0 files were filtered totalling 0.00 hours
[NeMo I 2025-07-18 16:10:55 features:305] PADDING: 16
[NeMo I 2025-07-18 16:10:56 rnnt_models:226] Using RNNT Loss : warprnnt_numba
    Loss warprnnt_numba_kwargs: {'fastemit_lambda': 0.001, 'clamp': -1.0}
[NeMo I 2025-07-18 16:10:56 rnnt_models:226] Using RNNT Loss : warprnnt_numba
    Loss warprnnt_numba_kwargs: {'fastemit_lambda': 0.001, 'clam

  | Name              | Type                              | Params | Mode 
--------------------------------------------------------------------------------
0 | preprocessor      | AudioToMelSpectrogramPreprocessor | 0      | train
1 | encoder           | ConformerEncoder                  | 5.1 M  | train
2 | decoder           | RNNTDecoder                       | 3.3 M  | train
3 | joint             | RNNTJoint                         | 565 K  | train
4 | loss              | RNNTLoss                          | 0      | train
5 | spec_augmentation | SpectrogramAugmentation           | 0      | train
6 | wer               | WER                               | 0      | train
--------------------------------------------------------------------------------
9.0 M     Trainable params
0         Non-trainable params
9.0 M     Total params
35.964    Total estimated model params size (MB)
201       Modules in train mode
0         Modules in eval mode

# Training on AN4

Now that the model is ready, we can finally train it!

In [8]:
# Prepare NeMo's Experiment manager to handle checkpoint saving and logging for us
from nemo.utils import exp_manager

# Environment variable generally used for multi-node multi-gpu training.
# In notebook environments, this flag is unnecessary and can cause logs of multiple training runs to overwrite each other.
os.environ.pop('NEMO_EXPM_VERSION', None)

exp_config = exp_manager.ExpManagerConfig(
    exp_dir=f'experiments/',
    name=f"Transducer-Model",
    checkpoint_callback_params=exp_manager.CallbackParams(
        monitor="val_wer",
        mode="min",
        always_save_nemo=True,
        save_best_model=True,
    ),
)

exp_config = OmegaConf.structured(exp_config)

logdir = exp_manager.exp_manager(trainer, exp_config)

[NeMo I 2025-07-18 16:10:56 exp_manager:561] ExpManager schema
[NeMo I 2025-07-18 16:10:56 exp_manager:562] {'explicit_log_dir': None, 'exp_dir': None, 'name': None, 'version': None, 'use_datetime_version': True, 'resume_if_exists': False, 'resume_past_end': False, 'resume_ignore_no_checkpoint': False, 'resume_from_checkpoint': None, 'create_tensorboard_logger': True, 'summary_writer_kwargs': None, 'create_wandb_logger': False, 'wandb_logger_kwargs': None, 'create_mlflow_logger': False, 'mlflow_logger_kwargs': {'experiment_name': None, 'tracking_uri': None, 'tags': None, 'save_dir': './mlruns', 'prefix': '', 'artifact_location': None, 'run_id': None, 'log_model': False}, 'create_dllogger_logger': False, 'dllogger_logger_kwargs': {'verbose': False, 'stdout': False, 'json_file': './dllogger.json'}, 'create_clearml_logger': False, 'clearml_logger_kwargs': {'project': None, 'task': None, 'connect_pytorch': False, 'model_name': None, 'tags': None, 'log_model': False, 'log_cfg': False, 'log_

In [9]:
try:
  from google import colab
  COLAB_ENV = True
except (ImportError, ModuleNotFoundError):
  COLAB_ENV = False

# Load the TensorBoard notebook extension
if COLAB_ENV:
  %load_ext tensorboard
  %tensorboard --logdir /content/experiments/Transducer-Model/
else:
  print("To use TensorBoard, please use this notebook in a Google Colab environment.")

To use TensorBoard, please use this notebook in a Google Colab environment.


In [10]:
# Release resources prior to training
import gc
gc.collect()

if accelerator == 'gpu':
  torch.cuda.empty_cache()

In [None]:
# Train the model
trainer.fit(model)

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


[NeMo I 2025-07-18 16:10:56 modelPT:802] Optimizer config = Novograd (
    Parameter Group 0
        amsgrad: False
        betas: [0.9, 0.0]
        eps: 1e-08
        grad_averaging: False
        lr: 0.05
        weight_decay: 0.001
    )
[NeMo I 2025-07-18 16:10:56 lr_scheduler:950] Scheduler "<nemo.core.optim.lr_scheduler.CosineAnnealing object at 0x75edecee1660>" 
    will be used during training (effective maximum steps = 600) - 
    Parameters : 
    (warmup_steps: null
    warmup_ratio: null
    min_lr: 1.0e-06
    last_epoch: -1
    max_steps: 600
    )



  | Name              | Type                              | Params | Mode 
--------------------------------------------------------------------------------
0 | preprocessor      | AudioToMelSpectrogramPreprocessor | 0      | train
1 | encoder           | ConformerEncoder                  | 5.1 M  | train
2 | decoder           | RNNTDecoder                       | 3.3 M  | train
3 | joint             | RNNTJoint                         | 565 K  | train
4 | loss              | RNNTLoss                          | 0      | train
5 | spec_augmentation | SpectrogramAugmentation           | 0      | train
6 | wer               | WER                               | 0      | train
--------------------------------------------------------------------------------
9.0 M     Trainable params
0         Non-trainable params
9.0 M     Total params
35.964    Total estimated model params size (MB)
201       Modules in train mode
0         Modules in eval mode


Sanity Checking: |          | 0/? [00:00<?, ?it/s][NeMo I 2025-07-18 16:10:57 optional_cuda_graphs:79] Enabled CUDA graphs for module <class 'nemo.collections.asr.models.rnnt_bpe_models.EncDecRNNTBPEModel'>.decoding.decoding
[NeMo I 2025-07-18 16:10:57 optional_cuda_graphs:79] Enabled CUDA graphs for module <class 'nemo.collections.asr.metrics.wer.WER'>joint._wer.decoding.decoding
Sanity Checking DataLoader 0: 100%|██████████| 2/2 [00:01<00:00,  1.30it/s][NeMo I 2025-07-18 16:10:58 optional_cuda_graphs:53] Disabled CUDA graphs for module <class 'nemo.collections.asr.models.rnnt_bpe_models.EncDecRNNTBPEModel'>.decoding.decoding
[NeMo I 2025-07-18 16:10:58 optional_cuda_graphs:53] Disabled CUDA graphs for module <class 'nemo.collections.asr.metrics.wer.WER'>joint._wer.decoding.decoding
Training: |          | 0/? [00:00<?, ?it/s]                                [NeMo I 2025-07-18 16:10:59 preemption:56] Preemption requires torch distributed to be initialized, disabling preemption
Epoch 0: 

    
    
    
    
    


Epoch 0: 100%|██████████| 60/60 [00:10<00:00,  5.77it/s, v_num=0-56, train_step_timing in s=0.108][NeMo I 2025-07-18 16:11:09 optional_cuda_graphs:79] Enabled CUDA graphs for module <class 'nemo.collections.asr.models.rnnt_bpe_models.EncDecRNNTBPEModel'>.decoding.decoding
[NeMo I 2025-07-18 16:11:09 optional_cuda_graphs:79] Enabled CUDA graphs for module <class 'nemo.collections.asr.metrics.wer.WER'>joint._wer.decoding.decoding
Epoch 1:   0%|          | 0/60 [00:00<?, ?it/s, v_num=0-56, train_step_timing in s=0.108]         [NeMo I 2025-07-18 16:11:09 optional_cuda_graphs:53] Disabled CUDA graphs for module <class 'nemo.collections.asr.models.rnnt_bpe_models.EncDecRNNTBPEModel'>.decoding.decoding
[NeMo I 2025-07-18 16:11:09 optional_cuda_graphs:53] Disabled CUDA graphs for module <class 'nemo.collections.asr.metrics.wer.WER'>joint._wer.decoding.decoding
Epoch 1: 100%|██████████| 60/60 [00:08<00:00,  6.93it/s, v_num=0-56, train_step_timing in s=0.113][NeMo I 2025-07-18 16:11:18 optional

Epoch 1, global step 120: 'val_wer' reached 1.00000 (best 1.00000), saving model to '/home/ubuntu/nvidia_nemo/tutorials/asr/experiments/Transducer-Model/2025-07-18_16-10-56/checkpoints/Transducer-Model--val_wer=1.0000-epoch=1.ckpt' as top 3


[NeMo I 2025-07-18 16:11:19 nemo_model_checkpoint:546] Checkpoint save for step 120 started at 1752855079.9710252.
[NeMo I 2025-07-18 16:11:20 nemo_model_checkpoint:546] Checkpoint save for step 120 started at 1752855080.3671467.
[NeMo I 2025-07-18 16:11:20 nemo_model_checkpoint:236] New best .nemo model saved to: /home/ubuntu/nvidia_nemo/tutorials/asr/experiments/Transducer-Model/2025-07-18_16-10-56/checkpoints/Transducer-Model.nemo
Epoch 1: 100%|██████████| 60/60 [00:11<00:00,  5.29it/s, v_num=0-56, train_step_timing in s=0.113][NeMo I 2025-07-18 16:11:21 optional_cuda_graphs:79] Enabled CUDA graphs for module <class 'nemo.collections.asr.models.rnnt_bpe_models.EncDecRNNTBPEModel'>.decoding.decoding
[NeMo I 2025-07-18 16:11:21 optional_cuda_graphs:79] Enabled CUDA graphs for module <class 'nemo.collections.asr.metrics.wer.WER'>joint._wer.decoding.decoding
Epoch 2:   0%|          | 0/60 [00:00<?, ?it/s, v_num=0-56, train_step_timing in s=0.113]         [NeMo I 2025-07-18 16:11:21 opti

Epoch 3, global step 240: 'val_wer' reached 1.00000 (best 1.00000), saving model to '/home/ubuntu/nvidia_nemo/tutorials/asr/experiments/Transducer-Model/2025-07-18_16-10-56/checkpoints/Transducer-Model--val_wer=1.0000-epoch=3.ckpt' as top 3


[NeMo I 2025-07-18 16:11:40 nemo_model_checkpoint:546] Checkpoint save for step 240 started at 1752855100.1497211.
[NeMo I 2025-07-18 16:11:40 nemo_model_checkpoint:546] Checkpoint save for step 240 started at 1752855100.5654378.
Epoch 3: 100%|██████████| 60/60 [00:11<00:00,  5.38it/s, v_num=0-56, train_step_timing in s=0.108][NeMo I 2025-07-18 16:11:40 optional_cuda_graphs:79] Enabled CUDA graphs for module <class 'nemo.collections.asr.models.rnnt_bpe_models.EncDecRNNTBPEModel'>.decoding.decoding
[NeMo I 2025-07-18 16:11:40 optional_cuda_graphs:79] Enabled CUDA graphs for module <class 'nemo.collections.asr.metrics.wer.WER'>joint._wer.decoding.decoding
Epoch 4:   0%|          | 0/60 [00:00<?, ?it/s, v_num=0-56, train_step_timing in s=0.108]         [NeMo I 2025-07-18 16:11:40 optional_cuda_graphs:53] Disabled CUDA graphs for module <class 'nemo.collections.asr.models.rnnt_bpe_models.EncDecRNNTBPEModel'>.decoding.decoding
[NeMo I 2025-07-18 16:11:40 optional_cuda_graphs:53] Disabled CU

Epoch 5, global step 360: 'val_wer' reached 0.97413 (best 0.97413), saving model to '/home/ubuntu/nvidia_nemo/tutorials/asr/experiments/Transducer-Model/2025-07-18_16-10-56/checkpoints/Transducer-Model--val_wer=0.9741-epoch=5.ckpt' as top 3


[NeMo I 2025-07-18 16:12:00 nemo_model_checkpoint:546] Checkpoint save for step 360 started at 1752855120.60349.
[NeMo I 2025-07-18 16:12:01 nemo_model_checkpoint:546] Checkpoint save for step 360 started at 1752855121.0226743.
[NeMo I 2025-07-18 16:12:01 nemo_model_checkpoint:316] /home/ubuntu/nvidia_nemo/tutorials/asr/experiments/Transducer-Model/2025-07-18_16-10-56/checkpoints/Transducer-Model.nemo already exists, moving existing checkpoint to /home/ubuntu/nvidia_nemo/tutorials/asr/experiments/Transducer-Model/2025-07-18_16-10-56/checkpoints/Transducer-Model-v1.nemo
[NeMo I 2025-07-18 16:12:01 nemo_model_checkpoint:236] New best .nemo model saved to: /home/ubuntu/nvidia_nemo/tutorials/asr/experiments/Transducer-Model/2025-07-18_16-10-56/checkpoints/Transducer-Model.nemo
[NeMo I 2025-07-18 16:12:01 nemo_model_checkpoint:245] Removing old .nemo backup /home/ubuntu/nvidia_nemo/tutorials/asr/experiments/Transducer-Model/2025-07-18_16-10-56/checkpoints/Transducer-Model-v1.nemo
Epoch 5: 1

Epoch 7, global step 480: 'val_wer' reached 0.95213 (best 0.95213), saving model to '/home/ubuntu/nvidia_nemo/tutorials/asr/experiments/Transducer-Model/2025-07-18_16-10-56/checkpoints/Transducer-Model--val_wer=0.9521-epoch=7.ckpt' as top 3


[NeMo I 2025-07-18 16:12:22 nemo_model_checkpoint:546] Checkpoint save for step 480 started at 1752855142.0145965.
[NeMo I 2025-07-18 16:12:22 nemo_model_checkpoint:546] Checkpoint save for step 480 started at 1752855142.4441109.
[NeMo I 2025-07-18 16:12:22 nemo_model_checkpoint:316] /home/ubuntu/nvidia_nemo/tutorials/asr/experiments/Transducer-Model/2025-07-18_16-10-56/checkpoints/Transducer-Model.nemo already exists, moving existing checkpoint to /home/ubuntu/nvidia_nemo/tutorials/asr/experiments/Transducer-Model/2025-07-18_16-10-56/checkpoints/Transducer-Model-v1.nemo
[NeMo I 2025-07-18 16:12:22 nemo_model_checkpoint:236] New best .nemo model saved to: /home/ubuntu/nvidia_nemo/tutorials/asr/experiments/Transducer-Model/2025-07-18_16-10-56/checkpoints/Transducer-Model.nemo
[NeMo I 2025-07-18 16:12:22 nemo_model_checkpoint:245] Removing old .nemo backup /home/ubuntu/nvidia_nemo/tutorials/asr/experiments/Transducer-Model/2025-07-18_16-10-56/checkpoints/Transducer-Model-v1.nemo
Epoch 7:

Epoch 9, global step 600: 'val_wer' was not in top 3


[NeMo I 2025-07-18 16:12:43 nemo_model_checkpoint:546] Checkpoint save for step 600 started at 1752855163.8975737.
Epoch 9: 100%|██████████| 60/60 [00:11<00:00,  5.06it/s, v_num=0-56, train_step_timing in s=0.118][NeMo I 2025-07-18 16:12:44 optional_cuda_graphs:79] Enabled CUDA graphs for module <class 'nemo.collections.asr.models.rnnt_bpe_models.EncDecRNNTBPEModel'>.decoding.decoding
[NeMo I 2025-07-18 16:12:44 optional_cuda_graphs:79] Enabled CUDA graphs for module <class 'nemo.collections.asr.metrics.wer.WER'>joint._wer.decoding.decoding


`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 60/60 [00:11<00:00,  5.06it/s, v_num=0-56, train_step_timing in s=0.118]
[NeMo I 2025-07-18 16:12:44 nemo_model_checkpoint:546] Checkpoint save for step 600 started at 1752855164.3016355.


Restoring states from the checkpoint path at /home/ubuntu/nvidia_nemo/tutorials/asr/experiments/Transducer-Model/2025-07-18_16-10-56/checkpoints/Transducer-Model--val_wer=0.9521-epoch=7.ckpt
Restored all states from the checkpoint at /home/ubuntu/nvidia_nemo/tutorials/asr/experiments/Transducer-Model/2025-07-18_16-10-56/checkpoints/Transducer-Model--val_wer=0.9521-epoch=7.ckpt


[NeMo I 2025-07-18 16:12:44 nemo_model_checkpoint:316] /home/ubuntu/nvidia_nemo/tutorials/asr/experiments/Transducer-Model/2025-07-18_16-10-56/checkpoints/Transducer-Model.nemo already exists, moving existing checkpoint to /home/ubuntu/nvidia_nemo/tutorials/asr/experiments/Transducer-Model/2025-07-18_16-10-56/checkpoints/Transducer-Model-v1.nemo
[NeMo I 2025-07-18 16:12:45 nemo_model_checkpoint:288] Removing old .nemo backup /home/ubuntu/nvidia_nemo/tutorials/asr/experiments/Transducer-Model/2025-07-18_16-10-56/checkpoints/Transducer-Model-v1.nemo


: 