Skip to content

ONNX decoding fails when using a non-default blank_id during training #2017

@Rabah-IA

Description

@Rabah-IA

Bug Description

When a Zipformer model is trained with a non-default blank_id (e.g., blank_id=1), the exported ONNX model fails to produce correct transcriptions when using icefall's ONNX decoding scripts (pruned_transducer_stateless7_streaming/onnx_pretrained.py or .../decode.py --onnx).

The decoding process either outputs a stream of tokens or tokens, depending on the tokens.txt file, but never the correct transcription.

However, the exact same ONNX model files decode perfectly when used with an external inference engine like sherpa-onnx, which confirms that the exported ONNX model is valid and that the issue lies within the icefall ONNX decoding scripts.

To Reproduce

Prepare Data: Generate a vocabulary. This creates a tokens.txt where is at ID 0 and is at ID 1.

Train the Model: Train a streaming Zipformer model using pruned_transducer_stateless7_streaming/train.py, but override the default blank ID by passing the argument --blank-id=1. The training converges successfully.
Sample training log snippet:

"blank_id": 1,
"bpe_model": "/content/drive/MyDrive/everyayah_1_zipformer/bpe_model/everyayah_bpe_500.model",
...
"vocab_size": 500,

Verify with PyTorch Decoding: Decode a test set using pruned_transducer_stateless7_streaming/decode.py (without the --onnx flag). The model decodes perfectly with a very low WER (e.g., 0.98%), confirming the trained PyTorch model is correct. The log shows it correctly identifies the blank_id as 1.
Sample decode.py log snippet:

"blank_id": 1,
"unk_id": 1,
...
%WER 0.98% [58 / 5931, 21 ins, 26 del, 11 sub ]

Export to ONNX: Export the model (either a single checkpoint or an averaged one) using pruned_transducer_stateless7_streaming/export-onnx.py.

Attempt ONNX Decoding (Failure Case): Attempt to decode an audio file using pruned_transducer_stateless7_streaming/onnx_pretrained.py with the exported ONNX models and the original tokens.txt ( 0, 1).

Observed Behavior: The output is a stream of tokens.

Reasoning: The model is emitting blank ID 1, which the tokens.txt maps to .

Attempt ONNX Decoding with Corrected Tokens (Failure Case 2): Create a tokens_corrected.txt where is 0 and is 1 to match the training configuration. Rerun the onnx_pretrained.py script.

Observed Behavior: The output is now a stream of tokens, but still no meaningful transcription. This confirms the script is now correctly mapping ID 1 to , but it shows the model is only outputting blanks.

**Sample Output:

<blk><blk> اهْدِ<blk><blk> بِ<blk><blk> الصِّ<blk> الصِّ<blk> الصَّا<blk><blk> الْمُ<blk><blk> الْمُ<blk><blk> الْمُسْتَ<blk> الْمُسْتَ<blk>...**

Verify ONNX Model with Sherpa-ONNX (Success Case): Use the exact same ONNX models and the original tokens.txt with sherpa-onnx.

Observed Behavior: The transcription is perfect. This proves the exported ONNX model is not corrupted.

**Sample Output:

/content/drive/MyDrive/everyayah_R_zipformer/wav/train4.wav
اهْدِنَا الصِّرَاطَ الْمُسْتَقِيمَ**

Environment

**icefall-git-sha1: 34fc1fd-dirty

k2-version: 1.24.4

torch-version: 2.6.0+cu124

onnx: 1.16.0

onnxruntime-gpu: 1.18.0**

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions