ONNX decoding fails when using a non-default blank_id during training


Bug Description

When a Zipformer model is trained with a non-default blank_id (e.g., **blank_id=1**), the exported ONNX model fails to produce correct transcriptions when using icefall's ONNX decoding scripts (pruned_transducer_stateless7_streaming/onnx_pretrained.py or .../**decode.py** --onnx).

The decoding process either outputs a stream of <unk> tokens or <blk> tokens, depending on the tokens.txt file, but never the correct transcription.

However, the exact same ONNX model files decode perfectly when used with an external inference engine like sherpa-onnx, which confirms that the exported ONNX model is valid and that the issue lies within the icefall ONNX decoding scripts.

To Reproduce

Prepare Data: Generate a vocabulary. This creates a tokens.txt where <blk> is at ID 0 and <unk> is at ID 1.

Train the Model: Train a streaming Zipformer model using pruned_transducer_stateless7_streaming/train.py, but override the default blank ID by passing the argument --blank-id=1. The training converges successfully.
Sample training log snippet:

```
"blank_id": 1,
"bpe_model": "/content/drive/MyDrive/everyayah_1_zipformer/bpe_model/everyayah_bpe_500.model",
...
"vocab_size": 500,
```

Verify with PyTorch Decoding: Decode a test set using pruned_transducer_stateless7_streaming/decode.py (without the --onnx flag). The model decodes perfectly with a very **low WER (e.g., 0.98%**), confirming the trained PyTorch model is correct. The log shows it correctly identifies the blank_id as 1.
Sample decode.py log snippet:

```
"blank_id": 1,
"unk_id": 1,
...
%WER 0.98% [58 / 5931, 21 ins, 26 del, 11 sub ]
```

Export to ONNX: Export the model (either a single checkpoint or an averaged one) using pruned_transducer_stateless7_streaming/export-onnx.py.

Attempt ONNX Decoding (Failure Case): Attempt to decode an audio file using pruned_transducer_stateless7_streaming/onnx_pretrained.py with the exported ONNX models and the original tokens.txt (<blk> 0, <unk> 1).

Observed Behavior: The output is a stream of <unk> tokens.

Reasoning: The model is emitting blank ID 1, which the tokens.txt maps to <unk>.

Attempt ONNX Decoding with Corrected Tokens (Failure Case 2): Create a tokens_corrected.txt where <unk> is 0 and <blk> is 1 to match the training configuration. Rerun the onnx_pretrained.py script.

Observed Behavior: The output is now a stream of <blk> tokens, but still no meaningful transcription. This confirms the script is now correctly mapping ID 1 to <blk>, but it shows the model is only outputting blanks.

```
**Sample Output:

<blk><blk> اهْدِ<blk><blk> بِ<blk><blk> الصِّ<blk> الصِّ<blk> الصَّا<blk><blk> الْمُ<blk><blk> الْمُ<blk><blk> الْمُسْتَ<blk> الْمُسْتَ<blk>...**

```
Verify ONNX Model with Sherpa-ONNX (Success Case): Use the exact same ONNX models and the original tokens.txt with sherpa-onnx.

Observed Behavior: The transcription is perfect. This proves the exported ONNX model is not corrupted.

```
**Sample Output:

/content/drive/MyDrive/everyayah_R_zipformer/wav/train4.wav
اهْدِنَا الصِّرَاطَ الْمُسْتَقِيمَ**

```

Environment

**icefall-git-sha1: 34fc1fdf-dirty

k2-version: 1.24.4

torch-version: 2.6.0+cu124

onnx: 1.16.0

onnxruntime-gpu: 1.18.0**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ONNX decoding fails when using a non-default blank_id during training #2017

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ONNX decoding fails when using a non-default blank_id during training #2017

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions