-
Notifications
You must be signed in to change notification settings - Fork 372
Description
Bug Description
When a Zipformer model is trained with a non-default blank_id (e.g., blank_id=1), the exported ONNX model fails to produce correct transcriptions when using icefall's ONNX decoding scripts (pruned_transducer_stateless7_streaming/onnx_pretrained.py or .../decode.py --onnx).
The decoding process either outputs a stream of tokens or tokens, depending on the tokens.txt file, but never the correct transcription.
However, the exact same ONNX model files decode perfectly when used with an external inference engine like sherpa-onnx, which confirms that the exported ONNX model is valid and that the issue lies within the icefall ONNX decoding scripts.
To Reproduce
Prepare Data: Generate a vocabulary. This creates a tokens.txt where is at ID 0 and is at ID 1.
Train the Model: Train a streaming Zipformer model using pruned_transducer_stateless7_streaming/train.py, but override the default blank ID by passing the argument --blank-id=1. The training converges successfully.
Sample training log snippet:
"blank_id": 1,
"bpe_model": "/content/drive/MyDrive/everyayah_1_zipformer/bpe_model/everyayah_bpe_500.model",
...
"vocab_size": 500,
Verify with PyTorch Decoding: Decode a test set using pruned_transducer_stateless7_streaming/decode.py (without the --onnx flag). The model decodes perfectly with a very low WER (e.g., 0.98%), confirming the trained PyTorch model is correct. The log shows it correctly identifies the blank_id as 1.
Sample decode.py log snippet:
"blank_id": 1,
"unk_id": 1,
...
%WER 0.98% [58 / 5931, 21 ins, 26 del, 11 sub ]
Export to ONNX: Export the model (either a single checkpoint or an averaged one) using pruned_transducer_stateless7_streaming/export-onnx.py.
Attempt ONNX Decoding (Failure Case): Attempt to decode an audio file using pruned_transducer_stateless7_streaming/onnx_pretrained.py with the exported ONNX models and the original tokens.txt ( 0, 1).
Observed Behavior: The output is a stream of tokens.
Reasoning: The model is emitting blank ID 1, which the tokens.txt maps to .
Attempt ONNX Decoding with Corrected Tokens (Failure Case 2): Create a tokens_corrected.txt where is 0 and is 1 to match the training configuration. Rerun the onnx_pretrained.py script.
Observed Behavior: The output is now a stream of tokens, but still no meaningful transcription. This confirms the script is now correctly mapping ID 1 to , but it shows the model is only outputting blanks.
**Sample Output:
<blk><blk> اهْدِ<blk><blk> بِ<blk><blk> الصِّ<blk> الصِّ<blk> الصَّا<blk><blk> الْمُ<blk><blk> الْمُ<blk><blk> الْمُسْتَ<blk> الْمُسْتَ<blk>...**
Verify ONNX Model with Sherpa-ONNX (Success Case): Use the exact same ONNX models and the original tokens.txt with sherpa-onnx.
Observed Behavior: The transcription is perfect. This proves the exported ONNX model is not corrupted.
**Sample Output:
/content/drive/MyDrive/everyayah_R_zipformer/wav/train4.wav
اهْدِنَا الصِّرَاطَ الْمُسْتَقِيمَ**
Environment
**icefall-git-sha1: 34fc1fd-dirty
k2-version: 1.24.4
torch-version: 2.6.0+cu124
onnx: 1.16.0
onnxruntime-gpu: 1.18.0**