Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming Zipformer with multi-dataset #984

Merged

Conversation

marcoyang1998
Copy link
Collaborator

@marcoyang1998 marcoyang1998 commented Apr 3, 2023

This PR adds multi-dataset setup for streaming zipformer. Unlike older recipes (e.g pruned_transducer_stateless3 and pruned_transducer_stateless8), we only use one head for both LibriSpeech and GigaSpeech.

The model achieves the following WERs:

decoding method chunk size test-clean test-other comment decoding mode
greedy search 320ms 2.43 6.0 --epoch 20 --avg 4 simulated streaming
greedy search 320ms 2.47 6.13 --epoch 20 --avg 4 chunk-wise
fast beam search 320ms 2.43 5.99 --epoch 20 --avg 4 simulated streaming
fast beam search 320ms 2.8 6.46 --epoch 20 --avg 4 chunk-wise
modified beam search 320ms 2.4 5.96 --epoch 20 --avg 4 simulated streaming
modified beam search 320ms 2.42 6.03 --epoch 20 --avg 4 chunk-size
greedy search 640ms 2.26 5.58 --epoch 20 --avg 4 simulated streaming
greedy search 640ms 2.33 5.76 --epoch 20 --avg 4 chunk-wise
fast beam search 640ms 2.27 5.54 --epoch 20 --avg 4 simulated streaming
fast beam search 640ms 2.37 5.75 --epoch 20 --avg 4 chunk-wise
modified beam search 640ms 2.22 5.5 --epoch 20 --avg 4 simulated streaming
modified beam search 640ms 2.25 5.69 --epoch 20 --avg 4 chunk-size

Will upload the pre-trained models later to huggingface.

@marcoyang1998 marcoyang1998 changed the title Zipformer libri small models Streaming Zipformer with multi-dataset Apr 3, 2023
@ezerhouni
Copy link
Collaborator

@marcoyang1998 Thank you for this PR
Did you try multi head streaming zipformer ?

@marcoyang1998
Copy link
Collaborator Author

Did you try multi head streaming zipformer ?

No. We did some internal testing and found out that there is no performance difference between one head and two heads. Using one head also simplifies the inference because the single head can be used for both Librispeech and Gigaspeech.

@marcoyang1998
Copy link
Collaborator Author

A comparison with our previous best-performing streaming model lstm_transducer_stateless2 (with two heads) on GigaSpeech:

model head dev test
lstm_transducer_stateless2 libri 15.07 15.45
lstm_transducer_stateless2 giga 12.45 12.39
this PR, pruned_transducer_stateless7_streaming_multi one head 12.08 11.98

As can be seen, we need to use the Giga head to decode on GigaSpeech in the two-headed lstm_transducer_stateless2 to have good WERs, making the usage complicated. Using only one head resolves this problem because the single head model achieves good WERs on LibriSpeech and GigaSpeech at the same time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants