-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added triton config for streaming zipformer #430
Conversation
Many thanks. Would you mind adding a script such that we could lauch the streaming service with one button? e.g. build_librispeech_pruned_transducer_stateless7_streaming.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind using soft links for duplicate files from https://github.com/k2-fsa/sherpa/tree/master/triton/model_repo_streaming this repo?
Yes give me one day, will push everything and add a README.md also |
…former in 'single click'
@yuekaizhang I have done requested changes, and tested with bash script, triton server is up and running with all modules without any issues, also tested via decode_manifest client script I get ASR outputs. Also, I feel readme file is not required, as build script already lists all the steps to start the server, users should be able to follow easily. Let me know if anything else required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be a .gitkeep file.
All config.pbtxt, config.pbtxt.template and model.py files under model_repo_streaming_zipformer/ might be same with files in model_repo_streaming except for encoder/config.pbtxt.template. Would you mind checking them? (diff file1 file2 first, then decide if we need a soft link.) The bash scripts look nice. Thanks again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check model_repo_streaming/decoder/config.pbtxt.template
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check model_repo_streaming/feature_extractor/1/model.py
I already checked one by one, they are not same, I made some changes(for example changed type to FP32, to make it compatible with pretrained model), there was tensor name change for encoder outputs(x and x_lens from speech and speech_lengths) in python file for feature_extractor.py The only same file is scorer/1/model.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check model_repo_streaming/feature_extractor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check check model_repo_streaming/joiner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be duplicate with model_repo_streaming/transducer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were some minor changes in this like changing tensor names(to make it compatible with input and output names of onnx export script (export.py))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check model_repo_streaming/scorer
I see. https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming/export.py#L811-L820 Actually, you could set --fp16 to export fp16.onnx. Then the whole model_repo could infer with fp16, which could increase throughput a lot. I was wondering if you would like try with fp16. |
Yes sure, I will try with that, didnt use that option before, will update the script(s) accordingly to make it as generic as possible |
Also, it would be better if we make a PR https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming/export.py#L377-L394 to change the str "x" into "speech" "x_length" into "speech_lenghts" such that we could reuse the feature_extractor part codes. |
Or https://github.com/k2-fsa/sherpa/tree/master/triton/model_repo_streaming/feature_extractor change files here and scripts/export_onnx.py, from "speech" to "x", if this way is more convenient for you. |
I am still a little bit confused as to why we have so many different export onnx scripts ? |
Ideally, we would like reuse the scripts for sherpa-onnx project. However, several minor differences block it e.g. if processed_len is regarded as a state tensor for streaming ASR. The WAR for now is to use scripts/build_X bash scripts to avoid confusion. Also, the difference between conformer and zipformer let us have to keep separte model_repo files and export scripts. |
…, changed tensor names(speech and speech_lengths) in export_onnx.py to x and x_lens, and also same thing in model_repo_streaming/feature_extractor and model_repo_streaming/transducer
Added the required changes. Also tested out with fp16, the script works, however Triton gives a warning when loading.
|
This warning is fine. |
triton/scripts/export_onnx.py
Outdated
@@ -390,16 +390,16 @@ def export_encoder_model_onnx_triton( | |||
verbose=False, | |||
opset_version=opset_version, | |||
input_names=[ | |||
"speech", | |||
"speech_lengths", | |||
"x", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function's input names are for offline ASR, which should keep unchanged.
Many thanks @uni-saurabh-vyas |
Based on #412