Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added triton config for streaming zipformer #430

Merged
merged 5 commits into from
Jul 10, 2023

Conversation

uni-saurabh-vyas
Copy link
Contributor

Based on #412

@yuekaizhang
Copy link
Collaborator

Many thanks. Would you mind adding a script such that we could lauch the streaming service with one button? e.g. build_librispeech_pruned_transducer_stateless7_streaming.sh

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind using soft links for duplicate files from https://github.com/k2-fsa/sherpa/tree/master/triton/model_repo_streaming this repo?

@uni-saurabh-vyas
Copy link
Contributor Author

Yes give me one day, will push everything and add a README.md also

@uni-saurabh-vyas
Copy link
Contributor Author

@yuekaizhang I have done requested changes, and tested with bash script, triton server is up and running with all modules without any issues, also tested via decode_manifest client script I get ASR outputs.

Also, I feel readme file is not required, as build script already lists all the steps to start the server, users should be able to follow easily. Let me know if anything else required.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be a .gitkeep file.

@yuekaizhang
Copy link
Collaborator

All config.pbtxt, config.pbtxt.template and model.py files under model_repo_streaming_zipformer/ might be same with files in model_repo_streaming except for encoder/config.pbtxt.template. Would you mind checking them? (diff file1 file2 first, then decide if we need a soft link.)

The bash scripts look nice. Thanks again.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check model_repo_streaming/decoder/config.pbtxt.template

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check model_repo_streaming/feature_extractor/1/model.py

@uni-saurabh-vyas
Copy link
Contributor Author

I already checked one by one, they are not same, I made some changes(for example changed type to FP32, to make it compatible with pretrained model), there was tensor name change for encoder outputs(x and x_lens from speech and speech_lengths) in python file for feature_extractor.py

The only same file is scorer/1/model.py

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check model_repo_streaming/feature_extractor

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check check model_repo_streaming/joiner

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be duplicate with model_repo_streaming/transducer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were some minor changes in this like changing tensor names(to make it compatible with input and output names of onnx export script (export.py))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check model_repo_streaming/scorer

@yuekaizhang
Copy link
Collaborator

I already checked one by one, they are not same, I made some changes(for example changed type to FP32, to make it compatible with pretrained model), there was tensor name change for encoder outputs(x and x_lens from speech and speech_lengths) in python file for feature_extractor.py

The only same file is scorer/1/model.py

I see. https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming/export.py#L811-L820 Actually, you could set --fp16 to export fp16.onnx. Then the whole model_repo could infer with fp16, which could increase throughput a lot. I was wondering if you would like try with fp16.

@uni-saurabh-vyas
Copy link
Contributor Author

Yes sure, I will try with that, didnt use that option before, will update the script(s) accordingly to make it as generic as possible

@yuekaizhang
Copy link
Collaborator

Also, it would be better if we make a PR https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming/export.py#L377-L394 to change the str "x" into "speech" "x_length" into "speech_lenghts" such that we could reuse the feature_extractor part codes.

@yuekaizhang
Copy link
Collaborator

Also, it would be better if we make a PR https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming/export.py#L377-L394 to change the str "x" into "speech" "x_length" into "speech_lenghts" such that we could reuse the feature_extractor part codes.

Or https://github.com/k2-fsa/sherpa/tree/master/triton/model_repo_streaming/feature_extractor change files here and scripts/export_onnx.py, from "speech" to "x", if this way is more convenient for you.

@uni-saurabh-vyas
Copy link
Contributor Author

I am still a little bit confused as to why we have so many different export onnx scripts ?

@yuekaizhang
Copy link
Collaborator

yuekaizhang commented Jul 7, 2023

I am still a little bit confused as to why we have so many different export onnx scripts ?

Ideally, we would like reuse the scripts for sherpa-onnx project. However, several minor differences block it e.g. if processed_len is regarded as a state tensor for streaming ASR.

The WAR for now is to use scripts/build_X bash scripts to avoid confusion. Also, the difference between conformer and zipformer let us have to keep separte model_repo files and export scripts.

…, changed tensor names(speech and speech_lengths) in export_onnx.py to x and x_lens, and also same thing in model_repo_streaming/feature_extractor and model_repo_streaming/transducer
@uni-saurabh-vyas
Copy link
Contributor Author

Added the required changes.

Also tested out with fp16, the script works, however Triton gives a warning when loading.

I0708 04:34:38.286388 11446 model_lifecycle.cc:694] successfully loaded 'feature_extractor' version 1
2023-07-08 04:34:38.472352679 [W:onnxruntime:, constant_folding.cc:179 ApplyImpl] Could not find a CPU kernel and hence
 can't constant fold Pow node '/Pow_1'
2023-07-08 04:34:38.472412110 [W:onnxruntime:, constant_folding.cc:179 ApplyImpl] Could not find a CPU kernel and hence
 can't constant fold Pow node '/Pow'
2023-07-08 04:34:38.598780814 [W:onnxruntime:, constant_folding.cc:179 ApplyImpl] Could not find a CPU kernel and hence
 can't constant fold Pow node '/Pow_1'
2023-07-08 04:34:38.598831005 [W:onnxruntime:, constant_folding.cc:179 ApplyImpl] Could not find a CPU kernel and hence
 can't constant fold Pow node '/Pow'
2023-07-08 04:34:38.612756773 [W:onnxruntime:, constant_folding.cc:179 ApplyImpl] Could not find a CPU kernel and hence
 can't constant fold Pow node '/Pow_1'
2023-07-08 04:34:38.612800814 [W:onnxruntime:, constant_folding.cc:179 ApplyImpl] Could not find a CPU kernel and hence
 can't constant fold Pow node '/Pow'
2023-07-08 04:34:39.567382515 [W:onnxruntime:, session_state.cc:1030 VerifyEachNodeIsAssignedToAnEp] Some nodes were no
t assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT e
xplicitly assigns shape related ops to CPU to improve perf.

@yuekaizhang
Copy link
Collaborator

Added the required changes.

Also tested out with fp16, the script works, however Triton gives a warning when loading.

I0708 04:34:38.286388 11446 model_lifecycle.cc:694] successfully loaded 'feature_extractor' version 1
2023-07-08 04:34:38.472352679 [W:onnxruntime:, constant_folding.cc:179 ApplyImpl] Could not find a CPU kernel and hence
 can't constant fold Pow node '/Pow_1'
2023-07-08 04:34:38.472412110 [W:onnxruntime:, constant_folding.cc:179 ApplyImpl] Could not find a CPU kernel and hence
 can't constant fold Pow node '/Pow'
2023-07-08 04:34:38.598780814 [W:onnxruntime:, constant_folding.cc:179 ApplyImpl] Could not find a CPU kernel and hence
 can't constant fold Pow node '/Pow_1'
2023-07-08 04:34:38.598831005 [W:onnxruntime:, constant_folding.cc:179 ApplyImpl] Could not find a CPU kernel and hence
 can't constant fold Pow node '/Pow'
2023-07-08 04:34:38.612756773 [W:onnxruntime:, constant_folding.cc:179 ApplyImpl] Could not find a CPU kernel and hence
 can't constant fold Pow node '/Pow_1'
2023-07-08 04:34:38.612800814 [W:onnxruntime:, constant_folding.cc:179 ApplyImpl] Could not find a CPU kernel and hence
 can't constant fold Pow node '/Pow'
2023-07-08 04:34:39.567382515 [W:onnxruntime:, session_state.cc:1030 VerifyEachNodeIsAssignedToAnEp] Some nodes were no
t assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT e
xplicitly assigns shape related ops to CPU to improve perf.

This warning is fine.

@@ -390,16 +390,16 @@ def export_encoder_model_onnx_triton(
verbose=False,
opset_version=opset_version,
input_names=[
"speech",
"speech_lengths",
"x",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function's input names are for offline ASR, which should keep unchanged.

@yuekaizhang
Copy link
Collaborator

Many thanks @uni-saurabh-vyas

@yuekaizhang yuekaizhang merged commit 478b4bc into k2-fsa:master Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants