Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing warmup process as k2-sherpa #134

Closed
jingzhaoou opened this issue Apr 28, 2023 · 4 comments
Closed

Missing warmup process as k2-sherpa #134

jingzhaoou opened this issue Apr 28, 2023 · 4 comments

Comments

@jingzhaoou
Copy link
Contributor

In k2-sherpa, during warm-up, the encoder is initialized to the model initial states.

  void WarmUp(torch::Tensor features, torch::Tensor features_length) {
    torch::IValue states = GetEncoderInitStates();
    states = StackStates({states});

However, I don't see similar warm-up process with sherpa-onnx. This may cause a bit worse performance at the beginning of a stream. It may be tricky to save the model initial states during ONNX export I guess.

Appreciate any suggestions.

@csukuangfj
Copy link
Collaborator

Could you help test the online or offline websocket server in sherpa-onnx and see if the second request is much faster than the very first request?


By the way, we have similar methods as k2-fsa/sherpa in sherpa-onnx

virtual std::vector<Ort::Value> GetEncoderInitStates() = 0;

virtual std::vector<Ort::Value> StackStates(
const std::vector<std::vector<Ort::Value>> &states) const = 0;

virtual std::vector<std::vector<Ort::Value>> UnStackStates(
const std::vector<Ort::Value> &states) const = 0;

It is not difficult to add warmup to sherpa-onnx if you find it is helpful to reduce the latency of the first request.


It may be tricky to save the model initial states during ONNX export I guess.

You don't need to save the initial states during model exporting. The initial state is constructed in the code in sherpa-onnx.
We only need some model parameters to construct the initial states.

@jingzhaoou
Copy link
Contributor Author

As I get better understanding of the code base, I realized that sherpa-onnx rewrites the Python initialization code in C++. The warm-up process is actually already in place. I am closing this PR.

@uni-sagar-raikar
Copy link

@csukuangfj @jingzhaoou Could you point me to code where model initialization is rewritten in C++. Didn't quite understand the point from previous comment. We see that, Warmup is still not part of the online websocket server.

@csukuangfj
Copy link
Collaborator

There is no warmup in sherpa-onnx, I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants