-
Notifications
You must be signed in to change notification settings - Fork 712
ASR JNI bindings #13525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
ASR JNI bindings #13525
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13525
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 6afb221 with merge base 2640a86 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
...executorch_android/src/main/java/org/pytorch/executorch/extension/audio/WhisperCallback.java
Outdated
Show resolved
Hide resolved
|
I'm a little worried about maintaining model-specific APIs. Would it be possible to build it under examples or similar? There is definitely a need to provide these sort of task-specific driver APIs, though it might be nice to have a more general AudioToTextRunner (or something similar). If we do land this as is, can we make sure it's clear that it's unstable and should not be used in production? Even if it is marked experimental in code, if it's under the Android extension, there is an implicit assumption that it will be supported. My concern is that we'll eventually have a more general task-specific runner and will want to migrate to that without hitting BC guarantees. Having this Whisper-specific code under examples would potentially discourage production use until then. @jackzhxng @larryliu0820 How would you expect Whisper to fit into our runner strategy in the long term? |
Right now for LLMs the API is we have Java class LlmModule which wraps a C++ class which stores a pointer to a IRunner runner, and in the constructor we specify which runner to use. So one idea could be to have a ASRModule Java class wrap a C++ class which has a runner which implements some interface we could call AudioToTextRunner. Then the Whisper Runner in examples/qualcomm could implement this interface. |
b45941e to
cb2278a
Compare
cb2278a to
2ccb469
Compare
|
Ideally we can fit Whisper into our multimodal runner strategy but it's not super high priority at the moment. I think it's fine to have an ASR module though, but I agree we shouldn't have anything Whisper specific in there |
6bfa09d to
d5ed11f
Compare
|
@kirklandsign @jackzhxng -- Why did we approve this PR? This should be reviewed carefully, as we're adding first-class citizen APIs to our extension. Also, cc @larryliu0820 - please keep an eye on these kinds of PRs |
| class ET_EXPERIMENTAL ASRRunner { | ||
| public: | ||
| virtual ~ASRRunner() = default; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constructor for ASRRunner should contain configs such as temperature and other configs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also what about taking tokenizer and module as part of the constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the configs (temperature, echo, etc) which are encapsulated in the GenerationConfig struct for LLM runners aren't relevant for ASR. ASRRunner is supposed to be a general interface like IRunner so I didn't put any constructor here
| #endif | ||
| #if defined(EXECUTORCH_BUILD_QNN) | ||
| // create runner | ||
| runner_ = std::make_unique<example::WhisperRunner>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
top-level ASR component shouldn't depend on example Whisper
| static constexpr auto kMaxContextLen = "get_max_context_len"; | ||
| } // namespace | ||
| Runner::Runner( | ||
| WhisperRunner::WhisperRunner( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should refactor out general purpose ASR logic out of this file and put into asr_runner.h/cpp
extension/android/jni/jni_layer.cpp
Outdated
| #ifdef EXECUTORCH_BUILD_ASR_JNI | ||
| extern void register_natives_for_asr(); | ||
| #else | ||
| void register_natives_for_asr() {} | ||
| #endif | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Semi related to the diff, I think we started to recognize a plugin pattern here, it would be better if we can rely on static initialization to register these natives and just iterate through them in line 535. This way we don't need all these macros, we just link the necessary library and it register itself.
Added EXECUTORCH_BUILD_WHISPER_JNI flag
5a4ef35 to
6afb221
Compare
|
Hi @rohansjoshi! Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours needs attention. You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
Summary
Added JNI bindings to run Qualcomm Whisper Runner in Android
Test plan
CI tests