Skip to content

Conversation

@rohansjoshi
Copy link
Contributor

Summary

Added JNI bindings to run Qualcomm Whisper Runner in Android

Test plan

CI tests

@pytorch-bot
Copy link

pytorch-bot bot commented Aug 19, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13525

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 6afb221 with merge base 2640a86 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 19, 2025
@github-actions
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@cccclai cccclai marked this pull request as draft August 20, 2025 15:19
@GregoryComer
Copy link
Member

GregoryComer commented Aug 20, 2025

I'm a little worried about maintaining model-specific APIs. Would it be possible to build it under examples or similar?

There is definitely a need to provide these sort of task-specific driver APIs, though it might be nice to have a more general AudioToTextRunner (or something similar). If we do land this as is, can we make sure it's clear that it's unstable and should not be used in production? Even if it is marked experimental in code, if it's under the Android extension, there is an implicit assumption that it will be supported.

My concern is that we'll eventually have a more general task-specific runner and will want to migrate to that without hitting BC guarantees. Having this Whisper-specific code under examples would potentially discourage production use until then.

@jackzhxng @larryliu0820 How would you expect Whisper to fit into our runner strategy in the long term?

@rohansjoshi
Copy link
Contributor Author

I'm a little worried about maintaining model-specific APIs. Would it be possible to build it under examples or similar?

There is definitely a need to provide these sort of task-specific driver APIs, though it might be nice to have a more general AudioToTextRunner (or something similar). If we do land this as is, can we make sure it's clear that it's unstable and should not be used in production? Even if it is marked experimental in code, if it's under the Android extension, there is an implicit assumption that it will be supported.

My concern is that we'll eventually have a more general task-specific runner and will want to migrate to that without hitting BC guarantees. Having this Whisper-specific code under examples would potentially discourage production use until then.

@jackzhxng @larryliu0820 How would you expect Whisper to fit into our runner strategy in the long term?

Right now for LLMs the API is we have Java class LlmModule which wraps a C++ class which stores a pointer to a IRunner runner, and in the constructor we specify which runner to use. So one idea could be to have a ASRModule Java class wrap a C++ class which has a runner which implements some interface we could call AudioToTextRunner. Then the Whisper Runner in examples/qualcomm could implement this interface.

@jackzhxng
Copy link
Contributor

Ideally we can fit Whisper into our multimodal runner strategy but it's not super high priority at the moment. I think it's fine to have an ASR module though, but I agree we shouldn't have anything Whisper specific in there

@rohansjoshi rohansjoshi force-pushed the whisper-jni branch 4 times, most recently from 6bfa09d to d5ed11f Compare September 12, 2025 22:30
@rohansjoshi rohansjoshi changed the title Qualcomm Whisper Runner JNI bindings ASR JNI bindings Sep 15, 2025
@mergennachin
Copy link
Contributor

mergennachin commented Sep 16, 2025

@kirklandsign @jackzhxng -- Why did we approve this PR? This should be reviewed carefully, as we're adding first-class citizen APIs to our extension.

Also, cc @larryliu0820 - please keep an eye on these kinds of PRs

class ET_EXPERIMENTAL ASRRunner {
public:
virtual ~ASRRunner() = default;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constructor for ASRRunner should contain configs such as temperature and other configs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also what about taking tokenizer and module as part of the constructor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the configs (temperature, echo, etc) which are encapsulated in the GenerationConfig struct for LLM runners aren't relevant for ASR. ASRRunner is supposed to be a general interface like IRunner so I didn't put any constructor here

#endif
#if defined(EXECUTORCH_BUILD_QNN)
// create runner
runner_ = std::make_unique<example::WhisperRunner>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

top-level ASR component shouldn't depend on example Whisper

static constexpr auto kMaxContextLen = "get_max_context_len";
} // namespace
Runner::Runner(
WhisperRunner::WhisperRunner(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should refactor out general purpose ASR logic out of this file and put into asr_runner.h/cpp

Comment on lines 511 to 516
#ifdef EXECUTORCH_BUILD_ASR_JNI
extern void register_natives_for_asr();
#else
void register_natives_for_asr() {}
#endif

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semi related to the diff, I think we started to recognize a plugin pattern here, it would be better if we can rely on static initialization to register these natives and just iterate through them in line 535. This way we don't need all these macros, we just link the necessary library and it register itself.

@meta-cla
Copy link

meta-cla bot commented Sep 20, 2025

Hi @rohansjoshi!

Thank you for your pull request.

We require contributors to sign our Contributor License Agreement, and yours needs attention.

You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants