Skip to content

Android: LlmModule Closeable lifecycle and ModelRunner crash fix #19012

Merged
psiddh merged 10 commits intopytorch:mainfrom
psiddh:followup-crash-prevention
Apr 21, 2026
Merged

Android: LlmModule Closeable lifecycle and ModelRunner crash fix #19012
psiddh merged 10 commits intopytorch:mainfrom
psiddh:followup-crash-prevention

Conversation

@psiddh
Copy link
Copy Markdown
Contributor

@psiddh psiddh commented Apr 20, 2026

Summary

  • LlmModule: Implement Closeable so modules can be used with try-with-resources. Add close() with explicit
    caller contract. Deprecate resetNative() in favor of close(). Add ReentrantLock to serialize access to
    non-thread-safe native state, matching Module.java's pattern.
  • ModelRunner: Add early return after loadMethod() failure to prevent cascading crash in warmup/inference
    loops. Fix native memory leak by wrapping benchmark body in try/finally for module.destroy().

LlmModule lifecycle design

  • ReentrantLock serializes all operations — prevents accidental concurrent access (e.g. double-tap, rotation
    during generation) from causing native crashes
  • generate()/prefill*()/load()/resetContext() acquire the lock
  • close() uses tryLock() — fails fast with IllegalStateException instead of blocking (no ANR risk)
  • stop() remains a bare native method — lock-free, uses C++ atomic flag for cross-thread interrupt
  • Caller contract: call stop(), wait for generate() to return, then close()

Tests added

  • testUseAfterCloseThrows: generate() after close() throws IllegalStateException
  • testCloseIsIdempotent: double close() is safe

Test plan

  • Existing LlmModuleInstrumentationTest tests pass
  • New lifecycle tests pass
  • ModelRunner benchmark handles load failures gracefully

This PR was authored with the help of Claude.

psiddh added 2 commits April 20, 2026 13:26
LlmModule: Add ReentrantReadWriteLock to prevent use-after-free when
close() races with generate(). generate()/prefill()/load() acquire the
read lock; close() acquires the write lock after calling stop(). The
fair lock ensures close() drains in-flight operations. stop() remains
lock-free for cross-thread interrupt capability.

LlmModule now implements Closeable. resetNative() is deprecated and
delegates to close().

ModelRunner: Add early return after loadMethod() failure to prevent
cascading crash in warmup/inference loops.

This commit was authored with the help of Claude.
Add 3 LlmModule lifecycle tests:
- testUseAfterCloseThrows: generate() after close() throws IllegalStateException
- testStopAfterCloseIsNoOp: stop() after close() is safe
- testCloseIsIdempotent: double close() is safe

Fix ModelRunner: call module.destroy() on both error and success paths
to prevent native memory leaks.

This commit was authored with the help of Claude.
Copilot AI review requested due to automatic review settings April 20, 2026 22:28
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 20, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19012

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 2 Pending, 2 Unrelated Failures

As of commit 078f425 with merge base 66e4656 (image):

NEW FAILURE - The following job has failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 20, 2026
@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@psiddh psiddh changed the title Followup crash prevention Android: LlmModule thread safety (Closeable + ReadWriteLock) and ModelRunner crash fix Apr 20, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR focuses on preventing lifecycle-related crashes in the Android benchmarking and LLM extension codepaths by tightening native resource cleanup and aligning JNI naming with the Java wrapper pattern.

Changes:

  • Ensure benchmark modules are destroyed on both success and load-failure paths.
  • Update LlmModule lifecycle handling (introduce close(), add locking, and wrap native calls with closed-state checks).
  • Rename the LLM JNI stop binding to stopNative and add instrumentation tests covering close/stop behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
extension/benchmark/android/benchmark/app/src/main/java/org/pytorch/minibench/ModelRunner.java Adds early-return metrics on load failure and ensures module.destroy() is called on success/failure paths.
extension/android/jni/jni_layer_llama.cpp Renames JNI registration from stop to stopNative to match the Java wrapper API.
extension/android/executorch_android/src/main/java/org/pytorch/executorch/extension/llm/LlmModule.java Introduces Closeable, adds read/write locking and closed-state checks, and splits stop() vs stopNative().
extension/android/executorch_android/src/androidTest/java/org/pytorch/executorch/LlmModuleInstrumentationTest.kt Adds lifecycle instrumentation tests for use-after-close, stop-after-close, and idempotent close.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@GregoryComer
Copy link
Copy Markdown
Member

Regarding concurrent operations, we don't want to allow concurrent load, generate, or prefill, right? The runtime itself is not thread safe for concurrent execution of the same method, and there is shared KV cache state.

Remove ReentrantReadWriteLock — the native runtime is not thread-safe
for concurrent execution, so the lock gives a false sense of safety
without changing the caller's obligations. The caller contract is:
call stop(), wait for generate() to return, then close().

Plain boolean mDestroyed + checkNotDestroyed() catches same-thread
programmer errors with a clean IllegalStateException. stop() uses the
native atomic flag for cross-thread interrupt — that is the intended
mechanism.

This commit was authored with the help of Claude.
@psiddh psiddh changed the title Android: LlmModule thread safety (Closeable + ReadWriteLock) and ModelRunner crash fix Android: LlmModule Closeable lifecycle and ModelRunner crash fix Apr 21, 2026
@psiddh
Copy link
Copy Markdown
Contributor Author

psiddh commented Apr 21, 2026

Regarding concurrent operations, we don't want to allow concurrent load, generate, or prefill, right? The runtime itself is not thread safe for concurrent execution of the same method, and there is shared KV cache state.

Fair point — removed the ReentrantReadWriteLock. The native runtime isn't thread-safe for concurrent execution, so
the lock gives a false sense of safety without changing the caller's obligations. Now it's just plain mDestroyed +
checkNotDestroyed() to catch same-thread programmer errors, plus Closeable with an explicit contract: call stop(),
wait for generate() to return, then close(). Related lifecycle tests kept.

Open to suggestions.

Ensures module.destroy() runs even if forward() or etdump() throws.

This commit was authored with the help of Claude.
Copilot AI review requested due to automatic review settings April 21, 2026 01:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

psiddh added 2 commits April 20, 2026 19:10
volatile is needed because stop() is callable from any thread and reads
mDestroyed — without it, a cross-thread close() write may not be visible.
Also fix ambiguous {@link #generate} to reference a specific overload.

This commit was authored with the help of Claude.
The caller contract requires single-threaded access: call stop(), wait
for generate() to return, then close(). In that sequence stop() always
runs before close(), so cross-thread visibility of mDestroyed is not
needed. Plain boolean is sufficient for same-thread programmer error
detection.

This commit was authored with the help of Claude.
Copilot AI review requested due to automatic review settings April 21, 2026 02:11
stop() does not need a Java wrapper or mDestroyed guard. The cross-thread
safety is handled entirely by the native atomic flag in the C++ runner.
Single-threaded lifecycle contract means callers coordinate stop() before
close() — same as every other method.

This commit was authored with the help of Claude.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@psiddh psiddh changed the title Android: LlmModule Closeable lifecycle and ModelRunner crash fix Title: Android: LlmModule Closeable lifecycle and ModelRunner crash fix Apr 21, 2026
@psiddh psiddh changed the title Title: Android: LlmModule Closeable lifecycle and ModelRunner crash fix Android: LlmModule Closeable lifecycle and ModelRunner crash fix Apr 21, 2026
Copy link
Copy Markdown
Member

@GregoryComer GregoryComer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock. Thanks for adding the close support - looks good.

Regarding locks, I think it might be worth keeping the lock, just changing to a single lock (as opposed to a RW lock). We do this in the non-LLM module, and it catches a lot of accidental unsafe concurrency.

It's very easy for users to accidentally violate thread safety when calling inference from app code. My other comment on locks was mainly that we probably want to avoid allowing the concurrency that RW lock allowed.

* @param llmCallback callback object to receive results.
*/
public void generate(String prompt, LlmCallback llmCallback) {
checkNotDestroyed();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intend to remove these checks? From the PR description, it sounds like the intent was to catch use after close.

Serializes access to non-thread-safe native state. Prevents accidental
concurrent calls (e.g. double-tap, rotation during generation) from
causing native crashes — second caller blocks until the first finishes.

- generate()/prefill()/load()/resetContext() use lock()
- close() uses tryLock() — fails fast, no ANR risk
- stop() stays lock-free (native atomic flag for cross-thread interrupt)

This commit was authored with the help of Claude.
@psiddh psiddh force-pushed the followup-crash-prevention branch from 3634913 to 0f67b86 Compare April 21, 2026 20:13
@psiddh
Copy link
Copy Markdown
Contributor Author

psiddh commented Apr 21, 2026

Approving to unblock. Thanks for adding the close support - looks good.

Regarding locks, I think it might be worth keeping the lock, just changing to a single lock (as opposed to a RW lock). We do this in the non-LLM module, and it catches a lot of accidental unsafe concurrency.

It's very easy for users to accidentally violate thread safety when calling inference from app code. My other comment on locks was mainly that we probably want to avoid allowing the concurrency that RW lock allowed.

Thanks the intent of adding locks is to ensure that the app is safe from crashes, not concurrency point of view (as underlying native runtime is not thread-safe), but with all the error propagation now in place, hopefully it surfaces the right context / error scenario to the app, (This will also serve App developer all the error context, so that they can triage and correct their flow)

This commit was authored with the help of Claude.
Copilot AI review requested due to automatic review settings April 21, 2026 20:19
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

If close() is called from an LlmCallback during generate() (same
thread, re-entrant lock acquisition), detect via getHoldCount() > 1
and throw instead of freeing native resources mid-call.

This commit was authored with the help of Claude.
@psiddh psiddh merged commit 54b0148 into pytorch:main Apr 21, 2026
166 of 171 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants