feat(llm): min_p and repetition_penalty sampling, per-model defaults, letterbox vision by NorbertKlockiewicz · Pull Request #1099 · software-mansion/react-native-executorch

NorbertKlockiewicz · 2026-04-24T13:07:46Z

Description

Adds min_p and repetition_penalty sampling parameters to GenerationConfig, plumbs them through the full stack (Sampler → TextDecoderRunner → TextTokenGenerator → BaseLLMRunner / TextRunner / MultimodalRunner → JSI bindings → LLMController), introduces a per-model default generationConfig that gets applied automatically on load (populated for Qwen3 and LFM2-VL from their upstream recommendations), and replaces the distorting cv::resize in VisionEncoder with the existing resizePadded helper so multimodal inputs keep their aspect ratio. Also fixes three silent pre-existing bugs surfaced along the way: an xorshift PRNG seeded with 0 that made sampling deterministic, a Sampler::apply_min_p renormalization gap, and inline {} no-op overrides in MultimodalRunner that would desync in future refactors.

Introduces a breaking change?

Yes
No

Type of change

Bug fix (change which fixes an issue)
New feature (change which adds functionality)
Documentation update (improves or adds clarity to existing documentation)
Other (chores, tests, code style improvements etc.)

Tested on

iOS
Android

Testing instructions

Sampling parameter plumbing

Open apps/llm, load any supported model (e.g. LFM2_VL_450M_QUANTIZED).
Without any manual configure() call, send a prompt. The model card defaults are applied automatically — for LFM2-VL you should now see coherent, non-repetitive descriptions (previously the model often produced generic or looping replies at the library's default temperature=0.8, topp=0.9).
Optionally override via useLLM(...)'s configure({ generationConfig: { temperature: 0.7, minP: 0.1, repetitionPenalty: 1.05 } }) and confirm the generation style changes.

Letterbox preprocessing

With a multimodal model loaded in apps/llm → multimodal_llm screen, attach a photo with a non-square aspect ratio (e.g. 3000×2250 from your camera roll).
Ask the model to describe it. Before this PR the image was stretched into the PTE's square input shape — the model would sometimes misidentify subjects in wide/tall photos. After, the image is letterboxed so proportions are preserved.

Screenshots

Related issues

Checklist

I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have updated the documentation accordingly
My changes generate no new warnings

Additional notes

Per-model recommended defaults

Model presets gain an optional generationConfig field; LLMController.load applies it before flipping isReady, so users see sensible sampling out of the box. User configure() calls still override per-field. Populated for:

Qwen3 family (temperature=0.6, topp=0.95, from generation_config.json)
LFM2-VL family (temperature=0.1, minP=0.15, repetitionPenalty=1.05, from the LiquidAI model card)

Other presets (Llama, SmolLM2, Hammer, Phi-4, Qwen2.5, LFM2 text) keep the library defaults — these model cards don't publish sampling recommendations, so adding arbitrary values would be guessing.

msluszniak · 2026-04-24T15:32:41Z

I don't think this closes #1094 as the topp and temperature are still no-ops for multimodal. It only partially solves it. Instead please add this issue in section related issue and drop conditional closing.

msluszniak

Also there is a wrong anchor in documentation, please fix this one as well.

Plumb two new sampling parameters end to end: - GenerationConfig.min_p (default 0.0, disables) - filter tokens whose probability is below min_p * max_prob, post-softmax, before top-p. - GenerationConfig.repetition_penalty (default 1.0, disables) - applies a multiplicative penalty to logits of recent tokens before softmax. Sampler gains a new sample(logits, recent_tokens) overload that runs: repetition penalty -> temperature -> softmax -> min_p truncation (with renormalization) -> existing top-p nucleus sampling. Seeded with std::time(nullptr) per call so the xorshift PRNG can actually advance. TextDecoderRunner::logits_to_token gains matching parameters and forwards them to the sampler; TextTokenGenerator::generate accumulates a generated_tokens vector and passes it to every logits_to_token call to power the penalty. BaseLLMRunner exposes public set_min_p / set_repetition_penalty that write to config_ then dispatch to virtual _impl hooks. TextRunner forwards to its TextDecoderRunner; MultimodalRunner's previously empty set_temperature_impl / set_topp_impl inline no-ops are replaced with proper out-of-line defs so the base class setters actually update config_ and generate_internal reads the new values. LLM.h / LLM.cpp add setMinP / setRepetitionPenalty JSI bindings with range validation; ModelHostObject registers them alongside setTemperature / setTopp. LLMController.configure forwards minP / repetitionPenalty from GenerationConfig to the native module. Unit tests added: - SamplerTest.cpp - 6 tests covering penalty direction on positive / negative logits, the no-op case, min-p tail filtering, minP=0 short-circuit, and minP + topp stacking. - RunnerTest.cpp - SetMinPUpdatesConfig, SetRepetitionPenaltyUpdatesConfig, SetTemperatureIsNotNoOp regression guard. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Plain cv::resize stretched wide or tall photos to the PTE's fixed input shape, distorting aspect ratio. Scale the source image so the longer side matches the target, centre it on a gray (127,127,127) canvas, and feed that letterboxed tensor to the vision_encoder method. Matches what HuggingFace preprocessors do for this model family. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Model presets now carry an optional `generationConfig` field. The hook forwards it to LLMController.load, which applies it before flipping isReady, so users see the recommended sampling defaults without having to call configure() manually. Subsequent configure() calls still override on a per-field basis. Populate defaults for models whose authors publish recommendations: - Qwen3 family: temperature=0.6, topp=0.95 (from generation_config.json) - LFM2-VL family: temperature=0.1, minP=0.15, repetitionPenalty=1.05 (from the LiquidAI model card) Also fix a latent bug in the applyGenerationConfig helper: checks for `temperature !== undefined` / `topp !== undefined` instead of truthiness so temperature=0 (greedy) and topp=0 now reach the native module. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The bundled VLM weights are from the LFM2.5-VL family. Add new LFM2_5_VL_1_6B_QUANTIZED / LFM2_5_VL_450M_QUANTIZED exports and keep LFM2_VL_* as deprecated aliases so existing callers keep working. Update the example app and current/v0.8 hook + module docs to use the new names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@deprecated

Rename `topp` to `topP` in TS GenerationConfig to match the camelCase of `minP` and `repetitionPenalty`. The legacy `topp` field stays as a @deprecated alias and applyGenerationConfig reads `topP ?? topp`, so existing callers keep working with no behaviour change. Update model preset constants and current/v0.8 docs to use the new spelling. Also add LLMTest cases for setMinP / setRepetitionPenalty mirroring the existing setTemperature / setTopp pattern, covering valid ranges and the InvalidConfig throw paths. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chmjkb · 2026-04-28T13:21:18Z

just a note here not really related to this pr:
i think we should do something about this file, maybe each TS API should have a directory with its own model urls, as this is getting hard to navigate, especially with the new configs

We can escalate this to a separate issue

chmjkb · 2026-04-28T13:24:57Z

+    // `topp` is the legacy spelling kept for backwards compatibility — `topP`
+    // wins when both are set so callers migrating to the new name don't get
+    // surprised by stale values. Reading the deprecated alias is intentional.


if we're doing a breaking change (which I'm not sure is needed here, maybe just keep it lowercase and do it for min/max), we should at least throw a warning if someone uses a deprecated one.

The users we see it like that: ~~topp~~ in their IDE with an annotation that it is deprecated and will be removed in next version. I think that should be enough

The cherry-pick of #1099 landed sampling/VLM doc updates under docs/docs/ (the "Next" version on main). On release/0.8 those changes belong in docs/versioned_docs/version-0.8.x/, so revert the unversioned files to the release/0.8 baseline and apply the same edits to the versioned 0.8.x copies.

## Summary Patch release v0.8.4 — cherry-picks the following commits from `main` into `release/0.8`: - fix: graceful degradation when native libraries are unavailable (Android) (#1067) - fix(llm): normalize multimodal image paths to file:// URIs (#1090) - fix(llm): auto-shape multimodal mediaPath messages in chat template (#1089) - feat(llm): min_p and repetition_penalty sampling, per-model defaults, letterbox vision (#1099) ## Checklist - [x] Commits cherry-picked from `main` in chronological order (with `-x`) - [x] Version bumped to `0.8.4` in `packages/react-native-executorch/package.json` - [x] Adapter packages (`bare-resource-fetcher`, `expo-resource-fetcher`) untouched by cherry-picks — versions not bumped ## Docs The unversioned doc edits that landed via the #1099 cherry-pick belong on the "Next" version (i.e. `main`) and are already there from the original PR. The corresponding `docs/versioned_docs/version-0.8.x/...` updates will be done in a separate PR targeting `main` after `v0.8.4` is published to npm — that PR will also regenerate the v0.8.x api-reference snapshot so anchors for the new sampling fields resolve. --------- Co-authored-by: Radek Czemerys <7029942+radko93@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Norbert Klockiewicz <Nklockiewicz12@gmail.com>

Backport the sampling and multimodal-rename doc edits from #1099 into the v0.8.x useLLM.md and LLMModule.md pages.

## Description Backports the sampling and multimodal-rename doc edits from #1099 into the v0.8.x useLLM.md and LLMModule.md pages, plus a JSDoc fence fix on `useInstanceSegmentation.ts`. New `minP` / `repetitionPenalty` / `topP` field names are rendered as plain inline code rather than anchor links, since the v0.8.x `GenerationConfig.md` snapshot doesn't have those entries. ### Introduces a breaking change? - [ ] Yes - [x] No ### Type of change - [ ] Bug fix (change which fixes an issue) - [ ] New feature (change which adds functionality) - [x] Documentation update (improves or adds clarity to existing documentation) - [ ] Other (chores, tests, code style improvements etc.) ### Tested on - [ ] iOS - [ ] Android ### Testing instructions `yarn build` in `docs/`. ### Screenshots ### Related issues Follow-up to #1108 / `v0.8.4`. ### Checklist - [x] I have performed a self-review of my code - [ ] I have commented my code, particularly in hard-to-understand areas - [x] I have updated the documentation accordingly - [x] My changes generate no new warnings ### Additional notes

NorbertKlockiewicz force-pushed the @nk/llm-min-p-repetition-penalty branch 6 times, most recently from 9a59396 to 3593b06 Compare April 24, 2026 14:57

NorbertKlockiewicz changed the title ~~feat(llm): add min_p and repetition_penalty sampling + Lfm2Vl vision encoder~~ feat(llm): min_p and repetition_penalty sampling, per-model defaults, letterbox vision Apr 24, 2026

NorbertKlockiewicz marked this pull request as ready for review April 24, 2026 15:03

NorbertKlockiewicz self-assigned this Apr 24, 2026

NorbertKlockiewicz linked an issue Apr 24, 2026 that may be closed by this pull request

Add parameter customisation for multi-modal LLMs #1094

Closed

NorbertKlockiewicz force-pushed the @nk/llm-min-p-repetition-penalty branch 2 times, most recently from 0043747 to 6a6c0bb Compare April 24, 2026 15:14

NorbertKlockiewicz requested a review from msluszniak April 24, 2026 15:15

NorbertKlockiewicz added the model Issues related to exporting, improving, fixing ML models label Apr 24, 2026

NorbertKlockiewicz force-pushed the @nk/llm-min-p-repetition-penalty branch 7 times, most recently from 724db23 to df8d803 Compare April 27, 2026 09:05

msluszniak reviewed Apr 27, 2026

View reviewed changes

NorbertKlockiewicz force-pushed the @nk/llm-min-p-repetition-penalty branch 2 times, most recently from e3e8be9 to 9ce3252 Compare April 27, 2026 10:45

barhanc mentioned this pull request Apr 27, 2026

feat: implement sequential prefill fallback for multimodal runner and add support for Qwen3.5 with vision encoder #1102

Draft

13 tasks

NorbertKlockiewicz force-pushed the @nk/llm-min-p-repetition-penalty branch from 9ce3252 to 06d644a Compare April 27, 2026 15:24

NorbertKlockiewicz and others added 3 commits April 27, 2026 17:26

NorbertKlockiewicz and others added 2 commits April 27, 2026 17:26

chore: remove useless comments

61a0505

NorbertKlockiewicz force-pushed the @nk/llm-min-p-repetition-penalty branch from 06d644a to 36bb523 Compare April 27, 2026 15:26

chmjkb requested changes Apr 27, 2026

View reviewed changes

msluszniak requested review from chmjkb and msluszniak April 27, 2026 16:27

NorbertKlockiewicz added 2 commits April 28, 2026 11:02

Merge branch 'main' into @nk/llm-min-p-repetition-penalty

7527fbe

docs: revert changes to 0.8 docs

3f971f7

msluszniak approved these changes Apr 28, 2026

View reviewed changes

NorbertKlockiewicz enabled auto-merge (squash) April 28, 2026 10:07

chmjkb approved these changes Apr 28, 2026

View reviewed changes

NorbertKlockiewicz merged commit 9f1c89f into main Apr 28, 2026
5 of 7 checks passed

NorbertKlockiewicz deleted the @nk/llm-min-p-repetition-penalty branch April 28, 2026 13:26

msluszniak mentioned this pull request Apr 28, 2026

chore: Release v0.8.4 #1108

Merged

3 tasks

msluszniak mentioned this pull request Apr 29, 2026

docs(0.8): backport minP / repetitionPenalty docs to v0.8.x #1111

Merged

12 tasks

msluszniak added a commit that referenced this pull request Apr 29, 2026

docs(0.8): add minP / repetitionPenalty / topP to v0.8.x LLM docs

27e1bed

Backport the sampling and multimodal-rename doc edits from #1099 into the v0.8.x useLLM.md and LLMModule.md pages.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llm): min_p and repetition_penalty sampling, per-model defaults, letterbox vision#1099

feat(llm): min_p and repetition_penalty sampling, per-model defaults, letterbox vision#1099
NorbertKlockiewicz merged 8 commits intomainfrom
@nk/llm-min-p-repetition-penalty

NorbertKlockiewicz commented Apr 24, 2026 •

edited by msluszniak

Loading

Uh oh!

msluszniak commented Apr 24, 2026

Uh oh!

msluszniak left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chmjkb Apr 28, 2026

Uh oh!

msluszniak Apr 28, 2026

Uh oh!

chmjkb Apr 28, 2026

Uh oh!

NorbertKlockiewicz Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

NorbertKlockiewicz commented Apr 24, 2026 • edited by msluszniak Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Introduces a breaking change?

Type of change

Tested on

Testing instructions

Screenshots

Related issues

Checklist

Additional notes

Uh oh!

msluszniak commented Apr 24, 2026

Uh oh!

msluszniak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chmjkb Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

msluszniak Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

chmjkb Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

NorbertKlockiewicz Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

NorbertKlockiewicz commented Apr 24, 2026 •

edited by msluszniak

Loading