Skip to content

feat(llm): min_p and repetition_penalty sampling, per-model defaults, letterbox vision#1099

Merged
NorbertKlockiewicz merged 8 commits intomainfrom
@nk/llm-min-p-repetition-penalty
Apr 28, 2026
Merged

feat(llm): min_p and repetition_penalty sampling, per-model defaults, letterbox vision#1099
NorbertKlockiewicz merged 8 commits intomainfrom
@nk/llm-min-p-repetition-penalty

Conversation

@NorbertKlockiewicz
Copy link
Copy Markdown
Contributor

@NorbertKlockiewicz NorbertKlockiewicz commented Apr 24, 2026

Description

Adds min_p and repetition_penalty sampling parameters to GenerationConfig, plumbs them through the full stack (SamplerTextDecoderRunnerTextTokenGeneratorBaseLLMRunner / TextRunner / MultimodalRunner → JSI bindings → LLMController), introduces a per-model default generationConfig that gets applied automatically on load (populated for Qwen3 and LFM2-VL from their upstream recommendations), and replaces the distorting cv::resize in VisionEncoder with the existing resizePadded helper so multimodal inputs keep their aspect ratio. Also fixes three silent pre-existing bugs surfaced along the way: an xorshift PRNG seeded with 0 that made sampling deterministic, a Sampler::apply_min_p renormalization gap, and inline {} no-op overrides in MultimodalRunner that would desync in future refactors.

Introduces a breaking change?

  • Yes
  • No

Type of change

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Documentation update (improves or adds clarity to existing documentation)
  • Other (chores, tests, code style improvements etc.)

Tested on

  • iOS
  • Android

Testing instructions

Sampling parameter plumbing

  1. Open apps/llm, load any supported model (e.g. LFM2_VL_450M_QUANTIZED).
  2. Without any manual configure() call, send a prompt. The model card defaults are applied automatically — for LFM2-VL you should now see coherent, non-repetitive descriptions (previously the model often produced generic or looping replies at the library's default temperature=0.8, topp=0.9).
  3. Optionally override via useLLM(...)'s configure({ generationConfig: { temperature: 0.7, minP: 0.1, repetitionPenalty: 1.05 } }) and confirm the generation style changes.

Letterbox preprocessing

  1. With a multimodal model loaded in apps/llmmultimodal_llm screen, attach a photo with a non-square aspect ratio (e.g. 3000×2250 from your camera roll).
  2. Ask the model to describe it. Before this PR the image was stretched into the PTE's square input shape — the model would sometimes misidentify subjects in wide/tall photos. After, the image is letterboxed so proportions are preserved.

Screenshots

Related issues

Checklist

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have updated the documentation accordingly
  • My changes generate no new warnings

Additional notes

Per-model recommended defaults

Model presets gain an optional generationConfig field; LLMController.load applies it before flipping isReady, so users see sensible sampling out of the box. User configure() calls still override per-field. Populated for:

  • Qwen3 family (temperature=0.6, topp=0.95, from generation_config.json)
  • LFM2-VL family (temperature=0.1, minP=0.15, repetitionPenalty=1.05, from the LiquidAI model card)

Other presets (Llama, SmolLM2, Hammer, Phi-4, Qwen2.5, LFM2 text) keep the library defaults — these model cards don't publish sampling recommendations, so adding arbitrary values would be guessing.

@NorbertKlockiewicz NorbertKlockiewicz force-pushed the @nk/llm-min-p-repetition-penalty branch 6 times, most recently from 9a59396 to 3593b06 Compare April 24, 2026 14:57
@NorbertKlockiewicz NorbertKlockiewicz changed the title feat(llm): add min_p and repetition_penalty sampling + Lfm2Vl vision encoder feat(llm): min_p and repetition_penalty sampling, per-model defaults, letterbox vision Apr 24, 2026
@NorbertKlockiewicz NorbertKlockiewicz marked this pull request as ready for review April 24, 2026 15:03
@NorbertKlockiewicz NorbertKlockiewicz self-assigned this Apr 24, 2026
@NorbertKlockiewicz NorbertKlockiewicz linked an issue Apr 24, 2026 that may be closed by this pull request
@NorbertKlockiewicz NorbertKlockiewicz force-pushed the @nk/llm-min-p-repetition-penalty branch 2 times, most recently from 0043747 to 6a6c0bb Compare April 24, 2026 15:14
@NorbertKlockiewicz NorbertKlockiewicz added the model Issues related to exporting, improving, fixing ML models label Apr 24, 2026
@msluszniak
Copy link
Copy Markdown
Member

I don't think this closes #1094 as the topp and temperature are still no-ops for multimodal. It only partially solves it. Instead please add this issue in section related issue and drop conditional closing.

@NorbertKlockiewicz NorbertKlockiewicz force-pushed the @nk/llm-min-p-repetition-penalty branch 7 times, most recently from 724db23 to df8d803 Compare April 27, 2026 09:05
Copy link
Copy Markdown
Member

@msluszniak msluszniak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also there is a wrong anchor in documentation, please fix this one as well.

Comment thread packages/react-native-executorch/common/runner/sampler.h Outdated
Comment thread packages/react-native-executorch/common/runner/sampler.h Outdated
Comment thread packages/react-native-executorch/common/runner/sampler.h Outdated
Comment thread packages/react-native-executorch/common/runner/sampler.h Outdated
Comment thread packages/react-native-executorch/src/constants/modelUrls.ts Outdated
Comment thread packages/react-native-executorch/common/runner/encoders/vision_encoder.cpp Outdated
NorbertKlockiewicz and others added 3 commits April 27, 2026 17:26
Plumb two new sampling parameters end to end:

- GenerationConfig.min_p (default 0.0, disables) - filter tokens whose
  probability is below min_p * max_prob, post-softmax, before top-p.
- GenerationConfig.repetition_penalty (default 1.0, disables) - applies
  a multiplicative penalty to logits of recent tokens before softmax.

Sampler gains a new sample(logits, recent_tokens) overload that runs:
repetition penalty -> temperature -> softmax -> min_p truncation
(with renormalization) -> existing top-p nucleus sampling. Seeded with
std::time(nullptr) per call so the xorshift PRNG can actually advance.

TextDecoderRunner::logits_to_token gains matching parameters and
forwards them to the sampler; TextTokenGenerator::generate accumulates
a generated_tokens vector and passes it to every logits_to_token call
to power the penalty.

BaseLLMRunner exposes public set_min_p / set_repetition_penalty that
write to config_ then dispatch to virtual _impl hooks. TextRunner
forwards to its TextDecoderRunner; MultimodalRunner's previously empty
set_temperature_impl / set_topp_impl inline no-ops are replaced with
proper out-of-line defs so the base class setters actually update
config_ and generate_internal reads the new values.

LLM.h / LLM.cpp add setMinP / setRepetitionPenalty JSI bindings with
range validation; ModelHostObject registers them alongside
setTemperature / setTopp. LLMController.configure forwards minP /
repetitionPenalty from GenerationConfig to the native module.

Unit tests added:
- SamplerTest.cpp - 6 tests covering penalty direction on positive /
  negative logits, the no-op case, min-p tail filtering, minP=0
  short-circuit, and minP + topp stacking.
- RunnerTest.cpp - SetMinPUpdatesConfig, SetRepetitionPenaltyUpdatesConfig,
  SetTemperatureIsNotNoOp regression guard.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Plain cv::resize stretched wide or tall photos to the PTE's fixed
input shape, distorting aspect ratio. Scale the source image so the
longer side matches the target, centre it on a gray (127,127,127)
canvas, and feed that letterboxed tensor to the vision_encoder method.
Matches what HuggingFace preprocessors do for this model family.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Model presets now carry an optional `generationConfig` field. The hook
forwards it to LLMController.load, which applies it before flipping
isReady, so users see the recommended sampling defaults without having
to call configure() manually. Subsequent configure() calls still
override on a per-field basis.

Populate defaults for models whose authors publish recommendations:
- Qwen3 family: temperature=0.6, topp=0.95 (from generation_config.json)
- LFM2-VL family: temperature=0.1, minP=0.15, repetitionPenalty=1.05
  (from the LiquidAI model card)

Also fix a latent bug in the applyGenerationConfig helper: checks for
`temperature !== undefined` / `topp !== undefined` instead of truthiness
so temperature=0 (greedy) and topp=0 now reach the native module.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
NorbertKlockiewicz and others added 2 commits April 27, 2026 17:26
The bundled VLM weights are from the LFM2.5-VL family. Add new
LFM2_5_VL_1_6B_QUANTIZED / LFM2_5_VL_450M_QUANTIZED exports and keep
LFM2_VL_* as deprecated aliases so existing callers keep working.
Update the example app and current/v0.8 hook + module docs to use the
new names.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@NorbertKlockiewicz NorbertKlockiewicz force-pushed the @nk/llm-min-p-repetition-penalty branch from 06d644a to 36bb523 Compare April 27, 2026 15:26
Comment thread docs/docs/03-hooks/01-natural-language-processing/useLLM.md Outdated
Comment thread docs/docs/04-typescript-api/01-natural-language-processing/LLMModule.md Outdated
Rename `topp` to `topP` in TS GenerationConfig to match the camelCase
of `minP` and `repetitionPenalty`. The legacy `topp` field stays as a
@deprecated alias and applyGenerationConfig reads `topP ?? topp`, so
existing callers keep working with no behaviour change. Update model
preset constants and current/v0.8 docs to use the new spelling.

Also add LLMTest cases for setMinP / setRepetitionPenalty mirroring
the existing setTemperature / setTopp pattern, covering valid ranges
and the InvalidConfig throw paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@msluszniak msluszniak requested review from chmjkb and msluszniak April 27, 2026 16:27
@NorbertKlockiewicz NorbertKlockiewicz enabled auto-merge (squash) April 28, 2026 10:07
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a note here not really related to this pr:
i think we should do something about this file, maybe each TS API should have a directory with its own model urls, as this is getting hard to navigate, especially with the new configs

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can escalate this to a separate issue

Comment on lines +196 to +198
// `topp` is the legacy spelling kept for backwards compatibility — `topP`
// wins when both are set so callers migrating to the new name don't get
// surprised by stale values. Reading the deprecated alias is intentional.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're doing a breaking change (which I'm not sure is needed here, maybe just keep it lowercase and do it for min/max), we should at least throw a warning if someone uses a deprecated one.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The users we see it like that: topp in their IDE with an annotation that it is deprecated and will be removed in next version. I think that should be enough

@NorbertKlockiewicz NorbertKlockiewicz merged commit 9f1c89f into main Apr 28, 2026
5 of 7 checks passed
@NorbertKlockiewicz NorbertKlockiewicz deleted the @nk/llm-min-p-repetition-penalty branch April 28, 2026 13:26
@msluszniak msluszniak mentioned this pull request Apr 28, 2026
3 tasks
msluszniak added a commit that referenced this pull request Apr 28, 2026
The cherry-pick of #1099 landed sampling/VLM doc updates under
docs/docs/ (the "Next" version on main). On release/0.8 those
changes belong in docs/versioned_docs/version-0.8.x/, so revert the
unversioned files to the release/0.8 baseline and apply the same
edits to the versioned 0.8.x copies.
msluszniak added a commit that referenced this pull request Apr 29, 2026
## Summary

Patch release v0.8.4 — cherry-picks the following commits from `main`
into `release/0.8`:

- fix: graceful degradation when native libraries are unavailable
(Android) (#1067)
- fix(llm): normalize multimodal image paths to file:// URIs (#1090)
- fix(llm): auto-shape multimodal mediaPath messages in chat template
(#1089)
- feat(llm): min_p and repetition_penalty sampling, per-model defaults,
letterbox vision (#1099)

## Checklist

- [x] Commits cherry-picked from `main` in chronological order (with
`-x`)
- [x] Version bumped to `0.8.4` in
`packages/react-native-executorch/package.json`
- [x] Adapter packages (`bare-resource-fetcher`,
`expo-resource-fetcher`) untouched by cherry-picks — versions not bumped

## Docs

The unversioned doc edits that landed via the #1099 cherry-pick belong
on the "Next" version (i.e. `main`) and are already there from the
original PR. The corresponding `docs/versioned_docs/version-0.8.x/...`
updates will be done in a separate PR targeting `main` after `v0.8.4` is
published to npm — that PR will also regenerate the v0.8.x api-reference
snapshot so anchors for the new sampling fields resolve.

---------

Co-authored-by: Radek Czemerys <7029942+radko93@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Norbert Klockiewicz <Nklockiewicz12@gmail.com>
msluszniak added a commit that referenced this pull request Apr 29, 2026
Backport the sampling and multimodal-rename doc edits from #1099 into
the v0.8.x useLLM.md and LLMModule.md pages.
msluszniak added a commit that referenced this pull request Apr 29, 2026
## Description

Backports the sampling and multimodal-rename doc edits from #1099 into
the v0.8.x useLLM.md and LLMModule.md pages, plus a JSDoc fence fix on
`useInstanceSegmentation.ts`. New `minP` / `repetitionPenalty` / `topP`
field names are rendered as plain inline code rather than anchor links,
since the v0.8.x `GenerationConfig.md` snapshot doesn't have those
entries.

### Introduces a breaking change?

- [ ] Yes
- [x] No

### Type of change

- [ ] Bug fix (change which fixes an issue)
- [ ] New feature (change which adds functionality)
- [x] Documentation update (improves or adds clarity to existing
documentation)
- [ ] Other (chores, tests, code style improvements etc.)

### Tested on

- [ ] iOS
- [ ] Android

### Testing instructions

`yarn build` in `docs/`.

### Screenshots

### Related issues

Follow-up to #1108 / `v0.8.4`.

### Checklist

- [x] I have performed a self-review of my code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [x] I have updated the documentation accordingly
- [x] My changes generate no new warnings

### Additional notes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

model Issues related to exporting, improving, fixing ML models

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add parameter customisation for multi-modal LLMs

3 participants