feat(llm): min_p and repetition_penalty sampling, per-model defaults, letterbox vision#1099
Conversation
9a59396 to
3593b06
Compare
0043747 to
6a6c0bb
Compare
|
I don't think this closes #1094 as the |
724db23 to
df8d803
Compare
msluszniak
left a comment
There was a problem hiding this comment.
Also there is a wrong anchor in documentation, please fix this one as well.
e3e8be9 to
9ce3252
Compare
9ce3252 to
06d644a
Compare
Plumb two new sampling parameters end to end: - GenerationConfig.min_p (default 0.0, disables) - filter tokens whose probability is below min_p * max_prob, post-softmax, before top-p. - GenerationConfig.repetition_penalty (default 1.0, disables) - applies a multiplicative penalty to logits of recent tokens before softmax. Sampler gains a new sample(logits, recent_tokens) overload that runs: repetition penalty -> temperature -> softmax -> min_p truncation (with renormalization) -> existing top-p nucleus sampling. Seeded with std::time(nullptr) per call so the xorshift PRNG can actually advance. TextDecoderRunner::logits_to_token gains matching parameters and forwards them to the sampler; TextTokenGenerator::generate accumulates a generated_tokens vector and passes it to every logits_to_token call to power the penalty. BaseLLMRunner exposes public set_min_p / set_repetition_penalty that write to config_ then dispatch to virtual _impl hooks. TextRunner forwards to its TextDecoderRunner; MultimodalRunner's previously empty set_temperature_impl / set_topp_impl inline no-ops are replaced with proper out-of-line defs so the base class setters actually update config_ and generate_internal reads the new values. LLM.h / LLM.cpp add setMinP / setRepetitionPenalty JSI bindings with range validation; ModelHostObject registers them alongside setTemperature / setTopp. LLMController.configure forwards minP / repetitionPenalty from GenerationConfig to the native module. Unit tests added: - SamplerTest.cpp - 6 tests covering penalty direction on positive / negative logits, the no-op case, min-p tail filtering, minP=0 short-circuit, and minP + topp stacking. - RunnerTest.cpp - SetMinPUpdatesConfig, SetRepetitionPenaltyUpdatesConfig, SetTemperatureIsNotNoOp regression guard. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Plain cv::resize stretched wide or tall photos to the PTE's fixed input shape, distorting aspect ratio. Scale the source image so the longer side matches the target, centre it on a gray (127,127,127) canvas, and feed that letterboxed tensor to the vision_encoder method. Matches what HuggingFace preprocessors do for this model family. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Model presets now carry an optional `generationConfig` field. The hook forwards it to LLMController.load, which applies it before flipping isReady, so users see the recommended sampling defaults without having to call configure() manually. Subsequent configure() calls still override on a per-field basis. Populate defaults for models whose authors publish recommendations: - Qwen3 family: temperature=0.6, topp=0.95 (from generation_config.json) - LFM2-VL family: temperature=0.1, minP=0.15, repetitionPenalty=1.05 (from the LiquidAI model card) Also fix a latent bug in the applyGenerationConfig helper: checks for `temperature !== undefined` / `topp !== undefined` instead of truthiness so temperature=0 (greedy) and topp=0 now reach the native module. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The bundled VLM weights are from the LFM2.5-VL family. Add new LFM2_5_VL_1_6B_QUANTIZED / LFM2_5_VL_450M_QUANTIZED exports and keep LFM2_VL_* as deprecated aliases so existing callers keep working. Update the example app and current/v0.8 hook + module docs to use the new names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
06d644a to
36bb523
Compare
Rename `topp` to `topP` in TS GenerationConfig to match the camelCase of `minP` and `repetitionPenalty`. The legacy `topp` field stays as a @deprecated alias and applyGenerationConfig reads `topP ?? topp`, so existing callers keep working with no behaviour change. Update model preset constants and current/v0.8 docs to use the new spelling. Also add LLMTest cases for setMinP / setRepetitionPenalty mirroring the existing setTemperature / setTopp pattern, covering valid ranges and the InvalidConfig throw paths. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
just a note here not really related to this pr:
i think we should do something about this file, maybe each TS API should have a directory with its own model urls, as this is getting hard to navigate, especially with the new configs
There was a problem hiding this comment.
We can escalate this to a separate issue
| // `topp` is the legacy spelling kept for backwards compatibility — `topP` | ||
| // wins when both are set so callers migrating to the new name don't get | ||
| // surprised by stale values. Reading the deprecated alias is intentional. |
There was a problem hiding this comment.
if we're doing a breaking change (which I'm not sure is needed here, maybe just keep it lowercase and do it for min/max), we should at least throw a warning if someone uses a deprecated one.
There was a problem hiding this comment.
The users we see it like that: topp in their IDE with an annotation that it is deprecated and will be removed in next version. I think that should be enough
The cherry-pick of #1099 landed sampling/VLM doc updates under docs/docs/ (the "Next" version on main). On release/0.8 those changes belong in docs/versioned_docs/version-0.8.x/, so revert the unversioned files to the release/0.8 baseline and apply the same edits to the versioned 0.8.x copies.
## Summary Patch release v0.8.4 — cherry-picks the following commits from `main` into `release/0.8`: - fix: graceful degradation when native libraries are unavailable (Android) (#1067) - fix(llm): normalize multimodal image paths to file:// URIs (#1090) - fix(llm): auto-shape multimodal mediaPath messages in chat template (#1089) - feat(llm): min_p and repetition_penalty sampling, per-model defaults, letterbox vision (#1099) ## Checklist - [x] Commits cherry-picked from `main` in chronological order (with `-x`) - [x] Version bumped to `0.8.4` in `packages/react-native-executorch/package.json` - [x] Adapter packages (`bare-resource-fetcher`, `expo-resource-fetcher`) untouched by cherry-picks — versions not bumped ## Docs The unversioned doc edits that landed via the #1099 cherry-pick belong on the "Next" version (i.e. `main`) and are already there from the original PR. The corresponding `docs/versioned_docs/version-0.8.x/...` updates will be done in a separate PR targeting `main` after `v0.8.4` is published to npm — that PR will also regenerate the v0.8.x api-reference snapshot so anchors for the new sampling fields resolve. --------- Co-authored-by: Radek Czemerys <7029942+radko93@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Norbert Klockiewicz <Nklockiewicz12@gmail.com>
Backport the sampling and multimodal-rename doc edits from #1099 into the v0.8.x useLLM.md and LLMModule.md pages.
## Description Backports the sampling and multimodal-rename doc edits from #1099 into the v0.8.x useLLM.md and LLMModule.md pages, plus a JSDoc fence fix on `useInstanceSegmentation.ts`. New `minP` / `repetitionPenalty` / `topP` field names are rendered as plain inline code rather than anchor links, since the v0.8.x `GenerationConfig.md` snapshot doesn't have those entries. ### Introduces a breaking change? - [ ] Yes - [x] No ### Type of change - [ ] Bug fix (change which fixes an issue) - [ ] New feature (change which adds functionality) - [x] Documentation update (improves or adds clarity to existing documentation) - [ ] Other (chores, tests, code style improvements etc.) ### Tested on - [ ] iOS - [ ] Android ### Testing instructions `yarn build` in `docs/`. ### Screenshots ### Related issues Follow-up to #1108 / `v0.8.4`. ### Checklist - [x] I have performed a self-review of my code - [ ] I have commented my code, particularly in hard-to-understand areas - [x] I have updated the documentation accordingly - [x] My changes generate no new warnings ### Additional notes
Description
Adds
min_pandrepetition_penaltysampling parameters toGenerationConfig, plumbs them through the full stack (Sampler→TextDecoderRunner→TextTokenGenerator→BaseLLMRunner/TextRunner/MultimodalRunner→ JSI bindings →LLMController), introduces a per-model defaultgenerationConfigthat gets applied automatically on load (populated for Qwen3 and LFM2-VL from their upstream recommendations), and replaces the distortingcv::resizeinVisionEncoderwith the existingresizePaddedhelper so multimodal inputs keep their aspect ratio. Also fixes three silent pre-existing bugs surfaced along the way: an xorshift PRNG seeded with0that made sampling deterministic, aSampler::apply_min_prenormalization gap, and inline{}no-op overrides inMultimodalRunnerthat would desync in future refactors.Introduces a breaking change?
Type of change
Tested on
Testing instructions
Sampling parameter plumbing
apps/llm, load any supported model (e.g.LFM2_VL_450M_QUANTIZED).configure()call, send a prompt. The model card defaults are applied automatically — for LFM2-VL you should now see coherent, non-repetitive descriptions (previously the model often produced generic or looping replies at the library's defaulttemperature=0.8, topp=0.9).useLLM(...)'sconfigure({ generationConfig: { temperature: 0.7, minP: 0.1, repetitionPenalty: 1.05 } })and confirm the generation style changes.Letterbox preprocessing
apps/llm→multimodal_llmscreen, attach a photo with a non-square aspect ratio (e.g. 3000×2250 from your camera roll).Screenshots
Related issues
Checklist
Additional notes
Per-model recommended defaults
Model presets gain an optional
generationConfigfield;LLMController.loadapplies it before flippingisReady, so users see sensible sampling out of the box. Userconfigure()calls still override per-field. Populated for:temperature=0.6, topp=0.95, fromgeneration_config.json)temperature=0.1, minP=0.15, repetitionPenalty=1.05, from the LiquidAI model card)Other presets (Llama, SmolLM2, Hammer, Phi-4, Qwen2.5, LFM2 text) keep the library defaults — these model cards don't publish sampling recommendations, so adding arbitrary values would be guessing.