fix(hf): filter model_options for apply_chat_template via Jinja AST allowlist by planetf1 · Pull Request #1168 · generative-computing/mellea

planetf1 · 2026-05-27T13:29:23Z

The problem

When you pass generation options like temperature, max_new_tokens, or do_sample to LocalHFBackend, those options were being forwarded into HuggingFace's apply_chat_template. The method accepts arbitrary **kwargs and passes them straight into the Jinja template's variable namespace.

Standard HF templates silently ignore variables they don't recognise, so nothing breaks for today's models. But this is a latent correctness hazard: any model whose Jinja template happens to define a variable named temperature, num_beams, or any other GenerationConfig parameter would silently receive the caller's generation setting instead of the template author's intended value. It also makes the calling contract of model_options opaque — there's no principled boundary between what goes to apply_chat_template and what goes to generate().

How other backends handle this

For comparison, the OpenAI backend uses inspect.signature(Completions.create) to build a self-maintaining filter: it forwards only kwargs that appear in the API client's call signature. The HF backend's equivalent interface is the Jinja template itself — template variables are the "declared parameters" for apply_chat_template.

The fix

Rather than maintaining a hand-written denylist of GenerationConfig parameter names (which can never be complete — transformers has 60+ generation parameters), this PR uses the Jinja2 template as the source of truth.

At both apply_chat_template call sites, a new _filter_for_chat_template method is applied. It:

Parses the tokenizer's Jinja2 template with jinja2.meta.find_undeclared_variables to get the exact set of variables the template references.
Subtracts _HF_INTERNAL_TEMPLATE_VARS — the 5 variables HuggingFace injects automatically (see below).
Forwards only the keys that are in the resulting allowlist.

Generation-only options like temperature are dropped not because they appear on any list, but because no standard HF chat template ever references them as Jinja variables. The allowlist is computed once per backend instance (via functools.cached_property) and adapts automatically as models evolve.

The exclusion set — please review carefully

_HF_INTERNAL_TEMPLATE_VARS is the small set of variables that HuggingFace injects into the template namespace unconditionally. These must be excluded from the allowlist to avoid "got multiple values for keyword argument" TypeErrors or silent replacement of injected callables.

Variable	Source	Why excluded
`messages`	`render_jinja_template` named param	always injected as the first positional arg
`tools`	`render_jinja_template` named param	our call sites pass `tools=convert_tools_to_json(...)` explicitly
`add_generation_prompt`	`render_jinja_template` named param	passed as `add_generation_prompt=True` at the standard-generation call site
`raise_exception`	`_compile_jinja_template` Jinja env global	`find_undeclared_variables` cannot see env globals; treats this as undeclared
`strftime_now`	`_compile_jinja_template` Jinja env global	same as above

Relevant transformers source locations (verified against installed version):

Named params: PreTrainedTokenizerBase.apply_chat_template — the render_jinja_template(...) call
Jinja globals: transformers.utils.chat_template_utils._compile_jinja_template — the jinja_env.globals["raise_exception"] and jinja_env.globals["strftime_now"] assignments

⚠️ If transformers is upgraded, verify that this set is still accurate. The test test_hf_internal_template_vars_contents fails immediately if the constant changes without a corresponding test update — making the exclusion set an explicit, auditable contract.

Behaviour changes reviewers should be aware of

add_generation_prompt on the KV-cache path: because add_generation_prompt is in _HF_INTERNAL_TEMPLATE_VARS, any user-supplied value is now dropped rather than forwarded. This is correct: the KV-cache path formats context (not a generation turn), so HF's default of False is the right behaviour. A comment at the call site documents this explicitly.
Graceful degradation for unsupported template extensions: templates using {% generation %} / {% endgeneration %} (DeepSeek-R1, Qwen3, transformers ≥ 4.47) or unusual Jinja extensions that cannot be parsed by a plain jinja2.Environment return frozenset() — the filter forwards nothing to apply_chat_template rather than crashing. {% break %} / {% continue %} are handled correctly via jinja2.ext.loopcontrols.
Non-string chat_template: some tokenizers store chat_template as a list of named template dicts. The filter now guards against this with an isinstance(str) check and returns frozenset() for non-string values, leaving apply_chat_template to handle that format itself.

Changes

mellea/backends/huggingface.py
- New module-level constant _HF_INTERNAL_TEMPLATE_VARS (5 entries, documented above)
- New _chat_template_allowlist cached property — parses the Jinja AST once per backend instance, with loopcontrols extension registered and graceful fallback on parse failure
- New _filter_for_chat_template method — renames sentinels then filters by allowlist
- Both apply_chat_template call sites updated to use _filter_for_chat_template
- _filter_generate_only_options removed (the 21-key denylist is gone)
- jinja2 and jinja2.meta imported at module level
test/backends/test_huggingface_filter_options.py — 21 unit tests (no GPU needed):
- Strict equality check on _HF_INTERNAL_TEMPLATE_VARS contents
- Allowlist: Jinja AST parsing, exclusion of all 5 HF-internal vars, absence of generate-only names in a realistic template, caching, empty template
- Robustness: {% generation %} tag (graceful degradation), {% break %} (loopcontrols), non-string template (isinstance guard)
- _filter_for_chat_template: sentinel renaming, generate-only drop, template-var pass-through, empty input, unknown key drop
- _filter_chat_template_only_options: regression guard on pre-existing method
- Integration tests (marked huggingface): Granite tokenizer allowlist includes think + guardian_config, excludes generate-only keys and HF-internal vars

Testing

uv run pytest test/backends/test_huggingface_filter_options.py -m "not huggingface" -v
# 18 passed

The e2e test test_generate_only_options_do_not_break_generation is @pytest.mark.qualitative and requires the real model fixture.

Where this fits

Standalone bugfix. No other phases.

Closes #1154

…ve-computing#1154) `temperature`, `max_new_tokens`, and `do_sample` were splatted directly into the Jinja template variable namespace via `apply_chat_template`. HF templates silently ignore unknown kwargs today, but a future template that references any of those names would be silently shadowed. Adds `_filter_generate_only_options` (the inverse of the existing `_filter_chat_template_only_options`) and applies it at both `apply_chat_template` call sites in `LocalHFBackend`. Adds 12 unit tests in `test_huggingface_filter_options.py` covering each key individually and the full filter→backend-specific chain, plus a qualitative e2e regression guard in `test_huggingface.py`. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

…ilter fix - Reword _filter_generate_only_options docstring: replace "Inverse of" with "Companion to" (it is a deny-list, not a mathematical complement); add a note that model_options arrive post-_simplify_and_merge so user string keys are already normalised to sentinels - Clarify inline comment on MAX_NEW_TOKENS: the sentinel is renamed *downstream* by _make_backend_specific_and_remove, not in this method - Update e2e test docstring to be honest: the qualitative test guards pipeline non-regression; filter correctness is covered by unit tests in test_huggingface_filter_options.py - Expand test fixture docstring to document the __new__ invariant and the single attribute that must be set - Expand module docstring to clarify that torch must be importable even though no GPU is needed to run the tests Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

The initial fix covered only temperature, max_new_tokens, and do_sample. Expand the denylist to include all commonly-used HuggingFace GenerationConfig options that must not reach the Jinja template namespace: - Sampling: top_k, top_p, typical_p, repetition_penalty, no_repeat_ngram_size, length_penalty, num_beams, num_beam_groups, diversity_penalty, penalty_alpha, early_stopping - Length: min_new_tokens - Sequence count: num_return_sequences - Special token IDs: pad_token_id, bos_token_id, eos_token_id, forced_bos_token_id, forced_eos_token_id Adds 19 parametrised unit tests (one per new key) and an integration test that verifies all template-only keys survive unchanged when every expanded denylist key is present in the same options dict. Closes generative-computing#1170 (no longer needed as a follow-up). Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

The previous approach maintained a hand-written denylist of GenerationConfig option names to strip before apply_chat_template. This is fundamentally incomplete: transformers has ~60+ GenerationConfig parameters and the list requires manual updates as HuggingFace adds new ones. Replace it with a self-maintaining solution: parse the tokenizer's Jinja2 template with jinja2.meta.find_undeclared_variables to compute the exact set of kwargs the template can consume. Only keys in that set are forwarded to apply_chat_template. Generation-only options are silently dropped not because they are named in a list, but because no chat template references them. _HF_INTERNAL_TEMPLATE_VARS (5 entries) excludes variables HuggingFace injects automatically — named params to render_jinja_template (messages, tools, add_generation_prompt) and Jinja env globals from _compile_jinja_template (raise_exception, strftime_now). Sourced by reading the transformers source directly; see the constant's docstring for the relevant function references. The allowlist is computed lazily and cached on the instance (cached_property), so the Jinja parse happens once per backend lifetime. Tests: - test_hf_internal_template_vars_contents: strict equality check on the 5-item exclusion set — fails immediately if transformers changes what it injects - allowlist tests: Jinja AST parsing, exclusion of HF internals, absence of generate-only option names in a realistic template, caching, empty template - _filter_for_chat_template tests: sentinel renaming, generate-only drop, template-var pass-through, empty input, unknown key drop - _filter_chat_template_only_options: regression guard (pre-existing method) Closes generative-computing#1154 Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

Adds comprehensive unit tests for the new _chat_template_allowlist cached property and _filter_for_chat_template method introduced in the preceding commit, plus tokenizer-only integration tests against the real Granite 3.3 8B template (no GPU required; skipped automatically when the tokenizer is not in the local HF cache). Unit tests cover: - _HF_INTERNAL_TEMPLATE_VARS exact-set assertion (load-bearing upgrade guard) - allowlist includes template variables / excludes HF-internal / generate-only vars - empty allowlist for missing chat_template - cached_property returns same object on repeated access - _filter_for_chat_template: passes referenced vars, drops generate-only, renames Mellea sentinels, handles empty input, drops unknown keys - _filter_chat_template_only_options regression guard (pre-existing method) Integration tests (marked huggingface, skip when tokenizer not cached): - Granite allowlist includes 'think' and 'guardian_config' - Granite allowlist excludes generate-only options - Granite allowlist excludes HF-internal vars Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

_FakeTokenizer() is not a PreTrainedTokenizer, so assigning it to b._tokenizer caused a mypy [assignment] error in CI. Use object.__setattr__ instead — its value param is typed Any in typeshed so mypy accepts the assignment without suppression. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

…variants Three robustness fixes to the Jinja AST allowlist: 1. Register jinja2.ext.loopcontrols so templates using {% break %} or {% continue %} parse without TemplateSyntaxError. 2. Wrap env.parse() in try/except jinja2.TemplateSyntaxError — templates that use unsupported extensions such as {% generation %} / {% endgeneration %} (DeepSeek-R1, Qwen3, transformers >= 4.47) return frozenset() instead of crashing during inference. 3. Guard against non-string chat_template (None, list-of-alternates, dict) with an isinstance check before attempting to parse. Move the inline 'import jinja2.meta' to module level (already done for jinja2). Add a comment at the KV-cache apply_chat_template call site explaining why add_generation_prompt is intentionally absent there (it is HF-internal and the KV-cache path formats context, not a generation turn). Three new unit tests cover each robustness case: - test_allowlist_non_string_template_returns_empty - test_allowlist_graceful_on_generation_tag - test_allowlist_break_continue_tags_parsed_correctly Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

github-actions Bot added the bug Something isn't working label May 27, 2026

planetf1 mentioned this pull request May 27, 2026

fix: extend _filter_generate_only_options denylist to cover full HF GenerationConfig surface #1170

Closed

planetf1 marked this pull request as ready for review May 27, 2026 14:09

planetf1 requested a review from a team as a code owner May 27, 2026 14:09

planetf1 requested review from AngeloDanducci and akihikokuroda May 27, 2026 14:09

planetf1 marked this pull request as draft May 27, 2026 14:20

planetf1 mentioned this pull request May 27, 2026

fix: Bumps to transformers==5.0.0 #418

Open

11 tasks

planetf1 changed the title ~~fix: strip generate-only options before apply_chat_template~~ fix(hf): filter model_options for apply_chat_template via Jinja AST allowlist May 27, 2026

planetf1 marked this pull request as ready for review May 27, 2026 15:35

planetf1 added 7 commits May 28, 2026 23:53

planetf1 force-pushed the worktree-issue-1154 branch from 3346b68 to bc01569 Compare May 28, 2026 22:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(hf): filter model_options for apply_chat_template via Jinja AST allowlist#1168

fix(hf): filter model_options for apply_chat_template via Jinja AST allowlist#1168
planetf1 wants to merge 7 commits into
generative-computing:mainfrom
planetf1:worktree-issue-1154

planetf1 commented May 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

planetf1 commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The problem

How other backends handle this

The fix

The exclusion set — please review carefully

Behaviour changes reviewers should be aware of

Changes

Testing

Where this fits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

planetf1 commented May 27, 2026 •

edited

Loading