Skip to content

POC: Jinja + Markup middle-ground for outer repr template#8

Open
katosh wants to merge 1 commit intohtml_repfrom
jinja-markup-poc
Open

POC: Jinja + Markup middle-ground for outer repr template#8
katosh wants to merge 1 commit intohtml_repfrom
jinja-markup-poc

Conversation

@katosh
Copy link
Copy Markdown
Collaborator

@katosh katosh commented Apr 20, 2026

Summary

Proof-of-concept showing how a Jinja + Markup middle-ground can be layered onto the existing _repr pipeline without touching formatter internals. Routes only the top-level repr through an autoescape-enabled Jinja template; wraps all existing formatter-produced fragments in markupsafe.Markup at the boundary.

Full visual comparison

The existing tests/visual_inspect_repr_html.py harness (26 scenarios covering empty/minimal/view/dense/sparse/lazy/backed/nested AnnData, many-categories, custom formatters, TreeData, SpatialData, MuData, raw, unknown sections, serialization warnings, adversarial robustness, ecosystem extensibility, array-API devices) runs unchanged on this branch and produces the full comparison artifact:

Regenerate locally with:

.venv/bin/python tests/visual_inspect_repr_html.py
# → tests/repr_html_visual_test.html

Rendered output should be visually identical to the html_rep reference, because the same formatter code produces the same fragments; Jinja just concatenates them.

Why

Context for the discussion on scverse#2236 around whether to adopt Jinja. The concrete question: what does a minimal middle-ground look like, and does it give a real safety uplift?

This branch is evidence that:

  1. The entry-point can be Jinja-ified without a wholesale rewrite. One template, ~50 lines of new code, 2 new dependencies.
  2. The Markup / str type distinction is load-bearing at the outer template. Autoescape closes the "forgot to html.escape()" class of bug for every direct insertion in anndata.j2 (container_id, depth, style). Every trusted fragment is a single Markup(...) call, greppable at review time.
  3. No formatter code changes. formatters.py, registry.py, components.py, sections.py, core.py are untouched. FormattedOutput, TypeFormatter, SectionFormatter, render_section, render_formatted_entry still work as-is.
  4. Third-party packages are unaffected. register_formatter() still accepts the same signature; _repr_html_ output formats remain backwards-compatible. If we later want to tighten preview_html: str to preview_html: Markup | str, that's a follow-up — not required by this POC.

What changed

File Change
pyproject.toml Added jinja2>=3.1, markupsafe>=3.0
src/anndata/_repr/environment.py New. Cached Environment with autoescape, loading from anndata._repr.templates
src/anndata/_repr/templates/anndata.j2 New. Outer template; ~20 lines
src/anndata/_repr/html.py generate_repr_html() gathers existing fragments, wraps each in Markup, calls env.get_template('anndata.j2').render(...)

Diff stats: 4 files, +100 / -43.

The trust contract

return get_env().get_template("anndata.j2").render(
    container_id=container_id,    # plain str — autoescaped
    depth=depth,                   # plain value — autoescaped
    style=style,                   # plain str — autoescaped
    css=Markup(get_css()),         # Markup — passthrough
    header=Markup(_render_header(...)),
    sections=[Markup(s) for s in _render_all_sections(...)],
    footer=Markup(_render_footer(adata)),
    javascript=Markup(get_javascript(container_id)),
)
  • Markup passes through {{ … }} verbatim.
  • Plain str is autoescaped. Forgetting would formerly be a latent XSS vector; under this contract it's structurally prevented at this layer.

Minimal demonstration:

>>> from anndata._repr.environment import get_env
>>> from markupsafe import Markup
>>> get_env().get_template("anndata.j2").render(
...     container_id='"><script>alert(1)</script>',   # attacker input
...     depth=0, style="color:red",
...     css=None,
...     header=Markup('<div class="trusted">header</div>'),
...     index_preview=None, sections=[], footer=None, hints=None, javascript=None,
... )[:200]
'\n<div class="anndata-repr" id="&#34;&gt;&lt;script&gt;alert(1)&lt;/script&gt;" data-depth="0" style="color:red">\n<div class="trusted">header</div>  <div class="anndata-repr__sections">\n  </div>\n</div>\n'

Attacker-controlled container_id escaped to &#34;&gt;&lt;script&gt;… by autoescape; Markup-wrapped trusted fragment passes through unchanged. Same render() call.

Scope — what this POC does not claim

  • It does not argue that internal formatters benefit enough from Jinja templating to justify converting them individually. That's the broader architectural question for feat: Add HTML representation scverse/anndata#2236 to resolve.
  • It does not introduce ChoiceLoader / PackageLoader wiring for third-party template contribution. That's a follow-up if ecosystem template authorship becomes the agreed direction.
  • It does not modify escape_html() calls in existing formatters. The safety gain is purely at the outer template layer.

Takeaway

A Jinja + Markup migration can be incremental and low-cost at the entry point while leaving formatter internals and third-party integrations untouched. If the maintainers want to evaluate the middle-ground concretely, this branch is the smallest viable starting point; further conversion can proceed formatter-by-formatter from here.

Routes the top-level repr through a single autoescape-enabled Jinja template
and wraps existing formatter-produced HTML fragments in markupsafe.Markup at
the boundary. Formatter internals (formatters.py, registry.py, components.py,
sections.py, core.py) are untouched.

The safety contract at the outer template:
- plain-str values (container_id, depth, style) are autoescaped by default
- Markup-wrapped fragments (header, sections, css, js, hints) pass through

Adds jinja2>=3.1 and markupsafe>=3.0 to dependencies. Adds a minimal
Environment module and one outer anndata.j2 template.

The existing tests/visual_inspect_repr_html.py visual harness runs cleanly
against this branch and produces the full 26-scenario comparison artifact.
Repr test suite: 614 passed, 1 skipped — zero regressions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant