Skip to content

[Mirror] feat: jinja engine improvements for reka-edge#96

Closed
ngxson wants to merge 4 commits into
ngxson:masterfrom
kwajiehao:feat/jinja-engine-additions-reka-edge
Closed

[Mirror] feat: jinja engine improvements for reka-edge#96
ngxson wants to merge 4 commits into
ngxson:masterfrom
kwajiehao:feat/jinja-engine-additions-reka-edge

Conversation

@ngxson

@ngxson ngxson commented Apr 8, 2026

Copy link
Copy Markdown
Owner

@coderabbitai use "mirror" preset for review

ggml-org#21623

Summary by CodeRabbit

  • New Features

    • String repetition with * in either operand order (e.g., 'ab' * 3 and 3 * 'ab''ababab')
    • int and float numeric builtins (identity behavior)
    • tojson(ensure_ascii=true) now escapes non‑ASCII as \uXXXX (preserves formatting and existing escapes; handles invalid UTF‑8 safely)
  • Tests

    • Added tests for string repetition, tojson ensure_ascii behavior (formatting and escape preservation), invalid UTF‑8 fuzzing, and int builtin

Port three Jinja engine improvements needed for the reka-edge model:
1. Python-style string repetition ("ab" * 3 → "ababab")
2. ensure_ascii=true support for tojson filter (escapes non-ASCII to \uXXXX)
3. int() builtin on value_int_t (identity, needed for Reka Edge template)
@coderabbitai

coderabbitai Bot commented Apr 8, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e23a78e6-0924-460d-ab2d-ca878b9fd7d1

📥 Commits

Reviewing files that changed from the base of the PR and between 0bd6415 and 16af73b.

📒 Files selected for processing (1)
  • common/jinja/value.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
  • common/jinja/value.cpp

📝 Walkthrough

Walkthrough

Adds Python-style string repetition for * when mixing strings and integers, implements ensure_ascii=true post-processing for tojson to escape non‑ASCII code points (including surrogate pairs) while preserving JSON layout and escapes, adds int and float builtins, and extends tests.

Changes

Cohort / File(s) Summary
String Repetition
common/jinja/runtime.cpp
Adds * operator handling for value_string × value_int (either operand order); derives repeat count from the integer, returns empty string for repeat ≤ 0; placed before membership (in/not in) logic.
JSON, UTF‑8, & Builtins
common/jinja/value.cpp, common/jinja/unicode.h
Adds UTF‑8 helper utilities and unicode.h; changes tojson to always produce JSON then, when ensure_ascii is truthy, post‑process JSON string literals to replace non‑ASCII code points with \uXXXX escapes or surrogate pairs, preserve existing escapes, emit \ufffd on UTF‑8 parse errors; adds {"int", ...} and {"float", ...} builtins that return the numeric value unchanged.
Tests
tests/test-jinja.cpp
Adds tests for 'ab' * 3 and 3 * 'ab', expanded tojson(ensure_ascii=true) coverage (nested objects, exact indent/newline formatting, preservation of existing escapes, invalid UTF‑8 fuzz), and an int builtin identity test.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Poem

🐇 I hopped through code and found a star,
* makes echoes of "ab" both near and far,
JSON shields odd letters with a careful cloak,
int and float stay simple — no folk joke,
Tests twinkle bright; I nibble a carrot, hark! 🥕

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is largely incomplete, missing required sections like Overview, detailed information about the changes, and proper AI usage disclosure as required by the template. Complete the description with an Overview section explaining the changes, provide detailed additional information, and explicitly state AI usage disclosure (YES/NO).
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: Jinja engine improvements for reka-edge support, which aligns with the substantial modifications to Jinja runtime, value handling, and test coverage.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/test-jinja.cpp (1)

450-460: Add boundary coverage for repetition counts.

Please add 0 and negative count cases so semantics are locked down ('' output), not just positive-path behavior.

🧪 Suggested test additions
     test_template(t, "reversed string repetition",
         "{{ 3 * 'ab' }}",
         json::object(),
         "ababab"
     );
+
+    test_template(t, "string repetition zero",
+        "{{ 'ab' * 0 }}",
+        json::object(),
+        ""
+    );
+
+    test_template(t, "string repetition negative",
+        "{{ 'ab' * -2 }}",
+        json::object(),
+        ""
+    );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test-jinja.cpp` around lines 450 - 460, Add boundary tests for zero and
negative repetition counts by extending the existing repetition tests using
test_template; specifically, add cases for "{{ 'ab' * 0 }}" and "{{ 0 * 'ab' }}"
expecting an empty string, and similarly for a negative count e.g. "{{ 'ab' * -1
}}" and "{{ -1 * 'ab' }}" expecting an empty string. Place these new
test_template calls alongside the existing "string repetition" and "reversed
string repetition" tests in tests/test-jinja.cpp so the semantics for zero and
negative repetition are explicitly asserted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@common/jinja/value.cpp`:
- Around line 221-225: The code currently pushes the raw byte 'ch' into output
when UTF-8 parsing fails (in the block handling parsed.status !=
utf8_parse_result::SUCCESS), which breaks the ASCII-only guarantee when
ensure_ascii is true; update that branch (around variables parsed, json_str,
pos, ch, and the ensure_ascii flag) to emit an ASCII-safe escape for the
offending byte (e.g., a \u00XX hex escape or a \\xXX sequence consistent with
the surrounding JSON emitter) instead of pushing the raw byte, and then advance
pos as before so invalid/incomplete UTF-8 bytes are consumed while preserving
the ensure_ascii contract.

---

Nitpick comments:
In `@tests/test-jinja.cpp`:
- Around line 450-460: Add boundary tests for zero and negative repetition
counts by extending the existing repetition tests using test_template;
specifically, add cases for "{{ 'ab' * 0 }}" and "{{ 0 * 'ab' }}" expecting an
empty string, and similarly for a negative count e.g. "{{ 'ab' * -1 }}" and "{{
-1 * 'ab' }}" expecting an empty string. Place these new test_template calls
alongside the existing "string repetition" and "reversed string repetition"
tests in tests/test-jinja.cpp so the semantics for zero and negative repetition
are explicitly asserted.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d01b0fb5-11e0-478e-9722-ce641ab984e7

📥 Commits

Reviewing files that changed from the base of the PR and between 87f4744 and d29575d.

📒 Files selected for processing (3)
  • common/jinja/runtime.cpp
  • common/jinja/value.cpp
  • tests/test-jinja.cpp

Comment thread common/jinja/value.cpp
The json_ensure_ascii_preserving_format function does not correctly
handle an edge case where if UTF-8 parsing fails, it adds the non-ascii
character back to the output as a raw byte.

This commit fixes that by adding the unicode standard replacement
character \\ufffd to the output instead. This is the standard behavior
for various programming languages like Python, Rust, Go, etc.
@ngxson

ngxson commented Apr 8, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Apr 8, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

kwajiehao and others added 2 commits April 9, 2026 12:35
1. Add todo comment for supporting string repetition for array/tuples
2. Add support for float identity operation
3. Move invalid ascii test case to test_fuzzing
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants