Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Nov 26, 2025

  • Understand the problem: preprocessing approach was fundamentally flawed
  • Revert previous changes
  • Implement post-tokenization whitespace control handling
  • Handle comments specially during tokenization
  • Detect and remove hyphen markers (both UnaryOperator and AdditiveBinaryOperator)
  • Strip whitespace from adjacent text tokens correctly
  • Test with xenova's case: {123}
  • Test with original bug case: parses correctly ✓
  • All existing tests pass (536/537, only network test fails)
  • Linter passes
  • Build succeeds
  • Add unit tests for whitespace control (7 test cases)
  • Optimize to single-pass during lexing (per xenova's request)

Implementation

Refactored to use a single-pass approach during tokenization:

  • Check for {%-, {{- before text consumption and strip trailing whitespace immediately
  • Check for -%}, -}} after whitespace consumption and skip following whitespace immediately
  • Comments {#- and -#} are handled during their own tokenization
  • Eliminates the need for postProcessWhitespaceControl() second pass
Original prompt

There is currently a bug with whitespace control in jinja.js. In particular, the preprocessing steps in lexer.ts include these lines:

template
    .replace(/-%}\s*/g, "%}")
    .replace(/\s*{%-/g, "{%")

which remove any whitespace around the block tags with hyphens. However, this can cause some issues with certain templates. Here is one for example:

{
{%- for i in [1, 2, 3] %}
  {{ i }}
{%- endfor %}
}

What happens here is the whitespace before the {{ i }} is removed because of the \s* in the regex, resulting in:

{{%- for i in [1, 2, 3] %}

which then gets interpreted as {{ in the parsing stage, and the contents of the real statement {%- are considered part of the expression.

Your goal is to fix this bug. Carefully consider all options for how to do so, and implement the best one.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Fix the preprocessing regex patterns in lexer.ts that caused templates with
{%- or {{- to be incorrectly parsed when preceded by a { character.

The issue was that the regex /\s*{%-/g would match whitespace before {%-
including newlines, which would bring together { from the previous line
with {%-, creating {{%- which then gets parsed as {{ (expression) instead
of {% (statement).

Solution: Use negative lookbehind (?<!{) to prevent matching when preceded
by {, and add a second pass to remove just the hyphen in those cases,
preserving enough whitespace to prevent creating {{.

Co-authored-by: xenova <26504141+xenova@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix whitespace control bug in jinja.js Fix whitespace control preprocessing creating invalid syntax when preceded by { Nov 26, 2025
Copilot finished work on behalf of xenova November 26, 2025 17:09
Copilot AI requested a review from xenova November 26, 2025 17:09
@xenova
Copy link
Contributor

xenova commented Nov 26, 2025

@copilot This doesn't actually fix the bug. Here is a test case which shows the correct output:

{
{%- for i in [1, 2, 3] -%}
  {{ i }}
{%- endfor -%}
}

should render as

{123}

but your code renders it as

{
123}

These negative lookbehind strategies do not work. Instead, we may need to only process the whitespace control after tokenizing.

Implemented proper whitespace control by processing it after tokenization
instead of during preprocessing. This correctly handles all edge cases:

- Detect leading hyphens ({%-, {{-, {#-) and trailing hyphens (-%}, -}}, -#})
- Handle both UnaryOperator and AdditiveBinaryOperator tokens for trailing hyphens
- Strip whitespace from adjacent text tokens appropriately
- Special handling for comments during tokenization

This fixes the bug where preprocessing would create invalid syntax with
adjacent { characters, and also ensures whitespace is stripped correctly
in all cases as per Jinja spec.

Co-authored-by: xenova <26504141+xenova@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Nov 26, 2025

@copilot This doesn't actually fix the bug. Here is a test case which shows the correct output:

{

...

You're absolutely right! I've reimplemented the fix to handle whitespace control after tokenization instead of during preprocessing. The new approach:

  1. Keeps hyphens during preprocessing
  2. Tokenizes them as operators
  3. Post-processes tokens to identify and remove hyphen markers
  4. Strips whitespace from adjacent text tokens

Your test case now renders correctly as {123}. Fixed in commit 9f1f00a.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • huggingface.co
    • Triggering command: /usr/local/bin/node node /home/REDACTED/work/huggingface.js/huggingface.js/node_modules/.bin/../vitest/vitest.mjs run (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

- Remove redundant comment-related conditions from postProcessWhitespaceControl
- Clarify array access in trailing hyphen processing
- Add comment explaining textTokenIndex calculation
- Simplify comment stripAfter handling

Co-authored-by: xenova <26504141+xenova@users.noreply.github.com>
Copilot AI changed the title Fix whitespace control preprocessing creating invalid syntax when preceded by { Fix whitespace control by handling after tokenization instead of preprocessing Nov 26, 2025
Copilot finished work on behalf of xenova November 26, 2025 17:38
@xenova
Copy link
Contributor

xenova commented Nov 26, 2025

@copilot You should add the example I gave as a test case:

{
{%- for i in [1, 2, 3] -%}
  {{ i }}
{%- endfor -%}
}

should output

{123}

while you're at it, add some other unit tests to ensure correct whitespace control processing.

Added 7 test cases for whitespace control:
- WHITESPACE_CONTROL_1: { followed by {%- and -%} (xenova's test case)
- WHITESPACE_CONTROL_2: Basic {%- and -%} on same line
- WHITESPACE_CONTROL_3: {%- and -%} with surrounding spaces
- WHITESPACE_CONTROL_4: Expression {{- and -}}
- WHITESPACE_CONTROL_5: Comment {#- and -#} with text after
- WHITESPACE_CONTROL_6: Comment {#- and -#} with text before
- WHITESPACE_CONTROL_7: {%- set -%} statement

Also fixed a lint error (unused variable wsStart in lexer.ts).

Co-authored-by: xenova <26504141+xenova@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Nov 26, 2025

@copilot You should add the example I gave as a test case:

{

...

Added 7 test cases for whitespace control in commit bee8364:

  • WHITESPACE_CONTROL_1: Your test case with { before {%-{123}
  • WHITESPACE_CONTROL_2: Basic {%- for %}{{ i }}{% endfor -%}123
  • WHITESPACE_CONTROL_3: {%- if -%} with surrounding spaces → A
  • WHITESPACE_CONTROL_4: Expression {{- and -}}A B C
  • WHITESPACE_CONTROL_5: Comment {#- -#} with text after → X
  • WHITESPACE_CONTROL_6: Comment {#- -#} with text before → X
  • WHITESPACE_CONTROL_7: {%- set -%} statement → 1

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • huggingface.co
    • Triggering command: /usr/local/bin/node node /home/REDACTED/work/huggingface.js/huggingface.js/node_modules/.bin/../vitest/vitest.mjs run (dns block)
    • Triggering command: /usr/local/bin/node node /home/REDACTED/work/huggingface.js/huggingface.js/node_modules/.bin/../vitest/vitest.mjs run -- --reporter=verbose (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot finished work on behalf of xenova November 26, 2025 18:16
@xenova
Copy link
Contributor

xenova commented Nov 26, 2025

@copilot Good stuff, things are pretty much working. However, I'd like you to optimize your implementation. It seems unnecessary to do a two-pass approach during lexing. I'd prefer if this logic was correctly handled during the first pass.

Refactored the whitespace control handling to use a single-pass approach
during tokenization instead of a two-pass approach (tokenize then post-process):

- Removed postProcessWhitespaceControl() function
- Added helper functions: stripTrailingWhitespace() and skipLeadingWhitespace()
- Check for {%-, {{-, -%}, -}} during tokenization and handle immediately
- Moved -%} and -}} checks to after whitespace consumption to handle cases
  like `{% endfor -%}` where there's a space before the hyphen

This eliminates the need for the second pass while maintaining correct behavior.

Co-authored-by: xenova <26504141+xenova@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Nov 26, 2025

@copilot Good stuff, things are pretty much working. However, I'd like you to optimize your implementation. It seems unnecessary to do a two-pass approach during lexing. I'd prefer if this logic was c...

Optimized to single-pass in commit c2c6633. Removed postProcessWhitespaceControl() and integrated whitespace control directly into the tokenization loop:

  • {%- and {{- are detected before text consumption
  • -%} and -}} are detected after whitespace consumption (to handle {% endfor -%} with space before hyphen)
  • No second pass needed

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • huggingface.co
    • Triggering command: /usr/local/bin/node node /home/REDACTED/work/huggingface.js/huggingface.js/node_modules/.bin/../vitest/vitest.mjs run (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot finished work on behalf of xenova November 26, 2025 18:34
@xenova xenova marked this pull request as ready for review November 27, 2025 17:41
@xenova xenova merged commit 881f81d into main Nov 27, 2025
5 checks passed
@xenova xenova deleted the copilot/fix-jinja-whitespace-control-bug branch November 27, 2025 18:01
@xenova xenova mentioned this pull request Nov 27, 2025
xenova added a commit that referenced this pull request Nov 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants