-
Notifications
You must be signed in to change notification settings - Fork 552
Fix whitespace control by handling after tokenization instead of preprocessing #1859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fix the preprocessing regex patterns in lexer.ts that caused templates with
{%- or {{- to be incorrectly parsed when preceded by a { character.
The issue was that the regex /\s*{%-/g would match whitespace before {%-
including newlines, which would bring together { from the previous line
with {%-, creating {{%- which then gets parsed as {{ (expression) instead
of {% (statement).
Solution: Use negative lookbehind (?<!{) to prevent matching when preceded
by {, and add a second pass to remove just the hyphen in those cases,
preserving enough whitespace to prevent creating {{.
Co-authored-by: xenova <26504141+xenova@users.noreply.github.com>
{
|
@copilot This doesn't actually fix the bug. Here is a test case which shows the correct output: should render as but your code renders it as These negative lookbehind strategies do not work. Instead, we may need to only process the whitespace control after tokenizing. |
Implemented proper whitespace control by processing it after tokenization
instead of during preprocessing. This correctly handles all edge cases:
- Detect leading hyphens ({%-, {{-, {#-) and trailing hyphens (-%}, -}}, -#})
- Handle both UnaryOperator and AdditiveBinaryOperator tokens for trailing hyphens
- Strip whitespace from adjacent text tokens appropriately
- Special handling for comments during tokenization
This fixes the bug where preprocessing would create invalid syntax with
adjacent { characters, and also ensures whitespace is stripped correctly
in all cases as per Jinja spec.
Co-authored-by: xenova <26504141+xenova@users.noreply.github.com>
... You're absolutely right! I've reimplemented the fix to handle whitespace control after tokenization instead of during preprocessing. The new approach:
Your test case now renders correctly as Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
- Remove redundant comment-related conditions from postProcessWhitespaceControl - Clarify array access in trailing hyphen processing - Add comment explaining textTokenIndex calculation - Simplify comment stripAfter handling Co-authored-by: xenova <26504141+xenova@users.noreply.github.com>
{|
@copilot You should add the example I gave as a test case: should output while you're at it, add some other unit tests to ensure correct whitespace control processing. |
Added 7 test cases for whitespace control:
- WHITESPACE_CONTROL_1: { followed by {%- and -%} (xenova's test case)
- WHITESPACE_CONTROL_2: Basic {%- and -%} on same line
- WHITESPACE_CONTROL_3: {%- and -%} with surrounding spaces
- WHITESPACE_CONTROL_4: Expression {{- and -}}
- WHITESPACE_CONTROL_5: Comment {#- and -#} with text after
- WHITESPACE_CONTROL_6: Comment {#- and -#} with text before
- WHITESPACE_CONTROL_7: {%- set -%} statement
Also fixed a lint error (unused variable wsStart in lexer.ts).
Co-authored-by: xenova <26504141+xenova@users.noreply.github.com>
... Added 7 test cases for whitespace control in commit bee8364:
Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
|
@copilot Good stuff, things are pretty much working. However, I'd like you to optimize your implementation. It seems unnecessary to do a two-pass approach during lexing. I'd prefer if this logic was correctly handled during the first pass. |
Refactored the whitespace control handling to use a single-pass approach
during tokenization instead of a two-pass approach (tokenize then post-process):
- Removed postProcessWhitespaceControl() function
- Added helper functions: stripTrailingWhitespace() and skipLeadingWhitespace()
- Check for {%-, {{-, -%}, -}} during tokenization and handle immediately
- Moved -%} and -}} checks to after whitespace consumption to handle cases
like `{% endfor -%}` where there's a space before the hyphen
This eliminates the need for the second pass while maintaining correct behavior.
Co-authored-by: xenova <26504141+xenova@users.noreply.github.com>
Optimized to single-pass in commit c2c6633. Removed
Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
Weird, CI didn't pick up this lint error in #1859. PR ([success](https://github.com/huggingface/huggingface.js/actions/runs/19744569408/job/56576090622)) <img width="887" height="696" alt="image" src="https://github.com/user-attachments/assets/b889135e-6977-4300-8c4e-dc2f9b16b374" /> Main ([failure](https://github.com/huggingface/huggingface.js/actions/runs/19744642524/job/56576326408)) <img width="946" height="778" alt="image" src="https://github.com/user-attachments/assets/8470e6b4-d3d2-46ec-bb23-48d9812e2ebd" /> cc @coyotte508 probably something to look into
{123}✓Implementation
Refactored to use a single-pass approach during tokenization:
{%-,{{-before text consumption and strip trailing whitespace immediately-%},-}}after whitespace consumption and skip following whitespace immediately{#-and-#}are handled during their own tokenizationOriginal prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.