Skip to content

Fix attribute parsing edge cases for curly braces, colons, and escapes#110

Merged
dereuromark merged 1 commit intomasterfrom
fix/attribute-edge-cases
Mar 23, 2026
Merged

Fix attribute parsing edge cases for curly braces, colons, and escapes#110
dereuromark merged 1 commit intomasterfrom
fix/attribute-edge-cases

Conversation

@dereuromark
Copy link
Copy Markdown
Contributor

Summary

This PR fixes three issues in attribute parsing to match the reference JS implementation behavior:

1. Curly braces in quoted values

Use findAttributeEnd() instead of naive strpos() for finding attribute block boundaries. This properly handles quoted strings containing { and } characters.

Before:

Input:  [text]{code="{foo}"}
Output: <p><span>text</span>"}</p>  ← broken

After:

Input:  [text]{code="{foo}"}
Output: <p><span code="{foo}">text</span></p>  ✓

2. Colon in attribute keys

Allow colons in attribute key names to support namespaced attributes like xml:lang=en. Also made key patterns more permissive to allow underscore/hyphen prefixed keys after whitespace.

Before:

Input:  [text]{xml:lang=en}
Output: <p><span lang="en">text</span></p>  ← colon prefix stripped

After:

Input:  [text]{xml:lang=en}
Output: <p><span xml:lang="en">text</span></p>  ✓

3. Backslash escape handling

Process escapes correctly for ASCII punctuation (\\, \", \*, etc.) while keeping backslash before alphanumerics literal (\n, \t, \U stay as-is).

Before:

Input:  [text]{path="C:\Users\test"}
Output: <p><span path="C:Userstest">text</span></p>  ← backslashes removed

After:

Input:  [text]{path="C:\Users\test"}
Output: <p><span path="C:\Users\test">text</span></p>  ✓

Changes

  • src/Parser/InlineParser.php: Replace strpos($text, '}', ...) with findAttributeEnd() in 4 places
  • src/Parser/Utility/AttributeParser.php:
    • Update key patterns to allow colons and be more permissive
    • Fix processEscapes() to only escape ASCII punctuation
  • tests/TestCase/Parser/AttributeParserTest.php: Add 17 new tests

All verified against reference JS implementation (@djot/djot).

This PR fixes three issues in attribute parsing to match the reference
JS implementation behavior:

1. Curly braces in quoted values: Use findAttributeEnd() instead of
   naive strpos() for finding attribute block boundaries. This properly
   handles quoted strings containing { and } characters like
   `{code="{foo}"}`.

2. Colon in attribute keys: Allow colons in attribute key names to
   support namespaced attributes like `xml:lang=en`. Also made key
   patterns more permissive to allow underscore/hyphen prefixed keys
   after whitespace, matching JS reference.

3. Backslash escape handling: Process escapes correctly for ASCII
   punctuation (\\, \", \*, etc.) while keeping backslash before
   alphanumerics literal (\n, \t, \U stay as-is).

Adds 17 new tests covering curly braces, colon keys, and escape handling.
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 23, 2026

Codecov Report

❌ Patch coverage is 94.11765% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.64%. Comparing base (e35734f) to head (6797754).
⚠️ Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
src/Parser/InlineParser.php 75.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master     #110      +/-   ##
============================================
+ Coverage     93.61%   93.64%   +0.03%     
- Complexity     2328     2332       +4     
============================================
  Files            79       79              
  Lines          6168     6182      +14     
============================================
+ Hits           5774     5789      +15     
+ Misses          394      393       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@dereuromark dereuromark merged commit c2acf13 into master Mar 23, 2026
6 checks passed
@dereuromark dereuromark deleted the fix/attribute-edge-cases branch March 23, 2026 06:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant