Skip to content

Separate adjacent same-delimiter inline runs in HtmlToDjot + doc fixes#205

Merged
dereuromark merged 1 commit into
masterfrom
fix/htmltodjot-adjacent-inline
Jun 2, 2026
Merged

Separate adjacent same-delimiter inline runs in HtmlToDjot + doc fixes#205
dereuromark merged 1 commit into
masterfrom
fix/htmltodjot-adjacent-inline

Conversation

@dereuromark
Copy link
Copy Markdown
Contributor

Follow-up to the HtmlToDjot round-trip work (#202, #203, #204).

Problem (G2)

Two adjacent inline elements that share a Djot delimiter merged into a single malformed token on the round-trip:

<em>a</em><em>b</em>                  ->  _a__b_   ->  <em>a_</em>b_
<strong>a</strong><strong>b</strong>  ->  *a**b*   ->  <strong>a*</strong>b*
<code>a</code><code>b</code>          ->  `a``b`   ->  <code>a``b</code>

Same for <sub> and <sup>. Brace-delimited inlines (<del>, <mark>, <ins>) and mixed delimiters (<em> then <strong>) were already fine.

Fix

When concatenating inline children, insert an empty attribute group {} between two pieces where the left ends and the right begins with the same delimiter (_, *, ~, ^, or backtick). {} renders to nothing, so the content is unchanged while the two runs stay distinct:

<em>a</em><em>b</em>  ->  _a_{}_b_  ->  <em>a</em><em>b</em>

A trailing escaped delimiter (\_, produced for literal text) is recognized and left alone, so genuine literal characters are not affected.

The round-trip property test gains an adjacency sweep over em, strong, sub, sup, code, del, mark, ins (pairs and triples), plus the mixed-delimiter and space-separated cases as regression guards.

Docs (G3)

Fixes two overclaims in the converter guide:

  • Round-trip mode is described as "lossless for supported constructs" instead of "perfect" (HTML is many-to-one with Djot, so perfect round-trips of arbitrary input are not possible).
  • Code blocks preserve language and content; the fence length is normalized to the shortest safe length, not preserved. The table entry now says so.

Two adjacent inline elements sharing a Djot delimiter merged into one
malformed token on the round-trip: `<em>a</em><em>b</em>` serialized to
`_a__b_` and re-parsed as `<em>a_</em>b_`. The same happened for strong,
sub, super and code spans.

Insert an empty attribute group `{}` between two pieces when the left
ends and the right begins with the same delimiter (`_`, `*`, `~`, `^` or
backtick). `{}` renders to nothing, so the value is unchanged while the
two runs stay separate. A trailing escaped delimiter (`\_`) is literal
text and is left alone.

Extend the round-trip property test with an adjacency sweep over em,
strong, sub, sup, code, del, mark and ins.

Also fix two documentation overclaims in the converter guide: round-trip
mode is lossless for supported constructs (not "perfect"), and code-block
fence length is normalized rather than preserved.
@dereuromark dereuromark marked this pull request as ready for review June 2, 2026 17:57
@dereuromark dereuromark merged commit a2aa1bb into master Jun 2, 2026
4 checks passed
@dereuromark dereuromark deleted the fix/htmltodjot-adjacent-inline branch June 2, 2026 17:58
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 2, 2026

Codecov Report

❌ Patch coverage is 84.61538% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.79%. Comparing base (637623e) to head (f9363cb).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
src/Converter/HtmlToDjot.php 84.61% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master     #205      +/-   ##
============================================
- Coverage     91.80%   91.79%   -0.02%     
- Complexity     3450     3457       +7     
============================================
  Files           104      104              
  Lines          9786     9798      +12     
============================================
+ Hits           8984     8994      +10     
- Misses          802      804       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

dereuromark added a commit that referenced this pull request Jun 2, 2026
…206)

The same-delimiter separator added in #205 only ran through
processChildren(), so it covered paragraphs but not the two other places
that buffer inline output by hand: processBlock() (bare top-level inline)
and processList() (list-item content). There `<em>a</em><em>b</em>`
still serialized to `_a__b_` and round-tripped to `<em>a_</em>b_`.

Extract the join into a shared appendInline() helper and use it on all
three concatenation paths, so adjacency is handled the same everywhere.

Extend the property test with bare and list contexts across em, strong,
sub, sup and code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant