Separate adjacent same-delimiter inline runs in HtmlToDjot + doc fixes by dereuromark · Pull Request #205 · php-collective/djot-php

dereuromark · 2026-06-02T17:56:39Z

Follow-up to the HtmlToDjot round-trip work (#202, #203, #204).

Problem (G2)

Two adjacent inline elements that share a Djot delimiter merged into a single malformed token on the round-trip:

<em>a</em><em>b</em>                  ->  _a__b_   ->  <em>a_</em>b_
<strong>a</strong><strong>b</strong>  ->  *a**b*   ->  <strong>a*</strong>b*
<code>a</code><code>b</code>          ->  `a``b`   ->  <code>a``b</code>

Same for  and . Brace-delimited inlines (<del>, , <ins>) and mixed delimiters ( then ) were already fine.

Fix

When concatenating inline children, insert an empty attribute group {} between two pieces where the left ends and the right begins with the same delimiter (_, *, ~, ^, or backtick). {} renders to nothing, so the content is unchanged while the two runs stay distinct:

<em>a</em><em>b</em>  ->  _a_{}_b_  ->  <em>a</em><em>b</em>

A trailing escaped delimiter (\_, produced for literal text) is recognized and left alone, so genuine literal characters are not affected.

The round-trip property test gains an adjacency sweep over em, strong, sub, sup, code, del, mark, ins (pairs and triples), plus the mixed-delimiter and space-separated cases as regression guards.

Docs (G3)

Fixes two overclaims in the converter guide:

Round-trip mode is described as "lossless for supported constructs" instead of "perfect" (HTML is many-to-one with Djot, so perfect round-trips of arbitrary input are not possible).
Code blocks preserve language and content; the fence length is normalized to the shortest safe length, not preserved. The table entry now says so.

Two adjacent inline elements sharing a Djot delimiter merged into one malformed token on the round-trip: `ab` serialized to `_a__b_` and re-parsed as `a_b_`. The same happened for strong, sub, super and code spans. Insert an empty attribute group `{}` between two pieces when the left ends and the right begins with the same delimiter (`_`, `*`, `~`, `^` or backtick). `{}` renders to nothing, so the value is unchanged while the two runs stay separate. A trailing escaped delimiter (`\_`) is literal text and is left alone. Extend the round-trip property test with an adjacency sweep over em, strong, sub, sup, code, del, mark and ins. Also fix two documentation overclaims in the converter guide: round-trip mode is lossless for supported constructs (not "perfect"), and code-block fence length is normalized rather than preserved.

codecov · 2026-06-02T17:59:14Z

Codecov Report

❌ Patch coverage is 84.61538% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.79%. Comparing base (637623e) to head (f9363cb).
⚠️ Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
src/Converter/HtmlToDjot.php	84.61%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master     #205      +/-   ##
============================================
- Coverage     91.80%   91.79%   -0.02%     
- Complexity     3450     3457       +7     
============================================
  Files           104      104              
  Lines          9786     9798      +12     
============================================
+ Hits           8984     8994      +10     
- Misses          802      804       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…206) The same-delimiter separator added in #205 only ran through processChildren(), so it covered paragraphs but not the two other places that buffer inline output by hand: processBlock() (bare top-level inline) and processList() (list-item content). There `ab` still serialized to `_a__b_` and round-tripped to `a_b_`. Extract the join into a shared appendInline() helper and use it on all three concatenation paths, so adjacency is handled the same everywhere. Extend the property test with bare and list contexts across em, strong, sub, sup and code.

dereuromark marked this pull request as ready for review June 2, 2026 17:57

dereuromark merged commit a2aa1bb into master Jun 2, 2026
4 checks passed

dereuromark deleted the fix/htmltodjot-adjacent-inline branch June 2, 2026 17:58

dereuromark mentioned this pull request Jun 2, 2026

Apply adjacent-inline separator on all inline buffers in HtmlToDjot #206

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Separate adjacent same-delimiter inline runs in HtmlToDjot + doc fixes#205

Separate adjacent same-delimiter inline runs in HtmlToDjot + doc fixes#205
dereuromark merged 1 commit into
masterfrom
fix/htmltodjot-adjacent-inline

dereuromark commented Jun 2, 2026

Uh oh!

Uh oh!

codecov Bot commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

dereuromark commented Jun 2, 2026

Problem (G2)

Fix

Docs (G3)

Uh oh!

Uh oh!

codecov Bot commented Jun 2, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant