[EuiMarkdownFormat] Fix intraword underscore emphasis rendering#9408
[EuiMarkdownFormat] Fix intraword underscore emphasis rendering#9408acstll wants to merge 5 commits intoelastic:mainfrom
Conversation
…tic#9404) Add a remark AST transformer plugin that reverses incorrectly applied emphasis/strong nodes when underscore delimiters are flanked by alphanumeric characters on both sides (intraword context), matching the CommonMark specification behavior that remark-parse v8 does not implement. This prevents Salesforce-style identifiers like SBQQ__OrderProductBookings__c from being rendered with bold formatting. Co-authored-by: Cursor <cursoragent@cursor.com>
tkajtoch
left a comment
There was a problem hiding this comment.
I was initially thinking we'd handle that outside remark but having it as a plugin is also a valid solution :D So far the changes look good. Let me know when the PR is ready for final review!
nice! i'll go through it again and add a small unit test, and then i'll open 🙏 |
|
@tkajtoch this one is ready for review 🫡 |
|
I would propose to close this one in favor of dropping CJS support (https://github.com/elastic/eui-private/issues/553). It's not the first issue we've had for remark that a simple package update would fix. |
|
This issue has spun up a whole different discussion, and I believe at this point we just need to get things going. I'm going to run your code against remark test files to ensure there are no regressions. |
tkajtoch
left a comment
There was a problem hiding this comment.
I tested your PR against a CommonMark sample file and confirmed there are no regressions.
However, I've found one more thing related to underscores that this doesn't fix: Lorem__ipsum__ dolor sit amet should output a plaintext but instead it makes the ipsum bolded.
Here's the expected mdast:
{
"type": "root",
"children": [
{
"type": "paragraph",
"children": [
{
"type": "text",
"value": "Lorem__ipsum__ dolor sit amet"
}
]
}
]
}|
The |
There was a problem hiding this comment.
Pull request overview
Adds a temporary remark AST transformer to undo incorrectly-parsed underscore emphasis/strong in “intraword” contexts (e.g., Salesforce API identifiers), aligning EuiMarkdownFormat output closer to the CommonMark rule for _ delimiters until remark-parse is upgraded.
Changes:
- Added
remark_intraword_underscoreplugin to flatten misappliedemphasis/strongnodes back into plain text underscores in intraword scenarios. - Added RTL/Jest coverage validating Salesforce-style identifiers render without
<em>/<strong>while standalone_italic_/__bold__still work. - Registered the plugin in default parsing plugins, updated plugin count assertions, and added a changelog entry.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/eui/src/components/markdown_editor/plugins/remark/remark_intraword_underscore.ts | New remark plugin that rewrites intraword underscore emphasis/strong nodes into text. |
| packages/eui/src/components/markdown_editor/plugins/remark/remark_intraword_underscore.test.tsx | Tests verifying identifiers with underscores are preserved as plain text. |
| packages/eui/src/components/markdown_editor/plugins/markdown_default_plugins/parsing_plugins.ts | Registers the new parsing plugin in the default list. |
| packages/eui/src/components/markdown_editor/plugins/markdown_default_plugins/plugins.test.ts | Updates default plugin count expectations. |
| packages/eui/changelogs/upcoming/9408.md | Changelog entry for the bug fix. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } else if (isEmphasisOrStrong(child)) { | ||
| inner += flattenToText(child); | ||
| } else { | ||
| // Contains non-text children (links, images, etc.) — leave emphasis intact | ||
| return delimiter + collectText(node) + delimiter; | ||
| } |
There was a problem hiding this comment.
flattenToText() claims to "leave emphasis intact" when encountering non-text children, but it still returns a string and processParent() always replaces the node with a text node. This will strip nested nodes like inlineCode, link, etc. and can even drop their content (e.g. inlineCode has value but no children, so collectText() returns an empty string). Consider bailing out of the replacement when non-text descendants are present (e.g. have flattenToText return null and skip splice), or otherwise ensure all literal node types are preserved and non-text nodes are not lost.
| if (isEmphasisOrStrong(child) && isIntraword(parent, i, source)) { | ||
| const textValue = flattenToText(child); | ||
| const replacement: TextNode = { | ||
| type: 'text', | ||
| value: textValue, | ||
| } as TextNode; | ||
|
|
||
| parent.children.splice(i, 1, replacement); | ||
| modified = true; |
There was a problem hiding this comment.
The replacement text node created here drops the original node’s position. EuiMarkdownEditor relies on AST position.start/end.offset for cursor tracking and node replacement, so losing positions can break selection behavior. Consider copying child.position onto replacement (and ensuring it still reflects the same source span).
| (parent.children[i] as TextNode).value += ( | ||
| parent.children[i + 1] as TextNode | ||
| ).value; |
There was a problem hiding this comment.
mergeAdjacentText() concatenates adjacent text node values but does not update their position ranges. If the first node keeps its original position.end, the merged node’s offsets will no longer cover the full text span, which can break cursor-to-AST mapping in the markdown editor. When merging, update the kept node’s position.end (and related fields) based on the removed node’s end position.
| (parent.children[i] as TextNode).value += ( | |
| parent.children[i + 1] as TextNode | |
| ).value; | |
| const current = parent.children[i] as TextNode & { position?: any }; | |
| const next = parent.children[i + 1] as TextNode & { position?: any }; | |
| current.value += next.value; | |
| // Preserve and extend positional information so the merged node | |
| // covers the full text span of both original nodes. | |
| if (current.position && next.position && next.position.end) { | |
| current.position.end = next.position.end; | |
| } |
…ord-underscore-emphasis
💚 Build SucceededHistory
cc @acstll |
💚 Build Succeeded
History
cc @acstll |
Summary
Adds a remark AST transformer plugin (
remark_intraword_underscore) that reverses incorrectly appliedemphasis/strongnodes when underscore delimiters are flanked by alphanumeric characters on both sides (intraword context).remark-parsev8 does not implement the CommonMark specification rule that underscore delimiters which are both left-flanking and right-flanking (i.e. surrounded by word characters) cannot open or close emphasis. This plugin walks the parsed AST post-parse, detects these cases by checking sibling text nodes and verifying the delimiter is_(not*) via source position, and flattens the nodes back to plain text with underscores restored.This is a temporary workaround until
remark-parseis upgraded to a version that natively handles this rule.Why are we making this change?
Closes #9404
Salesforce API identifiers use underscores and double underscores extensively (e.g.
SBQQ__OrderProductBookings__c,Custom_Field__c).EuiMarkdownFormatrenders these with bold/italic formatting instead of preserving them verbatim, making the text unreadable and unusable for copy-paste.Impact to users
🟢 Intraword underscore sequences that were previously (incorrectly) rendered as bold or italic will now display as plain text, matching CommonMark spec behavior. Legitimate emphasis using
__bold__or_italic_with whitespace boundaries continues to work as expected. Asterisk-based emphasis (**bold**,*italic*) is completely unaffected.QA
Remove or strikethrough items that do not apply to your PR.
General checklist
Docs site QAAdded documentationProps have proper autodocsChecked Code Sandbox works for any docs examplesUpdated visual regression testsIf applicable, added the breaking change issue labelIf the changes unblock an issue in a different repo, smoke tested carefullyDesigner checklistMade with Cursor