Skip to content

AtlasEngine: Enable RTL text shaping via bidi analysis (#20156)#20159

Open
Qmoony wants to merge 2 commits into
microsoft:mainfrom
Qmoony:fix/arabic-rtl-text-shaping
Open

AtlasEngine: Enable RTL text shaping via bidi analysis (#20156)#20159
Qmoony wants to merge 2 commits into
microsoft:mainfrom
Qmoony:fix/arabic-rtl-text-shaping

Conversation

@Qmoony
Copy link
Copy Markdown

@Qmoony Qmoony commented Apr 29, 2026

Summary of the Pull Request

This fixes Persian/Arabic text rendering in AtlasEngine by enabling DirectWrite's bidirectional (bidi) text analysis and passing the correct isRightToLeft flag to GetGlyphs() and GetGlyphPlacements(). Previously these were hardcoded to 0 (LTR), which prevented Arabic contextual shaping and produced glyphs in reversed order.

References and Relevant Issues

Closes #20156

Detailed Description of the Pull Request / Additional comments

Previously, IDWriteTextAnalyzer::GetGlyphs and GetGlyphPlacements were called with isRightToLeft=0 for all text, which prevented DirectWrite from performing Arabic contextual shaping (cursive joining) and produced glyphs in LTR order for RTL text.

This change:

  • Calls AnalyzeBidi() to determine text direction per run
  • Passes the correct isRightToLeft value to GetGlyphs/GetGlyphPlacements
  • Reverses the resulting RTL glyph ordering to LTR for the backends
  • Builds ShapedRow directly for RTL runs, bypassing the column loop which assumes ascending clusterMap values

Validation Steps Performed

  • Built and tested on Windows 11 with WindowsTerminalDev app
  • Persian text (e.g., سلام) renders with correct RTL ordering and connected letters
  • Backspace deletion removes characters from the right (RTL) side
  • Mixed LTR/RTL text falls back to LTR direction per run
  • Existing tests continue to pass

PR Checklist

Previously, IDWriteTextAnalyzer::GetGlyphs and GetGlyphPlacements were
called with isRightToLeft=0 for all text, which prevented DirectWrite
from performing Arabic contextual shaping (cursive joining) and
produced glyphs in LTR order for RTL text.

This change:
- Calls AnalyzeBidi() to determine text direction per run
- Passes the correct isRightToLeft value to GetGlyphs/GetGlyphPlacements
- Reverses the resulting RTL glyph ordering to LTR for the backends
- Builds ShapedRow directly for RTL runs, bypassing the column loop
  which assumes ascending clusterMap values
@Qmoony
Copy link
Copy Markdown
Author

Qmoony commented Apr 29, 2026

@microsoft-github-policy-service agree

@DHowett
Copy link
Copy Markdown
Member

DHowett commented Apr 29, 2026

Do you have a screenshot of this working?

Comment thread src/renderer/atlas/AtlasEngine.cpp Outdated
{
// Determine if this script run is RTL by intersecting with bidi runs.
// If all overlapping bidi runs agree on direction → use that.
// If mixed (RTL+LTR within the same script run) → fall back to LTR.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the technically correct approach would be to refactor the text segmentation algorithm. This code currently looks somewhat like this:

for (each text attribute) {      // aka: UpdateDrawingBrushes
    for (each font fallback) {   // aka: _flushBufferLine
        for (each script run) {
            for (each glyph) {
                place glyph
            }
        }
    }
}

The correct approach should look somewhat like this:

// aka: UpdateDrawingBrushes, etc., calls
accumulate all text and attributes in a line

// ...then at the end of each line during flush()
LinkedList runs;
for (each text attribute) {
    for (each font fallback) {
        split runs (per font)
    }
}

perform glyph analysis
for (each analysis result) {
    split runs (per script)
}

// (because DWrite can split runs by accident)
merge adjacent identical runs again

for (each run) {
    map to glyphs
    foreach (glyph) {
        place glyph
    }
}

Among others, this solves the issue that your code only performs correct RTL if it is contained within an entire span. Making a part of RTL text bold for instance should break your code.


// RTL run fully handled. Skip the normal column loop below.
continue;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The block above is some nice reference code to have! Thank you for working on it!
We should probably think about unifying it with the existing logic, however.

BTW judging by the lack of {} around conditions, I bet it was done by Claude though, right? 😄

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed review and the architecture outline!

You're right on both counts:

  1. The current approach only handles RTL correctly within a single script run — as soon as a space or attribute change splits the span, it falls back to LTR. That matches what @Avid29 observed with זאת בדיקה.
  2. And yes, Claude wrote it :) The missing braces gave it away? I did review and
    test before pushing though.

Your proposed refactoring makes sense — accumulate line state first, then
split/merge runs, then glyph placement last. That would also fix the mixed-direction
span case naturally.

I see two paths forward:

  • Option A: Keep this PR as a working proof-of-concept / stepping stone, and
    iterate on the full refactor in a follow-up.
  • Option B: Close this and do the full refactor from scratch, using the RTL
    reversal code here as reference.

I'm leaning toward Option A since it already improves the common case (pure RTL
text without mixed formatting), but I'll defer to your preference on this.
Happy to take this to a discussion issue first if you'd rather align on the design
up front.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude is bad at implicit research, e.g. checking for existing code styles, if you don't tell it to. That's perhaps also why today's system prompts are like 10000 lines long. "Alignment" my 🍑. lol

Jokes aside, I'd prefer option B, because we (the maintainers) are always pressed on time. If we don't do it right now, we won't ever have any incentive to fix it later. So, if we care to make it good, we have to make it good from the get go.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with B — pushed 13f06b9. The pipeline now does analysis → split runs:

  • AnalyzeScript + AnalyzeBidi run once on the whole buffer line in _flushBufferLine
  • _mapRegularText clamps font fallback and GetTextComplexity at script/bidi boundaries via a new _nextAnalysisBoundary helper, so every range entering _mapComplex is guaranteed uniform in direction and script
  • _mapComplex dropped its inline AnalyzeBidi/AnalyzeScript calls, the per-script-run loop, and the "mixed → LTR" fallback (structurally impossible now)

Verified locally:

  • זאת בדיקה (the Hebrew + space case @Avid29 reported)
  • Hello זאת / abc العربية xyz (mixed LTR+RTL on one line)
  • بسم الله الرحمن الرحيم (multi-word Arabic, per-word cursive joining intact)

No visible regressions in ASCII, CJK, emoji, combining marks, ligatures, or built-in box-drawing glyphs.

Still out of scope:

  • Full line-level visual reordering for mixed-direction text — the terminal cell grid is LTR; separate change
  • Cursive joining across direction/font boundaries — inherent limitation when text gets split into separate shaping runs

@Avid29
Copy link
Copy Markdown

Avid29 commented Apr 29, 2026

I deployed and tried this. It's close, but it doesn't handle non-directional characters properly.

Here I typed זאת בדיקה, but it says בדיקה זאת because the space reset the span.

image

This is without a doubt massively improved behavior though

@lhecker
Copy link
Copy Markdown
Member

lhecker commented Apr 30, 2026

Yeah this is one example for what I meant in my review above. It's somewhat annoying to fix this because it necessitates a refactor. The flip-side is that this fixes various non-RTL issues too.

@Qmoony
Copy link
Copy Markdown
Author

Qmoony commented Apr 30, 2026

@Avid29 Good catch, thanks for testing! The space character has bidi class WS
(whitespace) which resets the direction — the bidi algorithm places it outside the
RTL run and the next word starts a new analysis. This is the same limitation
@lhecker pointed out in code review: RTL correctness only holds within a single
script run right now. Fixing this properly requires the text segmentation refactor
that lhecker outlined above.

@Qmoony
Copy link
Copy Markdown
Author

Qmoony commented Apr 30, 2026

@DHowett Screenshot added below. For a quick test case, echo "مرحبا بالعالم" in
PowerShell or echo سلام in WSL shows the RTL shaping — cursive joining and
right-to-left glyph order are both working now.
屏幕截图 2026-04-30 200030

Comment thread CLAUDE.md Fixed
Comment thread CLAUDE.md Fixed
Comment thread CLAUDE.md Fixed
@github-actions

This comment has been minimized.

@Qmoony Qmoony force-pushed the fix/arabic-rtl-text-shaping branch from 29044b4 to ff22ca5 Compare May 11, 2026 08:56
@github-actions

This comment has been minimized.

@Qmoony Qmoony force-pushed the fix/arabic-rtl-text-shaping branch 2 times, most recently from a89d424 to ff22ca5 Compare May 11, 2026 09:43
The previous RTL implementation ran AnalyzeBidi and AnalyzeScript inside
_mapComplex, after font fallback and complexity analysis had already
segmented the text. When those upstream segmenters split a direction-
uniform region (e.g. at a space or attribute change), each fragment
re-analyzed in isolation and fell back to LTR.

This commit inverts the order so analysis is authoritative:

- Run AnalyzeScript + AnalyzeBidi once on the whole buffer line in
  _flushBufferLine.
- Add _nextAnalysisBoundary so font fallback and GetTextComplexity
  clamp every segment at the next script/bidi run end.
- _mapComplex receives ranges guaranteed uniform in script and
  direction; the inline analysis call, the per-script-run loop, and
  the "mixed -> LTR" fallback are removed.

Hebrew text with embedded spaces and mixed LTR/RTL lines now shape
correctly.
@Qmoony Qmoony force-pushed the fix/arabic-rtl-text-shaping branch 2 times, most recently from 13f06b9 to 25aa9c7 Compare May 15, 2026 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Persian (Farsi) text rendered with reversed character order and no shaping

5 participants