AtlasEngine: Enable RTL text shaping via bidi analysis (#20156) by Qmoony · Pull Request #20159 · microsoft/terminal

Qmoony · 2026-04-29T10:23:31Z

Summary of the Pull Request

This fixes Persian/Arabic text rendering in AtlasEngine by enabling DirectWrite's bidirectional (bidi) text analysis and passing the correct isRightToLeft flag to GetGlyphs() and GetGlyphPlacements(). Previously these were hardcoded to 0 (LTR), which prevented Arabic contextual shaping and produced glyphs in reversed order.

References and Relevant Issues

Closes #20156

Detailed Description of the Pull Request / Additional comments

Previously, IDWriteTextAnalyzer::GetGlyphs and GetGlyphPlacements were called with isRightToLeft=0 for all text, which prevented DirectWrite from performing Arabic contextual shaping (cursive joining) and produced glyphs in LTR order for RTL text.

This change:

Calls AnalyzeBidi() to determine text direction per run
Passes the correct isRightToLeft value to GetGlyphs/GetGlyphPlacements
Reverses the resulting RTL glyph ordering to LTR for the backends
Builds ShapedRow directly for RTL runs, bypassing the column loop which assumes ascending clusterMap values

Validation Steps Performed

Built and tested on Windows 11 with WindowsTerminalDev app
Persian text (e.g., سلام) renders with correct RTL ordering and connected letters
Backspace deletion removes characters from the right (RTL) side
Mixed LTR/RTL text falls back to LTR direction per run
Existing tests continue to pass

PR Checklist

Closes Persian (Farsi) text rendered with reversed character order and no shaping #20156
Tests added/passed
Documentation updated
Schema updated (if necessary)

Previously, IDWriteTextAnalyzer::GetGlyphs and GetGlyphPlacements were called with isRightToLeft=0 for all text, which prevented DirectWrite from performing Arabic contextual shaping (cursive joining) and produced glyphs in LTR order for RTL text. This change: - Calls AnalyzeBidi() to determine text direction per run - Passes the correct isRightToLeft value to GetGlyphs/GetGlyphPlacements - Reverses the resulting RTL glyph ordering to LTR for the backends - Builds ShapedRow directly for RTL runs, bypassing the column loop which assumes ascending clusterMap values

Qmoony · 2026-04-29T16:52:31Z

@microsoft-github-policy-service agree

DHowett · 2026-04-29T16:58:01Z

Do you have a screenshot of this working?

lhecker · 2026-04-29T16:59:53Z

    {
+        // Determine if this script run is RTL by intersecting with bidi runs.
+        // If all overlapping bidi runs agree on direction → use that.
+        // If mixed (RTL+LTR within the same script run) → fall back to LTR.


I believe the technically correct approach would be to refactor the text segmentation algorithm. This code currently looks somewhat like this:

for (each text attribute) { // aka: UpdateDrawingBrushes for (each font fallback) { // aka: _flushBufferLine for (each script run) { for (each glyph) { place glyph } } } }

The correct approach should look somewhat like this:

// aka: UpdateDrawingBrushes, etc., calls accumulate all text and attributes in a line // ...then at the end of each line during flush() LinkedList runs; for (each text attribute) { for (each font fallback) { split runs (per font) } } perform glyph analysis for (each analysis result) { split runs (per script) } // (because DWrite can split runs by accident) merge adjacent identical runs again for (each run) { map to glyphs foreach (glyph) { place glyph } }

Among others, this solves the issue that your code only performs correct RTL if it is contained within an entire span. Making a part of RTL text bold for instance should break your code.

lhecker · 2026-04-29T17:02:01Z

+
+            // RTL run fully handled. Skip the normal column loop below.
+            continue;
+        }


The block above is some nice reference code to have! Thank you for working on it!
We should probably think about unifying it with the existing logic, however.

BTW judging by the lack of {} around conditions, I bet it was done by Claude though, right? 😄

Thanks for the detailed review and the architecture outline!

You're right on both counts:

The current approach only handles RTL correctly within a single script run — as soon as a space or attribute change splits the span, it falls back to LTR. That matches what @Avid29 observed with זאת בדיקה.

And yes, Claude wrote it :) The missing braces gave it away? I did review and
test before pushing though.

Your proposed refactoring makes sense — accumulate line state first, then
split/merge runs, then glyph placement last. That would also fix the mixed-direction
span case naturally.

I see two paths forward:

Option A: Keep this PR as a working proof-of-concept / stepping stone, and
iterate on the full refactor in a follow-up.

Option B: Close this and do the full refactor from scratch, using the RTL
reversal code here as reference.

I'm leaning toward Option A since it already improves the common case (pure RTL
text without mixed formatting), but I'll defer to your preference on this.
Happy to take this to a discussion issue first if you'd rather align on the design
up front.

Claude is bad at implicit research, e.g. checking for existing code styles, if you don't tell it to. That's perhaps also why today's system prompts are like 10000 lines long. "Alignment" my 🍑. lol

Jokes aside, I'd prefer option B, because we (the maintainers) are always pressed on time. If we don't do it right now, we won't ever have any incentive to fix it later. So, if we care to make it good, we have to make it good from the get go.

Went with B — pushed 13f06b9. The pipeline now does analysis → split runs:

AnalyzeScript + AnalyzeBidi run once on the whole buffer line in _flushBufferLine

_mapRegularText clamps font fallback and GetTextComplexity at script/bidi boundaries via a new _nextAnalysisBoundary helper, so every range entering _mapComplex is guaranteed uniform in direction and script

_mapComplex dropped its inline AnalyzeBidi/AnalyzeScript calls, the per-script-run loop, and the "mixed → LTR" fallback (structurally impossible now)

Verified locally:

זאת בדיקה (the Hebrew + space case @Avid29 reported)

Hello זאת / abc العربية xyz (mixed LTR+RTL on one line)

بسم الله الرحمن الرحيم (multi-word Arabic, per-word cursive joining intact)

No visible regressions in ASCII, CJK, emoji, combining marks, ligatures, or built-in box-drawing glyphs.

Still out of scope:

Full line-level visual reordering for mixed-direction text — the terminal cell grid is LTR; separate change

Cursive joining across direction/font boundaries — inherent limitation when text gets split into separate shaping runs

Avid29 · 2026-04-29T23:44:49Z

I deployed and tried this. It's close, but it doesn't handle non-directional characters properly.

Here I typed זאת בדיקה, but it says בדיקה זאת because the space reset the span.

This is without a doubt massively improved behavior though

lhecker · 2026-04-30T00:23:45Z

Yeah this is one example for what I meant in my review above. It's somewhat annoying to fix this because it necessitates a refactor. The flip-side is that this fixes various non-RTL issues too.

Qmoony · 2026-04-30T11:57:04Z

@Avid29 Good catch, thanks for testing! The space character has bidi class WS
(whitespace) which resets the direction — the bidi algorithm places it outside the
RTL run and the next word starts a new analysis. This is the same limitation
@lhecker pointed out in code review: RTL correctness only holds within a single
script run right now. Fixing this properly requires the text segmentation refactor
that lhecker outlined above.

Qmoony · 2026-04-30T12:00:52Z

@DHowett Screenshot added below. For a quick test case, echo "مرحبا بالعالم" in
PowerShell or echo سلام in WSL shows the RTL shaping — cursive joining and
right-to-left glyph order are both working now.

The previous RTL implementation ran AnalyzeBidi and AnalyzeScript inside _mapComplex, after font fallback and complexity analysis had already segmented the text. When those upstream segmenters split a direction- uniform region (e.g. at a space or attribute change), each fragment re-analyzed in isolation and fell back to LTR. This commit inverts the order so analysis is authoritative: - Run AnalyzeScript + AnalyzeBidi once on the whole buffer line in _flushBufferLine. - Add _nextAnalysisBoundary so font fallback and GetTextComplexity clamp every segment at the next script/bidi run end. - _mapComplex receives ranges guaranteed uniform in script and direction; the inline analysis call, the per-script-run loop, and the "mixed -> LTR" fallback are removed. Hebrew text with embedded spaces and mixed LTR/RTL lines now shape correctly.

lhecker reviewed Apr 29, 2026

View reviewed changes

github-advanced-security AI found potential problems May 11, 2026

View reviewed changes

Comment thread CLAUDE.md Fixed

Comment thread CLAUDE.md Fixed

Comment thread CLAUDE.md Fixed

This comment has been minimized.

Sign in to view

Qmoony force-pushed the fix/arabic-rtl-text-shaping branch from 29044b4 to ff22ca5 Compare May 11, 2026 08:56

This comment has been minimized.

Sign in to view

Qmoony force-pushed the fix/arabic-rtl-text-shaping branch 2 times, most recently from a89d424 to ff22ca5 Compare May 11, 2026 09:43

Qmoony force-pushed the fix/arabic-rtl-text-shaping branch 2 times, most recently from 13f06b9 to 25aa9c7 Compare May 15, 2026 18:45

Conversation

Qmoony commented Apr 29, 2026

Summary of the Pull Request

References and Relevant Issues

Detailed Description of the Pull Request / Additional comments

Validation Steps Performed

PR Checklist

Uh oh!

Qmoony commented Apr 29, 2026

Uh oh!

DHowett commented Apr 29, 2026

Uh oh!

lhecker Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

lhecker Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Qmoony Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

lhecker May 6, 2026

Choose a reason for hiding this comment

Uh oh!

Qmoony May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Avid29 commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lhecker commented Apr 30, 2026

Uh oh!

Qmoony commented Apr 30, 2026

Uh oh!

Qmoony commented Apr 30, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Avid29 commented Apr 29, 2026 •

edited

Loading