Fine-grained DWrite text analysis based on text complexity #9156

skyline75489 · 2021-02-14T01:21:28Z

Description of the new feature/enhancement

Inspired by microsoft/cascadia-code#411, certain ASCII characters sometimes break the simplicity of the entire text, depending on the font being used. The current implementation skips dwrite analysis when the entire text is simple:

if (!_isEntireTextSimple)
{
    // Call each of the analyzers in sequence, recording their results.
    RETURN_IF_FAILED(_fontRenderData->Analyzer()->AnalyzeLineBreakpoints(this, 0, textLength, this));
    RETURN_IF_FAILED(_fontRenderData->Analyzer()->AnalyzeBidi(this, 0, textLength, this));
    RETURN_IF_FAILED(_fontRenderData->Analyzer()->AnalyzeScript(this, 0, textLength, this));
    RETURN_IF_FAILED(_fontRenderData->Analyzer()->AnalyzeNumberSubstitution(this, 0, textLength, this));
    // Perform our custom font fallback analyzer that mimics the pattern of the real analyzers.
    RETURN_IF_FAILED(_AnalyzeFontFallback(this, 0, textLength));
}

With for example Fira Code, in most cases the optimization only applies to lines with 120 spaces, which is not good.

Proposed technical implementation details (optional)

GetTextComplexity can provide a breakdown report of the text, showing which specific range of the text is simple, we should be able to utilize it like this:

for (auto range : complexRanges)
{
    // Call each of the analyzers in sequence, recording their results.
    RETURN_IF_FAILED(_fontRenderData->Analyzer()->AnalyzeLineBreakpoints(this, range, this));
    RETURN_IF_FAILED(_fontRenderData->Analyzer()->AnalyzeBidi(this, range , this));
    RETURN_IF_FAILED(_fontRenderData->Analyzer()->AnalyzeScript(this, range , this));
    RETURN_IF_FAILED(_fontRenderData->Analyzer()->AnalyzeNumberSubstitution(this, range, this));
    // Perform our custom font fallback analyzer that mimics the pattern of the real analyzers.
    RETURN_IF_FAILED(_AnalyzeFontFallback(this, range));
}

See #6695 for the introduction of text complexity analysis.

The text was updated successfully, but these errors were encountered:

skyline75489 · 2021-02-14T01:24:21Z

This should also help users who use non-English locales, for example avoid analyze entirely:

skyline75489 · 2021-02-14T02:27:36Z

/cc @miniksa for both sanity & technical check

skyline75489 · 2021-02-15T01:36:01Z

I've done some experiment and I found that the text complexity is not the same as run splitting. For example with the following text:

The text complexity analysis reports (a, b is pos, length pair) :

0, 4: Complex
4, 26: Simple
30, 8: Complex
38, 70: Simple

The run analysis split it into the following runs:

0, 6
6, 25
31, 77

We might also need some sort of RLE implementation to find it a run is entire simple and then optimize the shaping process for the run.

miniksa · 2021-02-16T22:14:28Z

I agree that we should make use of the additional analysis information to improve performance in this way.

I do think that we could just further split the Runs and give them an additional simple-or-not parameter (bool) during the initial _AnalyzeTextComplexity that is just picked up during _AnalyzeRuns to determine the full analysis or skip and again during _ShapeGlyphRuns to determine the quick-mapping or slow-mapping to glyphs. In lieu of the whole thing being simple, a Run would be simple or not.

I'm not quite sure why your example maps as it does. Are some of those characters UTF-16 surrogate pairs?

skyline75489 · 2021-02-16T23:49:33Z

those are just normal Chinese characters. Originally I thought text complexity analysis would split the text the same way as run splitting. Just want to add an example to show that it’s not.

a Run would be simple or no

This is likely undetermined. In the example above:

This is a Run. But according to text complexity, the first 4 characters are complex, the last 2 characters are simple. This is what frustrates me. We can’t just simply know a Run is simple or not easily and optimize based on that.

miniksa · 2021-02-17T00:22:41Z

Yeah but what I'm saying is that we can just call _SetCurrentRun and _SplitCurrentRun inside of _AnalyzeTextComplexity when we start listening to the length of the complexity and add the additional data.

So then you have a [0,4) complex run. [6,8) simple run. [8, 26) simple run. etc. etc.

skyline75489 · 2021-02-17T00:29:52Z

Doesn’t that bring more fragmentation into the process? Will it affect the line breaking and script analysis result? I need to dig more into this... 获取 Outlook for iOS<https://aka.ms/o0ukef>

miniksa · 2021-02-17T17:00:17Z

To your questions: oh probably. It's worth a try though to see if it just works. Sometimes the simple answer is "good enough". If it turns out to not be, we can refine further from there. Feel free to try/dig!

This PR aims to optimize the text analysis process by breaking the text into simple & complex runs according to the result of `GetTextComplexity`. For simple runs, we can skip certain processing steps to improve the analysis performance. Previous to this PR, we rely on the result of `AnalyzeBidi`, `AnalyzeScript` and `AnalyzeNumberSubstitution` to both break the text into different runs and attach the corresponding bidi/script/number_substitution information to the run. Thanks to #6695 we have the chance to skip the expensive analysis process when we found the *entire text* is determined to be simple. Inspired by microsoft/cascadia-code#411 and discussions in #9156, I found that the "entire text simplicity" is often hard to meet. In order to fully utilize the complexity information of the text, we need to first break the text into simple & complex ranges. These ranges are also the initial runs prior to the bidi/script/number_substitution analysis. This way we can skip the text analysis for simple runs to speed up the process. VALIDATION Build & run cmatrix, cacafire, cat big.txt with it. Initial simple run PR: #6695 Closes #9156

skyline75489 · 2021-07-20T02:46:42Z

Can we reopen this? #9202 was reverted.

#10036 is a unsuccessful attempt to patch #9202.

lhecker · 2022-10-12T21:39:57Z

AtlasEngine does this! 💖

skyline75489 added the Issue-Feature Complex enough to require an in depth planning process and actual budgeted, scheduled work. label Feb 14, 2021

ghost added Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Needs-Tag-Fix Doesn't match tag requirements labels Feb 14, 2021

ghost removed the Needs-Tag-Fix Doesn't match tag requirements label Feb 16, 2021

zadjii-msft added this to the Terminal Backlog milestone Feb 16, 2021

skyline75489 mentioned this issue Feb 18, 2021

Initial implementation of fine-grained text analysis #9202

Merged

DHowett removed the Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting label Feb 18, 2021

ghost added the In-PR This issue has a related PR label Feb 19, 2021

ghost closed this as completed in #9202 Apr 28, 2021

ghost removed the In-PR This issue has a related PR label Apr 28, 2021

ghost added the Resolution-Fix-Committed Fix is checked in, but it might be 3-4 weeks until a release. label Apr 28, 2021

skyline75489 mentioned this issue Jun 20, 2021

Add a DxRenderer based on a glyph atlas #10461

Closed

skyline75489 mentioned this issue Jul 20, 2021

[DRAFT] Re-add #9202, but with less crashes #10036

Closed

6 tasks

zadjii-msft reopened this Jul 20, 2021

zadjii-msft removed the Resolution-Fix-Committed Fix is checked in, but it might be 3-4 weeks until a release. label Jul 20, 2021

skyline75489 mentioned this issue Aug 5, 2021

Fix setting wght axis font bugs #10863

Merged

2 tasks

zadjii-msft modified the milestones: Terminal Backlog, Backlog Jan 4, 2022

lhecker closed this as completed Oct 12, 2022

ghost added the Needs-Tag-Fix Doesn't match tag requirements label Oct 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine-grained DWrite text analysis based on text complexity #9156

Fine-grained DWrite text analysis based on text complexity #9156

skyline75489 commented Feb 14, 2021 •

edited

Loading

skyline75489 commented Feb 14, 2021

skyline75489 commented Feb 14, 2021

skyline75489 commented Feb 15, 2021

miniksa commented Feb 16, 2021

skyline75489 commented Feb 16, 2021

miniksa commented Feb 17, 2021

skyline75489 commented Feb 17, 2021 via email •

edited by ghost

Loading

miniksa commented Feb 17, 2021

skyline75489 commented Jul 20, 2021

lhecker commented Oct 12, 2022

Fine-grained DWrite text analysis based on text complexity #9156

Fine-grained DWrite text analysis based on text complexity #9156

Comments

skyline75489 commented Feb 14, 2021 • edited Loading

Description of the new feature/enhancement

Proposed technical implementation details (optional)

skyline75489 commented Feb 14, 2021

skyline75489 commented Feb 14, 2021

skyline75489 commented Feb 15, 2021

miniksa commented Feb 16, 2021

skyline75489 commented Feb 16, 2021

miniksa commented Feb 17, 2021

skyline75489 commented Feb 17, 2021 via email • edited by ghost Loading

miniksa commented Feb 17, 2021

skyline75489 commented Jul 20, 2021

lhecker commented Oct 12, 2022

skyline75489 commented Feb 14, 2021 •

edited

Loading

skyline75489 commented Feb 17, 2021 via email •

edited by ghost

Loading