-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fine-grained DWrite text analysis based on text complexity #9156
Comments
This should also help users who use non-English locales, for example avoid analyze entirely:
|
/cc @miniksa for both sanity & technical check |
I've done some experiment and I found that the text complexity is not the same as run splitting. For example with the following text: 版权所有 (C) Microsoft Corporation。保留所有权利。 The text complexity analysis reports (a, b is pos, length pair) :
The run analysis split it into the following runs:
We might also need some sort of RLE implementation to find it a run is entire simple and then optimize the shaping process for the run. |
I agree that we should make use of the additional analysis information to improve performance in this way. I do think that we could just further split the I'm not quite sure why your example maps as it does. Are some of those characters UTF-16 surrogate pairs? |
those are just normal Chinese characters. Originally I thought text complexity analysis would split the text the same way as run splitting. Just want to add an example to show that it’s not.
This is likely undetermined. In the example above: “版权所有 (” This is a Run. But according to text complexity, the first 4 characters are complex, the last 2 characters are simple. This is what frustrates me. We can’t just simply know a Run is simple or not easily and optimize based on that. |
Yeah but what I'm saying is that we can just call So then you have a [0,4) complex run. [6,8) simple run. [8, 26) simple run. etc. etc. |
Doesn’t that bring more fragmentation into the process? Will it affect the line breaking and script analysis result? I need to dig more into this...
获取 Outlook for iOS<https://aka.ms/o0ukef>
|
To your questions: oh probably. It's worth a try though to see if it just works. Sometimes the simple answer is "good enough". If it turns out to not be, we can refine further from there. Feel free to try/dig! |
This PR aims to optimize the text analysis process by breaking the text into simple & complex runs according to the result of `GetTextComplexity`. For simple runs, we can skip certain processing steps to improve the analysis performance. Previous to this PR, we rely on the result of `AnalyzeBidi`, `AnalyzeScript` and `AnalyzeNumberSubstitution` to both break the text into different runs and attach the corresponding bidi/script/number_substitution information to the run. Thanks to #6695 we have the chance to skip the expensive analysis process when we found the *entire text* is determined to be simple. Inspired by microsoft/cascadia-code#411 and discussions in #9156, I found that the "entire text simplicity" is often hard to meet. In order to fully utilize the complexity information of the text, we need to first break the text into simple & complex ranges. These ranges are also the initial runs prior to the bidi/script/number_substitution analysis. This way we can skip the text analysis for simple runs to speed up the process. VALIDATION Build & run cmatrix, cacafire, cat big.txt with it. Initial simple run PR: #6695 Closes #9156
AtlasEngine does this! 💖 |
Description of the new feature/enhancement
Inspired by microsoft/cascadia-code#411, certain ASCII characters sometimes break the simplicity of the entire text, depending on the font being used. The current implementation skips dwrite analysis when the entire text is simple:
With for example
Fira Code
, in most cases the optimization only applies to lines with 120 spaces, which is not good.Proposed technical implementation details (optional)
GetTextComplexity
can provide a breakdown report of the text, showing which specific range of the text is simple, we should be able to utilize it like this:See #6695 for the introduction of text complexity analysis.
The text was updated successfully, but these errors were encountered: