-
Notifications
You must be signed in to change notification settings - Fork 790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cascadia Code unexpected complexity analysis result #411
Comments
Thanks for the feedback! Do you happen to know how |
From MSDN:
Interestingly I've also seen similar behaviour with Fira Code, where |
Hmm. This is weird. @skyline75489 it doesn't also happen with |
To be specific, with Cascadia and the following string:
The following characters are reported as complex: |
Hmm. I first thought it would be the |
@DHowett, you might be on to something, actually. The To further prove this theory, if you run |
I don’t know about fonts that much. Seems this is like a feature instead of bug? I wonder if it’s possible to stop some ASCII characters from being recognized as complex to improve the overall performance.
获取 Outlook for iOS<https://aka.ms/o0ukef>
…________________________________
发件人: Aaron <notifications@github.com>
发送时间: Saturday, February 13, 2021 11:12:29 PM
收件人: microsoft/cascadia-code <cascadia-code@noreply.github.com>
抄送: Chester Liu <skyline75489@outlook.com>; Mention <mention@noreply.github.com>
主题: Re: [microsoft/cascadia-code] Cascadia Code unexpected complexity analysis result (#411)
@DHowett<https://github.com/DHowett>, you might be on to something, actually. The locl table contains a variety of glyphs, but among the base Latin set, the only ones mentioned are j, J, l, and L. I would expect, then, that í, Ć and Д would be seen as complex too.
To further prove this theory, if you run GetTextComplexity with Hack on Şş, do these register as complex? Hack has a much more limited OpenType feature set, and their locl table only includes those two glyphs.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#411 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABD6BLYD3VP4WXHY3BCT3ODS62JF3ANCNFSM4XRXI55Q>.
|
This is spam. Please fix the signature in your phone email app. |
@kenmcd at least Chester’s message had meaningful content in addition to his signature, unlike yours which mailed the entire subscriber list for... this. 😄 A signature is not spam. A comment that does not contribute to the discussion is also not spam, but it's much closer to spam than an e-mail signature is. |
That "signature" is put there by the app developer to promote the app. If there is another easier/better way to report spam on GitHub, please advise. |
Hi @kenmcd sorry if my email response upsets you with its automatic signature. I’ve been around GitHub for about 8 years and I’ve only recently started using email response because of some special security policy that only applies to Microsoft employees (Dustin knows what I mean). Overall it’s because of my laziness. I’ve seen a lot of email replied comments that comes with various signature. I don’t find them spam-like unless the comments themselves have no meaning content, or the promotion is about a car or something. You do realize that some of us around the repo work for Microsoft, right? I feel somewhat eligible to promote the products of our own company. And it’s an app that sends emails. Surely someone would need an app like this. Anyway this is way off topic. I hope this discussion about signatures can just end here. |
@DHowett if the complex result is unavoidable, I think we can still optimize the analysis process with the partial-simple text. I’ll open an issue about this in the Terminal repo later. |
I thought I'd wait until next week when y'all aren't on a long weekend, but let's talk :) FYI @skyline75489, the purpose of the My understanding is that the way things work is that when the string is analyzed, if any letter found to be in the The two letters in question, Potentially we can remove those substitutions which I think will improve your speed tests (and I'm thinking about making an update). But removing them won't fully achieve parity with a font like Hack that has fewer To actually solve the issue it would be better to improve the analysis process. For example, when Of course, all this is predicated on that Anyway, the most important thing is to understand what drives the logic of |
@aaronbell thank you for the detailed explanation. I’ll also try to find a way to improve the performance on the terminal side. By the way I may be wrong about the exact character that breaks the simplicity in Fira Code since I don’t use Fira Code that much, but I’m positive that there is a lot characters that belong to the category. |
This PR aims to optimize the text analysis process by breaking the text into simple & complex runs according to the result of `GetTextComplexity`. For simple runs, we can skip certain processing steps to improve the analysis performance. Previous to this PR, we rely on the result of `AnalyzeBidi`, `AnalyzeScript` and `AnalyzeNumberSubstitution` to both break the text into different runs and attach the corresponding bidi/script/number_substitution information to the run. Thanks to #6695 we have the chance to skip the expensive analysis process when we found the *entire text* is determined to be simple. Inspired by microsoft/cascadia-code#411 and discussions in #9156, I found that the "entire text simplicity" is often hard to meet. In order to fully utilize the complexity information of the text, we need to first break the text into simple & complex ranges. These ranges are also the initial runs prior to the bidi/script/number_substitution analysis. This way we can skip the text analysis for simple runs to speed up the process. VALIDATION Build & run cmatrix, cacafire, cat big.txt with it. Initial simple run PR: #6695 Closes #9156
Environment
Steps to reproduce
Running cmatrix in Windows Terminal with no font specified (fallback to Cascadia Mono, I think) & Hack.
Expected behavior
The performance should be about the same.
Actual behavior
This is the CPU usage breakdown:
Cascadia:
Hack:
A significant more amount of CPU is consumed with Cascadia.
The reason behind this is that for some reason,
analyzer->GetTextComplexity()
gives unexpected result for pure ASCII strings when using Cascadia.For example, take a random string produced by
cmatrix
:With Cascadia
GetTextComplexity
reports this result:However with Hack, the entire text is reported as simple so that we can optimize the layout process.
I've seen a dozen of examples. The letter
J
seems to be the cause. I have no idea why the letterJ
is so special.The text was updated successfully, but these errors were encountered: