-
Notifications
You must be signed in to change notification settings - Fork 8.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Initial implementation of fine-grained text analysis (#9202)
This PR aims to optimize the text analysis process by breaking the text into simple & complex runs according to the result of `GetTextComplexity`. For simple runs, we can skip certain processing steps to improve the analysis performance. Previous to this PR, we rely on the result of `AnalyzeBidi`, `AnalyzeScript` and `AnalyzeNumberSubstitution` to both break the text into different runs and attach the corresponding bidi/script/number_substitution information to the run. Thanks to #6695 we have the chance to skip the expensive analysis process when we found the *entire text* is determined to be simple. Inspired by microsoft/cascadia-code#411 and discussions in #9156, I found that the "entire text simplicity" is often hard to meet. In order to fully utilize the complexity information of the text, we need to first break the text into simple & complex ranges. These ranges are also the initial runs prior to the bidi/script/number_substitution analysis. This way we can skip the text analysis for simple runs to speed up the process. VALIDATION Build & run cmatrix, cacafire, cat big.txt with it. Initial simple run PR: #6695 Closes #9156
- Loading branch information
1 parent
8f93f76
commit 1c414a7
Showing
2 changed files
with
53 additions
and
42 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters