Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Use uniscribe to calculate character offsets where allowed #10550
Link to issue number:
Summary of the issue:
When moving by character in an NVDA virtualBuffer, each and every unicode code point is treated as its own character, even if is is visually combined with another code point to create one composit character. Examples are:
Description of how this pull request fixes the issue:
Just like how we use the Windows uniscribe library to calculate word offsets in some places, use it to also calculate character offsets.
Known issues with pull request:
Although NVDA now matches the behaviour of notepad and other standard edit controls, which includes treating acutes, variation selectors and some other modifiers as being a part of the previous symbol, complex tied emojies that use multiple unicode characters connected with a tie u+200d symbol, are still not treated as one single composit character. But if we did, this would differ from Windows' own standard edit control behaviour.
Change log entry:
Some quick benchmarking: With the World War I wikipedia article loaded in Firefox, And with the review cursor at the top of the document in browse mode: ``` import time r=review.copy() t=time.time() r.move(textInfos.UNIT_CHARACTER,4000) time.time()-t ``` Runs seem to be between 0.9 and 1.5 seconds both with this change and without this change. In other words, both the new and old code seem to be affected quite significantly by other things in the environment (which is not surprizing as it is a loop that runs 4000 times). And the added usage of uniscribe does not seem to slow things down as far as I can tell. It is also worth noting that _getCharacterOffsets always fetched the text for the current line. The only difference is the actual uniscribe call. I accept that there is usage in the wild such as the placeMarkers add-on that calls move with a large number. However, with any other text api (UIA, other object models etc) this call would probably be much much worse. Still, if we do notice a performance decrease in real usage, we of course should take this into consideration.
I did a second test, once my machine had stopped installing a Windows update in the background :p This time comparing with and without the change, both doing a move of 20000 characters: Without the change: 5.2 seconds With the change: 5.7 seconds It is about an increase of 1.09 times. So yes, if the move is very large (like 20000) then the difference is noticeable.
@leonardder I have addressed all your review actions I believe. When abstracting _calculateUniscribeOffsets in textUtils.cpp, I still copied the two basic for loops that walk the offsets, otherwise it would have become very complex to read with fWordStop changed to fCharStop dynamically changed based on the unit somehow.