Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use uniscribe to calculate character offsets where allowed #10550

Merged
merged 6 commits into from Nov 28, 2019

Conversation

@michaelDCurran
Copy link
Contributor

michaelDCurran commented Nov 26, 2019

Link to issue number:

None.

Summary of the issue:

When moving by character in an NVDA virtualBuffer, each and every unicode code point is treated as its own character, even if is is visually combined with another code point to create one composit character. Examples are:

  • é: e with an acute
  • ❤️: red heart with a variation selector
  • 8⃣: number 8 inside a keycap
    Similarly, when moving by character in Notepad with the arrow keys, NVDA only reads the first code point in composit characters.

Description of how this pull request fixes the issue:

Just like how we use the Windows uniscribe library to calculate word offsets in some places, use it to also calculate character offsets.
This involved:

  • Abstracted code from nvdaHelperLocal's calculateWordOffsets into _calculateUniscribeOffsets which takes a unit argument of either character or word. For word the code is as it was. For character, it widens the offsets until hitting a code point marked with fCharStop. calculateCharacterOffsets has been added which just calls _calculateUniscribeOffsets with a unit of character.
  • Abstract out all the uniscribe code from OffsetsTextInfo._getWordOffsets into a new _getUniscribeOffsets method, and use this also in _getCharacterOffsets.

Testing performed:

  • Using the example characters above in a virtualBuffer, moved throw them with the arrow keys making sure that they are all treated as single composit characters.
  • Using the example characters above in a notepad document, move through them with left and right arrow and make sure that NVDA announces the entire composit character.

Known issues with pull request:

Although NVDA now matches the behaviour of notepad and other standard edit controls, which includes treating acutes, variation selectors and some other modifiers as being a part of the previous symbol, complex tied emojies that use multiple unicode characters connected with a tie u+200d symbol, are still not treated as one single composit character. But if we did, this would differ from Windows' own standard edit control behaviour.

Change log entry:

Bug fixes:

  • NVDA will now treat certain composit unicode characters such as e-acute as one single character when moving through text.
… calculate the bounds for a character. This allows us to treat something like e-acute as one character.
@michaelDCurran michaelDCurran requested a review from leonardder Nov 26, 2019
…cterOffsets, allowing unit tests to pass again.
@leonardder

This comment has been minimized.

Copy link
Collaborator

leonardder commented Nov 27, 2019

@michaelDCurran

This comment has been minimized.

Copy link
Contributor Author

michaelDCurran commented Nov 27, 2019

@leonardder

This comment has been minimized.

Copy link
Collaborator

leonardder commented Nov 27, 2019

@michaelDCurran

This comment has been minimized.

Copy link
Contributor Author

michaelDCurran commented Nov 27, 2019

nvdaHelper/local/textUtils.cpp Outdated Show resolved Hide resolved
source/textInfos/offsets.py Outdated Show resolved Hide resolved
source/textInfos/offsets.py Show resolved Hide resolved
source/textInfos/offsets.py Outdated Show resolved Hide resolved
source/textInfos/offsets.py Outdated Show resolved Hide resolved
source/textInfos/offsets.py Show resolved Hide resolved
… code in both calculateWordOffsets and calculateCharacterOffsets.
@michaelDCurran

This comment has been minimized.

Copy link
Contributor Author

michaelDCurran commented Nov 27, 2019

@leonardder I have addressed all your review actions I believe. When abstracting _calculateUniscribeOffsets in textUtils.cpp, I still copied the two basic for loops that walk the offsets, otherwise it would have become very complex to read with fWordStop changed to fCharStop dynamically changed based on the unit somehow.
Also, that comment about uniscribe being broken without the two extra chars added: I'm not sure if this is still true, but as there are many Operating System variations to test on, I'd prefer not to address that at the moment. Moving by word logic should not be changed, just code moved a bit.

Copy link
Collaborator

leonardder left a comment

Thanks, looks awesome now!

@michaelDCurran michaelDCurran merged commit 1045d2d into master Nov 28, 2019
1 check passed
1 check passed
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
@nvaccessAuto nvaccessAuto added this to the 2019.3 milestone Nov 28, 2019
michaelDCurran added a commit that referenced this pull request Nov 28, 2019
@leonardder

This comment has been minimized.

Copy link
Collaborator

leonardder commented Dec 3, 2019

I'm afraid that this pull request introduces off by n errors in braille.TextInfoRegion.getTextInfoForBraillePos. However, it is pretty difficult to avoid this. We should somehow be able to decouple braille positions from characters.

@michaelDCurran

This comment has been minimized.

Copy link
Contributor Author

michaelDCurran commented Dec 3, 2019

@leonardder

This comment has been minimized.

Copy link
Collaborator

leonardder commented Dec 4, 2019

No, no need to revert this. At least for uniscribe, we can disable it at the TextInfo object level before moving by character.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.