Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Emojis Do Not Speak when Arrowing by Character #8782
Steps to reproduce:
Note: These steps apply when running NVDA Master 16003 with pull request #8758 included, which reads unicode emoji characters.
NVDA does not speak anything when landing on an emoji character, which is the same behavior as before Master 16003.
The emoji character should be spoken.
NVDA Installed/portable/running from source:
NVDA Master 16003,
Name and version of other software in use when reproducing the issue:
Other information about your system:
Does the issue still occur after restarting your PC?
Have you tried any other versions of NVDA?
n/a, based off recent PR
My comment regarding Python 3 was related to handling of repeated emoji; i.e.
As for the issue at hand, it's complicated. :) These Emoji are 32 bit Unicode characters, so they consume two UTF-16 code units. How this gets handled depends on the underlying text implementation. For example, if you try this in Wordpad, it does work because we ask ITextDocument for its idea of a character and it does account for UTF-16 encoding. It doesn't work in Notepad because Notepad is a standard Edit control and we use our OffsetsTextInfo implementation for that.
I wrote a fix for this in OffsetsTextInfo 2 years ago. It's in the offsetsUnicodeBeyond16 branch in my fork. I didn't ship it because I have concerns it might affect performance (since it has to fetch text when calculating characters) and I never found the time to test it extensively. In practical terms, it should be fine - it's only fetching one character and that should be fairly fast - but it should be tested with various controls to be sure. This should fix Edit controls like Notepad, as well as NVDA virtual buffers.
Note that there is a further complication, which is that some Emoji are actually multiple Unicode code points; e.g.
My branch does not fix this second issue for OffsetsTextInfo. This could be fixed for OffsetsTextInfo by retrieving several characters before and after (ideally in one call) and then looking for zero width joiner and Variation Selector-16 code points to determine the boundaries. Alternatively (and perhaps better), we should be able to use Uniscribe for this, just as we do for word offsets. It looks like the SCRIPT_LOGATTR data returned by ScriptBreak has an fCharStop attribute as well as the fWordStop attribute we already use for words. See
I don't think we can fix this for ITextDocument or any other non-offset implementation.
Note that we're already trying to report spelling errors in previous words, and I've never seen this cause a major performance hit for OffsetsTextInfo, except for some firefox cases. Is it safe to assume that the change in your branch has the same impact?
Also applies to the case where I enforce UIA in Word pad, which probably bridges ITextDocument anyway. Definitely an issue in ITextDocument.
Note that arrowing in Notepad and firefox detects 🤦 and ♀️ as two separate entities. Therefore, I think I'll stick with your implementation for OffsetsTextInfo for now, as that fixes a major issue, that is, major within this area.