Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emojis Do Not Speak when Arrowing by Character #8782

Closed
jage9 opened this Issue Sep 25, 2018 · 4 comments

Comments

Projects
None yet
4 participants
@jage9
Copy link
Contributor

jage9 commented Sep 25, 2018

Steps to reproduce:

Note: These steps apply when running NVDA Master 16003 with pull request #8758 included, which reads unicode emoji characters.

  1. Open Notepad
  2. Insert an Emoji such as 馃尞 which is the taco symbol.
  3. Arrow over the emoji by character.
    You can also arrow over the taco emoji above to observe the same results. Using the up and down arrow keys reads it correctly.
    I'm not sure if some emojis are represented by two characters, especially in browse mode.

Actual behavior:

NVDA does not speak anything when landing on an emoji character, which is the same behavior as before Master 16003.

Expected behavior:

The emoji character should be spoken.

System configuration:

NVDA Installed/portable/running from source:

installed

NVDA version:

NVDA Master 16003,

Windows version:

1803

Name and version of other software in use when reproducing the issue:

n/a

Other information about your system:

Eloquence speech.

Other questions:

Does the issue still occur after restarting your PC?

yes

Have you tried any other versions of NVDA?

n/a, based off recent PR

@leonardder

This comment has been minimized.

Copy link
Collaborator

leonardder commented Sep 25, 2018

@jcsteh: I recall you said something about Python 3 making it easier to fix this, but I'm not yet sure why or how.
Interestingly, moving by word in notepad reads the emojis just fine.

@jcsteh

This comment has been minimized.

Copy link
Contributor

jcsteh commented Sep 26, 2018

My comment regarding Python 3 was related to handling of repeated emoji; i.e. 馃尞馃尞馃尞馃尞 being reported as 4 taco. That is a valid issue, but separate from this one.

As for the issue at hand, it's complicated. :) These Emoji are 32 bit Unicode characters, so they consume two UTF-16 code units. How this gets handled depends on the underlying text implementation. For example, if you try this in Wordpad, it does work because we ask ITextDocument for its idea of a character and it does account for UTF-16 encoding. It doesn't work in Notepad because Notepad is a standard Edit control and we use our OffsetsTextInfo implementation for that.

I wrote a fix for this in OffsetsTextInfo 2 years ago. It's in the offsetsUnicodeBeyond16 branch in my fork. I didn't ship it because I have concerns it might affect performance (since it has to fetch text when calculating characters) and I never found the time to test it extensively. In practical terms, it should be fine - it's only fetching one character and that should be fairly fast - but it should be tested with various controls to be sure. This should fix Edit controls like Notepad, as well as NVDA virtual buffers.

Note that there is a further complication, which is that some Emoji are actually multiple Unicode code points; e.g. 馃う鈥嶁檪锔 consists of the characters person facepalming, zero width joiner, male sign, Variation Selector-16. Even Wordpad doesn't seem to get this right. Right arrow skips over the whole combined character, but ITextDocument only returns the first character (馃う).

My branch does not fix this second issue for OffsetsTextInfo. This could be fixed for OffsetsTextInfo by retrieving several characters before and after (ideally in one call) and then looking for zero width joiner and Variation Selector-16 code points to determine the boundaries. Alternatively (and perhaps better), we should be able to use Uniscribe for this, just as we do for word offsets. It looks like the SCRIPT_LOGATTR data returned by ScriptBreak has an fCharStop attribute as well as the fWordStop attribute we already use for words. See _getWordOffsets in source/textInfos/offsets.py and calculateWordOffsets in nvdaHelper/local/textUtils.cpp. Note that the word calculation code takes a whole line of text. I'm hoping we don't have to use a whole line for characters (maybe just, say, 6 chars before and 6 after?), as calculating a line might be expensive in some implementations and moving by character needs to be really snappy.

I don't think we can fix this for ITextDocument or any other non-offset implementation.

@leonardder

This comment has been minimized.

Copy link
Collaborator

leonardder commented Sep 26, 2018

I wrote a fix for this in OffsetsTextInfo 2 years ago. It's in the offsetsUnicodeBeyond16 branch in my fork. I didn't ship it because I have concerns it might affect performance (since it has to fetch text when calculating characters) and I never found the time to test it extensively. In practical terms, it should be fine - it's only fetching one character and that should be fairly fast - but it should be tested with various controls to be sure. This should fix Edit controls like Notepad, as well as NVDA virtual buffers.

Note that we're already trying to report spelling errors in previous words, and I've never seen this cause a major performance hit for OffsetsTextInfo, except for some firefox cases. Is it safe to assume that the change in your branch has the same impact?

Even Wordpad doesn't seem to get this right. Right arrow skips over the whole combined character, but ITextDocument only returns the first character (馃う).

Also applies to the case where I enforce UIA in Word pad, which probably bridges ITextDocument anyway. Definitely an issue in ITextDocument.

My branch does not fix this second issue for OffsetsTextInfo.

Note that arrowing in Notepad and firefox detects 馃う鈥 and 鈾锔 as two separate entities. Therefore, I think I'll stick with your implementation for OffsetsTextInfo for now, as that fixes a major issue, that is, major within this area.

@jage9

This comment has been minimized.

Copy link
Contributor Author

jage9 commented Dec 6, 2018

Thanks for the fix. When arrowing in FF, the first character reads the emoji while the second reads right paren. I assume this is expected behavior based on the comments in the PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can鈥檛 perform that action at this time.