Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvda can read emojies #6523

Closed
mrdin8877 opened this issue Oct 30, 2016 · 23 comments · Fixed by #8758
Closed

nvda can read emojies #6523

mrdin8877 opened this issue Oct 30, 2016 · 23 comments · Fixed by #8758
Labels
p4 https://github.com/nvaccess/nvda/blob/master/projectDocs/issues/triage.md#priority
Milestone

Comments

@mrdin8877
Copy link

hi and sorry for asking this, i know that if you install a specific add on, nvda can read emoticons, but that one is the letter type, for example, this one, :) :( :@ etc. but this one is the a symbol that represents emojies. sorry if my information is not enough. thanks

@feerrenrut feerrenrut added the p4 https://github.com/nvaccess/nvda/blob/master/projectDocs/issues/triage.md#priority label Nov 1, 2016
@feerrenrut
Copy link
Contributor

P3, This is becoming more important as more and more sites / apps are using emojois rather than emoticons. There is some work to implement this, and some thought will be required on how to deal with translations.

This may also be a good candidate for an addon, could the existing one be extended?

@kaveinthran
Copy link

will be nicer also if we have emogi, symbols and emoticons selecters

On 11/1/16, Reef Turner notifications@github.com wrote:

P3, This is becoming more important as more and more sites / apps are using
emojois rather than emoticons. There is some work to implement this, and
some thought will be required on how to deal with translations.

This may also be a good candidate for an addon, could the existing one be
extended?

You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
#6523 (comment)

@nvdaes
Copy link
Sponsor Contributor

nvdaes commented Nov 1, 2016

Hi, about the question on Emoticons add-on, whose main author is Chris Leo, it's being extended to speak Emojis too. Work is been done to insert also emojis as well as emoticons.
Anyway, the add-on works with speech dictionaries (braille is supported in symbols insertion only).
For the add-on see this branch:
https://github.com/nvdaaddons/emoticons/tree/emojis

@LeonarddeR
Copy link
Collaborator

The translation part will probably be a bit difficult. I wonder whether Apple, for example, does all the emoji translations internally. I think we can't expect from our translation teams to translate all the emojis

@PratikP1
Copy link

PratikP1 commented Feb 2, 2018

To what extent should the speaking of emoji characters be left to synthesizers? It might be worth looking at testing OneCore, it's language packs in different languages, and the Windows 10 emoji panel when the emoji panel functionality becomes available in other languages. People using synthesizers in languages other than English can also test by using this emoji catalog. For development purposes this Github iamcal/emoji-data repository looks useful.

@jcsteh
Copy link
Contributor

jcsteh commented Mar 28, 2018

I looked into this a bit. The Unicode CLDR (Common Locale Data Repository) includes TTS descriptions for emoji and a whole bunch of other languages. You can find them in the common/annotations and common/annotationsDerived directories.

Some notes/thoughts:

  1. The data is in XML. I imagine we'd process that data into our own symbol dictionary format for each language and allow NVDA to load this as an additional symbols dictionary.
  2. The TTS descriptions are an annotation with type "tts". For example:

    <annotation cp="😂" type="tts">face with tears of joy</annotation>

  3. We'd need to include the data from both annotations and annotationsDerived. The derived annotations include emoji combined with skin tone modifiers, country flags (with info derived from the Unicode territory name data), etc.
  4. Derived annotations include punctuation which we'd probably want to strip; e.g.

    <annotation cp="👶🏻" type="tts">baby: light skin tone</annotation>

  5. We could also derive annotations ourselves at runtime using code. While it would decrease dictionary size (since we wouldn't have to include multiple instances of each modified emoji), it'd be a fair bit of work and we'd have to include separate country data, etc.
  6. For English, we need to include en_001 as well as en. en is for US English. Some other English locales derive directly from en_001 instead of en, but it seems a lot of stuff is still only in en, and NVDA doesn't have anything other than "en" anyway.
  7. These dictionaries are going to be several hundred kb each. It's possibly worth it - emoji are used a lot these days - but we'll have to keep an eye on performance and memory usage at runtime.

@LeonarddeR
Copy link
Collaborator

@jcsteh commented on 28 mrt. 2018 04:05 CEST:

  1. The data is in XML. I imagine we'd process that data into our own symbol dictionary format for each language and allow NVDA to load this as an additional symbols dictionary.

This makes sense.

  1. We could also derive annotations ourselves at runtime using code. While it would decrease dictionary size (since we wouldn't have to include multiple instances of each modified emoji), it'd be a fair bit of work and we'd have to include separate country data, etc.

This sounds suitable for a version 2 of the implementation.

  1. For English, we need to include en_001 as well as en. en is for US English. Some other English locales derive directly from en_001 instead of en, but it seems a lot of stuff is still only in en, and NVDA doesn't have anything other than "en" anyway.

It's not entirely clear to me what the differences are. Could you give an example of what you observed?

@jcsteh
Copy link
Contributor

jcsteh commented Jul 31, 2018

@leonardder commented on Aug 1, 2018, 12:26 AM GMT+10:

  1. We could also derive annotations ourselves at runtime using code. While it would decrease dictionary size (since we wouldn't have to include multiple instances of each modified emoji), it'd be a fair bit of work and we'd have to include separate country data, etc.

This sounds suitable for a version 2 of the implementation.

Note that if the database doesn't cover some of the languages we need, our translators may want to translate them. At that point, we won't want to go for a "version 2" because we'll break all of their work.

  1. For English, we need to include en_001 as well as en. en is for US English. Some other English locales derive directly from en_001 instead of en, but it seems a lot of stuff is still only in en, and NVDA doesn't have anything other than "en" anyway.

It's not entirely clear to me what the differences are. Could you give an example of what you observed?

As an example, the 😂 only appears in en (which is US). en_001 (which is the base for English) doesn't include it, nor does en_GB (which inherits directly from en_001). That means that en_GB doesn't include 😂 at all. See English Inheritance for details about inheritance for specific English locales.

@LeonarddeR
Copy link
Collaborator

@jcsteh commented on 1 aug. 2018 00:58 CEST:

Note that if the database doesn't cover some of the languages we need, our translators may want to translate them. At that point, we won't want to go for a "version 2" because we'll break all of their work.

Fair point. Furthermore, it seems that the derived annotations mainly just stick the main annotations together, separated with a column which we'd like to ignore anyway. So if I'm correct, sticking to the base annotations gives us results that are close to what we want, unless:

The derived annotations include emoji combined with skin tone modifiers, country flags (with info derived from the Unicode territory name data), etc.

I've seen these skin tone modifiers as part of the main annotations, but may be this is not the case for country flags?

@LeonarddeR
Copy link
Collaborator

Looks like this repository might be a good way to go, it is perfectly kept up to date and contains exactly what we want: https://github.com/fujiwarat/cldr-emoji-annotation

@LeonarddeR
Copy link
Collaborator

@jcsteh commented on 1 aug. 2018 00:58 CEST:

As an example, the 😂 only appears in en (which is US). en_001 (which is the base for English) doesn't include it, nor does en_GB (which inherits directly from en_001). That means that en_GB doesn't include 😂 at all. See English Inheritance for details about inheritance for specific English locales.

Based on this, it looks like en is currently the dictionary that is the most complete one, and as we don't specify an English dialect in NVDA, using en here might make sense.

@LeonarddeR
Copy link
Collaborator

I have a prototype implementation in de @LeonarddeR i6523 branch (note that I'm using my private fork).

  1. It loads an additional emojis.dic file per language. I chose for emojis as the plural of use for ease of code clarity.
  2. It includes https://github.com/fujiwarat/cldr-emoji-annotation as a submodule and generates dictionaries from both annotations and annotationsDerived when running scons source. see emojiDict_sconscript in the root of the repo.

@jcsteh
Copy link
Contributor

jcsteh commented Aug 2, 2018 via email

@LeonarddeR
Copy link
Collaborator

LeonarddeR commented Aug 3, 2018

@jcsteh commented on 2 Aug 2018, 23:21 CEST:

I think you might need to include en_001 and en, as en_001 is supposed to
be the base for all English locales according to that inheritance document.

You're right. I've updated the code to support multiple sources per locale, which is also required for pt_br, which must include both pt and pt_BR sources.

A different problem though, is whether we want the user to enable or disable processing of emojis. When we either disable or enable it in the gui, we can simply invalidate all data in the data map, so emojis either will or won't be loaded. However, config profile switches are more complex, as we don't want to invalidate all data for every config profile switch.

Also, how should we treat emojis that are added as part of the user dictionary? It is easy to exclude the build in emoji dictionaries as they are in different files. The only way I can think of in case of user added emojis is adding an emojis category to the symbols files, becides symbols and complexSymbols. But then, data invalidation is still required, unless we use a separate regex for emojis within the symbol processor.

@LeonarddeR
Copy link
Collaborator

I noticed that the annotations also contain symbols like ®, © and ™ which are strictly spoken not emojis I believe.

@LeonarddeR
Copy link
Collaborator

cc @feerrenrut @michaelDCurran @josephsl, Do you have any thoughts about this issue? The emoji dictionary creation code is there and works quite well, the question is how to integrate these annotations into NVDA. Is it really necessary to have the ability to enable/disable them?

@nvdaes
Copy link
Sponsor Contributor

nvdaes commented Aug 25, 2018

I mentioned Chris, @Christianlm, the main author of the emoticons addon, but not correctly.
In case he have something to say, hope this helps.
Thanks

@kvark128
Copy link
Contributor

kvark128 commented Oct 2, 2018

Sorry for emotions, but it is very very bad innovation.
Opened today Punctuation / symbol dialog, i found more than 2000 little-used emoji, many of which have a very bad translation into my language.
Why was it necessary to add all this to the base NVDA distribution? For users who need it there are appropriate add-ons.
And even there is no way to disable this shit. Only have to remove the file cldr.dic.

@kvark128
Copy link
Contributor

kvark128 commented Oct 2, 2018

Found the checkbox "Include Unicode Consortium data (including emoji) when processing characters and symbols". Thanks so much for the ability to disable this garbage.

@LeonarddeR
Copy link
Collaborator

@kvark128 : what language are you using? Could you elaborate on what's so bad about the emoji translations for your language?

@marlon-sousa
Copy link
Contributor

Hello,

I have found what I believe to be a bug.

How to reproduce?

1- Open up notepad and past the below text:

doesn't include 😂 at all. See

2- press home to go to the beginning of the line.

3- Press nvda + down arrow to read all the text.

4- You should hear "doesn't include face with tears of joy at all. See

5- Right until now everything is OK.

6- Now press home again and press twice ctrl + right arrow. You will hear doesn't include.

7- Now, press the right arrow. You will hear space and if you press right arrow again ... you will hear symbol d 8 3 d

This demonstrates that when reading character by character the symbol is being somehow not processed or processed incorrectly.

If you keep pressing control + right arrow though the symbol is read correctly.

I had the same results using espeak ng in English or Portuguese with NVDA itself also either in English or Portuguese. One core voices (the two brazilian ones) with NVDA either in Portuguese or in English reported kind of the same behavior but in that case the character was not even read, a silence occurred when reading with arrows. The symbol was read correctly by read all or using ctrl arrows.

@LeonarddeR
Copy link
Collaborator

LeonarddeR commented Dec 18, 2018 via email

@marlon-sousa
Copy link
Contributor

marlon-sousa commented Dec 18, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p4 https://github.com/nvaccess/nvda/blob/master/projectDocs/issues/triage.md#priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants