New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvda can read emojies #6523

Closed
mrdin8877 opened this Issue Oct 30, 2016 · 20 comments

Comments

Projects
None yet
9 participants
@mrdin8877

mrdin8877 commented Oct 30, 2016

hi and sorry for asking this, i know that if you install a specific add on, nvda can read emoticons, but that one is the letter type, for example, this one, :) :( :@ etc. but this one is the a symbol that represents emojies. sorry if my information is not enough. thanks

@feerrenrut feerrenrut added the p3 label Nov 1, 2016

@feerrenrut

This comment has been minimized.

Contributor

feerrenrut commented Nov 1, 2016

P3, This is becoming more important as more and more sites / apps are using emojois rather than emoticons. There is some work to implement this, and some thought will be required on how to deal with translations.

This may also be a good candidate for an addon, could the existing one be extended?

@kaveinthran

This comment has been minimized.

kaveinthran commented Nov 1, 2016

will be nicer also if we have emogi, symbols and emoticons selecters

On 11/1/16, Reef Turner notifications@github.com wrote:

P3, This is becoming more important as more and more sites / apps are using
emojois rather than emoticons. There is some work to implement this, and
some thought will be required on how to deal with translations.

This may also be a good candidate for an addon, could the existing one be
extended?

You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
#6523 (comment)

@nvdaes

This comment has been minimized.

Contributor

nvdaes commented Nov 1, 2016

Hi, about the question on Emoticons add-on, whose main author is Chris Leo, it's being extended to speak Emojis too. Work is been done to insert also emojis as well as emoticons.
Anyway, the add-on works with speech dictionaries (braille is supported in symbols insertion only).
For the add-on see this branch:
https://github.com/nvdaaddons/emoticons/tree/emojis

@leonardder

This comment has been minimized.

Collaborator

leonardder commented Feb 2, 2018

The translation part will probably be a bit difficult. I wonder whether Apple, for example, does all the emoji translations internally. I think we can't expect from our translation teams to translate all the emojis

@PratikP1

This comment has been minimized.

PratikP1 commented Feb 2, 2018

To what extent should the speaking of emoji characters be left to synthesizers? It might be worth looking at testing OneCore, it's language packs in different languages, and the Windows 10 emoji panel when the emoji panel functionality becomes available in other languages. People using synthesizers in languages other than English can also test by using this emoji catalog. For development purposes this Github iamcal/emoji-data repository looks useful.

@jcsteh

This comment has been minimized.

Contributor

jcsteh commented Mar 28, 2018

I looked into this a bit. The Unicode CLDR (Common Locale Data Repository) includes TTS descriptions for emoji and a whole bunch of other languages. You can find them in the common/annotations and common/annotationsDerived directories.

Some notes/thoughts:

  1. The data is in XML. I imagine we'd process that data into our own symbol dictionary format for each language and allow NVDA to load this as an additional symbols dictionary.
  2. The TTS descriptions are an annotation with type "tts". For example:

    <annotation cp="😂" type="tts">face with tears of joy</annotation>

  3. We'd need to include the data from both annotations and annotationsDerived. The derived annotations include emoji combined with skin tone modifiers, country flags (with info derived from the Unicode territory name data), etc.
  4. Derived annotations include punctuation which we'd probably want to strip; e.g.

    <annotation cp="👶🏻" type="tts">baby: light skin tone</annotation>

  5. We could also derive annotations ourselves at runtime using code. While it would decrease dictionary size (since we wouldn't have to include multiple instances of each modified emoji), it'd be a fair bit of work and we'd have to include separate country data, etc.
  6. For English, we need to include en_001 as well as en. en is for US English. Some other English locales derive directly from en_001 instead of en, but it seems a lot of stuff is still only in en, and NVDA doesn't have anything other than "en" anyway.
  7. These dictionaries are going to be several hundred kb each. It's possibly worth it - emoji are used a lot these days - but we'll have to keep an eye on performance and memory usage at runtime.
@leonardder

This comment has been minimized.

Collaborator

leonardder commented Jul 31, 2018

@jcsteh commented on 28 mrt. 2018 04:05 CEST:

  1. The data is in XML. I imagine we'd process that data into our own symbol dictionary format for each language and allow NVDA to load this as an additional symbols dictionary.

This makes sense.

  1. We could also derive annotations ourselves at runtime using code. While it would decrease dictionary size (since we wouldn't have to include multiple instances of each modified emoji), it'd be a fair bit of work and we'd have to include separate country data, etc.

This sounds suitable for a version 2 of the implementation.

  1. For English, we need to include en_001 as well as en. en is for US English. Some other English locales derive directly from en_001 instead of en, but it seems a lot of stuff is still only in en, and NVDA doesn't have anything other than "en" anyway.

It's not entirely clear to me what the differences are. Could you give an example of what you observed?

@jcsteh

This comment has been minimized.

Contributor

jcsteh commented Jul 31, 2018

@leonardder commented on Aug 1, 2018, 12:26 AM GMT+10:

  1. We could also derive annotations ourselves at runtime using code. While it would decrease dictionary size (since we wouldn't have to include multiple instances of each modified emoji), it'd be a fair bit of work and we'd have to include separate country data, etc.

This sounds suitable for a version 2 of the implementation.

Note that if the database doesn't cover some of the languages we need, our translators may want to translate them. At that point, we won't want to go for a "version 2" because we'll break all of their work.

  1. For English, we need to include en_001 as well as en. en is for US English. Some other English locales derive directly from en_001 instead of en, but it seems a lot of stuff is still only in en, and NVDA doesn't have anything other than "en" anyway.

It's not entirely clear to me what the differences are. Could you give an example of what you observed?

As an example, the 😂 only appears in en (which is US). en_001 (which is the base for English) doesn't include it, nor does en_GB (which inherits directly from en_001). That means that en_GB doesn't include 😂 at all. See English Inheritance for details about inheritance for specific English locales.

@leonardder

This comment has been minimized.

Collaborator

leonardder commented Aug 1, 2018

@jcsteh commented on 1 aug. 2018 00:58 CEST:

Note that if the database doesn't cover some of the languages we need, our translators may want to translate them. At that point, we won't want to go for a "version 2" because we'll break all of their work.

Fair point. Furthermore, it seems that the derived annotations mainly just stick the main annotations together, separated with a column which we'd like to ignore anyway. So if I'm correct, sticking to the base annotations gives us results that are close to what we want, unless:

The derived annotations include emoji combined with skin tone modifiers, country flags (with info derived from the Unicode territory name data), etc.

I've seen these skin tone modifiers as part of the main annotations, but may be this is not the case for country flags?

@leonardder

This comment has been minimized.

Collaborator

leonardder commented Aug 1, 2018

Looks like this repository might be a good way to go, it is perfectly kept up to date and contains exactly what we want: https://github.com/fujiwarat/cldr-emoji-annotation

@leonardder

This comment has been minimized.

Collaborator

leonardder commented Aug 2, 2018

@jcsteh commented on 1 aug. 2018 00:58 CEST:

As an example, the 😂 only appears in en (which is US). en_001 (which is the base for English) doesn't include it, nor does en_GB (which inherits directly from en_001). That means that en_GB doesn't include 😂 at all. See English Inheritance for details about inheritance for specific English locales.

Based on this, it looks like en is currently the dictionary that is the most complete one, and as we don't specify an English dialect in NVDA, using en here might make sense.

@leonardder

This comment has been minimized.

Collaborator

leonardder commented Aug 2, 2018

I have a prototype implementation in de @leonardder i6523 branch (note that I'm using my private fork).

  1. It loads an additional emojis.dic file per language. I chose for emojis as the plural of use for ease of code clarity.
  2. It includes https://github.com/fujiwarat/cldr-emoji-annotation as a submodule and generates dictionaries from both annotations and annotationsDerived when running scons source. see emojiDict_sconscript in the root of the repo.
@jcsteh

This comment has been minimized.

Contributor

jcsteh commented Aug 2, 2018

@leonardder

This comment has been minimized.

Collaborator

leonardder commented Aug 3, 2018

@jcsteh commented on 2 Aug 2018, 23:21 CEST:

I think you might need to include en_001 and en, as en_001 is supposed to
be the base for all English locales according to that inheritance document.

You're right. I've updated the code to support multiple sources per locale, which is also required for pt_br, which must include both pt and pt_BR sources.

A different problem though, is whether we want the user to enable or disable processing of emojis. When we either disable or enable it in the gui, we can simply invalidate all data in the data map, so emojis either will or won't be loaded. However, config profile switches are more complex, as we don't want to invalidate all data for every config profile switch.

Also, how should we treat emojis that are added as part of the user dictionary? It is easy to exclude the build in emoji dictionaries as they are in different files. The only way I can think of in case of user added emojis is adding an emojis category to the symbols files, becides symbols and complexSymbols. But then, data invalidation is still required, unless we use a separate regex for emojis within the symbol processor.

@leonardder

This comment has been minimized.

Collaborator

leonardder commented Aug 10, 2018

I noticed that the annotations also contain symbols like ®, © and ™ which are strictly spoken not emojis I believe.

@leonardder

This comment has been minimized.

Collaborator

leonardder commented Aug 25, 2018

cc @feerrenrut @michaelDCurran @josephsl, Do you have any thoughts about this issue? The emoji dictionary creation code is there and works quite well, the question is how to integrate these annotations into NVDA. Is it really necessary to have the ability to enable/disable them?

@nvdaes

This comment has been minimized.

Contributor

nvdaes commented Aug 25, 2018

I mentioned Chris, @Christianlm, the main author of the emoticons addon, but not correctly.
In case he have something to say, hope this helps.
Thanks

@kvark128

This comment has been minimized.

Contributor

kvark128 commented Oct 2, 2018

Sorry for emotions, but it is very very bad innovation.
Opened today Punctuation / symbol dialog, i found more than 2000 little-used emoji, many of which have a very bad translation into my language.
Why was it necessary to add all this to the base NVDA distribution? For users who need it there are appropriate add-ons.
And even there is no way to disable this shit. Only have to remove the file cldr.dic.

@kvark128

This comment has been minimized.

Contributor

kvark128 commented Oct 2, 2018

Found the checkbox "Include Unicode Consortium data (including emoji) when processing characters and symbols". Thanks so much for the ability to disable this garbage.

@leonardder

This comment has been minimized.

Collaborator

leonardder commented Oct 3, 2018

@kvark128 : what language are you using? Could you elaborate on what's so bad about the emoji translations for your language?

@pzajda pzajda referenced this issue Oct 5, 2018

Open

Emojis support #49

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment