Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrating Orca's symbol translations? #11105

Closed
sthibaul opened this issue May 4, 2020 · 35 comments · Fixed by #11110
Closed

Integrating Orca's symbol translations? #11105

sthibaul opened this issue May 4, 2020 · 35 comments · Fixed by #11110
Milestone

Comments

@sthibaul
Copy link
Contributor

sthibaul commented May 4, 2020

Hello,

On the Linux side, we are working on making the symbols processing more coherent, basically basing on the NVDA tables so we can have shared open-source effort there on fixes and translations.

There are a few symbols that Orca currently translates which are not yet in the NVDA tables, such as □, ○, ◆, ✗, ¼, ½, ¾, ⁰, ⁴, etc. with a lot of available language translations. I can easily work on providing a patch that simply adds them all to source/locale/*/symbols.dic, would that be enough for NVDA, or is there more to it? (e.g. to integrate with the translation process)

Samuel

@leonardder
Copy link
Collaborator

leonardder commented May 4, 2020

Interesting suggestion. You'd have to add them to the English symbols only and they would be automatically provided to be translated.

@ruifontes
Copy link
Contributor

ruifontes commented May 4, 2020

But if they are already translated, why asking the translators to translate again?
If the translations are placed in the respective locale, it will be only a question of review...

@sthibaul
Copy link
Contributor Author

sthibaul commented May 4, 2020

If the translations are placed in the respective locale, it will be only a question of review...

Ok, that's what I was wondering about, whether just committing the changes in locale/*/symbols.dic will work with the translation workflow.

@lukaszgo1
Copy link
Contributor

lukaszgo1 commented May 4, 2020

No, it would not. Translations are for now managed through SVN, so it is not as simple as that.

@sthibaul
Copy link
Contributor Author

sthibaul commented May 4, 2020

No, it would not. Translations are for now managed through SVN, so it is not as simple as that.

Ok, I see the symbols.dic files in http://subversion.assembla.com/svn/screenReaderTranslations , should I then produce a patch against these?

I can adapt my script, I just need to know what makes it easiest to integrate :)

@josephsl
Copy link
Collaborator

josephsl commented May 4, 2020

@leonardder
Copy link
Collaborator

leonardder commented May 4, 2020

Indeed, the problem is unification. NVDA always aimed at the shortest possible translation, hence the word bang and not explanation mark for !, question (not question mark) for ?. Does Orca also follow this pattern?

@sthibaul
Copy link
Contributor Author

sthibaul commented May 4, 2020

What will make it a bit harder is if some languages have these symbols translated

I can filter them out, that's not a problem.

Does Orca also follow this pattern?

Orca has the same kind of users, so the same habits, yes :)

But anyway, we can inject the symbols for which there is currently nothing in NVDA, that will be better than nothing for these, and translators will be able to fine-tune afterwards if they want.

@josephsl
Copy link
Collaborator

josephsl commented May 4, 2020

@zstanecic
Copy link
Contributor

zstanecic commented May 4, 2020

@josephsl
Copy link
Collaborator

josephsl commented May 4, 2020

@zstanecic
Copy link
Contributor

zstanecic commented May 4, 2020

@josephsl
Copy link
Collaborator

josephsl commented May 4, 2020

@lukaszgo1
Copy link
Contributor

lukaszgo1 commented May 4, 2020

@sthibaul wrote:

Does Orca also follow this pattern?

Orca has the same kind of users, so the same habits, yes :)

I don't think so. While the Orca's user's might have similar needs the translation is being managed quite differently. Last time when I looked translations were done as part of Gnome project. This in itself would not be bad except the fact that the people behind translations are not, for the most part, using screen readers themselves. The Polish translation for example is IMO hardly usable the fact that these people do not have even vague idea of conventions used by screen readers is visible in almost every message,

@zstanecic
Copy link
Contributor

zstanecic commented May 4, 2020

@zstanecic
Copy link
Contributor

zstanecic commented May 4, 2020

@sthibaul
Copy link
Contributor Author

sthibaul commented May 4, 2020

Ok, I understand that we probably don't want to just commit the translations outright. I was just thinking it'd be a bit sad not to try to take benefit from them.

Instead I can produce one patch per language, put them somewhere for translators to have a look at, and let them take what they like?

@josephsl
Copy link
Collaborator

josephsl commented May 4, 2020

@sthibaul
Copy link
Contributor Author

sthibaul commented May 4, 2020

no need for per-language patch, as this is done as part of translations workflow

I mean, for providing the Orca translations, that translators can take if they like them.

@josephsl
Copy link
Collaborator

josephsl commented May 4, 2020

sthibaul added a commit to sthibaul/nvda that referenced this issue May 5, 2020
@Brian1Gaff
Copy link

Brian1Gaff commented May 5, 2020

@CyrilleB79
Copy link
Contributor

CyrilleB79 commented May 5, 2020

Just a question: is there some guideline or some common usage to decide which symbol is added in symbol.dic and which should not? Is there a big performance penalty to add many symbols?
I raise these questions because anyone may want to add more symbol, e.g. see #11015.

IMO when one asks new symbol inclusion, he/she should give a justification. For example:

  • ¼ (one fourth): seems already supported by synthesizers. So give a synthesizer example where this symbol is badly or not announced
  • all subscript and superscript: give an example of real life where they are used
    Maybe also the level should be justified.

@gregjozk
Copy link
Contributor

gregjozk commented May 5, 2020

As a slovenian translator of NVDA I will gladly cooperate with you and slovenian gnome/orca translator.

sthibaul added a commit to sthibaul/nvda that referenced this issue May 13, 2020
sthibaul added a commit to sthibaul/nvda that referenced this issue May 13, 2020
feerrenrut pushed a commit that referenced this issue May 15, 2020
* Add symbols from Orca

Fixes #11105
@nvaccessAuto nvaccessAuto added this to the 2020.2 milestone May 15, 2020
@bhavyashah
Copy link

bhavyashah commented Aug 7, 2020

I would really really really appreciate the answers to the questions posed in #11105 (comment). I am going through a Math textbook and I have come across several critical symbols some of which ESpeak-NG reads but Eloquence doesn't, some of which Eloquence reads but ESpeak-NG doesn't, and some of which neither read. Restating the relevant questions as under:

  • Is there a performance penalty for adding too many symbols? If not or negligible, can we add all the Unicode symbols there exist?
  • If there is a non-trivial penalty, what is the bar we set for including or excluding a symbol from NVDA itself?

@bhavyashah
Copy link

bhavyashah commented Aug 10, 2020

I would request someone to please answer my questions in #11105 (comment). Otherwise, I will go ahead and create a new issue about adding some critical Math symbols to NVDA.

@gregjozk
Copy link
Contributor

gregjozk commented Aug 10, 2020

@Adriani90
Copy link
Collaborator

Adriani90 commented Aug 10, 2020

To answer @CyrilleB79's and @bhavyashah's questions,

  1. Regarding subscripts and superscripts, these symbols are often used in physics or mathematics when writing formulas or postulates.
  2. Regarding performance, there is not a standard answer on this, it depends on the symbol. I think the only way to find out where the limit is, is to add important symbols in different tranches and test where the performance gets lower. With the mathematical symbols I didn't experience any performance lags.
  3. Regarding the limit we set for including and excluding symbols, not every symbol is needed. But there are certain domains where important symbols can be added, i.e. mathematical symbols, musical symbols, etc. However, even in these domains, one has to assess which symbols are commonly used and which would cover the most use cases. I can speak for mathematical and physics because I absolved advanced courses in these domains, but I cannot speak for musical symbols. However, there are lots of forums where you can get a rough idea of which symbols are most important. So in the end some one who likes to implement symbols into NVDA should assess their need from a qualitative perspective.

And last but not least
4. Adding symbols to NVDA's dictionary means they will be spoken by every synthesizer. Otherwise there are some synths who suport certain symbols, and many others which do not support them. Integrating as many symbols into the NVDA ssymbol dictionary at least guarantees kind of consistency on how they are pronounced etc. How ever, as mentioned above, the qualitative assessment needs to be considered via testing and looking at the performance limits.

@bhavyashah
Copy link

bhavyashah commented Aug 10, 2020

Thank you so much @gregjozk and @Adriani90 .
Here is a proposition from me: the fact that Unicode has encoded a symbol as a distinct character means that every Unicode character is used, some more than others. Given that we don't know if there is a limit to how many symbols we can add, let us assume that none exists. Therefore, let us add every Unicode symbol - regardless of our assessment of the extent of their usage - to NVDA.

  • If we notice no performance penalty, then all is well and we have just resolved the issue of some symbols important to some people missing in NVDA once and for all. (Unicode could come up with new ones in the future, yes, but the point still largely stands.)
  • If we notice only a slight performance penalty, we can figure out the least useful groups of these symbols and work on identifying and removing them. I think elimination of chunks would be easier than adjudecating every newly requested symbol's utility on a case-by-case basis.
  • If we notice a significant performance penalty, we can revert to the status quo.
    Do you think it would be too much work in terms of implementation in compiling and adding all Unicode symbols to NVDA, or can that be automated?
    Please share your thoughts on the above suggestion.

@gregjozk
Copy link
Contributor

gregjozk commented Aug 11, 2020

@nvdaes
Copy link
Sponsor Contributor

nvdaes commented Aug 11, 2020

@gregjozk
Copy link
Contributor

gregjozk commented Aug 11, 2020

@nvdaes
Copy link
Sponsor Contributor

nvdaes commented Aug 11, 2020

@bhavyashah
Copy link

bhavyashah commented Aug 11, 2020

Thanks so, so much for the Wiktionary link. I can see how this might get overwhelming for translators. To be clear though, I think just because something is difficult to translate and make available to all NVDA users, it does not follow that the feature should not be included, Eg the User Guide. Not saying that @gregjozk is advocating for that, but just clarifying preemptively. Whether included in core or in an add-on, I think the inclusion of symbols would be immensely valuable both ways. Is this request something you may be able to take care of by any chance? @nvdaes @Christianlm @Adriani90

@nvdaes
Copy link
Sponsor Contributor

nvdaes commented Aug 12, 2020

@Christianlm
Copy link

Christianlm commented Aug 29, 2020

IMO when one asks new symbol inclusion, he/she should give a justification.

I agree with @CyrilleB79.
Furthermore, many symbols establish a diferent reading behavior, depends on the language of the synthesizer.

For example:
Superscript o, in Italian and Spanish, maybe in other languages too, is used for ordinal numbers.
E.g. 10º must be read in italian "decimo" instead of 10 superscrip o.
Same thing for superscript a character.

Another example:
In Spanish, the characters "¡" "¿" add a slight pause by changing the intonation of the part of the sentence that precedes it.
These behaviors are part of the synthesizers like sespeak-ng, eloquence, Loquendo, Microsoft SAPI5.
I know the problems mentioned in my examples can be solved by the translators in the symbols.dic specific for languages,
but how many translators know the behavior of the synths used?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.