-
-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NVDA isn't ignoring soft hyphens properly #9343
Comments
@Michael-Detmers: Thank you very much for opening this issue, because I planned exactly the same. But I would suggest that the user should still have the option to enable and disable filtering the soft hyphen (U+00AD) via the Browse Mode NVDA Settings. As a web developer you should have the opportunity to check the correct position of soft hyphens in all web browsers. But normally there isn't any useful benefit for screen reader users regarding this character. And sadly based on the responsive web design this character is often more used. And reading a news article, which contains "hundreds" of them, via speech and/or braille is extremely annoying. CC: @michaelDCurran, @jcsteh and @MarcoZehe |
One more question: What shall we do with the new HTML5 tag |
Since its purpose is to affect how a line of text is displayed, I'd vote for it to be generally ignored as well. And, as you suggest, there would have to be an option to read all punctuation and special characters verbatim for development and quality assurance purposes. |
A practical example where this is a huge problem is the CTAN repository for latex packages - for example, the page for the Amsmath package |
Sadly, no change. For the most part, hyphenation is unfortunately needed to meet the WCAG reflow requirements. Without it, long words will simply either flow out of visible areas, overlap each other or - ironically - will also be visually chopped up without any sign of continuation, since the hyphens are missing. So the current state is this: |
Switching the soft hyphen to be passed to the synthesiser (in Punctuation/symbol pronunciation...) fixes the issue, at least with eSpeak. I'll look into how it goes with other synths and see if I can change that to be the default and make a PR. |
Shouldn't have been so hasty. While eSpeak handles soft hyphens correctly, none of the other synths I have installed (SAPI5, One Core, Eloquence and Vocalizer) do. While handling them correctly probably should be up to the synthesiser, just switching them to be passed directly to the synth is not a very satisfactory solution. I'm not sure that having a setting to strip them is particularly satisfactory either. |
See also: #10634 (comment) |
I guess to fix this properly, we need an additional behavior in the speech symbol processor that simply discards the symbol as it wasn't there. |
@LeonarddeR: Please don't overlook the braille output, as ⠁⠏⠏⠇⠊⠉⠁⠞⠊⠕⠝ is also easier to read instead of ⠁⠏⠏⢤⠇⠊⢤⠉⠁⢤⠞⠊⠕⠝ (⢤ = SHY in German 8-dot), but both are needed depending on the situation (e.g. dictionaries, word processing, web/app development). I already pointed this out in my above linked comment. |
I think handling soft hyphens primarily should be a task of the braille table. In the Dutch 8 dot table for example, we ignore it completely. |
This is imho highly unwanted for the reasons I mentioned above because you cannot check a correct position of the SHY character if you cannot use TTS at the same time. And TTS will here also work only correct if you navigate character by character which is time consuming. Thus not really comfortable. As you already know me in such situations: The end user should have the force to change this behavior – not only (liblouis) devs for them. And issue #10634 also handles additional Unicode characters, which should be ignored in braille and speech output at the same way. So it's easier to add SHY (U+00AD) to that list as well. |
Please, ignore SHY. For German we need lots of shoft-hyphens. In many projects we need automated hyphenation which escalates this problem. |
The issue is fixed when setting "Punctuation/symbol level" to "some" in the Speech settings. I use NVDA version 2020.4. |
I don't understand why removing soft-hpyhens is not desirable. Normally it is invisible and shuld not be announced. And if it is shown it should IMHO not be announced either. It conveys absolutely no information related to the contents. |
Removing soft hyphens is desirable. NVDA ignores and doesn't announce soft hyphens when the user has set "Punctuation/symbol level" to "none" or "some" in the NVDA Speech settings. "some" is the default. An NVDA user might want to verify the correct positioning of the soft hyphens and therefore needs an option to make NVDA announce them. Microsoft Word has a similar setting, the Show/Hide Paragraphs option: Word's help explains: "Show paragraph marks and other hidden formatting symbols. This is especially useful for advanced layout tasks." This option shows optional hyphens. The fact that this option exists proves that there are valid use cases for revealing hidden symbols, for example proofreading including formatting symbols. |
@julianladisch: Which TTS synthesizers are you using? And which languages? |
That makes sense. My fault that I didn't think of that use case. So it boils down to the question whether soft hyphens should be announced with setting "most" or only in "all"? Or they could get their own settings. After all proof-reading is probaly not what users do all the time. |
Just a quick reminder: this issue is NOT about the announcement of the "shy" character. It is about the odd pronounciation of the whole words, where "shy" is used. |
cc: @michaelDCurran |
Thank you for clarification. The steps to reproduce should be extended:
The "Actual behavior" should be:
I confirm this bug. |
The behavior I'm seeing is this:
We could enhance this to only omit the symbol completely when certain conditions are met, such as level is none or character and "send symbol to synthesizer" is set to never. |
No progress on this? Soft hyphens have been around for ages and are a must-have for many languages - ok, for German at least. Just because English tends to have short words the problem should not be dismissed. Maybe it isn't, after all the issue has not been closed. |
See also #13668 |
Same in Finnish and Swedish @masi , this is a bit need and really against all specs that they are pronounced. Been hangin on to all hope that we can use soft hyphens which are so important while also maintaining our accessibility requirements that are surely so important for other Europeans at this point due to the new EC directive and all the languages with long compound words. VoiceOver does it great, doesn't that put the fire under you to improve this product that so many people rely on? 🔥 😉 |
To summarize this issue, I think to bring this further, we need to do the following:
I'd personally leave braille out of the discussion for now, though my standpoint is still that this is the translator's responsibility, otherwise we're very likely getting into routing issues. |
Hmm, if the soft character is still navigable in character by character navigation, this will also affect the word by word navigation still. I think it might be worth thinking about a checkbox in the browse mode settings to ignore soft hyphens completely when navigating through the virtual document. It is a small additional setting I know, but it seems to have big impact. |
Adding an extra option to browse mode settings isn't as impactless as you may think. Filtering characters from TextInfo is never trivial, even not with browse mode. |
How is this displayed visually? Are there any visual spaces instead of the hyphen themselves? If yes, I agree with you. But still the UX will be confusing when people navigate word by word while the word is splited into several parts.Von meinem iPhone gesendetAm 31.05.2024 um 17:47 schrieb Leonard de Ruijter ***@***.***>:
Adding an extra option to browse mode settings isn't as impactful as you may think. Filtering characters from TextInfo is never trivial, even not with browse mode.
Furthermore, character navigation should represent reality. If there is a soft hyphen in the text, I want to see that with character nav, just because it is there.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>
|
Source: https://en.wikipedia.org/wiki/Soft_hyphen See also my previous comments, e.g. #9343 (comment), #9343 (comment), #9343 (comment) and #10634 (comment) |
That means words with soft hyphens in the middle of the line are not stretched apart visually by any means? Is this correct?Von meinem iPhone gesendetAm 31.05.2024 um 18:29 schrieb Daniel Mayr ***@***.***>:
In computing and typesetting, a soft hyphen (Unicode U+00AD SOFT HYPHEN ()) or syllable hyphen, is a code point reserved in some coded character sets for the purpose of breaking words across lines by inserting visible hyphens if they fall on the line end but remain invisible within the line.
Source: https://en.wikipedia.org/wiki/Soft_hyphen
See also my previous comments, e.g. #9343 (comment), #9343 (comment), #9343 (comment) and #10634 (comment)
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>
|
Yes, based on my knowledge 20 years ago. I worked with soft hyphens visually during my time at business school. Additional quotation from Wikipedia:
Furthermore, there is an option in Microsoft Word 2010 (others yet not checked) and LibreOffice Writer 7.6 (others yet not checked) to toggle the screen visibility regarding specific characters like spaces, non-breaking spaces, tabulators and soft hyphens. In other words: The end user (or creator) must be able to see them to be able to check their correct position and the end user (or consumer) must also be able not to see them, which makes it easier to visually read a document. And the exact same option should be available in NVDA, as I already pointed out five years ago. |
Here is a codepen (not be me) which illustrates the use of soft hyphens compared to "normal" hyphens: https://codepen.io/InSightGraphics/pen/KKaMEr |
The soft hyphens are completely invisible under most circumstances but enable the word to wrap when space is limited, in which case the hyphen is shown giving a visual indication to users that the word has wrapped at the container boundary. In most cases this would not be useful information for non-visual users. Perhaps if a developer wants to inspect the characters this could be enabled with an option like other editor's (Word, VS Code) "display special characters" (navigate soft-hyphens) but for regular users, the words should be read naturally, ignoring the characters completely. A common case for these soft-hyphens would be in a heading text for example in compound words, so that on a mobile view longer words can be broken at the correct locations (between syllables or component words). Css hyphens: auto works fairly well in English, but not in other languages. Hyphen positioning is dictionary based and browsers have their own proprietary implementations. For important cases, it's possible to deliberately affect the points at which a word wraps by manually inserting the ­ character, but this is too much to expect from content managers, so using a library like Hypen is usually the best way to ensure breakpoints are applied consistently. Soft hyphens are therefore a useful tool allowing the best possible display of content dynamically and responsively in different circumstances, but the fact that NVDA reads these aloud means they can't actually be used anywhere. |
@LeonarddeR reading the comments above, in this case I think there should be an optional setting in NVDA voice settings called "speak and navigate word wrapping characters" or something like that. This would at least be consistent with braille settings as well, where word wrapping can be tunred on and off. |
I totally agree with this. And I would like to add, that this setting should be switched OFF by default. This makes it possible, to use the |
As far as I can see, the only major issue with soft hyphens currently is that they break up words when speaking them. It's pretty evident that they shouldn't. Apart from that, I don't think anything should be done within the scope of this issue. Let's not make it more complex than necessary. |
@LeonarddeR: I think fixing issue #9343 and issue #10634 at once would make more sense – and more work of course. But in the end we will be able to add more characters, which visually not visible like zero-width space (U+200B), but currently still sent to the TTS and to braille output. See: https://en.wikipedia.org/wiki/Zero-width_space What we need is a list of characters similar to the symbols.dic (tsv file). These characters are removing directly after the string was sent to NVDA and before the string is sent by NVDA to braille translation and speech symbol and word dictionaries. And during this process, the total of all removed characters must be count and their positions must be stored in a temp array to fix braille routing problems. The end user should be able to define, which characters should not be sent to speech and/or to braille output by enabling or disabling their checkboxes. He should also be able to add and remove characters to this list of ignored characters like it is the case with the NVDA GUI for the symbols.dic yet. Word-by-word navigation with CTRL+ArrowLeft/ArrowRight would be another problem, which in my opinion cannot be fixed by NVDA at all. But if I remember correctly, when you pressed ArrowRight in Microsoft Word, the visible cursor didn't change its visible position on moving through a soft hyphens – as long as the option for visually showing soft hyphens is disabled, which is normally the case. But this behavior could be changed within the last 20 years. So, please check this, as I'm using Microsoft Word/LibreOffice Writer since 2011 less than five times a year. Therefore my memories regarding this could be incorrect or no longer correct. |
Please not that #13668 has been closed as duplicate but contained useful information, more specifically the link provided in #13668 (comment). So I have just checked visually: Unicode character 173 = 0xADThis is the soft hyphen used in HTML (
Character 31 = 0x1FIt is a control character that Word calls "soft hyphen" and uses as such, i.e. it is not visible except when it is the last character of a line. ConclusionPlease do not mix the two characters. |
Steps to reproduce:
Actual behavior:
Soft hyphens are splitting words and causing odd pronunciations.
Expected behavior:
Soft hyphens are ignored.
System configuration
NVDA installed/portable/running from source:
installed
NVDA version:
2018.4.1
Windows version:
Win 7 64 bit
Name and version of other software in use when reproducing the issue:
Firefox 65.0.2
Other information about your system:
Default language is German (but that shouldn't mattern, should it?)
Other questions
Does the issue still occur after restarting your PC?
yes
Have you tried any other versions of NVDA?
no
Log
nvda.log
The text was updated successfully, but these errors were encountered: