Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for W3C's CSS Speech Module #4242

Open
nvaccessAuto opened this issue Jul 2, 2014 · 17 comments
Open

Support for W3C's CSS Speech Module #4242

nvaccessAuto opened this issue Jul 2, 2014 · 17 comments

Comments

@nvaccessAuto
Copy link

@nvaccessAuto nvaccessAuto commented Jul 2, 2014

Reported by mgifford on 2014-07-02 00:20
I'm trying to see if there is a way to improve the accessibility for http://kushagragour.in/lab/hint/

Which is now part of Drupal 8.

I'd like to see that there is support for http://www.w3.org/TR/css3-speech/

So that we could either insert a pause or change the voice family befor the tooltip is used.

Right now in VoiceOver it is all read together. In ChromeVox it gets ignored. However, there should be some means to convey that the tooltip is distinct aurally from the text it is describing.

This is probably a lot bigger than NVDA. Does NVDA support the CSS Speech Module?

@nvaccessAuto
Copy link
Author

@nvaccessAuto nvaccessAuto commented Jul 2, 2014

Comment 2 by jteh on 2014-07-02 06:18
To directly answer your question, no, the CSS speech module is not supported. This would need significant work in all existing browsers and screen readers and may even require additions to current accessibility APIs. This is not likely to happen any time soon.

Whether we should even do this is somewhat controversial. A screen reader is a bit different to an interface designed specifically for speech. The intention is to represent all functionality available to a "screen" user, even if, in doing so, the speech might not be as "friendly" as one might expect from a specialised speech interface. Being able to tell a screen reader how numbers should be read or a name should be pronounced might be ideal, though even here, we would hit problems mapping this back to screen position, for example. However, we wouldn't want the content to be made entirely different.

As to this specific case, generally, secondary content such as a tooltip is exposed separately from the primary content; e.g. as the "description" of the accessible element. For example, if you use the @title attribute on a link, the link content will be the link's name and the title will be its description. This way, the two types of content are separated and the screen reader can choose how to handle them. This can be done with ARIA attributes; e.g. aria-labelledby and aria-describedby. I feel this would be the more appropriate way to go here; i.e. expose them separately so that the AT decides how to handle them, rather than the library choosing a specific speech experience. The experience chosen by the library might be completely different from how a given screen reader normally reports tooltips.

I'm leaving this open because it certainly needs further discussion, but it's very low priority at this stage.

@nvaccessAuto
Copy link
Author

@nvaccessAuto nvaccessAuto commented Jul 2, 2014

Comment 3 by mgifford on 2014-07-02 13:09
Very interesting! Thanks for taking the time to detail this.

I have asked in FF https://bugzilla.mozilla.org/show_bug.cgi?id=47159 & Chrome https://code.google.com/p/chromium/issues/detail?id=369863&q=css3%20speech&colspec=ID%20Pri%20M%20Iteration%20ReleaseBlock%20Cr%20Status%20Owner%20Summary%20OS%20Modified

But neither is supporting it yet http://css3test.com/

I am sure that any of these elements could be easily abused in a way that makes it less accessible.

speak-as, pause, rest, cue all seem like they could be quite useful if done properly. But as with the title attribute, it's so easy to get it wrong. I've felt that it would be nice to use the voice-family consistently with say an admin theme or perhaps administration functions provided by the CMS. If there was support for this, it might provide the same aural cues that we have visually. Are there places where the pros/cons for this have been publically debated?

But yes, on the specific issue of tooltips, my sense is that the @title attribute has been badly abused and confused with alt text in general. My assumption has been that most screen reader users simply ignore the title as it usually isn't useful.

I don't know that there is a "normal" for tooltips. I'm assuming that these are still great examples Open Ajax Alliance & Dojo nightly http://www.w3.org/WAI/PF/aria-practices/#tooltip

I'm assuming NVDA supports the role="tooltip" and it does really feel like a describedby type of event.

Hopefully we can keep this conversation going a bit more.

@nvaccessAuto
Copy link
Author

@nvaccessAuto nvaccessAuto commented Jul 3, 2014

Comment 4 by jteh (in reply to comment 3) on 2014-07-03 22:50
Replying to mgifford:

I've felt that it would be nice to use the voice-family consistently with say an admin theme or perhaps administration functions provided by the CMS. If there was support for this, it might provide the same aural cues that we have visually.

It's certainly a tricky issue. On the surface, it does seem to make sense that if you can style something visually, you should be able to style it aurally. However, a visual user doesn't require an intermediary tool to present information to them in a primarily linear fashion, so it is a more direct mapping. One problem is that a screen reader might use certain voices for specific purposes, so if something else uses these, it might be very confusing.

Are there places where the pros/cons for this have been publically debated?

Not that I know of.

But yes, on the specific issue of tooltips, my sense is that the @title attribute has been badly abused and confused with alt text in general. My assumption has been that most screen reader users simply ignore the title as it usually isn't useful.

That's not really my experience, especially on form fields and links.

I'm assuming NVDA supports the role="tooltip" and it does really feel like a describedby type of event.

Actually, NVDA doesn't really care about the tooltip role here. The key point is that aria-describedby references the tooltip, so the tooltip content becomes the "description" of the element in question. An NVDA user can then query this on demand and it is also reported when the element is focused, just as a sighted user would generally have to mouse over the element (or interact with it in some other way).

@bhavyashah
Copy link

@bhavyashah bhavyashah commented Sep 13, 2017

@jcsteh's #4242 (comment) provides a series of seemingly compelling arguments about why this issue is extremely difficult to resolve, why it might be controversial to implement in the ffirst place, etc. Keeping that in mind, I would like to kindly invite developers to further the discussion of this support request for a module I don't believe too many NVDA users desire to work with in the first place, which requires significant code rewrites according to Jamie, and pose several other UX/technical challenges. On the surface at least, wontfix or P4 sounds justified.

@sKopheK
Copy link

@sKopheK sKopheK commented Apr 6, 2018

any chance to support "@media speech" at least? seems to be totally ignored by NVDA :/

@brennanyoung
Copy link

@brennanyoung brennanyoung commented May 16, 2018

I'd also like to keep this discussion warm, and argue against closing the issue just yet.

Certainly, any rationale for not implementing CSS 3 speech support in screen readers is opaque and under-described, plus even though there may be strong arguments against such an implementation, there are also strong arguments in favour. The debate needs a proper and public airing, so that content developers can easily understand the reasoning. I've not found it easy to find relevant discussions on this subject.

The w3c speech API has barely begun to get out there in the wild. I think the wisest course of action is to follow that rollout closely, and see whether it can somehow enrich the experience in NVDA and other screenreaders. If it still seems like a canard at that point, then by all means close.

FWIW, I've already noticed web developers rushing ahead and implementing 'styled speech' in ways that conflict with WCAG recommendations. If I unilaterally get my website to voice its content (using the speech api or just extensive use of pre-recorded html5 audio), how will screenreaders handle the collision? It might be a rare thing today, but I expect it will be more common in the future as developers attempt to be WCAG compliant. At the very least, this particular issue should not be ignored.

Back to CSS 3 speech: There is (I think) a compelling argument for mapping different semantics onto different 'kinds' of speech. There seems to be a use case for (say) aria-live regions to be distinguished from control labels, and each of those distinguished again from static text content. (etc.) More fine-grained or content-specific semantic differences are easy to imagine.

When I say 'distinguish', I mean that it could be spoken in a different kind of voice (perhaps something as subtle as using the azimuth setting, or as radical as a different gender).

One way this might be done could be to link particular aria roles to particular voice settings using css 3 speech properties. Another way might be to offer options to make such mappings in the screenreader preferences, though they are already very complex.

I'd like to invite anyone interested to read this article, which breaks down audio into four 'typologies' (essentially, semantic categories). These categories might not be the best fit for general web content, but they could help to form a 'mental model' for how different audio characteristics could be used to denote different semantics.

@dd8
Copy link

@dd8 dd8 commented May 22, 2018

any chance to support "@media speech" at least? seems to be totally ignored by NVDA :/

@sKopheK the Media Queries 4 spec makes it explicit that screen readers should match the 'screen' media type (and not 'speech') because they read the screen
https://drafts.csswg.org/mediaqueries-4/#media-types

All the screen readers we tested (VoiceOver, JAWS, NVDA, WindowEyes, System Access and Dolphin) match @media screen and @media all, but not @media speech or @media aural
https://www.powermapper.com/tests/screen-readers/content/media-query-speech/

@sKopheK
Copy link

@sKopheK sKopheK commented May 22, 2018

Thanks for explanation.
We've faced this issue when trying to avoid screen readers to read icons rendered using web fonts (using content attribute in CSS). Using aria-hidden attribute at a separate tag for icon would help, but it's too much unnecessary HTML that has usually only visual meaning.

@dd8
Copy link

@dd8 dd8 commented May 22, 2018

@sKopheK There is way in to provide a content: alternative in CSS - but I don't know how well supported it is https://www.w3.org/TR/css-content-3/#accessibility

@brennanyoung
Copy link

@brennanyoung brennanyoung commented May 24, 2018

Our live region updates every couple of seconds, and our product is all about training rapid responses (for first aid). Urgency is an intentional part of the experience, but confusing the UI labels with the fictional accident is not.

We just did some user tests, and can confirm that in our web-app, users find the babble of aria-live spoken in a contiguous stream alongside UI accessible names, announced in the exact same voice cripples usability. This was with aria-live="polite", by the way, which is supposed to be the least pushy beyond pure silence. I hoped for gaps, at least.

We may have to abandon aria-live altogether, and roll-our-own 'live region', just to get a different voice.

We really need to be able to distinguish semantics with different voice settings. Whether this be with CSS, distinguishing different aria-live 'channels', or some other mechanism.

By all means, let it be up to the user what the details of those voice choices are, in much the same way as the user can choose font-family settings for 'serif', 'sans-serif', 'monospace' or 'fantasy' in the browser preferences.

@brennanyoung
Copy link

@brennanyoung brennanyoung commented May 29, 2018

Just found this which states

Ideas for Settings and Heuristics
Allow for a different voice (in text-to-speech) or other varying presentational characteristics to set live changes apart.

@Adriani90
Copy link
Collaborator

@Adriani90 Adriani90 commented Jan 2, 2019

@derekriemer, @jcsteh, @michaelDCurran, @feerrenrut your thoughts are very apreciated.

@josephsl
Copy link
Collaborator

@josephsl josephsl commented Jan 2, 2019

Also @MarcoZehe and anyone from Microsoft as well.

@oferb
Copy link

@oferb oferb commented May 1, 2020

Yet another use-case:

Being able to create something like Emacspeak for code, where semantic meaning of words is translated to a different pitch (e.g variable names sound different than class names).
This is similar to syntax highlighting for sighted developers.

https://en.m.wikipedia.org/wiki/Emacspeak

I personally think, if it's speaking hints specifically for screen readers, it's there to help screen reader users, in good intentions and probably not as an afterthought.
Why not give the option to developers to provide richer experiences?

@oferb
Copy link

@oferb oferb commented May 19, 2020

Could this kind of support be implemented as an NVDA add-on?
For example, having NVDA read the following while emphasizing "lazy":
"The quick brown fox jumps over the lazy dog"

Emphasis could be done using different pitch, volume, delay etc.
This would be similar to how people would actually say the sentence when reading it out loud.

@feerrenrut
Copy link
Contributor

@feerrenrut feerrenrut commented Jun 12, 2020

Just reading through this now, given I'm not familiar with the background of this hopefully I haven't totally misunderstood the point, apologies if so.

The reasoning given on this issue in support seems mostly to allow web developers to have control over how differing semantics are presented to the user. I argue this is the wrong place to map the presentation of semantics. The likely outcome would be different websites providing conflicting or at least inconsistent presentations of semantics. This will only be more confusing for the user. It also ensures inconsistency with desktop applications. I strongly think this mapping should be done by the screen reader. Ideally it configurable by the user to account for any specific preferences or needs they may have. The experiment with aria-live is an interesting one, and likely something we could resolve within NVDA.

I can imagine use-cases for entertainment type applications, eg ebooks, games, or similar. However, to reduce cognitive load, and meet the preferences and needs of the user, the screen reader should provide a consistent experience for consuming information and interacting with applications (web or otherwise).

@oferb
Copy link

@oferb oferb commented Jun 12, 2020

Cool, so what do you have in mind for this mapping that is done by the screen reader?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
9 participants
You can’t perform that action at this time.