Support for W3C's CSS Speech Module #4242

nvaccessAuto · 2014-07-02T00:20:38Z

Reported by mgifford on 2014-07-02 00:20
I'm trying to see if there is a way to improve the accessibility for http://kushagragour.in/lab/hint/

Which is now part of Drupal 8.

I'd like to see that there is support for http://www.w3.org/TR/css3-speech/

So that we could either insert a pause or change the voice family befor the tooltip is used.

Right now in VoiceOver it is all read together. In ChromeVox it gets ignored. However, there should be some means to convey that the tooltip is distinct aurally from the text it is describing.

This is probably a lot bigger than NVDA. Does NVDA support the CSS Speech Module?

nvaccessAuto · 2014-07-02T06:18:51Z

Comment 2 by jteh on 2014-07-02 06:18
To directly answer your question, no, the CSS speech module is not supported. This would need significant work in all existing browsers and screen readers and may even require additions to current accessibility APIs. This is not likely to happen any time soon.

Whether we should even do this is somewhat controversial. A screen reader is a bit different to an interface designed specifically for speech. The intention is to represent all functionality available to a "screen" user, even if, in doing so, the speech might not be as "friendly" as one might expect from a specialised speech interface. Being able to tell a screen reader how numbers should be read or a name should be pronounced might be ideal, though even here, we would hit problems mapping this back to screen position, for example. However, we wouldn't want the content to be made entirely different.

As to this specific case, generally, secondary content such as a tooltip is exposed separately from the primary content; e.g. as the "description" of the accessible element. For example, if you use the @title attribute on a link, the link content will be the link's name and the title will be its description. This way, the two types of content are separated and the screen reader can choose how to handle them. This can be done with ARIA attributes; e.g. aria-labelledby and aria-describedby. I feel this would be the more appropriate way to go here; i.e. expose them separately so that the AT decides how to handle them, rather than the library choosing a specific speech experience. The experience chosen by the library might be completely different from how a given screen reader normally reports tooltips.

I'm leaving this open because it certainly needs further discussion, but it's very low priority at this stage.

nvaccessAuto · 2014-07-02T13:09:57Z

Comment 3 by mgifford on 2014-07-02 13:09
Very interesting! Thanks for taking the time to detail this.

I have asked in FF https://bugzilla.mozilla.org/show_bug.cgi?id=47159 & Chrome https://code.google.com/p/chromium/issues/detail?id=369863&q=css3%20speech&colspec=ID%20Pri%20M%20Iteration%20ReleaseBlock%20Cr%20Status%20Owner%20Summary%20OS%20Modified

But neither is supporting it yet http://css3test.com/

I am sure that any of these elements could be easily abused in a way that makes it less accessible.

speak-as, pause, rest, cue all seem like they could be quite useful if done properly. But as with the title attribute, it's so easy to get it wrong. I've felt that it would be nice to use the voice-family consistently with say an admin theme or perhaps administration functions provided by the CMS. If there was support for this, it might provide the same aural cues that we have visually. Are there places where the pros/cons for this have been publically debated?

But yes, on the specific issue of tooltips, my sense is that the @title attribute has been badly abused and confused with alt text in general. My assumption has been that most screen reader users simply ignore the title as it usually isn't useful.

I don't know that there is a "normal" for tooltips. I'm assuming that these are still great examples Open Ajax Alliance & Dojo nightly http://www.w3.org/WAI/PF/aria-practices/#tooltip

I'm assuming NVDA supports the role="tooltip" and it does really feel like a describedby type of event.

Hopefully we can keep this conversation going a bit more.

nvaccessAuto · 2014-07-03T22:50:05Z

Comment 4 by jteh (in reply to comment 3) on 2014-07-03 22:50
Replying to mgifford:

I've felt that it would be nice to use the voice-family consistently with say an admin theme or perhaps administration functions provided by the CMS. If there was support for this, it might provide the same aural cues that we have visually.

It's certainly a tricky issue. On the surface, it does seem to make sense that if you can style something visually, you should be able to style it aurally. However, a visual user doesn't require an intermediary tool to present information to them in a primarily linear fashion, so it is a more direct mapping. One problem is that a screen reader might use certain voices for specific purposes, so if something else uses these, it might be very confusing.

Are there places where the pros/cons for this have been publically debated?

Not that I know of.

But yes, on the specific issue of tooltips, my sense is that the @title attribute has been badly abused and confused with alt text in general. My assumption has been that most screen reader users simply ignore the title as it usually isn't useful.

That's not really my experience, especially on form fields and links.

I'm assuming NVDA supports the role="tooltip" and it does really feel like a describedby type of event.

Actually, NVDA doesn't really care about the tooltip role here. The key point is that aria-describedby references the tooltip, so the tooltip content becomes the "description" of the element in question. An NVDA user can then query this on demand and it is also reported when the element is focused, just as a sighted user would generally have to mouse over the element (or interact with it in some other way).

bhavyashah · 2017-09-13T11:28:19Z

@jcsteh's #4242 (comment) provides a series of seemingly compelling arguments about why this issue is extremely difficult to resolve, why it might be controversial to implement in the ffirst place, etc. Keeping that in mind, I would like to kindly invite developers to further the discussion of this support request for a module I don't believe too many NVDA users desire to work with in the first place, which requires significant code rewrites according to Jamie, and pose several other UX/technical challenges. On the surface at least, wontfix or P4 sounds justified.

sKopheK · 2018-04-06T07:42:28Z

any chance to support "@media speech" at least? seems to be totally ignored by NVDA :/

brennanyoung · 2018-05-16T12:25:25Z

I'd also like to keep this discussion warm, and argue against closing the issue just yet.

Certainly, any rationale for not implementing CSS 3 speech support in screen readers is opaque and under-described, plus even though there may be strong arguments against such an implementation, there are also strong arguments in favour. The debate needs a proper and public airing, so that content developers can easily understand the reasoning. I've not found it easy to find relevant discussions on this subject.

The w3c speech API has barely begun to get out there in the wild. I think the wisest course of action is to follow that rollout closely, and see whether it can somehow enrich the experience in NVDA and other screenreaders. If it still seems like a canard at that point, then by all means close.

FWIW, I've already noticed web developers rushing ahead and implementing 'styled speech' in ways that conflict with WCAG recommendations. If I unilaterally get my website to voice its content (using the speech api or just extensive use of pre-recorded html5 audio), how will screenreaders handle the collision? It might be a rare thing today, but I expect it will be more common in the future as developers attempt to be WCAG compliant. At the very least, this particular issue should not be ignored.

Back to CSS 3 speech: There is (I think) a compelling argument for mapping different semantics onto different 'kinds' of speech. There seems to be a use case for (say) aria-live regions to be distinguished from control labels, and each of those distinguished again from static text content. (etc.) More fine-grained or content-specific semantic differences are easy to imagine.

When I say 'distinguish', I mean that it could be spoken in a different kind of voice (perhaps something as subtle as using the azimuth setting, or as radical as a different gender).

One way this might be done could be to link particular aria roles to particular voice settings using css 3 speech properties. Another way might be to offer options to make such mappings in the screenreader preferences, though they are already very complex.

I'd like to invite anyone interested to read this article, which breaks down audio into four 'typologies' (essentially, semantic categories). These categories might not be the best fit for general web content, but they could help to form a 'mental model' for how different audio characteristics could be used to denote different semantics.

dd8 · 2018-05-22T07:34:22Z

any chance to support "@media speech" at least? seems to be totally ignored by NVDA :/

@sKopheK the Media Queries 4 spec makes it explicit that screen readers should match the 'screen' media type (and not 'speech') because they read the screen
https://drafts.csswg.org/mediaqueries-4/#media-types

All the screen readers we tested (VoiceOver, JAWS, NVDA, WindowEyes, System Access and Dolphin) match @media screen and @media all, but not @media speech or @media aural
https://www.powermapper.com/tests/screen-readers/content/media-query-speech/

sKopheK · 2018-05-22T07:51:40Z

Thanks for explanation.
We've faced this issue when trying to avoid screen readers to read icons rendered using web fonts (using content attribute in CSS). Using aria-hidden attribute at a separate tag for icon would help, but it's too much unnecessary HTML that has usually only visual meaning.

dd8 · 2018-05-22T13:36:25Z

@sKopheK There is way in to provide a content: alternative in CSS - but I don't know how well supported it is https://www.w3.org/TR/css-content-3/#accessibility

brennanyoung · 2018-05-24T13:14:01Z

Our live region updates every couple of seconds, and our product is all about training rapid responses (for first aid). Urgency is an intentional part of the experience, but confusing the UI labels with the fictional accident is not.

We just did some user tests, and can confirm that in our web-app, users find the babble of aria-live spoken in a contiguous stream alongside UI accessible names, announced in the exact same voice cripples usability. This was with aria-live="polite", by the way, which is supposed to be the least pushy beyond pure silence. I hoped for gaps, at least.

We may have to abandon aria-live altogether, and roll-our-own 'live region', just to get a different voice.

We really need to be able to distinguish semantics with different voice settings. Whether this be with CSS, distinguishing different aria-live 'channels', or some other mechanism.

By all means, let it be up to the user what the details of those voice choices are, in much the same way as the user can choose font-family settings for 'serif', 'sans-serif', 'monospace' or 'fantasy' in the browser preferences.

brennanyoung · 2018-05-29T11:40:29Z

Just found this which states

Ideas for Settings and Heuristics
Allow for a different voice (in text-to-speech) or other varying presentational characteristics to set live changes apart.

Adriani90 · 2019-01-02T09:41:40Z

@derekriemer, @jcsteh, @michaelDCurran, @feerrenrut your thoughts are very apreciated.

josephsl · 2019-01-02T17:19:51Z

Also @MarcoZehe and anyone from Microsoft as well.

oferb · 2020-05-01T18:23:13Z

Yet another use-case:

Being able to create something like Emacspeak for code, where semantic meaning of words is translated to a different pitch (e.g variable names sound different than class names).
This is similar to syntax highlighting for sighted developers.

https://en.m.wikipedia.org/wiki/Emacspeak

I personally think, if it's speaking hints specifically for screen readers, it's there to help screen reader users, in good intentions and probably not as an afterthought.
Why not give the option to developers to provide richer experiences?

oferb · 2020-05-19T17:53:22Z

Could this kind of support be implemented as an NVDA add-on?
For example, having NVDA read the following while emphasizing "lazy":
"The quick brown fox jumps over the lazy dog"

Emphasis could be done using different pitch, volume, delay etc.
This would be similar to how people would actually say the sentence when reading it out loud.

feerrenrut · 2020-06-12T14:01:26Z

Just reading through this now, given I'm not familiar with the background of this hopefully I haven't totally misunderstood the point, apologies if so.

The reasoning given on this issue in support seems mostly to allow web developers to have control over how differing semantics are presented to the user. I argue this is the wrong place to map the presentation of semantics. The likely outcome would be different websites providing conflicting or at least inconsistent presentations of semantics. This will only be more confusing for the user. It also ensures inconsistency with desktop applications. I strongly think this mapping should be done by the screen reader. Ideally it configurable by the user to account for any specific preferences or needs they may have. The experiment with aria-live is an interesting one, and likely something we could resolve within NVDA.

I can imagine use-cases for entertainment type applications, eg ebooks, games, or similar. However, to reduce cognitive load, and meet the preferences and needs of the user, the screen reader should provide a consistent experience for consuming information and interacting with applications (web or otherwise).

oferb · 2020-06-12T15:45:06Z

Cool, so what do you have in mind for this mapping that is done by the screen reader?

sidnc86 · 2022-01-05T18:15:52Z

Pardon my in-depth knowhow on CSS3 Speach API as I have started reading about it quite recently. But I think from web developers perspective implementing speak:none; from CSS instead of aria-hidden from HTML could be equivalent of text-heading-level:1; in CSS instead of h1 element in HTML. Looking at CSS It should be noted that semantics has been delivered by HTML to Assistive Technologies. So to expect Assistive Technologies to go and look for semantic information back in CSS, is kind of defying the purpose of seggrigating semantics from design.

derekriemer · 2022-01-05T18:46:41Z

I think the major difference is that speak:none would presume that braille could still occur, where aria-hidden hides both.

…

On Wed, Jan 5, 2022 at 11:16 AM Siddhant Chothe ***@***.***> wrote: Pardon my in-depth knowhow on CSS3 Speach API as I have started reading about it quite recently. But I think from web developers perspective implementing speak:none; from CSS instead of aria-hidden from HTML could be equivalent of text-heading-level:1; in CSS instead of h1 element in HTML. Looking at CSS It should be noted that semantics has been delivered by HTML to Assistive Technologies. So to expect Assistive Technologies to go and look for semantic information back in CSS, is kind of defying the purpose of seggrigating semantics from design. — Reply to this email directly, view it on GitHub <#4242 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABI2FPPVM7JD3PRIOHZGSILUUSDGJANCNFSM4D2X3PEA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you were mentioned.Message ID: ***@***.***>

-- Derek Riemer: Improving the world, one byte at a time. ⠊⠍⠏⠗⠕⠧⠬ ⠮ ⠸⠺⠂ ⠐⠕ ⠃⠽⠞⠑ ⠁⠞ ⠁ ⠐⠞⠲ Software engineer, Drive web

mgifford · 2023-02-28T16:27:59Z

Adding older thread here https://sourceforge.net/p/nvda/lists/nvda-commits/thread/054.dc1ab85e62b6cf0bbf5f855dadb7a9eb%40nvaccess.org/

But also add https://css-tricks.com/lets-talk-speech-css/ and more timely https://www.meetup.com/css-cafe/events/291837233/

Adriani90 · 2023-12-10T21:06:06Z

The newest specifications are documented here:
https://www.w3.org/TR/css-speech-1/

As far as I understand, this speech module is not really only for screen readers, but especially also for using in industries and in situations where actively controlling a device is not appropriate i.e. while driving a car. Or when using the read aloud feature on websites like news papers etc, where people can click on a button to have an article read out loud. another use case is the feature of reading out loud a pdf in Edge or Adobe reader with their internal speech synthesizer.
So with the css speech module the web author can specify how his or her website should be read aloud.
This is very useful for people with cognitive disabilities such as Down's Syndrome, for people who need easy language, for people with a bit of visual ability or for everyone who just wants to hear a part of a website being read by a voice in a specific case.
Still, I am not sure about the use case for a screen reader user. This is a totally different use case. In NVDA we have the framework implemented in #7599 which might make support for CSS speech module possible but I am not convinced that a web author can really meet the best user experience for a screen reader user. How do we make sure then that deaf-blind people have the same experience in braille? And as far as I understand this css speech module takes action when the user clicks on something, it does not follow the focus and adjusts the voice while you are moving the focus around. Is my understanding correct?

Anyway, if we put the control of the speech in the hands of a web author, this should definitely be an optional setting in the screen reader settings.
There are still too many things to be considered and it would probably lead to controversial opinions in the community about the user experience. In the end I guess it would overwelm everyone who creates a website if we set any standards related to the speech behavior while reading a website with a screen reader. This is a much more complex task than the read aloud button on news articles. I am not sure web developers really want to take this burden on themselves.
However, I would love to see a website that is being implemented with a CSS speech module use case in mind and which simulates a screen reader navigating the website so we can test in a prototype how it would sound like.
On the other hand screen reader developers might take this on a larger scale and implement more global use cases with #7599 in mind. This might make more sense in the end.

Adriani90 · 2023-12-28T08:51:15Z

Given there is no clear use case for a screenreader documented in this discussion, I am closing this for now. Please contribute with a concrete screen reader related use case and we can reopen. Or open a new discussion with screen reader use case in mind.

nvaccessAuto added enhancement blocked/needs-external-fix labels Nov 10, 2015

dd8 mentioned this issue May 18, 2018

[mediaqueries-4 ] Deprecate 'speech' media type as well? w3c/csswg-drafts#1751

Closed

Adriani90 closed this as completed Dec 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for W3C's CSS Speech Module #4242

Support for W3C's CSS Speech Module #4242

nvaccessAuto commented Jul 2, 2014

nvaccessAuto commented Jul 2, 2014

nvaccessAuto commented Jul 2, 2014

nvaccessAuto commented Jul 3, 2014

bhavyashah commented Sep 13, 2017

sKopheK commented Apr 6, 2018

brennanyoung commented May 16, 2018

dd8 commented May 22, 2018

sKopheK commented May 22, 2018

dd8 commented May 22, 2018

brennanyoung commented May 24, 2018

brennanyoung commented May 29, 2018

Adriani90 commented Jan 2, 2019

josephsl commented Jan 2, 2019

oferb commented May 1, 2020

oferb commented May 19, 2020

feerrenrut commented Jun 12, 2020

oferb commented Jun 12, 2020

sidnc86 commented Jan 5, 2022

derekriemer commented Jan 5, 2022 via email

mgifford commented Feb 28, 2023

Adriani90 commented Dec 10, 2023

Adriani90 commented Dec 28, 2023

Support for W3C's CSS Speech Module #4242

Support for W3C's CSS Speech Module #4242

Comments

nvaccessAuto commented Jul 2, 2014

nvaccessAuto commented Jul 2, 2014

nvaccessAuto commented Jul 2, 2014

nvaccessAuto commented Jul 3, 2014

bhavyashah commented Sep 13, 2017

sKopheK commented Apr 6, 2018

brennanyoung commented May 16, 2018

dd8 commented May 22, 2018

sKopheK commented May 22, 2018

dd8 commented May 22, 2018

brennanyoung commented May 24, 2018

brennanyoung commented May 29, 2018

Adriani90 commented Jan 2, 2019

josephsl commented Jan 2, 2019

oferb commented May 1, 2020

oferb commented May 19, 2020

feerrenrut commented Jun 12, 2020

oferb commented Jun 12, 2020

sidnc86 commented Jan 5, 2022

derekriemer commented Jan 5, 2022 via email

mgifford commented Feb 28, 2023

Adriani90 commented Dec 10, 2023

Adriani90 commented Dec 28, 2023