Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
Support for W3C's CSS Speech Module #4242
Reported by mgifford on 2014-07-02 00:20
Which is now part of Drupal 8.
I'd like to see that there is support for http://www.w3.org/TR/css3-speech/
So that we could either insert a pause or change the voice family befor the tooltip is used.
Right now in VoiceOver it is all read together. In ChromeVox it gets ignored. However, there should be some means to convey that the tooltip is distinct aurally from the text it is describing.
This is probably a lot bigger than NVDA. Does NVDA support the CSS Speech Module?
Comment 2 by jteh on 2014-07-02 06:18
Whether we should even do this is somewhat controversial. A screen reader is a bit different to an interface designed specifically for speech. The intention is to represent all functionality available to a "screen" user, even if, in doing so, the speech might not be as "friendly" as one might expect from a specialised speech interface. Being able to tell a screen reader how numbers should be read or a name should be pronounced might be ideal, though even here, we would hit problems mapping this back to screen position, for example. However, we wouldn't want the content to be made entirely different.
As to this specific case, generally, secondary content such as a tooltip is exposed separately from the primary content; e.g. as the "description" of the accessible element. For example, if you use the @title attribute on a link, the link content will be the link's name and the title will be its description. This way, the two types of content are separated and the screen reader can choose how to handle them. This can be done with ARIA attributes; e.g. aria-labelledby and aria-describedby. I feel this would be the more appropriate way to go here; i.e. expose them separately so that the AT decides how to handle them, rather than the library choosing a specific speech experience. The experience chosen by the library might be completely different from how a given screen reader normally reports tooltips.
I'm leaving this open because it certainly needs further discussion, but it's very low priority at this stage.
Comment 3 by mgifford on 2014-07-02 13:09
I have asked in FF https://bugzilla.mozilla.org/show_bug.cgi?id=47159 & Chrome https://code.google.com/p/chromium/issues/detail?id=369863&q=css3%20speech&colspec=ID%20Pri%20M%20Iteration%20ReleaseBlock%20Cr%20Status%20Owner%20Summary%20OS%20Modified
But neither is supporting it yet http://css3test.com/
I am sure that any of these elements could be easily abused in a way that makes it less accessible.
speak-as, pause, rest, cue all seem like they could be quite useful if done properly. But as with the title attribute, it's so easy to get it wrong. I've felt that it would be nice to use the voice-family consistently with say an admin theme or perhaps administration functions provided by the CMS. If there was support for this, it might provide the same aural cues that we have visually. Are there places where the pros/cons for this have been publically debated?
But yes, on the specific issue of tooltips, my sense is that the @title attribute has been badly abused and confused with alt text in general. My assumption has been that most screen reader users simply ignore the title as it usually isn't useful.
I don't know that there is a "normal" for tooltips. I'm assuming that these are still great examples Open Ajax Alliance & Dojo nightly http://www.w3.org/WAI/PF/aria-practices/#tooltip
I'm assuming NVDA supports the role="tooltip" and it does really feel like a describedby type of event.
Hopefully we can keep this conversation going a bit more.
Comment 4 by jteh (in reply to comment 3) on 2014-07-03 22:50
It's certainly a tricky issue. On the surface, it does seem to make sense that if you can style something visually, you should be able to style it aurally. However, a visual user doesn't require an intermediary tool to present information to them in a primarily linear fashion, so it is a more direct mapping. One problem is that a screen reader might use certain voices for specific purposes, so if something else uses these, it might be very confusing.
Not that I know of.
That's not really my experience, especially on form fields and links.
Actually, NVDA doesn't really care about the tooltip role here. The key point is that aria-describedby references the tooltip, so the tooltip content becomes the "description" of the element in question. An NVDA user can then query this on demand and it is also reported when the element is focused, just as a sighted user would generally have to mouse over the element (or interact with it in some other way).
@jcsteh's #4242 (comment) provides a series of seemingly compelling arguments about why this issue is extremely difficult to resolve, why it might be controversial to implement in the ffirst place, etc. Keeping that in mind, I would like to kindly invite developers to further the discussion of this support request for a module I don't believe too many NVDA users desire to work with in the first place, which requires significant code rewrites according to Jamie, and pose several other UX/technical challenges. On the surface at least, wontfix or P4 sounds justified.
I'd also like to keep this discussion warm, and argue against closing the issue just yet.
Certainly, any rationale for not implementing CSS 3 speech support in screen readers is opaque and under-described, plus even though there may be strong arguments against such an implementation, there are also strong arguments in favour. The debate needs a proper and public airing, so that content developers can easily understand the reasoning. I've not found it easy to find relevant discussions on this subject.
The w3c speech API has barely begun to get out there in the wild. I think the wisest course of action is to follow that rollout closely, and see whether it can somehow enrich the experience in NVDA and other screenreaders. If it still seems like a canard at that point, then by all means close.
FWIW, I've already noticed web developers rushing ahead and implementing 'styled speech' in ways that conflict with WCAG recommendations. If I unilaterally get my website to voice its content (using the speech api or just extensive use of pre-recorded html5 audio), how will screenreaders handle the collision? It might be a rare thing today, but I expect it will be more common in the future as developers attempt to be WCAG compliant. At the very least, this particular issue should not be ignored.
Back to CSS 3 speech: There is (I think) a compelling argument for mapping different semantics onto different 'kinds' of speech. There seems to be a use case for (say) aria-live regions to be distinguished from control labels, and each of those distinguished again from static text content. (etc.) More fine-grained or content-specific semantic differences are easy to imagine.
When I say 'distinguish', I mean that it could be spoken in a different kind of voice (perhaps something as subtle as using the azimuth setting, or as radical as a different gender).
One way this might be done could be to link particular aria roles to particular voice settings using css 3 speech properties. Another way might be to offer options to make such mappings in the screenreader preferences, though they are already very complex.
I'd like to invite anyone interested to read this article, which breaks down audio into four 'typologies' (essentially, semantic categories). These categories might not be the best fit for general web content, but they could help to form a 'mental model' for how different audio characteristics could be used to denote different semantics.
@sKopheK the Media Queries 4 spec makes it explicit that screen readers should match the 'screen' media type (and not 'speech') because they read the screen
All the screen readers we tested (VoiceOver, JAWS, NVDA, WindowEyes, System Access and Dolphin) match @media screen and @media all, but not @media speech or @media aural
Thanks for explanation.
Our live region updates every couple of seconds, and our product is all about training rapid responses (for first aid). Urgency is an intentional part of the experience, but confusing the UI labels with the fictional accident is not.
We just did some user tests, and can confirm that in our web-app, users find the babble of aria-live spoken in a contiguous stream alongside UI accessible names, announced in the exact same voice cripples usability. This was with aria-live="polite", by the way, which is supposed to be the least pushy beyond pure silence. I hoped for gaps, at least.
We may have to abandon aria-live altogether, and roll-our-own 'live region', just to get a different voice.
We really need to be able to distinguish semantics with different voice settings. Whether this be with CSS, distinguishing different aria-live 'channels', or some other mechanism.
By all means, let it be up to the user what the details of those voice choices are, in much the same way as the user can choose font-family settings for 'serif', 'sans-serif', 'monospace' or 'fantasy' in the browser preferences.
Yet another use-case:
Being able to create something like Emacspeak for code, where semantic meaning of words is translated to a different pitch (e.g variable names sound different than class names).
I personally think, if it's speaking hints specifically for screen readers, it's there to help screen reader users, in good intentions and probably not as an afterthought.
Could this kind of support be implemented as an NVDA add-on?
Emphasis could be done using different pitch, volume, delay etc.
Just reading through this now, given I'm not familiar with the background of this hopefully I haven't totally misunderstood the point, apologies if so.
The reasoning given on this issue in support seems mostly to allow web developers to have control over how differing semantics are presented to the user. I argue this is the wrong place to map the presentation of semantics. The likely outcome would be different websites providing conflicting or at least inconsistent presentations of semantics. This will only be more confusing for the user. It also ensures inconsistency with desktop applications. I strongly think this mapping should be done by the screen reader. Ideally it configurable by the user to account for any specific preferences or needs they may have. The experiment with aria-live is an interesting one, and likely something we could resolve within NVDA.
I can imagine use-cases for entertainment type applications, eg ebooks, games, or similar. However, to reduce cognitive load, and meet the preferences and needs of the user, the screen reader should provide a consistent experience for consuming information and interacting with applications (web or otherwise).