-
-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accessibility Shortcuts Portal Proposal (org.freedesktop.portal.AT.Shortcuts
)
#1046
Comments
As an aside, I expect many more accessibility features will want a portal implementation at some point. For example, if an application already has the permission to be an assistive technology, it may want to zoom in, or move a user's mouse around (not read the position, I'm aware of the mouse position idea). Both things that are traditionally at least able to be triggered by the screen reader, even if the screen reader itself does not provide the functionality. |
And, when talking about a screen reader portal, we might allow the screen reader to request a mouse event for a window at a particular window coordinates of that window, you unfortunately need that operation for some broken websites. |
Mouse events at particular coordinates are best handled by AT-SPI, not portals. |
A counter proposal for this, that makes it harder for applications to "sneakily" make themself accessibility technology by applying bad design: instead of "requesting accessibility", have the accessibility utility via the desktop file advertise that they implement a certain aspect of accessibility. With that, require Settings (or the equivalent in e.g. KDE) to make these discoverable and possible to explicitly enable. A accessibility application would instead of nagging the user with "please let me eaves drop on you" instead issue some request via the portal to open the accessibility setting, instructing the user to actively select $APP as the "screen reader" for example. This is not new, it is how some permissions are dealt with in Android. |
Zooming in can only practically be implemented by the compositor itself. In GNOME there is possible via the a11y menu, but could perhaps be something that can be triggered via a potential accessibility portal API. |
This shouldn't need to make a screen reader your default to have it usable. I don't mind a combination of these features where:
|
Fair. And I brought it up, but let's leave this for another portal. |
Correct me if I'm wrong, but I assume one would only have one screen reader active at any point in time, as otherwise I imagine it'd be quite a noisy experience. So if you have a default bundled screen reader, it'd already be selected, thus wouldn't need to do anything to work. The only time you need to change that setting is if you have another screen reader you want to change to. If you have non-screen-reader like assistive technology that needs to have the same level of "eaves-dropping-ness", then that would need to go into that Settings panel, but installing such a thing would mean it would be discoverable, and possible to enable. There is a general problem with "nag" like permissions, which is that the user sooner or later "gives up" or doesn't care if there is a yes/no permission. In other words, they don't really help. Portals are, when possible, designed to avoid this issue. For access to files, a file selector is always presented, for sharing the screen, one have to select the screen to share, and so on. For something as problematic as getting an introspected view of everything going on on the computer we should try hard to avoid ending up with a Yes/No like dialog, and taking inspiration from Android, making it explicit configuration one is asked to do seems like a good way to mitigate things. |
Agreed..... but this makes my situation (where I switch back and forth between two screen readers for testing) an absolute nightmare. I agree from a normal users' perspective this makes sense, though. EDIT: As long as it is possible for a distribution to ship with a default screen reader, then automatically run that without user interaction, that's fine by me. I'm sure as a dev, I can find a script that'll just swap this setting for me. |
A rather peculiar use case :) but I imagine this could be scriptable in one way or the other, in most DE:s, e.g. a gsetting in gnome. |
This is critical yes, but doable with the solution I'm suggesting, I believe. |
org.freedesktop.portal.Accessibility
)
I think this is really important. Android's accessibility portal-equivalent literally does get used by malware (for example), precisely because the interface is so powerful. |
The first question is: Are being aware of input events and emitting input events (shortcuts) valid actions to say for sure that an application has accessibility features? If not (which is the case here), then an app certainly can't request accessibility access (whether through a dialog or by opening the accessibility settings). This would be a lie. (if relevant) The second question is: Is it okay to let an app potentially control the system or other apps, even if asked? Example: accessibility can send auto-generated shortcuts within apps. I don't think so. |
This is an attempt to summarize a rough proposal that was discussed on the GNOME a11y Matrix room yesterday: Assistive TechnologyAssistive Technology (AT) are rather special when it comes to the type and breadth of access they need to users system. They need to be able to read out loud what widget is focused in what window is focused, and what letter is entered in what text field. From a privacy and sandbox perspective, the needs of AT are very problematic, as they for all practical purposes need to perform total surveillance of everything user "sees" and does. It would be disastrous if a rogue application would get the same level of access that an AT gets, but at the same time, people may want to install additional or replace existing AT to help them with using the computer. So, in one way or the other, if we want AT to be distributable in a safe and relatively sandbox friendly way (e.g. Flatpak), we need a portal that can handle access to the resources the system has to make available for the AT to work. At the same time, we need to be very careful in exactly how a user can use install and use AT, without accidentally enabling malware to get the same level of access to resources regular applications shouldn't have access to. At the same time, it needs to be easy enough and discoverable how to e.g. switch to another screen reader or adding additional AT. Access typesInitially, two types of access types have high priority, and are critical to AT, that are focused upon first. Priority keyboard shortcutsPreviously, in X11, this has been implemented by grabbing key events from the X server, but doing so is problematic, and seen as something very undesirable in Wayland compositors, as having to round trip for key events to one or more ATs is very problematic. Instead, a solution to this is to provide something similar to As with the global shortcuts portal, the display server would translate a stream of key events into triggered shortcuts, that the AT would then be signalled about. For this the shortcuts xdg-spec might need to be expanded to handle the use cases needed by ATs. This would avoid any display server AT round trips, but still allow shortcuts for ATs to have priority over other shortcuts on the system. Access to the accessibility busThe accessibility bus is a dedicated D-Bus bus where applications describe what the user is currently interacting with. Access to this is the most problematic, as it allows the application to fully introspect what is going on on the computer, including reading any text, reading everything the user types, etc. I'll leave the details of how to practically open such a bus, but in one way or the other, e.g. by opening file descriptors, it could be done with API on an accessibility portal. Handling access granting requirementsAs mentioned, we must try hard avoid rouge application that want to trick users into letting them spy on them, but we also need to make it possible to let distributions pre-install a screen reader that should have access without needing to query the user, as without said screen reader, the user wouldn't be able to interact with the computer at all. What this means in practice are these things:
Access handling proposalMake giving access to an AT an explicit operation similar to other system configuration, and not something directly requested by the AT application itself. The way this would work is making it possible to discover, switch and add AT via the accessibility panel of the settings application used in the desktop environment. DiscoveryDiscovery would be handled by AT applications adding a field to their The desktop file and the new field would have no use other than helping with discovery. The primary way after having installed a new AT would be to go to the accessibility panel of the settings app, and switch to or enable the newly installed AT. Assisted discoveryThere might be desirable to allow a window used for e.g. configuring an AT to assist the user with making it easy to find the Accessibility panel in the Settings app. This could work by for example having a Note that, in theory it would be possible for portal backends to implement "give me permission"-dialogs with such method call, but the advice would be not to, given the reasons listed earlier. GranularityHaving granular access control might be desirable, and doing so is not necessarily more complicated. A DE might want to simplify things and e.g. give access to both unrestricted keyboard shortcuts and the accessibility bus with a single option, while others might want to give more granular control up front. Manipulating the permission store via third party applications (e.g. Flatseal) would be possible if the permission store is used in a portable manner. Sane defaultsDistributions should be able to pre-install a screen reader, and make it possible to use without any user interaction. With this model, this would be achievable by making sure distributions can pre-populate e.g. the permission store with any installed screen reader application, while ensuring it is launched in a way that makes the portal correctly identify the app ID it has. With the permission store setup for each new user, it should not matter if the screen reader is pre-installed as a flatpak, via a traditional package, or part of an immutable OS image, as long as it is launched correctly. Development & testing experienceA concern raised with a method like this was the development experience for developing e.g. a screen reader, or testing different ones often; having to interact with Settings in a very manual way can be quite annoying if one has to do it very often. This can be mitigated by making sure changing permissions possible via scripts. If permission handling is done using the permission store, this should be relatively simple. Improved xdg-desktop-portal documentation about how to run executable from the command line allowing portals to correctly identify the app ID correctly would also make things easy as well, and developers would not need to do much more than just running the executable. Edit: added part about distribution default. |
Thanks you, @jadahl ! This is a great encapsulation of what we discussed. I'm going to create a simple set of calls for the portal here, and see if there are further comments: Methods
Signals
And finally, the standard properties: Should this be a PR at this poiint? Continue the discussion there? |
Questions:
As for accessing the accessibility bus, if that means accessing private information, then the user should be aware of that. |
Yes. The user would be aware of that by virtue of adding the application to the list of accessibility applications (which will be opened by the AccessibilitySettings method). And yes, the accessibility bus is what would allow an AT to read the contents of things of the screen.
An application that sets up realtime macro shortcuts could use this. They could set F8 to "bind new macro", then trap all keys, get a combination, followed by a set of keys to reproduce later. Then, set up an action with this protocol that would replay the sequence of keys via some other method. Niche, but not unheard of on other operating systems (Windows).
I may be misunderstanding the question, so feel free to correct me. Shortcuts are redefined based on context. So for example, the simple fact that a user is inside a document (web or libreoffice) would set different shortcuts than being in a simple GUI application. Being in a text box changes the shortcuts, your focus on certain types of items changes the shortcuts. It is extremely dependent on the current context, and would not generally change because a user "pressed a button".
The AT should not generate an input event, or something which would become a shortcut, no. |
Indeed, I was not clear. Here's an example scenario: The "accessibility" app has access to all content, so knows when you type text, what actions you trigger, and may possibly misread things on purpose. Since it can reassign shortcuts at will, can we imagine that the app can assign the action "delete" or "press the button" (like a delete button) to a shortcut that you use frequently (but which is not the shortcut you defined)? |
I suppose that would technically be possible, @Mikenux |
I think this is likely the only method needed regarding shortcuts, if the intention is for a11y shortcuts to always take precedence without any user interaction. It also means Might be useful to let the backend communicate what shortcuts it managed to set though, it cannot really be 100% unconditional. It'll depend on implementation abilities and a limited set of combinations (e.g. escape hatch) the compositor might want to have. Changing between "modes" would just set new shortcuts. |
Yes, it'd be a tricky design task to some how educate the user while they are configuring things.
I imagine A-Z, delete, backspace and enter could perhaps be "shortcuts" that the portal backend can disallow even for an AT, but fundamentally, the possibility that an app disguising itself as an AT can use an a11y portal to do really terrible things is a real problem and hard to solve. |
This will not be possible. I can't say for sure on Backdpace and enter, but individual characters and Shift+a singular character are very common shortcuts used by a screen reader. EDIT: I've just confirmed the backspace and enter are also used in some modes of operation. |
Right now. Any binary can just read and interact with the accessibility layer with no permissions at all. So this will still be massive progress. |
It would be better to warn the user when shortcuts are assigned to delete/destructive actions. However, even if it would be possible to detect such actions, these shortcuts must be stable across contexts, and the system screen reader must read them instead of the app's screen reader (or at least give a hint). The same may be true for the "push the button" action, although it could be limited to destructive actions. The main thing is to avoid any destructive actions. Any other bad but non-destructive behavior (e.g. misreading) is something the user should notice. Therefore, a way to easily disable the problematic app is needed. |
What exactly do you mean by destructive actions? |
Ah I see. Thanks for the clarification.
Yes, this is probably a good idea. This would be sent by the
The only reason I suggested otherwise is that changing the events would be a fairly large request, potentially in the 1-2KB+ range, since every possible shortcut, with namespaced actions attached could be quite a large list, and it needs to be updated nearly instantaneously for a good experience—I was worried about the round-trip time for such a large piece of data. Perhaps I'm thinking a bit too low-level for a portal? I'd need input from others on what latencies would be considered acceptable for this. I'm trying to avoid a situation where a user presses two shortcuts close together, and the first one changes what shortcuts are available. Ideally this would never happen, since under the current "the AT grabs all input events" system, this is not possible, which at least has the advantage of always being correct, even if it is an order-of-magnitude less secure. |
What is your meaning here? I'm not sure I understand what you mean in terms of the distinction between a system screen reader and "an app's screen reader". The vast, vast majority of applications rely an external screen readers to provide accessibility, and those that do not generally just require that a user disable their current screen reader to use it. |
I used the "destructive" action just to be general, referring to the term used in GNOME. Another destructive action other than "Delete" is "Discard", for example. If it's already communicated, that's fine. Thank you for the precision on the difference between a screen reader and a "self-voicing" application. |
Yes, so in this case focusing a button labeled "delete", or "remove", would speak the label of the button. So the user would be aware of what they are doing. |
Would the appropriate libei portal implementations be useful here, as well as libei itself? The compositor could redirect all input events to libei when using assistive technology, and the AT tools can then use libei to manipulate input as they see fit. And then strap on any extra accessibility features one would need via this portal that aren't related to input manipulation. This might also be a good chance to move some accessibility features from GNOME-specific settings (or so it seems) to a generic portal interface that apps can read, the same as the Settings portal. |
just ftr, implementation-wise this would require two libei sockets, one for the compositor to send events to the AT and then another one to receive emulated events back. But yes this would allow re-routing input events through some other process altogether. |
Looking at X11-specific code in at-spi2-core, in addition to https://gitlab.gnome.org/GNOME/at-spi2-core/-/blob/main/atspi/atspi-device-x11.c (which handles shortcuts like this) there's also https://gitlab.gnome.org/GNOME/at-spi2-core/-/blob/main/registryd/deviceeventcontroller-x11.c, which implements virtual functions like Presumably that's something used by accessibility tools that's also lacking on Wayland, and would be a natural fit for libei? Is there anything where a screenreader (or other tool) would need to monitor input events, other than for "accessibility shortcuts" like this? Those things may be handled separately, but it would be good to get confirmation of exactly what things are needed in this general area. |
I can imagine a feature where the mouse pointer would be monitored, and the object under it read. By doing so, the visually impaired user can get a sense of the visual layout of the window, for example. Of course, in this case, the screen reader does not want to consume the mouse events, it wants just to observe them and it would need the information on which window they belong to and the window relative coordinates. And, of course, then there's the question of possibly allowing to control the screen reader through consuming touch gestures aka VoiceOver on an iPhone. |
This has been discussed a few times, but repeating here as well: perhaps the at-spi API can learn how to forward mouse pointer events applications received from the windowing system, allowing them to forward the events to the AT using window-local coordinates? Wayland by design lacks the concept of global window coordinates, but making it something that happens between the application and the AT, bypassing anything "global" would avoid that obstacle. |
I think that much is already handled fine with ATSPI? The application already tracks which element has cursor focus. Orca has a key binding to read what's under the cursor. The keybinding doesn't work on Wayland with GTK4 current without an accessibility shortcuts mechanism like this. (But does on GTK3, which uses a legacy mechanism that sends all the key events over the ATSPI bus.) |
This feature is actually okay, e. g. it uses focus and selection events, but the mouse review functionality will need events send somehow as well. |
Not sure how I missed some of the comments here. As for mouse emulation: Is a portal the correct place for this feature, and is specifically an accessibility portal the best place for it? As of right now, the accessibility portal will simply allow keybindings to be bound arbitrarily, and with priority above any other system keybindss. This portal, at this time, does not offer any functionality for key emulation, and I'm not sure that it should. Likewise with mouse I/O; are we sure that emulation should be a part of this portal? Could this be on the backburner for version 2 or 3 of a portal? Adding bindings for keyboard, mice, etc. and emulation of those events makes for a much more complex portal that will be harder to get merged anywhere. Are we sure we want to go down that road? EDIT: The advantage is that obviously this would open the door to "autohotkey-style" programs across Wayland and X11 boundaries without reading events directly from And is that an accessibility portal, or a completely different beast? That's basically an "active event manager" moreso than any accessibility feature; although, of course it could still be used by assistive technologies. |
Note that there already is a "mouse emulation" portal - Either way, I think it makes sense to start with an a11y portal that starts somewhere, e.g. shortcuts. Whether it should be |
I think this makes more sense. |
org.freedesktop.portal.Accessibility
)org.freedesktop.portal.AT.Shortcuts
)
fwiw I think it makes sense to work on this in bits and pieces at a time, like @jadahl said, but keep it under one name - Another path would be to make another dbus name owner, specifically for |
IMO anything with intercepting keybinds this way needs a hard limit that only one AT can use it at a time. Probably the behavior there is to just auto "disconnect" the first AT and then use the newest one. If there is anything more complicated than that, it will very quickly run back into the status quo of X11 grabs fighting over priority which we all know is not desirable, and would be even worse to try to present to a typical AT user... |
If we'll use the method in #1046 (comment) than I think this can be handled by the DE actively selecting the active AT / screen reader, properly handling transitioning from one reader to another, while allowing to revert back to the first if the new one fails to work (a bit like gnome-shell / gnome-control-center handles applying monitor configs). |
Finally, what type of information does a screen reader want to access?
And still: Can the screen reader use its own text-to-speech engine and braille system or do we assume it will use the system's? |
@Mikenux all of the above. Screen readers need access to a lot in order to be useful. |
Pretty much all of that information is already provided by AT-SPI (and is already able to negotiate between multiple ATs for the info) so is not relevant to this issue. The issue here is only about how a third party AT handles keybindings. |
Yeah, most of these things are already handled though AT-SPI. Shortcuts are the main thing that needs a seperate protocol to handle effectively on Wayland. Some other things may be needed, but I'm somewhat unsure on that (see my earlier comment). I think we'll need clarification from AT-SPI/Orca/etc. maintainers before considering adding anything else. But if it's established that a portal like this is the best solution for handling shortcuts, that can be done before adding anything else. This is, as I understand, the largest accessibility issue (at least as far as screen-readers go) on Wayland at the moment. |
@TTWNO is working on it |
I am no longer working on this. If somebody else would like to take over, they are welcome to. Always willing to help, and answer questions. |
Thanks @TTWNO for paving the way GNOME Foundation is planning on picking this up after we implement global shortcuts. In the meantime, it would be very helpful for someone else to go ahead and get a prototype of this in another compositor / portal backend. |
This comment from a sibling discussion clarifies that:
and
Now, how representative is Orca among screen readers? Should the accessibility portal have xkb-like modifiers (hold-and-press) as well as Orca-like latching modifiers (press and press)? Or is one enough? Normal shortcuts can be triggered with multiple modifiers, is that still a thing for Orca's latching modifiers? Does it even make sense? On the technical side: If latches are desired, what should be their syntax? Normal modifiers have the following syntax according to XDG shortcuts: "Shift+Alt+J". Is latching a property of a key that can be composed? That would result in syntax like "Ins-Shift-J". |
@dcz-self: The Orca key is meant to function like a real modifier key. For instance in an app with a Help menu, I would expect Alt+H to open that menu. Similarly in Orca, Orca+H puts you in Orca's "learn mode." In both cases, one holds down the modifier (Alt or Orca) and then presses the H. The Orca modifier can be used with other, official modifiers. For instance, Orca+Ctrl+Space puts you in Orca's preference's dialog for the active application (e.g. Orca preferences for Gedit). Most screen readers work in this same fashion, including NVDA and JAWS on Windows and VoiceOver on macOS. They all have a screen reader key (NVDA and JAWS also use CapsLock and KP Insert last time I checked). I'm afraid I do not understand what you mean by "latches". Please clarify. Thanks! |
Thanks. Meanwhile someone else explained to me that latching is what Caps Lock does (even though the name indicates "locking" ;)). |
at least in XKB it can be either, latching or locking. Locking generally means "until the next key press of the same key" and latching means "until a key press of any other key". The sequence Caps, A, A, A, will thus produce "AAA" when locking and "Aaa" when latching. |
The Problem
Accessibility is broken in a big way on Wayland. This is because intercepting and re-transmitting global keybindings is no longer permitted, and fair enough! What a security nightmare!
In order to bring security to "normal" applications, the Global Shortcuts portal was proposed, adopted, and even implemented by KDE (so I hear).
Assistive technology, however, is traditionally considered a component with "exceptions" to the rules: "yeah sure, just snoop the keyboard", "it's inherently insecure so who cares". This is because in the most obvious case (a screen reader), it actually has access to what is on your screen anyway (any text, of any document, webpage, terminal output, etc.) and therefore often seems insecure be definition.
In an effort to integrate accessibility into Linux in a way that does not inherently require insecurity, just permission, we've turned to portals.
The Solution
The most viable path for permissions-based accessibility in Linux is to model it after other systems which have already done the hard work of finding the (mostly) right abstractions.
In this case, I'm going to recommend we emulate the behaviour of Android, since it's: a) already Linux-based, b) has an accessibility permissions system, and c) sandboxes most applications from the operating system—which seems to be the direction of Linux as well; the only difference for Linux is that we'll have both native applications and sandboxed applications to deal with.
After reaching out for advice over in the GNOME accessibility Matrix room, and some chats over on a wayland-protocols issue, there seems to be quite the consensus on implementing the global shortcuts portal for assistive technologies to be able to do their job.
However, the existing global shortcuts portal does not have quite all the features/permission granularity requirements to use an assistive technology at this point.
The Requirements
There are a few requirements for an accessibility portal:
Insert + a
,Capslock + x
, orh
all on its own).{sa(sa{sv}}
), then allow the set of bound shortcuts to be added to or removed in bulk by using the string key of that map. These changes often happen multiple times per second, and are somewhat time sensitive, since if a user presses a key combination likeCapslock + a
(toggle browse mode), immediately followed byh
(a key used in browse mode), a screen reader user would expect thath
is already bound, even if it was not before theCapslock + a
.I'm looking for comments, implementation concerns, links to related issues, requirements and edge cases I have not yet covered, and to gauge general interest in this proposal.
Once I've chatted with a few of you here on the issue page, or by email (tait@tait.tech) if you don't have a GH account, I'll begin a portal draft, go through RFC, and see if we can get this hammered into a standard.
I will help with the implementation of the portal.
(I am being payed to work on this process, including implementation; responses will be fast during
UTC-6
working hours.)The text was updated successfully, but these errors were encountered: