Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accessibility Shortcuts Portal Proposal (org.freedesktop.portal.AT.Shortcuts) #1046

Open
TTWNO opened this issue Jun 29, 2023 · 63 comments
Open
Labels
needs discussion Needs discussion on how to implement or fix the corresponding task new portals This requires creating a new portal interface

Comments

@TTWNO
Copy link

TTWNO commented Jun 29, 2023

The Problem

Accessibility is broken in a big way on Wayland. This is because intercepting and re-transmitting global keybindings is no longer permitted, and fair enough! What a security nightmare!
In order to bring security to "normal" applications, the Global Shortcuts portal was proposed, adopted, and even implemented by KDE (so I hear).

Assistive technology, however, is traditionally considered a component with "exceptions" to the rules: "yeah sure, just snoop the keyboard", "it's inherently insecure so who cares". This is because in the most obvious case (a screen reader), it actually has access to what is on your screen anyway (any text, of any document, webpage, terminal output, etc.) and therefore often seems insecure be definition.

In an effort to integrate accessibility into Linux in a way that does not inherently require insecurity, just permission, we've turned to portals.

The Solution

The most viable path for permissions-based accessibility in Linux is to model it after other systems which have already done the hard work of finding the (mostly) right abstractions.
In this case, I'm going to recommend we emulate the behaviour of Android, since it's: a) already Linux-based, b) has an accessibility permissions system, and c) sandboxes most applications from the operating system—which seems to be the direction of Linux as well; the only difference for Linux is that we'll have both native applications and sandboxed applications to deal with.

After reaching out for advice over in the GNOME accessibility Matrix room, and some chats over on a wayland-protocols issue, there seems to be quite the consensus on implementing the global shortcuts portal for assistive technologies to be able to do their job.
However, the existing global shortcuts portal does not have quite all the features/permission granularity requirements to use an assistive technology at this point.

The Requirements

There are a few requirements for an accessibility portal:

  1. The user must be asked once whether the application should be allowed as an assistive technology. The "never", "just this time", and "always" options seem appropriate here (remember what Android displays when asking for your location).
  2. After giving permission, the assistive technology should be able to have a similar API to the global shortcuts API, but with some more stringent requirements:
    • It must be allowed to bind and unbind actions at will.
    • It must be able to bind to any input event, even ones which would not be "normally" expected. (i.e., Insert + a, Capslock + x, or h all on its own).
    • Since a screen reader often adds and removes the same set of events often, and quickly, it may be worth allowing the assistive technology to define a map to a list of shortcuts ({sa(sa{sv}}), then allow the set of bound shortcuts to be added to or removed in bulk by using the string key of that map. These changes often happen multiple times per second, and are somewhat time sensitive, since if a user presses a key combination like Capslock + a (toggle browse mode), immediately followed by h (a key used in browse mode), a screen reader user would expect that h is already bound, even if it was not before the Capslock + a.
  3. It must have a way for an implementer to not have to show the prompt to a user at all and simply accept it outright as a default assistive technology. If a screen reader can not start properly, bind the keys required, have access to the contents on the screen, etc. then, someone who needs a screen reader will not be able to select an option without sighted assistance (this is very, very bad).
  4. All events bound via this portal must have priority above the global shortcuts portal. A currently running application should never have the ability to have a shortcut defined that an assistive technology wants. This is also something which can cause the need for slighted assistance (again, very bad).

I'm looking for comments, implementation concerns, links to related issues, requirements and edge cases I have not yet covered, and to gauge general interest in this proposal.
Once I've chatted with a few of you here on the issue page, or by email (tait@tait.tech) if you don't have a GH account, I'll begin a portal draft, go through RFC, and see if we can get this hammered into a standard.
I will help with the implementation of the portal.

(I am being payed to work on this process, including implementation; responses will be fast during UTC-6 working hours.)

@TTWNO
Copy link
Author

TTWNO commented Jun 29, 2023

As an aside, I expect many more accessibility features will want a portal implementation at some point. For example, if an application already has the permission to be an assistive technology, it may want to zoom in, or move a user's mouse around (not read the position, I'm aware of the mouse position idea). Both things that are traditionally at least able to be triggered by the screen reader, even if the screen reader itself does not provide the functionality.

@tyrylu
Copy link

tyrylu commented Jun 29, 2023

And, when talking about a screen reader portal, we might allow the screen reader to request a mouse event for a window at a particular window coordinates of that window, you unfortunately need that operation for some broken websites.

@TTWNO
Copy link
Author

TTWNO commented Jun 29, 2023

And, when talking about a screen reader portal, we might allow the screen reader to request a mouse event for a window at a particular window coordinates of that window, you unfortunately need that operation for some broken websites.

Mouse events at particular coordinates are best handled by AT-SPI, not portals.

@jadahl
Copy link
Collaborator

jadahl commented Jun 29, 2023

The user must be asked once whether the application should be allowed as an assistive technology. The "never", "just this time", and "always" options seem appropriate here (remember what Android displays when asking for your location).

A counter proposal for this, that makes it harder for applications to "sneakily" make themself accessibility technology by applying bad design: instead of "requesting accessibility", have the accessibility utility via the desktop file advertise that they implement a certain aspect of accessibility. With that, require Settings (or the equivalent in e.g. KDE) to make these discoverable and possible to explicitly enable.

A accessibility application would instead of nagging the user with "please let me eaves drop on you" instead issue some request via the portal to open the accessibility setting, instructing the user to actively select $APP as the "screen reader" for example.

This is not new, it is how some permissions are dealt with in Android.

@jadahl
Copy link
Collaborator

jadahl commented Jun 29, 2023

it may want to zoom in

Zooming in can only practically be implemented by the compositor itself. In GNOME there is possible via the a11y menu, but could perhaps be something that can be triggered via a potential accessibility portal API.

@TTWNO
Copy link
Author

TTWNO commented Jun 29, 2023

A accessibility application would instead of nagging the user with "please let me eaves drop on you" instead issue some request via the portal to open the accessibility setting, instructing the user to actively select $APP as the "screen reader" for example.

This shouldn't need to make a screen reader your default to have it usable. I don't mind a combination of these features where:

  1. The default screen reader does not need to ask permission explicitly (i.e., the implementer can say "you are my default screen reader, go ahead").
  2. Any other assistive technology that would like to launch would be able to ask explicitly.

@TTWNO
Copy link
Author

TTWNO commented Jun 29, 2023

Zooming in can only practically be implemented by the compositor itself. In GNOME there is possible via the a11y menu, but could perhaps be something that can be triggered via a potential accessibility portal API.

Fair. And I brought it up, but let's leave this for another portal.

@jadahl
Copy link
Collaborator

jadahl commented Jun 29, 2023

This shouldn't need to make a screen reader your default to have it usable. I don't mind a combination of these features where:

Correct me if I'm wrong, but I assume one would only have one screen reader active at any point in time, as otherwise I imagine it'd be quite a noisy experience. So if you have a default bundled screen reader, it'd already be selected, thus wouldn't need to do anything to work. The only time you need to change that setting is if you have another screen reader you want to change to.

If you have non-screen-reader like assistive technology that needs to have the same level of "eaves-dropping-ness", then that would need to go into that Settings panel, but installing such a thing would mean it would be discoverable, and possible to enable.

There is a general problem with "nag" like permissions, which is that the user sooner or later "gives up" or doesn't care if there is a yes/no permission. In other words, they don't really help. Portals are, when possible, designed to avoid this issue. For access to files, a file selector is always presented, for sharing the screen, one have to select the screen to share, and so on.

For something as problematic as getting an introspected view of everything going on on the computer we should try hard to avoid ending up with a Yes/No like dialog, and taking inspiration from Android, making it explicit configuration one is asked to do seems like a good way to mitigate things.

@TTWNO
Copy link
Author

TTWNO commented Jun 29, 2023

There is a general problem with "nag" like permissions, which is that the user sooner or later "gives up" or doesn't care if there is a yes/no permission. In other words, they don't really help. Portals are, when possible, designed to avoid this issue. For access to files, a file selector is always presented, for sharing the screen, one have to select the screen to share, and so on.

Agreed..... but this makes my situation (where I switch back and forth between two screen readers for testing) an absolute nightmare. I agree from a normal users' perspective this makes sense, though.

EDIT: As long as it is possible for a distribution to ship with a default screen reader, then automatically run that without user interaction, that's fine by me. I'm sure as a dev, I can find a script that'll just swap this setting for me.

@jadahl
Copy link
Collaborator

jadahl commented Jun 29, 2023

Agreed..... but this makes my situation (where I switch back and forth between two screen readers for testing) an absolute nightmare. I agree from a normal users' perspective this makes sense, though.

A rather peculiar use case :) but I imagine this could be scriptable in one way or the other, in most DE:s, e.g. a gsetting in gnome.

@jadahl
Copy link
Collaborator

jadahl commented Jun 29, 2023

EDIT: As long as it is possible for a distribution to ship with a default screen reader, then automatically run that without user interaction, that's fine by me. I'm sure as a dev, I can find a script that'll just swap this setting for me.

This is critical yes, but doable with the solution I'm suggesting, I believe.

@TTWNO TTWNO changed the title Accessibility Shortcuts Portal Proposal Accessibility Shortcuts Portal Proposal (org.freedesktop.portal.Accessibility) Jun 29, 2023
@smcv
Copy link
Collaborator

smcv commented Jun 29, 2023

makes it harder for applications to "sneakily" make themself accessibility technology by applying bad design

I think this is really important. Android's accessibility portal-equivalent literally does get used by malware (for example), precisely because the interface is so powerful.

@Mikenux
Copy link

Mikenux commented Jun 29, 2023

The first question is: Are being aware of input events and emitting input events (shortcuts) valid actions to say for sure that an application has accessibility features? If not (which is the case here), then an app certainly can't request accessibility access (whether through a dialog or by opening the accessibility settings). This would be a lie.

(if relevant) The second question is: Is it okay to let an app potentially control the system or other apps, even if asked? Example: accessibility can send auto-generated shortcuts within apps. I don't think so.

@jadahl
Copy link
Collaborator

jadahl commented Jun 30, 2023

This is an attempt to summarize a rough proposal that was discussed on the GNOME a11y Matrix room yesterday:

Assistive Technology

Assistive Technology (AT) are rather special when it comes to the type and breadth of access they need to users system. They need to be able to read out loud what widget is focused in what window is focused, and what letter is entered in what text field. From a privacy and sandbox perspective, the needs of AT are very problematic, as they for all practical purposes need to perform total surveillance of everything user "sees" and does. It would be disastrous if a rogue application would get the same level of access that an AT gets, but at the same time, people may want to install additional or replace existing AT to help them with using the computer.

So, in one way or the other, if we want AT to be distributable in a safe and relatively sandbox friendly way (e.g. Flatpak), we need a portal that can handle access to the resources the system has to make available for the AT to work. At the same time, we need to be very careful in exactly how a user can use install and use AT, without accidentally enabling malware to get the same level of access to resources regular applications shouldn't have access to. At the same time, it needs to be easy enough and discoverable how to e.g. switch to another screen reader or adding additional AT.

Access types

Initially, two types of access types have high priority, and are critical to AT, that are focused upon first.

Priority keyboard shortcuts

Previously, in X11, this has been implemented by grabbing key events from the X server, but doing so is problematic, and seen as something very undesirable in Wayland compositors, as having to round trip for key events to one or more ATs is very problematic.

Instead, a solution to this is to provide something similar to org.freedesktop.portal.GlobalShortcuts, with the difference being that the AT freely can register keyboard shortcuts that unconditionally and immediately is respected by the display server.

As with the global shortcuts portal, the display server would translate a stream of key events into triggered shortcuts, that the AT would then be signalled about.

For this the shortcuts xdg-spec might need to be expanded to handle the use cases needed by ATs.

This would avoid any display server AT round trips, but still allow shortcuts for ATs to have priority over other shortcuts on the system.

Access to the accessibility bus

The accessibility bus is a dedicated D-Bus bus where applications describe what the user is currently interacting with. Access to this is the most problematic, as it allows the application to fully introspect what is going on on the computer, including reading any text, reading everything the user types, etc.

I'll leave the details of how to practically open such a bus, but in one way or the other, e.g. by opening file descriptors, it could be done with API on an accessibility portal.

Handling access granting requirements

As mentioned, we must try hard avoid rouge application that want to trick users into letting them spy on them, but we also need to make it possible to let distributions pre-install a screen reader that should have access without needing to query the user, as without said screen reader, the user wouldn't be able to interact with the computer at all. What this means in practice are these things:

  • We cannot grant access by default to any installed, as that would mean any application can freely spy on the user as much as they want.
  • We should avoid designing a portal that depends on portal dialogs asking the user to grant access. The reason for this is that users that get nagged with dialogs practically asking them to "make things work" will eventually give up and just accept what the application is asking.
  • We must make it possible for distributions to pre-install an AT application, that has been granted enough access to e.g. act as a screen reader.
  • It'd be good if e.g. an AT settings app could still "guide" and help the user how to grant access.

Access handling proposal

Make giving access to an AT an explicit operation similar to other system configuration, and not something directly requested by the AT application itself.

The way this would work is making it possible to discover, switch and add AT via the accessibility panel of the settings application used in the desktop environment.

Discovery

Discovery would be handled by AT applications adding a field to their .desktop file declaring that they provide AT. The settings app would for example show a list of discovered ATs, and add a way to exchange one AT with another (e.g. change screen reader), or add additional ATs (e.g. braille integration software). Behind the scene, the settings app would configure e.g. the permission store with access rules. Ideally settings apps should handle "confirming" a change, to make sure one doesn't switch to a screen reader that doesn't actually work.

The desktop file and the new field would have no use other than helping with discovery.

The primary way after having installed a new AT would be to go to the accessibility panel of the settings app, and switch to or enable the newly installed AT.

Assisted discovery

There might be desirable to allow a window used for e.g. configuring an AT to assist the user with making it easy to find the Accessibility panel in the Settings app. This could work by for example having a OpenAccessibilitySettings() method on the portal, or a generic portal for opening some particular Settings panel.

Note that, in theory it would be possible for portal backends to implement "give me permission"-dialogs with such method call, but the advice would be not to, given the reasons listed earlier.

Granularity

Having granular access control might be desirable, and doing so is not necessarily more complicated. A DE might want to simplify things and e.g. give access to both unrestricted keyboard shortcuts and the accessibility bus with a single option, while others might want to give more granular control up front. Manipulating the permission store via third party applications (e.g. Flatseal) would be possible if the permission store is used in a portable manner.

Sane defaults

Distributions should be able to pre-install a screen reader, and make it possible to use without any user interaction. With this model, this would be achievable by making sure distributions can pre-populate e.g. the permission store with any installed screen reader application, while ensuring it is launched in a way that makes the portal correctly identify the app ID it has. With the permission store setup for each new user, it should not matter if the screen reader is pre-installed as a flatpak, via a traditional package, or part of an immutable OS image, as long as it is launched correctly.

Development & testing experience

A concern raised with a method like this was the development experience for developing e.g. a screen reader, or testing different ones often; having to interact with Settings in a very manual way can be quite annoying if one has to do it very often.

This can be mitigated by making sure changing permissions possible via scripts. If permission handling is done using the permission store, this should be relatively simple. Improved xdg-desktop-portal documentation about how to run executable from the command line allowing portals to correctly identify the app ID correctly would also make things easy as well, and developers would not need to do much more than just running the executable.

Edit: added part about distribution default.

@TTWNO
Copy link
Author

TTWNO commented Jun 30, 2023

Thanks you, @jadahl ! This is a great encapsulation of what we discussed. I'm going to create a simple set of calls for the portal here, and see if there are further comments:

Methods

// if this fails, it is expected that the client will call AccessibilitySettings
CreateSession (IN  a{sv}     options,
               OUT o         handle);
// set all possible shortcuts this assistive technology will use;
// all shortcuts are disabled by default
SetShortcuts (IN  o         session_handle,
               IN  {sa(sa{sv})} shortcuts,
               IN  s         parent_window,
               IN  a{sv}     options,
               OUT o         request_handle);
// change active shortcuts used by the implementation
// since shortcuts are defined as a dictionary with a string key and list of shortcuts as a value, we can enable and disable them en mass via the keys
// this is seen as a convenience method, since ATs often change hundreds of keybindings within the span of a keystroke.
ChangeActiveShortcuts (IN o        session_handle,
               IN as          enabled_shortcut_lists,
               IN as          disabled_shortcut_lists,
               OUT o        request_handle);
// return a list of *all* shortcuts defined via this portal
ListShortcuts (IN  o         session_handle,
               IN  a{sv}     options,
               OUT o         request_handle);
// open an implementation defined accessibility settings panel, where additional assistive technologies can be granted permission to use this portal
// the lack of a session handle means this method may be called without the success of CreateSession, and a client will normally run this if RequestSession fails
AccessibilitySettings (IN a{sv}       options,
               OUT o         handle);
// request the name of the global accessibility bus
AccessibilityBus (IN o          session_handle,
               IN a{sv}        options,
               OUT o          request_handle);

Signals

Activated        (o         session_handle,
                  s         shortcut_id,
                  t         timestamp,
                  a{sv}     options);
Deactivated      (o         session_handle,
                  s         shortcut_id,
                  t         timestamp,
                  a{sv}     options);
ShortcutsChanged(o         session_handle,
                  {sa(sa{sv})} shortcuts);
ActivatedShortcutsChanged(o         session_handle,
                  as     activated_shortcut_groups,
                  as     deactivated_shortcut_groups);

And finally, the standard properties: version readable u.

Should this be a PR at this poiint? Continue the discussion there?

@Mikenux
Copy link

Mikenux commented Jul 1, 2023

Questions:

  • If the shortcuts can be changed without user approval, are they shortcuts such as "pressing" a button?
  • Are shortcuts managed by the system (i.e. the application cannot generate such an event by itself)?
  • Are there any other apps that could use this feature other than just for accessibility reasons?

As for accessing the accessibility bus, if that means accessing private information, then the user should be aware of that.

@TTWNO
Copy link
Author

TTWNO commented Jul 1, 2023

As for accessing the accessibility bus, if that means accessing private information, then the user should be aware of that.

Yes. The user would be aware of that by virtue of adding the application to the list of accessibility applications (which will be opened by the AccessibilitySettings method). And yes, the accessibility bus is what would allow an AT to read the contents of things of the screen.

Are there any other apps that could use this feature other than just for accessibility reasons?

An application that sets up realtime macro shortcuts could use this. They could set F8 to "bind new macro", then trap all keys, get a combination, followed by a set of keys to reproduce later. Then, set up an action with this protocol that would replay the sequence of keys via some other method. Niche, but not unheard of on other operating systems (Windows).

If the shortcuts can be changed without user approval, are they shortcuts such as "pressing" a button?

I may be misunderstanding the question, so feel free to correct me. Shortcuts are redefined based on context. So for example, the simple fact that a user is inside a document (web or libreoffice) would set different shortcuts than being in a simple GUI application. Being in a text box changes the shortcuts, your focus on certain types of items changes the shortcuts. It is extremely dependent on the current context, and would not generally change because a user "pressed a button".

Are shortcuts managed by the system (i.e. the application cannot generate such an event by itself)?

The AT should not generate an input event, or something which would become a shortcut, no.

@Mikenux
Copy link

Mikenux commented Jul 1, 2023

If the shortcuts can be changed without user approval, are they shortcuts such as "pressing" a button?

I may be misunderstanding the question, so feel free to correct me. Shortcuts are redefined based on context. So for example, the simple fact that a user is inside a document (web or libreoffice) would set different shortcuts than being in a simple GUI application. Being in a text box changes the shortcuts, your focus on certain types of items changes the shortcuts. It is extremely dependent on the current context, and would not generally change because a user "pressed a button".

Indeed, I was not clear.

Here's an example scenario: The "accessibility" app has access to all content, so knows when you type text, what actions you trigger, and may possibly misread things on purpose. Since it can reassign shortcuts at will, can we imagine that the app can assign the action "delete" or "press the button" (like a delete button) to a shortcut that you use frequently (but which is not the shortcut you defined)?

@TTWNO
Copy link
Author

TTWNO commented Jul 1, 2023

I suppose that would technically be possible, @Mikenux

@jadahl
Copy link
Collaborator

jadahl commented Jul 3, 2023

Methods

// set all possible shortcuts this assistive technology will use;
// all shortcuts are disabled by default
SetShortcuts (IN  o         session_handle,
               IN  {sa(sa{sv})} shortcuts,
               IN  s         parent_window,
               IN  a{sv}     options,
               OUT o         request_handle);

I think this is likely the only method needed regarding shortcuts, if the intention is for a11y shortcuts to always take precedence without any user interaction. It also means parent_window isn't needed, since there would never be any dialogs.

Might be useful to let the backend communicate what shortcuts it managed to set though, it cannot really be 100% unconditional. It'll depend on implementation abilities and a limited set of combinations (e.g. escape hatch) the compositor might want to have.

Changing between "modes" would just set new shortcuts.

@jadahl
Copy link
Collaborator

jadahl commented Jul 3, 2023

As for accessing the accessibility bus, if that means accessing private information, then the user should be aware of that.

Yes, it'd be a tricky design task to some how educate the user while they are configuring things.

Since it can reassign shortcuts at will, can we imagine that the app can assign the action "delete" or "press the button" (like a delete button) to a shortcut that you use frequently (but which is not the shortcut you defined)?

I imagine A-Z, delete, backspace and enter could perhaps be "shortcuts" that the portal backend can disallow even for an AT, but fundamentally, the possibility that an app disguising itself as an AT can use an a11y portal to do really terrible things is a real problem and hard to solve.

@TTWNO
Copy link
Author

TTWNO commented Jul 3, 2023

I imagine A-Z, delete, backspace and enter could perhaps be "shortcuts" that the portal backend can disallow even for an AT

This will not be possible. I can't say for sure on Backdpace and enter, but individual characters and Shift+a singular character are very common shortcuts used by a screen reader.

EDIT: I've just confirmed the backspace and enter are also used in some modes of operation.

@TTWNO
Copy link
Author

TTWNO commented Jul 3, 2023

the possibility that an app disguising itself as an AT can use an a11y portal to do really terrible things is a real problem and hard to solve.

Right now. Any binary can just read and interact with the accessibility layer with no permissions at all. So this will still be massive progress.

@Mikenux
Copy link

Mikenux commented Jul 3, 2023

It would be better to warn the user when shortcuts are assigned to delete/destructive actions. However, even if it would be possible to detect such actions, these shortcuts must be stable across contexts, and the system screen reader must read them instead of the app's screen reader (or at least give a hint). The same may be true for the "push the button" action, although it could be limited to destructive actions.

The main thing is to avoid any destructive actions. Any other bad but non-destructive behavior (e.g. misreading) is something the user should notice. Therefore, a way to easily disable the problematic app is needed.

@TTWNO
Copy link
Author

TTWNO commented Jul 3, 2023

It would be better to warn the user when shortcuts are assigned to delete/destructive actions.

What exactly do you mean by destructive actions?

@TTWNO
Copy link
Author

TTWNO commented Jul 3, 2023

I think this is likely the only method needed regarding shortcuts, if the intention is for a11y shortcuts to always take precedence without any user interaction. It also means parent_window isn't needed, since there would never be any dialogs.

Ah I see. Thanks for the clarification.

Might be useful to let the backend communicate what shortcuts it managed to set though, it cannot really be 100% unconditional. It'll depend on implementation abilities and a limited set of combinations (e.g. escape hatch) the compositor might want to have.

Yes, this is probably a good idea. This would be sent by the ShortcutsChanged signal,.

Changing between "modes" would just set new shortcuts.

The only reason I suggested otherwise is that changing the events would be a fairly large request, potentially in the 1-2KB+ range, since every possible shortcut, with namespaced actions attached could be quite a large list, and it needs to be updated nearly instantaneously for a good experience—I was worried about the round-trip time for such a large piece of data.

Perhaps I'm thinking a bit too low-level for a portal? I'd need input from others on what latencies would be considered acceptable for this. I'm trying to avoid a situation where a user presses two shortcuts close together, and the first one changes what shortcuts are available. Ideally this would never happen, since under the current "the AT grabs all input events" system, this is not possible, which at least has the advantage of always being correct, even if it is an order-of-magnitude less secure.

@TTWNO
Copy link
Author

TTWNO commented Jul 3, 2023

the system screen reader must read them instead of the app's screen reader (or at least give a hint).

What is your meaning here? I'm not sure I understand what you mean in terms of the distinction between a system screen reader and "an app's screen reader".
Generally, a screen reader handles accessibility across all applications on a system, and it is extremely rare for individual apps to have their own "screen readers"—these are generally called "self-voicing" applications, since they do not require a screen reader to function, but it would still not be called a screen reader.

The vast, vast majority of applications rely an external screen readers to provide accessibility, and those that do not generally just require that a user disable their current screen reader to use it.

@Mikenux
Copy link

Mikenux commented Jul 3, 2023

I used the "destructive" action just to be general, referring to the term used in GNOME. Another destructive action other than "Delete" is "Discard", for example. If it's already communicated, that's fine.

Thank you for the precision on the difference between a screen reader and a "self-voicing" application.

@TTWNO
Copy link
Author

TTWNO commented Jul 3, 2023

If it's already communicated, that's fine.

Yes, so in this case focusing a button labeled "delete", or "remove", would speak the label of the button. So the user would be aware of what they are doing.
Is that what you're trying to say?

@orowith2os
Copy link

orowith2os commented Jul 19, 2023

Would the appropriate libei portal implementations be useful here, as well as libei itself? The compositor could redirect all input events to libei when using assistive technology, and the AT tools can then use libei to manipulate input as they see fit.

And then strap on any extra accessibility features one would need via this portal that aren't related to input manipulation.

This might also be a good chance to move some accessibility features from GNOME-specific settings (or so it seems) to a generic portal interface that apps can read, the same as the Settings portal.

@whot
Copy link
Contributor

whot commented Jul 19, 2023

just ftr, implementation-wise this would require two libei sockets, one for the compositor to send events to the AT and then another one to receive emulated events back. But yes this would allow re-routing input events through some other process altogether.

@ids1024
Copy link

ids1024 commented Aug 15, 2023

Looking at X11-specific code in at-spi2-core, in addition to https://gitlab.gnome.org/GNOME/at-spi2-core/-/blob/main/atspi/atspi-device-x11.c (which handles shortcuts like this) there's also https://gitlab.gnome.org/GNOME/at-spi2-core/-/blob/main/registryd/deviceeventcontroller-x11.c, which implements virtual functions like synth_keycode_press, synth_keycode_release, spi_dec_x11_emit_modifier_event, and spi_dec_x11_generate_mouse_event.

Presumably that's something used by accessibility tools that's also lacking on Wayland, and would be a natural fit for libei? Is there anything where a screenreader (or other tool) would need to monitor input events, other than for "accessibility shortcuts" like this?

Those things may be handled separately, but it would be good to get confirmation of exactly what things are needed in this general area.

@tyrylu
Copy link

tyrylu commented Aug 16, 2023

I can imagine a feature where the mouse pointer would be monitored, and the object under it read. By doing so, the visually impaired user can get a sense of the visual layout of the window, for example. Of course, in this case, the screen reader does not want to consume the mouse events, it wants just to observe them and it would need the information on which window they belong to and the window relative coordinates. And, of course, then there's the question of possibly allowing to control the screen reader through consuming touch gestures aka VoiceOver on an iPhone.

@jadahl
Copy link
Collaborator

jadahl commented Aug 16, 2023

I can imagine a feature where the mouse pointer would be monitored, and the object under it read.

This has been discussed a few times, but repeating here as well: perhaps the at-spi API can learn how to forward mouse pointer events applications received from the windowing system, allowing them to forward the events to the AT using window-local coordinates? Wayland by design lacks the concept of global window coordinates, but making it something that happens between the application and the AT, bypassing anything "global" would avoid that obstacle.

@ids1024
Copy link

ids1024 commented Aug 16, 2023

I can imagine a feature where the mouse pointer would be monitored, and the object under it read.

I think that much is already handled fine with ATSPI? The application already tracks which element has cursor focus. Orca has a key binding to read what's under the cursor. The keybinding doesn't work on Wayland with GTK4 current without an accessibility shortcuts mechanism like this. (But does on GTK3, which uses a legacy mechanism that sends all the key events over the ATSPI bus.)

@tyrylu
Copy link

tyrylu commented Aug 17, 2023

This feature is actually okay, e. g. it uses focus and selection events, but the mouse review functionality will need events send somehow as well.

@TTWNO
Copy link
Author

TTWNO commented Aug 22, 2023

Not sure how I missed some of the comments here.

As for mouse emulation:

Is a portal the correct place for this feature, and is specifically an accessibility portal the best place for it? As of right now, the accessibility portal will simply allow keybindings to be bound arbitrarily, and with priority above any other system keybindss.

This portal, at this time, does not offer any functionality for key emulation, and I'm not sure that it should. Likewise with mouse I/O; are we sure that emulation should be a part of this portal? Could this be on the backburner for version 2 or 3 of a portal?

Adding bindings for keyboard, mice, etc. and emulation of those events makes for a much more complex portal that will be harder to get merged anywhere. Are we sure we want to go down that road?

EDIT:

The advantage is that obviously this would open the door to "autohotkey-style" programs across Wayland and X11 boundaries without reading events directly from evdev, which I do personally believe would be a boon to the Linux community. But I'm extremely nervous around feature creep, and of the accessibility portal, the primary concern of getting keybinds working for screen readers in Wayland being sidelined by a much bigger, more complex issue.

And is that an accessibility portal, or a completely different beast? That's basically an "active event manager" moreso than any accessibility feature; although, of course it could still be used by assistive technologies.

@jadahl
Copy link
Collaborator

jadahl commented Aug 22, 2023

Note that there already is a "mouse emulation" portal - org.freedesktop.portal.RemoteDesktop. With that said, every feature that AT's in particular uses should ideally be seen from a "end goal" perspective to see if it can be done in a better way (e.g. binding shortcuts instead of key event roundtrips). Not sure how to plan things the best way, but comments in issues are very easy to loose track of.

Either way, I think it makes sense to start with an a11y portal that starts somewhere, e.g. shortcuts. Whether it should be org.freedesktop.portal.Accessibility or e.g. org.freedesktop.portal.AT.Shortcuts I don't know, feature creep is a valid concern.

@TTWNO
Copy link
Author

TTWNO commented Aug 22, 2023

org.freedesktop.portal.AT.Shortcuts

I think this makes more sense.

@TTWNO TTWNO changed the title Accessibility Shortcuts Portal Proposal (org.freedesktop.portal.Accessibility) Accessibility Shortcuts Portal Proposal (org.freedesktop.portal.AT.Shortcuts) Aug 22, 2023
@orowith2os
Copy link

fwiw I think it makes sense to work on this in bits and pieces at a time, like @jadahl said, but keep it under one name - org.freedesktop.portal.Accessibility. Having things like shortcuts implemented as methods probably makes sense.

Another path would be to make another dbus name owner, specifically for org.freedesktop.portal.Accessibility, but have the interface names be different for each feature of the portal - shortcuts, mouse emulation, screen contents, and so on. More or less emulating what the portals are right now.

@jf2048
Copy link

jf2048 commented Sep 6, 2023

Agreed..... but this makes my situation (where I switch back and forth between two screen readers for testing) an absolute nightmare. I agree from a normal users' perspective this makes sense, though.

EDIT: As long as it is possible for a distribution to ship with a default screen reader, then automatically run that without user interaction, that's fine by me. I'm sure as a dev, I can find a script that'll just swap this setting for me.

IMO anything with intercepting keybinds this way needs a hard limit that only one AT can use it at a time. Probably the behavior there is to just auto "disconnect" the first AT and then use the newest one. If there is anything more complicated than that, it will very quickly run back into the status quo of X11 grabs fighting over priority which we all know is not desirable, and would be even worse to try to present to a typical AT user...

@jadahl
Copy link
Collaborator

jadahl commented Sep 7, 2023

IMO anything with intercepting keybinds this way needs a hard limit that only one AT can use it at a time.

If we'll use the method in #1046 (comment) than I think this can be handled by the DE actively selecting the active AT / screen reader, properly handling transitioning from one reader to another, while allowing to revert back to the first if the new one fails to work (a bit like gnome-shell / gnome-control-center handles applying monitor configs).

@Mikenux
Copy link

Mikenux commented Sep 7, 2023

Finally, what type of information does a screen reader want to access?

  • App/Window names?
  • UI Structure (e.g. a button, titlebar, main pane)?
  • Text of a document?
  • Other?

And still: Can the screen reader use its own text-to-speech engine and braille system or do we assume it will use the system's?

@orowith2os
Copy link

@Mikenux all of the above. Screen readers need access to a lot in order to be useful.

@jf2048
Copy link

jf2048 commented Sep 7, 2023

Pretty much all of that information is already provided by AT-SPI (and is already able to negotiate between multiple ATs for the info) so is not relevant to this issue. The issue here is only about how a third party AT handles keybindings.

@ids1024
Copy link

ids1024 commented Sep 7, 2023

Yeah, most of these things are already handled though AT-SPI. Shortcuts are the main thing that needs a seperate protocol to handle effectively on Wayland.

Some other things may be needed, but I'm somewhat unsure on that (see my earlier comment). I think we'll need clarification from AT-SPI/Orca/etc. maintainers before considering adding anything else.

But if it's established that a portal like this is the best solution for handling shortcuts, that can be done before adding anything else. This is, as I understand, the largest accessibility issue (at least as far as screen-readers go) on Wayland at the moment.

@Mikenux
Copy link

Mikenux commented Sep 7, 2023

The issue here is only about how a third party AT handles keybindings.

This issue is also about privacy, like in #283 (depending on the use case and to a lesser extent), #304, #565, #653, and #1064. This problem being the one which poses the biggest problem with regard to privacy.

@GeorgesStavracas GeorgesStavracas added the needs discussion Needs discussion on how to implement or fix the corresponding task label Oct 9, 2023
@GeorgesStavracas
Copy link
Member

@TTWNO is working on it

@TTWNO
Copy link
Author

TTWNO commented Mar 11, 2024

I am no longer working on this. If somebody else would like to take over, they are welcome to. Always willing to help, and answer questions.

@sonnyp
Copy link

sonnyp commented Mar 11, 2024

Thanks @TTWNO for paving the way

GNOME Foundation is planning on picking this up after we implement global shortcuts.
There is a practical and technical dependency for us.

In the meantime, it would be very helpful for someone else to go ahead and get a prototype of this in another compositor / portal backend.

@dcz-self
Copy link

dcz-self commented Apr 7, 2024

This comment from a sibling discussion clarifies that:

an Orca modifier key and XKB modifier key are two very different things and one doesn't work like the other.

and

Orca's key isn't a modifier, it just messages that you want the next key(s) to be handled by orca. It doesn't affect the subsequent keys at all, aiui.

Now, how representative is Orca among screen readers? Should the accessibility portal have xkb-like modifiers (hold-and-press) as well as Orca-like latching modifiers (press and press)? Or is one enough?

Normal shortcuts can be triggered with multiple modifiers, is that still a thing for Orca's latching modifiers? Does it even make sense?

On the technical side:

If latches are desired, what should be their syntax? Normal modifiers have the following syntax according to XDG shortcuts: "Shift+Alt+J". Is latching a property of a key that can be composed? That would result in syntax like "Ins-Shift-J".
Or is latching a property of the entire shortcut? A syntax like "Latch:Ins+Alt+J"?

@joanmarie
Copy link

@dcz-self: The Orca key is meant to function like a real modifier key. For instance in an app with a Help menu, I would expect Alt+H to open that menu. Similarly in Orca, Orca+H puts you in Orca's "learn mode." In both cases, one holds down the modifier (Alt or Orca) and then presses the H.

The Orca modifier can be used with other, official modifiers. For instance, Orca+Ctrl+Space puts you in Orca's preference's dialog for the active application (e.g. Orca preferences for Gedit).

Most screen readers work in this same fashion, including NVDA and JAWS on Windows and VoiceOver on macOS. They all have a screen reader key (NVDA and JAWS also use CapsLock and KP Insert last time I checked).

I'm afraid I do not understand what you mean by "latches". Please clarify. Thanks!

@dcz-self
Copy link

dcz-self commented Apr 9, 2024

Thanks. Meanwhile someone else explained to me that latching is what Caps Lock does (even though the name indicates "locking" ;)).

@whot
Copy link
Contributor

whot commented Apr 10, 2024

Thanks. Meanwhile someone else explained to me that latching is what Caps Lock does (even though the name indicates "locking" ;)).

at least in XKB it can be either, latching or locking. Locking generally means "until the next key press of the same key" and latching means "until a key press of any other key".

The sequence Caps, A, A, A, will thus produce "AAA" when locking and "Aaa" when latching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs discussion Needs discussion on how to implement or fix the corresponding task new portals This requires creating a new portal interface
Projects
Status: Triaged
Development

No branches or pull requests