Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Push to talk does not work in Wayland / Gnome3 #3243

Open
itsrachelfish opened this issue Oct 4, 2017 · 12 comments

Comments

@itsrachelfish
Copy link

commented Oct 4, 2017

I recently upgraded my OS to use Gnome 3.24.2 which now uses Wayland instead of X. Gnome has used Wayland by default since version 3.16, which was released in 2015.

Push to talk works as expected when Mumble has focus and also works when certain applications like Firefox or my text editor are in focus.

However, push to talk does not work when a Wayland native application is in focus, like gnome-terminal or gnome-files.

If I start pressing push to talk with Mumble focused and switch tabs to a Wayland native application, Mumble does not detect when I stop pressing the push to talk key and it stays open until I switch back.

I've tried using both keyboard and mouse hotkeys, but the behavior is the same.

I couldn't find any other issues related to this bug on GitHub, but I did find this similar bug report in the Red Hat bug tracker: https://bugzilla.redhat.com/show_bug.cgi?id=1417576

Any further information about what could be causing this bug would be appreciated, thank you!

@sardemff7

This comment has been minimized.

Copy link

commented Oct 5, 2017

Hi,

Not allowing clients to sniff on the input when they don’t have focus is a feature in Wayland, and it will be kept that way. The use can then trust their input to go where it is expected.
Since X11 clients are actually sharing one Wayland connection, through the Xwayland server, it actually means that focusing an X11 client will give input to all of them. That is what you are experiencing with some clients allowing PTT to work.

There are four cases of global bindings that I am aware of:

  • WM bindings to manage windows
  • xbindkeys daemon and friends, with two main usages that I know about:
    • launching stuff
    • translating from one binding to another (like, a mouse button to a key, allowing to use mouse buttons with applications with keyboard-only bindings)
  • Media players
  • Push-to-talk

Under Wayland, the first case will work as usual, since the compositor has control over the input. The second case would ideally be split between the first one (for launching stuff) and a non-problem, see below. Also, the average user doesn’t need such tweaks in the first place.
The Media player case is mostly solved in DEs by having the compositor handling media keys and sending e.g. MPRIS commands.
We’re left with the fourth case (and half of the second one) that DEs had, until now, little interest in.

So the ideal protocol would:

  • not leak keyboard information
  • allow non-keyboard bindings

To have both, I went with an action-based protocol. I sent a proposal a few years ago and more recently, I made a cleaner one to allow for global action bindings.
To work as expected, clients (or toolkits) wanting to support global bindings would have to implement it. On the other side, compositors would have to implement the protocol and provide their user a way to link (key, mouse, touch, mind-control) bindings to said actions.

The action are namespaced, and you are expected to use fallbacks. For Mumble, it means you would ask for the "mumble/push-to-talk voip/push-to-talk" action. The user could then have a Mumble-specific binding (or not), and a generic binding. Let’s say Teamspeak is running too, pressing the key for voip/push-to-talk would lead to the action event being sent to either Mumble or Teamspeak (for example, based on the last focused one).

(I should probably write all that to the mailing list, for the record if nothing else.)

If there is any interest from Mumble developers for this solution, I am willing to implement the compositor side for Weston (and all libweston-based compositors) as well as WLC-based compositors (at least via an LD_PRELOAD hack), and I may convince wlroots developers too (to be used by Sway and way-cooler, two important tiling compositors).

@mkrautz

This comment has been minimized.

Copy link
Member

commented Nov 26, 2017

Sorry for taking so long to respond.

@itsrachelfish Mumble currently defaults to using XInput2 to "sniff" keypresses and mouse events. However, it can also use raw evdev.

Usually, the default device node permissions OSes allow you to read mouse clicks, but keyboards are off-limits (obviously).

However, if you configure the device node permissions, Mumble can happily use your raw evdev device nodes to read key events, which should work on Wayland, or everything, really.

The setting is "shortcut/linux/evdev/enable". To configure it on Linux, you'd add

[shortcut/linux/evdev]
enable=true

to $HOME/.config/Mumble/Mumble.conf.

However, there is a bit of a misbehavior right now, where Mumble will fall back to XInput2 when no keyboards can be opened via evdev. This behavior is from back when evdev was our default.
Now, it makes more sense for Mumble to keep using evdev if it is enabled, and warn the user if no keyboard device nodes are available.

That's the workaround until we figure something proper out for Wayland.

@mkrautz

This comment has been minimized.

Copy link
Member

commented Nov 26, 2017

Bug for evdev misbehavior is at #3269

@mkrautz

This comment has been minimized.

Copy link
Member

commented Nov 26, 2017

@sardemff7 I think your proposal (the August-dated one) looks solid. Is there an implementation of this stuff anywhere, or is it just a spec for now?

@mkrautz

This comment has been minimized.

Copy link
Member

commented Nov 26, 2017

@sardemff7:

...But I'm not sure it's enough in its current incarnation. At least it doesn't map fully onto the way global shortcuts work in Mumble currently. (And obviously: it doesn't need to. We're willing to use different UI if we need to, for different platforms.)

It seems like, if we were to use the current API, Mumble would simply bind to "mumble/push-to-talk", "mumble/volume-up", "mumble/volume-down", etc. -- and we wouldn't be able to show the actual bound key to the user, because that part is handled by the compositor. That means the UI for shortcuts would be less than ideal for users.

Perhaps we need a way to query which keys/events are bound to an action, so we can show that to the user?

How would the flow work from a user perspective? Do you configure the actions outside the app itself?
Otherwise, how do you map a keypress/event to an action using the exposed API?

Kind of ties into my previous comment, but I suppose the current API requires us to bind to the actions on startup, correct? If we don't, we won't receive notifications when the action is triggered?

@mkrautz

This comment has been minimized.

Copy link
Member

commented Nov 26, 2017

Usually, the default device node permissions OSes allow you to read mouse clicks, but keyboards are off-limits (obviously).

Hmmm. Actually, on my Ubuntu 16.10 test VM, /dev/input/mice (or mouseXX) are also only root readable (or rw by group 'input').

@sardemff7

This comment has been minimized.

Copy link

commented Nov 26, 2017

Not allowing sniffing on evdev directly is also a goal (that most OSes now do right because device nodes are root, as you noticed).

For now, there is no code behind my proposal, because nobody actually had (code-backed) interest in it. If Mumble is willing to implement it, I can make a Weston implementation, but I think at least a GNOME or KDE implementation would be needed to really push that protocol forward.

The client (Mumble) binds actions at startup, as you guessed.

As for the UI/UX, it would be compositor-dependent. Each DE/compositor would have its own UI (for Weston, it would just be the configuration file, at first, but writing a GUI tool is not really hard to do either). I can imagine GNOME and KDE having a new thing in their control panel, with a list of action strings and the corresponding binding(s).
The client could either have no UI at all (or a message “see your compositor configuration”) or we add a request in the protocol to make the compositor pop the UI directly, even filtered to the action bound by the client.

Mumble would bind actions in the mumble/ and voip/ namespaces, and receive events for the relevant one. Say you have Mumble and Teamspeak running at the same time, the compositor would decide which one to send the events to (for example, last focused).

@detrout

This comment has been minimized.

Copy link

commented Nov 7, 2018

I was impacted by this and came up with a possibly solution.

I extended mumble's dbus api to include startTalk znd stopTalk calls.

Then I wrote a small program that looked for the mouse button event I was using for push to talk that I could run with root permissions that could then send startTalk on mouse button down and stopTalk on mouse button up.

In the long term the desktops need to define some wayland accessibility system where users can bind global hot keys, and having such a thing send dbus messages seems reasonable.

But for the moment my little hard coded program will get me through my next gaming session.

Mumble patch

--- a/src/mumble/DBus.cpp
+++ b/src/mumble/DBus.cpp
@@ -101,3 +101,11 @@
 bool MumbleDBus::isSelfDeaf() {
 	return g.s.bDeaf;
 }
+
+void MumbleDBus::startTalk() {
+	g.mw->on_PushToTalk_triggered(true, 0);
+}
+
+void MumbleDBus::stopTalk() {
+	g.mw->on_PushToTalk_triggered(false, 0);
+}
--- a/src/mumble/DBus.h
+++ b/src/mumble/DBus.h
@@ -52,6 +52,8 @@
 		void setSelfDeaf(bool deafen);
 		bool isSelfMuted();
 		bool isSelfDeaf();
+                void startTalk();
+                void stopTalk();
 };
 
 #endif

Hackish program.
You'd need to change the keys its looking for, set the right device, set the user id, and set the do a export DBUS_SESSION_BUS_ADDRESS in the root session before it'd work for someone else.

#include <linux/input.h>
#include <linux/input-event-codes.h>
#include <unistd.h>

#include <QtCore/QCoreApplication>
#include <QtDBus/QtDBus>

#include <stdio.h>

#define SERVICE_NAME "net.sourceforge.mumble.mumble"

int main(int argc, char **argv) {
  QCoreApplication app(argc, argv);  
  FILE *mice = NULL;
  struct input_event e;
  int read;

  mice = fopen("/dev/input/by-id/usb-Logitech_USB_Receiver-if01-event-mouse", "rb");
  if (mice == NULL) {
    printf("unable to open mice device %d\n", errno);
    return -1;
  }
  
  fprintf(stderr, "uid %d, euid %d\n", getuid(), geteuid());
  setuid(1000);
  fprintf(stderr, "uid %d, euid %d\n", getuid(), geteuid());

  if (!QDBusConnection::sessionBus().isConnected()) {
    fprintf(stderr, "Cannot connect to the D-Bus session bus.\n");
    return -2;
  }

  QDBusInterface mumble(SERVICE_NAME, "/", "", QDBusConnection::sessionBus());
  if (!mumble.isValid()) {
    fprintf(stderr, "Failed to connect to %s\n", SERVICE_NAME);
    return -1;
  }
    
  while (1) {
    read = fread(&e, sizeof(struct input_event), 1, mice);
    if (read == 1) {
      if (e.type == EV_KEY && e.code == BTN_EXTRA) {
          QDBusReply<void> reply;
          if (e.value) {
            // mouse down
            reply = mumble.call("startTalk");
          } else {
            // mouse up
            reply = mumble.call("stopTalk");
          }
      }
    }
  }
}

@zevdg

This comment has been minimized.

Copy link

commented May 24, 2019

Let’s say Teamspeak is running too, pressing the key for voip/push-to-talk would lead to the action event being sent to either Mumble or Teamspeak (for example, based on the last focused one).

It seems weird to disallow broadcast (or multicast) in the spec. I can imagine examples where the user would want to broadcast voip/push-to-talk to multiple apps. Imagine a gamer with their friends on mumble, but also other randomly matched teammates using the voip built into the game. They'd have one key bound to mumble/push-to-talk for their friends, but when they want to talk to their whole team, they want one button to activate push-to-talk in both apps.

Of course, usually in this case, the game would be in focus and could get the key presses that way, but that seems like a weird and unnecessary limitation. As that user, I would expect my global push-to-talk key to keep working even if I alt-tab out to my browser for a second.

Could we leave it entirely up to the compositor to decide where to send the actions?

@sardemff7

This comment has been minimized.

Copy link

commented May 24, 2019

From my proposed protocol:

Here are some examples of dispatching choice: all applications, last
focused, user-defined preference order, latest fullscreened application.

It is up to the compositor. With that protocol, anyway.
But there are still no implementations that I know about and the protocol was never accepted anywhere.

@zevdg

This comment has been minimized.

Copy link

commented May 24, 2019

Doh. I should have looked at the actual proposal instead of only reading this issue.

@setpill

This comment has been minimized.

Copy link

commented Jul 25, 2019

Would it be an idea to allow push-to-talk to be triggered through the RPC (to both start and stop)? That way, under wayland, one can simply configure their desktop environment to run mumble rpc speak-start on keydown and mumble rpc speak-stop on keyup (or whatever naming is decided).

Edit: upon closer inspection, the dbus PR already covers my usecase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.