Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Startup issues with the old dongle #14

Closed
andreashuetter opened this issue Dec 25, 2019 · 9 comments
Closed

Startup issues with the old dongle #14

andreashuetter opened this issue Dec 25, 2019 · 9 comments
Labels
bug Something isn't working compatibility Compatibility with a certain system/device
Milestone

Comments

@andreashuetter
Copy link

andreashuetter commented Dec 25, 2019

Did more testing now, as promised. I am not quite sure what the reason is, but sometimes it happens that xow fails on startup (exit status=255), but if I re-start it later then it works:

desktop@steamos:~$ sudo systemctl status xow -l
● xow.service - Xbox One Wireless Dongle Driver
   Loaded: loaded (/etc/systemd/user/xow.service; enabled)
   Active: failed (Result: exit-code) since Mi 2019-12-25 20:57:42 CET; 36s ago
  Process: 646 ExecStart=/usr/local/bin/xow (code=exited, status=255)
 Main PID: 646 (code=exited, status=255)

Dez 25 20:57:42 steamos systemd[1]: Starting Xbox One Wireless Dongle Driver...
Dez 25 20:57:42 steamos systemd[1]: Started Xbox One Wireless Dongle Driver.
Dez 25 20:57:42 steamos xow[646]: INFO  - xow  ©Severin v. W.
Dez 25 20:57:42 steamos xow[646]: ERROR - Error opening device: LIBUSB_ERROR_NO_DEVICE
Dez 25 20:57:42 steamos systemd[1]: xow.service: main process exited, code=exited, status=255/n/a
Dez 25 20:57:42 steamos systemd[1]: Unit xow.service entered failed state.

desktop@steamos:~$ sudo systemctl stop xow 
desktop@steamos:~$ sudo systemctl start xow 
desktop@steamos:~$ sudo systemctl status xow -l
● xow.service - Xbox One Wireless Dongle Driver
   Loaded: loaded (/etc/systemd/user/xow.service; enabled)
   Active: active (running) since Mi 2019-12-25 20:58:22 CET; 2s ago
 Main PID: 1693 (xow)
   CGroup: /system.slice/xow.service
           └─1693 /usr/local/bin/xow

Dez 25 20:58:22 steamos systemd[1]: Started Xbox One Wireless Dongle Driver.
Dez 25 20:58:22 steamos xow[1693]: INFO  - xow  ©Severin v. W.
Dez 25 20:58:22 steamos xow[1693]: INFO  - Dongle plugged in
Dez 25 20:58:22 steamos xow[1693]: INFO  - Chip address: 62:45:b4:fa:b0:d2
Dez 25 20:58:22 steamos xow[1693]: INFO  - Dongle initialized

So far this only happened with the old bulky dongle (and not always), but until now never with the new small one.

@medusalix
Copy link
Owner

Does adding the libusb_set_auto_detach_kernel_driver line make it work reliably again?

@andreashuetter
Copy link
Author

andreashuetter commented Dec 26, 2019

Not really, unfortunately.

To me it looks like some kind of timeout issue or something similar. As if sometimes the driver gets started too early on system startup, before the device is known by the OS (LIBUSB_ERROR_NO_DEVICE). Perhaps it could be sufficient to perform a few seconds delay before initializiation?

@medusalix
Copy link
Owner

It seems to be a configuration/hardware issue, as I found a nearly identical topic concerning this error on the ArchLinux forums. The open source driver for the Microsoft Kinect had a similar issue. Running xow with LIBUSB_DEBUG=4 could give some insights about the error (and you should take a look at the dmesg output).

@andreashuetter
Copy link
Author

andreashuetter commented Dec 26, 2019

dmesg only prints the already well-known messages about the missing firmware (which can be ignored, as xow contains its own firmware for the dongles).

[    4.620082] mt76x2u 1-4:1.0: firmware: failed to load mediatek/mt7662u_rom_patch.bin (-2)
[    4.620086] firmware_class: See https://wiki.debian.org/Firmware for information about missing firmware
[    4.620089] mt76x2u 1-4:1.0: Direct firmware load for mediatek/mt7662u_rom_patch.bin failed with error -2

Running xow with LIBUSB_DEBUG=4 doesn't give much new information:

desktop@steamos:~$ cat /etc/systemd/user/xow.service
[Unit]
Description=Xbox One Wireless Dongle Driver

[Service]
Environment="LIBUSB_DEBUG=4"
ExecStart=/usr/local/bin/xow

[Install]
WantedBy=multi-user.target
desktop@steamos:~$ systemctl status xow -l
● xow.service - Xbox One Wireless Dongle Driver
   Loaded: loaded (/etc/systemd/user/xow.service; enabled)
   Active: failed (Result: exit-code) since Do 2019-12-26 19:34:58 CET; 50s ago
  Process: 634 ExecStart=/usr/local/bin/xow (code=exited, status=255)
 Main PID: 634 (code=exited, status=255)

Dez 26 19:34:58 steamos xow[634]: [ 0.014299] [0000027a] libusb: debug [handle_events] poll() 2 fds with timeout in 0ms
Dez 26 19:34:58 steamos xow[634]: [ 0.014303] [0000027a] libusb: debug [handle_events] poll() returned 1
Dez 26 19:34:58 steamos xow[634]: [ 0.014305] [0000027a] libusb: debug [handle_events] caught a fish on the event pipe
Dez 26 19:34:58 steamos xow[634]: [ 0.014307] [0000027a] libusb: debug [handle_events] hotplug message received
Dez 26 19:34:58 steamos xow[634]: [ 0.014312] [0000027a] libusb: debug [usbi_remove_pollfd] remove fd 6
Dez 26 19:34:58 steamos xow[634]: [ 0.014320] [0000027a] libusb: debug [usbi_remove_pollfd] remove fd 8
Dez 26 19:34:58 steamos xow[634]: [ 0.014333] [00000286] libusb: debug [linux_udev_event_thread_main] udev event thread exiting
Dez 26 19:34:58 steamos xow[634]: ERROR - Error opening device: LIBUSB_ERROR_NO_DEVICE
Dez 26 19:34:58 steamos systemd[1]: xow.service: main process exited, code=exited, status=255/n/a
Dez 26 19:34:58 steamos systemd[1]: Unit xow.service entered failed state.

But it seems that my thinking that it could be some kind of timing issue was actually correct, because inserting a 5 seconds delay before the initializiation of xow seems to resolve this issue:

desktop@steamos:~$ cat /etc/systemd/user/xow.service
[Unit]
Description=Xbox One Wireless Dongle Driver

[Service]
ExecStartPre=/bin/sleep 5
ExecStart=/usr/local/bin/xow

[Install]
WantedBy=multi-user.target

sleep 5 was just a guess, maybe 10 or 20 seconds would be better to be really on the safe side, or perhaps even less than 5 seconds could also be sufficient. I don't know because I have only tried it with 5 seconds so far, and on my machine this leads to a reliabe connection on every startup.

Perhaps a delay like this for the xow service should be added to the Makefile, so that it would reliably work out-of-the-box on any machine and with both dongles (old and new one), without the need of manually modifying the xow.service file in /etc/systemd/user.

@medusalix medusalix added this to the v0.3 milestone Dec 27, 2019
@medusalix medusalix changed the title Old bulky dongle vs. new smaller dongle - differences? Startup issues with the old dongle Dec 29, 2019
@medusalix medusalix added bug Something isn't working compatibility Compatibility with a certain system/device labels Dec 29, 2019
@medusalix
Copy link
Owner

medusalix commented Dec 30, 2019

I'd rather like to find the cause of the problem than adding an arbitrary delay to the service. I suspect it's caused by the mt76 kernel module. Could you please try blacklisting the module? Instructions are here (just remember to use mt76 as the module name).

@andreashuetter
Copy link
Author

andreashuetter commented Dec 31, 2019

Well, this issue is a little bit difficult to test, because it doesn't always occur, but only sometimes. Though, it seems to happen more often when cold-booting the machine than at warm-booting. After inserting the 5 seconds delay I tried about 10 cold-boots and 10 warm-boots and during those tests the issue did never occur again. This does not guarantee, though, that this permanently solves the issue, it might (probably) happen again.
To me this looks like some weird race condition.

Because of your suggestion to blacklist the module I created a blacklist.conf file in /etc/modprobe.d with the following content
blacklist mt76
and ran sudo update-initramfs -u afterwards.
This doesn't seem to help much, in 5 cold-boot tests it happened once that xow didn't work. So I re-added the 5 seconds delay again and on the next 5 cold-boots it always worked. So I am not quite sure what to think of it. Race conditions are hard to track.
What puzzles me, though, is that lsmod still says the mt76 module is loaded:

desktop@steamos:~$ lsmod|grep mt76
mt76x2u                28672  0 
mt76_usb               32768  1 mt76x2u
mt76x2_common          49152  1 mt76x2u
mt76                   40960  3 mt76_usb,mt76x2u,mt76x2_common
mac80211              835584  5 mt76,iwlmvm,mt76_usb,mt76x2u,mt76x2_common
cfg80211              786432  6 wl,iwlmvm,iwlwifi,mac80211,mt76x2u,mt76x2_common
usbcore               294912  10 xhci_hcd,usbhid,mt76_usb,xpad,usb_storage,btusb,xhci_pci,uas,mt76x2u

Shouldn't the blacklist prevent this module from being loaded?

Edit:
On my other Steam Machine where the new dongle is installed, no blacklisting seems to be needed at all, because the mt76 module does not get loaded there anyway:

desktop@steamos:~$ lsmod|grep mt76
desktop@steamos:~$ 
desktop@steamos:~$ sudo dmesg|grep mt76
desktop@steamos:~$ 

So if it's needed to blacklist the mt76 module then it seems only necessary for the old dongle.

To show the difference, here's the dmesg output of the machine that currently has the old dongle connected:

desktop@steamos:~$ sudo dmesg|grep mt76
[    4.570999] mt76x2u 1-4:1.0: ASIC revision: 76320044
[    4.596090] mt76x2u 1-4:1.0: firmware: failed to load mediatek/mt7662u_rom_patch.bin (-2)
[    4.596096] mt76x2u 1-4:1.0: Direct firmware load for mediatek/mt7662u_rom_patch.bin failed with error -2
[    5.302183] mt76x2u: probe of 1-4:1.0 failed with error -2
[    5.302205] usbcore: registered new interface driver mt76x2u

@medusalix
Copy link
Owner

medusalix commented Dec 31, 2019

Yes, the blacklisting should only be needed for the old dongle because the mt76 kernel driver doesn't support the newer dongle (yet). I think the race condition occurs because the two drivers try to access the dongle at the same time. As a last resort you could delete the mt76 driver entirely but I'd look for other solutions first.

EDIT: Please try using mt76x2u for the blacklist file (and rebuild initramfs). That should work.

@andreashuetter
Copy link
Author

andreashuetter commented Dec 31, 2019

So far it does look good:

desktop@steamos:~$ cat /etc/modprobe.d/blacklist_mt76x2u.conf 
blacklist mt76x2u

I did a sudo update-initramfs -u, removed the delay from /etc/systemd/user/xow.service and rebooted the machine. lsmod and dmesg are clean now (even with the old dongle):

desktop@steamos:~$ lsmod|grep mt76
desktop@steamos:~$ 
desktop@steamos:~$ sudo dmesg|grep mt76
desktop@steamos:~$ 

After that I tried a couple of cold-boots and encountered no problems so far.
Let's hope the best that this fixes the issue permanently.

Is it possible to integrate the blacklisting into the Makefile?

@medusalix
Copy link
Owner

Is it possible to integrate the blacklisting into the Makefile?

I'm currently making a few changes to the Makefile (blacklisting being part of that), that'll be included in the 0.3 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working compatibility Compatibility with a certain system/device
Projects
None yet
Development

No branches or pull requests

2 participants