Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HackRF one not working on some USB ports (Linux) #783

Closed
koparebu opened this issue Sep 4, 2020 · 14 comments
Closed

HackRF one not working on some USB ports (Linux) #783

koparebu opened this issue Sep 4, 2020 · 14 comments
Assignees
Labels
enhancement potential new feature technical support request for technical support

Comments

@koparebu
Copy link

koparebu commented Sep 4, 2020

Hi,

I've been using HackRF in a new Linux computer, and I found out that it doesn't work correctly when it is plugged to certain USB ports (the most "modern" USB ports in the motherboard). I don't have other different SDR devices to test these ports, but at least for USB mass storage and HID they seem to work OK.

While HackRF works at first (eg: hackrf_transfer, or running a GRC flowchart), if I stop it and try to launch the same command/flowchart again, it doesn't work anymore. I have to disconnect HackRF and plug it in again. If I plug it into the other USB ports, it works without issues any number of times without having to disconnect it.

If there is something else that I could provide to diagnose the problem and know if its related to the HackRF drivers itself, or to the operating system, I'd be glad to do it. Thanks a lot!

Steps to reproduce

  1. Plug HackRF in
  2. Launch hackrf_transfer and stop it after a while
  3. Launch hackrf_transfer again

Expected behaviour

Behaviour from (3) should be the the same as (2)

Actual behaviour

The program doesn't transmit anything. It exits. Here's the output:

$ hackrf_transfer -r output.dat -f 90000000
call hackrf_set_sample_rate(10000000 Hz/10.000 MHz)
call hackrf_set_freq(90000000 Hz/90.000 MHz)
Stop with Ctrl-C
 0.0 MiB / 1.000 sec =  0.0 MiB/second

Couldn't transfer any bytes for one second.

Exiting... hackrf_is_streaming() result: HACKRF_TRUE (1)
[Rest of the output removed]

Version information

Operating system: Ubuntu 20.04.1, kernel 5.4

hackrf package: 2018.01.1-2 (everything SDR related has been installed from the official Ubuntu repositories)

hackrf_info output:

hackrf_info version: unknown
libhackrf version: unknown (0.5)
Found HackRF
Index: 0
Serial number: ---
Board ID Number: 2 (HackRF One)
Firmware Version: 2015.07.2 (API:1.00)
Part ID Number: ---
@schneider42
Copy link
Contributor

Does dmesg show some USB related messages when this happens?

@koparebu
Copy link
Author

koparebu commented Sep 4, 2020

Yes, after taking a look at dmesg, it looks like the following line is printed every time I stop hackrf_transfer when it's working correctly. After that, it won't work again until a reset/re-connection:

xhci_hcd 0000:02:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.

I don't really know what that means, I'll try to get more info to understand the issue.

@miek
Copy link
Member

miek commented Sep 14, 2020

@ktemkin any ideas on this? you usually know about weird xhci errors :)

@ehoffman2
Copy link

ehoffman2 commented Nov 28, 2020

I can confirm, exact same issue here.

uname -a
Linux lx-ryzen 5.4.0-53-generic #59-Ubuntu SMP Wed Oct 21 09:38:44 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Tried with fw version 2017.02.1 and 2018.01.1

There's an issue regarding Ubuntu kernel 4.20+, I don't know if it's related

https://bugzilla.kernel.org/show_bug.cgi?id=202541

Edit:
A few people on the kernel bugtracker seemed to point the issue toward Ryzen B350 chipset. Does anyone having the issue here also have B350? I do have ASUS ROG STRIX B350-F GAMING motherboard.

@ktemkin
Copy link
Contributor

ktemkin commented Nov 30, 2020

@ktemkin any ideas on this? you usually know about weird xhci errors :)

I usually see this error when a device suddenly changes connected/disconnected state in a way the host doesn't expect. Usually, it's because of EMI (e.g. from a bad cable), but here it looks like there might actually be a chipset bug that might need a kernel quirk.

Either way, it's possible this is related to high-throughput USB devices; but doesn't seem likely to be something that's an issue on the HackRF side.

@WGH-
Copy link

WGH- commented Dec 1, 2020

Interestingly, killing hackrf_sweep so it doesn't have a chance to cleanup the transfers (?) avoids this problem, and hackrf_sweep can be run again.

This's what happens when you Ctrl+C it (echo 'file drivers/usb/host/* +p' > /sys/kernel/debug/dynamic_debug/control):

[ 3944.894483] xhci_hcd 0000:01:00.0: Transfer error for slot 28 ep 2 on endpoint
[ 3944.894494] xhci_hcd 0000:01:00.0: // Ding dong!
[ 3944.894602] xhci_hcd 0000:01:00.0: Ignoring reset ep completion code of 1
[ 3945.396550] xhci_hcd 0000:01:00.0: Cancel URB 000000008085c3f5, dev 5, ep 0x81, starting at offset 0x1fa7ae1c90
[ 3945.396558] xhci_hcd 0000:01:00.0: // Ding dong!
[ 3945.396690] xhci_hcd 0000:01:00.0: Removing canceled TD starting at 0x1fa7ae1c90 (dma).
[ 3945.396710] xhci_hcd 0000:01:00.0: Cancel URB 000000000098c2b5, dev 5, ep 0x81, starting at offset 0x1fa7ae1990
[ 3945.396712] xhci_hcd 0000:01:00.0: // Ding dong!
[ 3945.396836] xhci_hcd 0000:01:00.0: Removing canceled TD starting at 0x1fa7ae1990 (dma).
[ 3945.396839] xhci_hcd 0000:01:00.0: Finding endpoint context
[ 3945.396841] xhci_hcd 0000:01:00.0: Cycle state = 0x1
[ 3945.396843] xhci_hcd 0000:01:00.0: New dequeue segment = 000000008a0bf921 (virtual)
[ 3945.396845] xhci_hcd 0000:01:00.0: New dequeue pointer = 0x1fa7ae1a90 (DMA)
[ 3945.396847] xhci_hcd 0000:01:00.0: Set TR Deq Ptr cmd, new deq seg = 000000008a0bf921 (0x1fa7ae1000 dma), new deq ptr = 00000000e5711e6d (0x1fa7ae1a90 dma), new cycle = 1
[ 3945.396851] xhci_hcd 0000:01:00.0: // Ding dong!
[ 3945.396869] xhci_hcd 0000:01:00.0: Cancel URB 000000008b1032dd, dev 5, ep 0x81, starting at offset 0x1fa7ae1a90
[ 3945.396871] xhci_hcd 0000:01:00.0: // Ding dong!
[ 3945.396904] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[ 3945.396909] xhci_hcd 0000:01:00.0: Slot state = 3, EP state = 2
[ 3945.397028] xhci_hcd 0000:01:00.0: Removing canceled TD starting at 0x1fa7ae1a90 (dma).
[ 3945.397040] xhci_hcd 0000:01:00.0: Cancel URB 00000000b1b43562, dev 5, ep 0x81, starting at offset 0x1fa7ae1b90
[ 3945.397042] xhci_hcd 0000:01:00.0: // Ding dong!
[ 3945.397172] xhci_hcd 0000:01:00.0: Removing canceled TD starting at 0x1fa7ae1b90 (dma).

Subsequent attempts to run it will now give the following:

[ 4076.243019] xhci_hcd 0000:01:00.0: WARN halted endpoint, queueing URB anyway.
[ 4076.243029] xhci_hcd 0000:01:00.0: WARN halted endpoint, queueing URB anyway.
[ 4076.243044] xhci_hcd 0000:01:00.0: WARN halted endpoint, queueing URB anyway.
[ 4076.243051] xhci_hcd 0000:01:00.0: WARN halted endpoint, queueing URB anyway.
[ 4077.749450] xhci_hcd 0000:01:00.0: Cancel URB 0000000063c2cde4, dev 5, ep 0x81, starting at offset 0x1fa7ae1d90
[ 4077.749456] xhci_hcd 0000:01:00.0: // Ding dong!
[ 4077.749592] xhci_hcd 0000:01:00.0: Removing canceled TD starting at 0x1fa7ae1d90 (dma).
[ 4077.749620] xhci_hcd 0000:01:00.0: Cancel URB 00000000564ffbd2, dev 5, ep 0x81, starting at offset 0x1fa7ae1e90
[ 4077.749622] xhci_hcd 0000:01:00.0: // Ding dong!
[ 4077.749748] xhci_hcd 0000:01:00.0: Removing canceled TD starting at 0x1fa7ae1e90 (dma).
[ 4077.749761] xhci_hcd 0000:01:00.0: Cancel URB 00000000ff7fb480, dev 5, ep 0x81, starting at offset 0x1fa7ae1f90
[ 4077.749763] xhci_hcd 0000:01:00.0: // Ding dong!
[ 4077.749892] xhci_hcd 0000:01:00.0: Removing canceled TD starting at 0x1fa7ae1f90 (dma).
[ 4077.749906] xhci_hcd 0000:01:00.0: Cancel URB 00000000f83271e0, dev 5, ep 0x81, starting at offset 0x1fa7ae00a0
[ 4077.749908] xhci_hcd 0000:01:00.0: // Ding dong!
[ 4077.750035] xhci_hcd 0000:01:00.0: Removing canceled TD starting at 0x1fa7ae00a0 (dma).

@ehoffman2
Copy link

ehoffman2 commented Dec 1, 2020

I think I found the issue., or at least a workaround I don't know why though, my guess is it's really in the kernel, since HackRF behavior is the same on a system that cause no issue.

This is caused because when the program is stopped, you have the following event:

  • The event thread is terminated (and main thread join()).
  • The transfer queue are canceled.
  • After that, you still get a completion event for the transfers. The event callback is called one last time for each queue which may have completed transfer between the time the transfer callback thread was killed and the time the transfer queue were canceled.
  • Since the transfer thread is terminated, the transfer event callback is called in the main thread context. I'm not sure when this occurs, possible when the main thread call another libusb function, the libusb see that a queue is filled, and call the callback at this point. However, point is that this seem to be normal behavior because this occurs on both systems (the one with B350 chipset giving issue, and on my laptop which the issue does not manifest itself).
  • The transfer callback function call the user callback, then call libusb_submit_transfer() again, and this is where the issue occurs.

On a system which does not cause issue, that call to libusb_submit_transfer() when the transfer thread is terminated does not seem to bother. Here I show when is logged with 3 callback, following libusb_submit_transfer() when the transfer thread is terminated (I put some sleep to help be debug in the callback):

First callback:

[46873.599396] xhci_hcd 0000:00:14.0: Transfer error for slot 12 ep 2 on endpoint
[46873.599411] xhci_hcd 0000:00:14.0: // Ding dong!
[46873.599435] xhci_hcd 0000:00:14.0: Ignoring reset ep completion code of 1
[46873.599511] xhci_hcd 0000:00:14.0: Transfer error for slot 12 ep 2 on endpoint
[46873.599520] xhci_hcd 0000:00:14.0: // Ding dong!
[46873.599548] xhci_hcd 0000:00:14.0: Ignoring reset ep completion code of 1
[46873.599634] xhci_hcd 0000:00:14.0: Transfer error for slot 12 ep 2 on endpoint
[46873.599648] xhci_hcd 0000:00:14.0: // Ding dong!
[46873.599665] xhci_hcd 0000:00:14.0: Ignoring reset ep completion code of 1
[46873.599755] xhci_hcd 0000:00:14.0: Transfer error for slot 12 ep 2 on endpoint
[46873.599766] xhci_hcd 0000:00:14.0: // Ding dong!
[46873.599785] xhci_hcd 0000:00:14.0: Ignoring reset ep completion code of 1
[46873.599865] xhci_hcd 0000:00:14.0: Transfer error for slot 12 ep 2 on endpoint
[46873.599877] xhci_hcd 0000:00:14.0: Cleaning up stalled endpoint ring
[46873.599881] xhci_hcd 0000:00:14.0: Finding endpoint context
[46873.599884] xhci_hcd 0000:00:14.0: Cycle state = 0x1
[46873.599888] xhci_hcd 0000:00:14.0: New dequeue segment = 00000000e9a24515 (virtual)
[46873.599892] xhci_hcd 0000:00:14.0: New dequeue pointer = 0xa5092a10 (DMA)
[46873.599895] xhci_hcd 0000:00:14.0: Queueing new dequeue state
[46873.599899] xhci_hcd 0000:00:14.0: Set TR Deq Ptr cmd, new deq seg = 00000000e9a24515 (0xa5092000 dma), new deq ptr = 00000000a70f297c (0xa5092a10 dma), new cycle = 1
[46873.599905] xhci_hcd 0000:00:14.0: // Ding dong!
[46873.599913] xhci_hcd 0000:00:14.0: Giveback URB 000000009a48fe1c, len = 0, expected = 262144, status = -71
[46873.599926] xhci_hcd 0000:00:14.0: Ignoring reset ep completion code of 1
[46873.599933] xhci_hcd 0000:00:14.0: Successful Set TR Deq Ptr cmd, deq = @a5092a10

2nd callback

[46875.600001] xhci_hcd 0000:00:14.0: Transfer error for slot 12 ep 2 on endpoint
[46875.600017] xhci_hcd 0000:00:14.0: Cleaning up stalled endpoint ring
[46875.600021] xhci_hcd 0000:00:14.0: Finding endpoint context
[46875.600025] xhci_hcd 0000:00:14.0: Cycle state = 0x1
[46875.600030] xhci_hcd 0000:00:14.0: New dequeue segment = 00000000e9a24515 (virtual)
[46875.600033] xhci_hcd 0000:00:14.0: New dequeue pointer = 0xa5092b10 (DMA)
[46875.600036] xhci_hcd 0000:00:14.0: Queueing new dequeue state
[46875.600041] xhci_hcd 0000:00:14.0: Set TR Deq Ptr cmd, new deq seg = 00000000e9a24515 (0xa5092000 dma), new deq ptr = 0000000084b24244 (0xa5092b10 dma), new cycle = 1
[46875.600046] xhci_hcd 0000:00:14.0: // Ding dong!
[46875.600054] xhci_hcd 0000:00:14.0: Giveback URB 00000000d444ded8, len = 0, expected = 262144, status = -71
[46875.600069] xhci_hcd 0000:00:14.0: Ignoring reset ep completion code of 1
[46875.600077] xhci_hcd 0000:00:14.0: Successful Set TR Deq Ptr cmd, deq = @a5092b10

3rd callback

[46877.600571] xhci_hcd 0000:00:14.0: Transfer error for slot 12 ep 2 on endpoint
[46877.600586] xhci_hcd 0000:00:14.0: Cleaning up stalled endpoint ring
[46877.600590] xhci_hcd 0000:00:14.0: Finding endpoint context
[46877.600594] xhci_hcd 0000:00:14.0: Cycle state = 0x1
[46877.600599] xhci_hcd 0000:00:14.0: New dequeue segment = 00000000e9a24515 (virtual)
[46877.600602] xhci_hcd 0000:00:14.0: New dequeue pointer = 0xa5092c10 (DMA)
[46877.600605] xhci_hcd 0000:00:14.0: Queueing new dequeue state
[46877.600610] xhci_hcd 0000:00:14.0: Set TR Deq Ptr cmd, new deq seg = 00000000e9a24515 (0xa5092000 dma), new deq ptr = 00000000c548fa22 (0xa5092c10 dma), new cycle = 1
[46877.600615] xhci_hcd 0000:00:14.0: // Ding dong!
[46877.600623] xhci_hcd 0000:00:14.0: Giveback URB 00000000d2b3a7bc, len = 0, expected = 262144, status = -71
[46877.600636] xhci_hcd 0000:00:14.0: Ignoring reset ep completion code of 1
[46877.600644] xhci_hcd 0000:00:14.0: Successful Set TR Deq Ptr cmd, deq = @a5092c10

Then closing libusb does not cause any further log.

On a system which cause issue, that call seem to make the kernel go in a strange state.

First callback:

[68492.952815] xhci_hcd 0000:02:00.0: Transfer error for slot 79 ep 2 on endpoint
[68492.952821] xhci_hcd 0000:02:00.0: // Ding dong!
[68492.952949] xhci_hcd 0000:02:00.0: Ignoring reset ep completion code of 1

2nd callback:
(nothing)

3rd callback:
(nothing)

hackrf_close():

[68498.956918] xhci_hcd 0000:02:00.0: Cancel URB 00000000fc8857f7, dev 10, ep 0x81, starting at offset 0xfff1f8c0
[68498.956925] xhci_hcd 0000:02:00.0: // Ding dong!
[68498.956929] xhci_hcd 0000:02:00.0: Cancel URB 00000000275d3e56, dev 10, ep 0x81, starting at offset 0xfff1f7c0
[68498.956933] xhci_hcd 0000:02:00.0: Cancel URB 000000005f87d6f9, dev 10, ep 0x81, starting at offset 0xfff1f6c0
[68498.957051] xhci_hcd 0000:02:00.0: Removing canceled TD starting at 0xfff1f8c0 (dma).
[68498.957053] xhci_hcd 0000:02:00.0: Removing canceled TD starting at 0xfff1f7c0 (dma).
[68498.957055] xhci_hcd 0000:02:00.0: Removing canceled TD starting at 0xfff1f6c0 (dma).

hackrf_exit():

[68498.957057] xhci_hcd 0000:02:00.0: Finding endpoint context
[68498.957058] xhci_hcd 0000:02:00.0: Cycle state = 0x1
[68498.957060] xhci_hcd 0000:02:00.0: New dequeue segment = 00000000fcfc8838 (virtual)
[68498.957062] xhci_hcd 0000:02:00.0: New dequeue pointer = 0xfff1f7c0 (DMA)
[68498.957065] xhci_hcd 0000:02:00.0: Set TR Deq Ptr cmd, new deq seg = 00000000fcfc8838 (0xfff1f000 dma), new deq ptr = 0000000063cda460 (0xfff1f7c0 dma), new cycle = 1
[68498.957067] xhci_hcd 0000:02:00.0: // Ding dong!
[68498.957123] xhci_hcd 0000:02:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[68498.957126] xhci_hcd 0000:02:00.0: Slot state = 3, EP state = 2

Then the device is unusable until reset.

As a workaround, make sure that the transfer callback thread does not call libusb_submit_transfer() if in the process if terminating.

In hackrf_libusb_transfer_callback(), do not submit transfer if transfer thread is not started:


if( device->callback(&transfer) == 0 )
{
    if( device->transfer_thread_started == false ) {
        return;
    }

    if( libusb_submit_transfer(usb_transfer) < 0)
    {
        request_exit(device);
    }else {
        return;
    }
}else {
    request_exit(device);
}

Eric

@ehoffman2
Copy link

ehoffman2 commented Dec 1, 2020

To add...

Alternatively, to be in line with other place in the code, the check should probably be

if( device->do_exit != false )

Edit:
Proposed patch

diff --git a/host/libhackrf/src/hackrf.c b/host/libhackrf/src/hackrf.c
index bc1d5fe..710dad2 100644
--- a/host/libhackrf/src/hackrf.c
+++ b/host/libhackrf/src/hackrf.c
@@ -1499,6 +1499,10 @@ static void LIBUSB_CALL hackrf_libusb_transfer_callback(struct libusb_transfer*
 {
        hackrf_device* device = (hackrf_device*)usb_transfer->user_data;
 
+       if( device->do_exit != false ) {
+               return;
+       }
+
        if(usb_transfer->status == LIBUSB_TRANSFER_COMPLETED)
        {
                hackrf_transfer transfer = {

@ehoffman2
Copy link

Added more info on kernel thread https://bugzilla.kernel.org/show_bug.cgi?id=202541#c137

@koparebu
Copy link
Author

koparebu commented Dec 4, 2020

Hi,

A few people on the kernel bugtracker seemed to point the issue toward Ryzen B350 chipset. Does anyone having the issue here also have B350? I do have ASUS ROG STRIX B350-F GAMING motherboard.

In my case it was ASUS B550, so it may be related to this family of chipsets, or to the manufacturer itself.

@straithe straithe added technical support request for technical support enhancement potential new feature labels Jan 24, 2021
@selukov
Copy link

selukov commented Apr 7, 2021

I have a mistake too, connection drops at all USB-ports

[  263.881013] xhci_hcd 0000:02:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state

but if I use VirtualBox with OS LINUX(pentoo-full-amd64-hardened-2021.0_p20210402) or Windows 7 and pass the USB with hackrfOne the error does not occur. The HarkFR one device works stably

Mother board asus rog strix b550-f gaming (wi-fi)
CPU AMD Ryzen 9 3900XT
GRUB_CMDLINE_LINUX_DEFAULT="amd_iommu=on acpi_enforce_resources=lax quiet splash"
linux-firmware  1.187.10
kernel 5.4.0-70.78 or 5.8.0-48.54~20.04.1
virtualbox-6.1 6.1.18-142142~Ubuntu~eoan
libhackrf version: unknown (0.5)
Firmware Version: 2018.01.1 (API:1.02)

@selukov
Copy link

selukov commented Apr 7, 2021

libhackrf.so.0.6.0 -- fix my problem

apt install libpthread-stubs0-dev libusb-1.0-0-dev
git clone https://github.com/mossmann/hackrf.git
cd hackrf/host
mkdir build
cd build
cmake ..
make
sudo make install
sudo ldconfig

@straithe
Copy link
Member

@koparebu are you still experiencing this issue?

@straithe straithe self-assigned this Apr 24, 2021
@straithe
Copy link
Member

I'm going to close this as there hasn't been a response in a while, but please re-open this issue or open a new one if you still need assistance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement potential new feature technical support request for technical support
Projects
None yet
Development

No branches or pull requests

8 participants