Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

USB failing after several hours - xhci_hcd 0000:01:00.0: Ring expansion failed #5088

Closed
Vivivil opened this issue Jul 8, 2022 · 41 comments
Closed

Comments

@Vivivil
Copy link

Vivivil commented Jul 8, 2022

Describe the bug

Hi, I was hoping someone here may be able to help me.

I've been using a disk image (Buster) that was working fine with RPI4 R1.4

However since the RPI4 R1.5's have started to filter through I've had to update the system to get the disk image to start up.
This seems to have caused an issue with my USB FTDI serial adapter dropping off between maybe 8 - 14 hours and requiring either a hardware reboot or an unplug and re-plug of the serial adapter.

A dmesg while the USB has stopped shows that the system is being flooded by clock change errors ie:

[73254.053318] xhci_hcd 0000:01:00.0: Ring expansion failed
[73254.053346] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[73254.125213] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[73254.209306] xhci_hcd 0000:01:00.0: Ring expansion failed
[73254.209328] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[73254.348480] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12

From what I've read it seems like the -12 relates to a memory issue, however I'm running 16GB disk image with 53% free and the memory usage is around 444mb/3.33G. So I can't see that being the issue

Here's a few things I've attempted to get it working:
Tried on Kernel 5.10, 5.15 both produce the same issue.
Set the coherent_pool=4M in config file to see if that helps by mirroring the older firmware I used - Still happens in same timeframe.
Updated Bootloader.
Changed PI to another Rev1.5 in case it was a board issue
Attempted on an older pi Rev1.4 to see if it was newer board issue, with my updated disk image the same issue occurs.
Tried to replace the start4.elf and the other boot files from the later disk image to my older disk image, not including kernel, I might not have done this one correctly but I'm assuming they don't play nicely together being mismatched, (it broke things).
Rebuilt on new OS (Bullseye) with latest updates/upgrades, last around 8 hours longer before it drops off again.

This is the closest I could find of a similar issue, which doesn't seem to come to any real conclusion:
#3479

Oddly enough trying to run a test script with some neopixel LEDs I get the following message:

raise RuntimeError('ws2811_init failed with code {0} ({1})'.format(resp, str_resp))
RuntimeError: ws2811_init failed with code -2 (Out of memory)

In addition to this I've discovered that if I run the following:

sudo sh -c "echo 1 > /sys/bus/pci/devices/0000:00:00.0/remove"
sudo sh -c "echo 1 > /sys/bus/pci/rescan"

As an attempt to get the USB host to reset, seemed to clear what every memory issues was previously there and allowed me to start the pixel test script and also stopped the error spam with the changing of clock frequency, this however did not redetect the FTDI serial adapter that was connected and it dropped off completely.

My latest attempt at resolving this is now using the 64bit Kernel and I will report back on how this goes.

Would appreciate any help or things that I could try to get this resolved or at the very least narrow down why it's happening in the first place.
Many Thanks in advance.

Steps to reproduce the behaviour

Unknown. I plug in USB 232 serial adapter and read from via python script, wait 4-48 hours.
USB will stop reading and dmesg gets spammed with errors.

Device (s)

Raspberry Pi 4 Mod. B

System

Previous Working Kernel:

Linux pi 4.19.97-v7l+ #1294 SMP Thu Jan 30 13:21:14 GMT 2020 armv7l GNU/Linux

Latest with Issue:

Linux rpi 5.15.32-v7l+ #1538 SMP Thu Mar 31 19:39:41 BST 2022 armv7l GNU/Linux
#lsusb -t
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 5000M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/1p, 480M
    |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M
        |__ Port 2: Dev 3, If 1, Class=Human Interface Device, Driver=usbhid, 12M
        |__ Port 2: Dev 3, If 0, Class=Human Interface Device, Driver=usbhid, 12M
        |__ Port 4: Dev 4, If 0, Class=Vendor Specific Class, Driver=ftdi_sio, 12M
#lsusb
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 004: ID 0403:6001 Future Technology Devices International, Ltd FT232 Serial (UART) IC
Bus 001 Device 003: ID 03eb:214e Atmel Corp. 
Bus 001 Device 002: ID 2109:3431 VIA Labs, Inc. Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

Logs

[79710.013177] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79710.176265] xhci_hcd 0000:01:00.0: Ring expansion failed
[79710.176397] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79711.196834] xhci_hcd 0000:01:00.0: Ring expansion failed
[79711.197013] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79711.387118] xhci_hcd 0000:01:00.0: Ring expansion failed
[79711.387220] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79711.590789] xhci_hcd 0000:01:00.0: Ring expansion failed
[79711.590814] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79711.626451] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[79711.697099] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[79712.557077] xhci_hcd 0000:01:00.0: Ring expansion failed
[79712.557252] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79712.735030] xhci_hcd 0000:01:00.0: Ring expansion failed
[79712.735092] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79712.899047] xhci_hcd 0000:01:00.0: Ring expansion failed
[79712.899074] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79712.996485] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[79712.998423] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[79713.979218] xhci_hcd 0000:01:00.0: Ring expansion failed
[79713.979277] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79714.169234] xhci_hcd 0000:01:00.0: Ring expansion failed
[79714.169269] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79714.336196] xhci_hcd 0000:01:00.0: Ring expansion failed
[79714.336258] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79715.371434] xhci_hcd 0000:01:00.0: Ring expansion failed
[79715.371485] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79715.543992] xhci_hcd 0000:01:00.0: Ring expansion failed
[79715.544072] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79715.719450] xhci_hcd 0000:01:00.0: Ring expansion failed
[79715.719476] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79716.666478] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[79716.715792] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[79716.743567] xhci_hcd 0000:01:00.0: Ring expansion failed
[79716.743592] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79716.905628] xhci_hcd 0000:01:00.0: Ring expansion failed
[79716.905654] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79717.067880] xhci_hcd 0000:01:00.0: Ring expansion failed
[79717.067921] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79718.041774] xhci_hcd 0000:01:00.0: Ring expansion failed
[79718.041801] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79718.237846] xhci_hcd 0000:01:00.0: Ring expansion failed
[79718.237914] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79718.401208] xhci_hcd 0000:01:00.0: Ring expansion failed
[79718.401309] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79719.438021] xhci_hcd 0000:01:00.0: Ring expansion failed
[79719.438061] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79719.622485] xhci_hcd 0000:01:00.0: Ring expansion failed
[79719.622574] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79719.817889] xhci_hcd 0000:01:00.0: Ring expansion failed
[79719.818039] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79720.826277] xhci_hcd 0000:01:00.0: Ring expansion failed
[79720.826345] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79721.012315] xhci_hcd 0000:01:00.0: Ring expansion failed
[79721.012344] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79721.184273] xhci_hcd 0000:01:00.0: Ring expansion failed
[79721.184323] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79721.656574] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[79721.737147] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[79722.174484] xhci_hcd 0000:01:00.0: Ring expansion failed
[79722.174511] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79722.338479] xhci_hcd 0000:01:00.0: Ring expansion failed
[79722.338518] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79722.506646] xhci_hcd 0000:01:00.0: Ring expansion failed
[79722.506800] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79723.498602] xhci_hcd 0000:01:00.0: Ring expansion failed
[79723.498629] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79723.668693] xhci_hcd 0000:01:00.0: Ring expansion failed
[79723.668720] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79723.840706] xhci_hcd 0000:01:00.0: Ring expansion failed
[79723.840730] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79724.826872] xhci_hcd 0000:01:00.0: Ring expansion failed
[79724.826997] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79724.996958] xhci_hcd 0000:01:00.0: Ring expansion failed
[79724.997148] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79725.164853] xhci_hcd 0000:01:00.0: Ring expansion failed
[79725.164909] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79726.193101] xhci_hcd 0000:01:00.0: Ring expansion failed
[79726.193128] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79726.379373] xhci_hcd 0000:01:00.0: Ring expansion failed
[79726.379420] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79726.542444] xhci_hcd 0000:01:00.0: Ring expansion failed
[79726.542517] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79726.706640] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[79726.816875] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[79727.547519] xhci_hcd 0000:01:00.0: Ring expansion failed
[79727.547667] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79727.727380] xhci_hcd 0000:01:00.0: Ring expansion failed
[79727.727508] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79727.891358] xhci_hcd 0000:01:00.0: Ring expansion failed
[79727.891385] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79728.919568] xhci_hcd 0000:01:00.0: Ring expansion failed
[79728.919635] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79729.101772] xhci_hcd 0000:01:00.0: Ring expansion failed
[79729.101800] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79729.257746] xhci_hcd 0000:01:00.0: Ring expansion failed
[79729.257789] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79730.299676] xhci_hcd 0000:01:00.0: Ring expansion failed
[79730.299723] ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
[79730.336683] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[79730.337995] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[79730.483732] xhci_hcd 0000:01:00.0: Ring expansion failed
[79768.907163] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[79769.050667] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-arm frequency: -12
[79773.267203] raspberrypi_fw_set_rate: 8 callbacks suppressed
[79773.267226] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[79774.488489] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[79774.607247] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[79774.661073] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12
[79774.787223] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-v3d frequency: -12

Additional context

dtparam=i2c_arm=on
dtparam=spi=on

[pi4]
# Run as fast as firmware / board allows
arm_boost=1

[all]
enable_uart=1
gpu_mem=256
dtoverlay=vpe-2701c
dtoverlay=i2c-rtc,mcp7940x
dtoverlay=disable-bt

Edit: Formatting

@P33M
Copy link
Contributor

P33M commented Jul 11, 2022

This has been cross-posted to the forum here: https://forums.raspberrypi.com/viewtopic.php?t=336947
Setting gpu_mem is unnecessary on a Pi 4, and you should remove it.

Some condition is causing the xhci driver to think that it needs to expand a transfer ring, which it should almost never need to do. This has been seen before (and fixed) in #3919.

Linux pi 4.19.97-v7l+ 

Linux rpi 5.15.32-v7l+

Which kernel version is running when the out-of-memory errors occur?

Also, please post the results of the commands in #3919 (comment) .

@Vivivil
Copy link
Author

Vivivil commented Jul 11, 2022

Thanks for coming back to me on this, I very much appreciate it.

I've currently been running stable for almost 4 days after switching to the 64bit version of the kernel. I'm hoping that this has resolved the issue rather than just delayed it.

When I find a moment I'll try and set up an additional device for testing the original problem and following the steps posted in the above link, certainly an interesting read and I apologise for not being able to find it myself!

I've removed the gpu_mem setting and my applications are running as expected on my test unit, so that's great thanks. Must have been a relic from the past!

Linux rpi 5.15.32-v7l+ is the version I was having problems with, I had the same issues with a 5.10 version as well. The 4.19.97 had been working fine for years running the same application.

@Vivivil
Copy link
Author

Vivivil commented Jul 11, 2022

As an update, I've run the suggested trbs command on the 64bit currently running build and here's the result:

512 /sys/kernel/debug/usb/xhci/0000:01:00.0/devices/03/ep00/trbs
65536 /sys/kernel/debug/usb/xhci/0000:01:00.0/devices/03/ep02/trbs
512 /sys/kernel/debug/usb/xhci/0000:01:00.0/devices/03/ep03/trbs

Looking at this in comparison to the linked thread, the 65536 seems like its bad and has expanded from the 512 that it should be. At this point I'm assuming I'll be expecting the error to reappear in the next day or 2.

Following in the footsteps of the previous solution:

# lsusb:
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 004: ID 0403:6001 Future Technology Devices International, Ltd FT232 Serial (UART) IC
Bus 001 Device 003: ID 03eb:214e Atmel Corp. 
Bus 001 Device 002: ID 2109:3431 VIA Labs, Inc. Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
# sudo lsusb -v -d 0403:

Bus 001 Device 004: ID 0403:6001 Future Technology Devices International, Ltd FT232 Serial (UART) IC
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               2.00
  bDeviceClass            0 
  bDeviceSubClass         0 
  bDeviceProtocol         0 
  bMaxPacketSize0         8
  idVendor           0x0403 Future Technology Devices International, Ltd
  idProduct          0x6001 FT232 Serial (UART) IC
  bcdDevice            6.00
  iManufacturer           1 FTDI
  iProduct                2 USB-RS485 Cable
  iSerial                 3 FT54KAOF
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength       0x0020
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          0 
    bmAttributes         0x80
      (Bus Powered)
    MaxPower              300mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           2
      bInterfaceClass       255 Vendor Specific Class
      bInterfaceSubClass    255 Vendor Specific Subclass
      bInterfaceProtocol    255 Vendor Specific Protocol
      iInterface              2 USB-RS485 Cable
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0040  1x 64 bytes
        bInterval               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x02  EP 2 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0040  1x 64 bytes
        bInterval               0
Device Status:     0x0000
  (Bus Powered)

If you need any further information, I'm happy to provide.

@P33M
Copy link
Contributor

P33M commented Jul 13, 2022

Do you have a minimal python script that will run on a basic Pi OS install that stimulates the ring expansion?

@Vivivil
Copy link
Author

Vivivil commented Jul 14, 2022

So after a hefty bit of playing around, it looks as though for the trbs to expand as expected I need the modbus device I'm using actually connected to the usb serial otherwise I can't seem to replicate the issue. I think if I'm going to stand a chance of getting this fixed I'll have to send in the modbus device that i'm using and the usb converter in your direction, is that at all possible?

Haven't been able to test on a basic pi OS just yet, but my build has very minimal surface level changes.

It may be possible to to use the below script to replicate this if you have a modbus device that you can talk to via rs485, unfortunately I don't have enough hardware available to test alternatives at the moment.

This script seems to do the trick to expand the buffer though, just requires:

python3 -m pip install pymodbus

Should blow through the 512 mark quite rapidly, will loop through 100 reads then check the trbs for a change in size then will print out the size and time it was detected.

from pymodbus.client.sync import ModbusSerialClient
from datetime import datetime
from subprocess import check_output
from time import sleep

# Establish port using pymodbus, timeout at 0.01 to speed things up if not connected to modbus device

port = ModbusSerialClient(method='rtu', port="/dev/ttyUSB0", baudrate=19200, timeout=0.01, parity='N', stopbits=1, bytesize=8)

# Read from an address of the modbus device
def read_serial():
    try:
        address = 34000
        val = port.read_input_registers(address=address, count=1, unit=100)
    except Exception as e:
        print(e)

class ReadTrbs():
    def __init__(self):
        self.current_trbs_size = 0
        self.read_count = 0

    def check_trbs(self):
        # After 100 loops of reading modbus, check trbs size
        if self.read_count >= 100:
            self.read_count = 0
            trbs_check = check_output(["sudo", "wc", "-l", "/sys/kernel/debug/usb/xhci/0000:01:00.0/devices/03/ep02/trbs"])
            trbs_size = int(trbs_check.decode('utf-8').split(" ")[0])

            # Check if trbs size has increased since last check then print out if it has.
            if self.current_trbs_size != trbs_size:
                self.current_trbs_size = trbs_size
                print("TRBS Size Updated - Size:{} - {}".format(trbs_size, datetime.now()))
        else:
            self.read_count += 1


trbs = ReadTrbs()

# Set current TRBS
trbs.check_trbs()

while True:
    read_serial()
    trbs.check_trbs()

For the mean time as a temporary fix I've thrown together script that's just checking the trbs size and if it exceeds X size then use usbutils to reset the device which also resets the trbs.

@P33M
Copy link
Contributor

P33M commented Jul 19, 2022

Modbus RTU is just a framed serial protocol. I doubt the actual messages matter, just the fact that bytes are being received by the UART. If you have two rs-485 adapters, can you generate messages on a different Pi and cause the ring expansion on the board running the test script?

@Jorl17
Copy link

Jorl17 commented Sep 7, 2022

We have also bumped into this problem using the Modbus protocol and an FTDI serial adapter.

Other than the proposed USB reset, which we are experimenting with right now (hopefully it will work!), does anyone know of any workarounds or adjustments to kernel/driver parameters that might help? We have several application-based ideas (such as reducing message frequency), but I'd be interested in knowing if anyone has any other ideas.

@P33M
Copy link
Contributor

P33M commented Sep 7, 2022

It must be something specific in how the FTDI driver is manipulating its URBs. RS232-mode FTDI serial converters don't have this issue, or at least nobody has reported that they do, and these are far more widely used than RS485 converters.

@Jorl17
Copy link

Jorl17 commented Sep 7, 2022

I wonder if extending the fix from df30883 to support this device does anything. If I get some time I might try it out, though I doubt it works.

@P33M
Copy link
Contributor

P33M commented Sep 7, 2022

The patch is entirely driver-agnostic (and mentioned earlier in the thread). This doesn't discount the possibility that there might be some other ring condition that triggers expansion that the patch doesn't cater for.

@Jorl17
Copy link

Jorl17 commented Sep 7, 2022

I saw it mentioned earlier in the thread. However, I thought that because of this:

dev->vendor == PCI_VENDOR_ID_VIA && pdev->device == 0x3483

it might be specific for that device. But if you say it is driver agnostic, I believe you.

@mhartoft
Copy link

mhartoft commented Sep 9, 2022

It must be something specific in how the FTDI driver is manipulating its URBs. RS232-mode FTDI serial converters don't have this issue, or at least nobody has reported that they do, and these are far more widely used than RS485 converters.

We are facing this issue with an 8-port RS232 FTDI serial converter (StarTech ICUSB2328I).

# lsusb
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 012: ID 0403:6001 Future Technology Devices International, Ltd FT232 Serial (UART) IC
Bus 001 Device 011: ID 0403:6001 Future Technology Devices International, Ltd FT232 Serial (UART) IC
Bus 001 Device 010: ID 1a40:0201 Terminus Technology Inc. FE 2.1 7-port Hub
Bus 001 Device 009: ID 0403:6001 Future Technology Devices International, Ltd FT232 Serial (UART) IC
Bus 001 Device 008: ID 0403:6001 Future Technology Devices International, Ltd FT232 Serial (UART) IC
Bus 001 Device 007: ID 0403:6001 Future Technology Devices International, Ltd FT232 Serial (UART) IC
Bus 001 Device 006: ID 0403:6001 Future Technology Devices International, Ltd FT232 Serial (UART) IC
Bus 001 Device 005: ID 0403:6001 Future Technology Devices International, Ltd FT232 Serial (UART) IC
Bus 001 Device 004: ID 0403:6001 Future Technology Devices International, Ltd FT232 Serial (UART) IC
Bus 001 Device 003: ID 1a40:0201 Terminus Technology Inc. FE 2.1 7-port Hub
Bus 001 Device 002: ID 2109:3431 VIA Labs, Inc. Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
# lsusb -t
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 5000M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/1p, 480M
    |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M
        |__ Port 2: Dev 3, If 0, Class=Hub, Driver=hub/7p, 480M
            |__ Port 1: Dev 4, If 0, Class=Vendor Specific Class, Driver=ftdi_sio, 12M
            |__ Port 2: Dev 5, If 0, Class=Vendor Specific Class, Driver=ftdi_sio, 12M
            |__ Port 3: Dev 6, If 0, Class=Vendor Specific Class, Driver=ftdi_sio, 12M
            |__ Port 4: Dev 7, If 0, Class=Vendor Specific Class, Driver=ftdi_sio, 12M
            |__ Port 5: Dev 8, If 0, Class=Vendor Specific Class, Driver=ftdi_sio, 12M
            |__ Port 6: Dev 9, If 0, Class=Vendor Specific Class, Driver=ftdi_sio, 12M
            |__ Port 7: Dev 10, If 0, Class=Hub, Driver=hub/7p, 480M
                |__ Port 1: Dev 11, If 0, Class=Vendor Specific Class, Driver=ftdi_sio, 12M
                |__ Port 2: Dev 12, If 0, Class=Vendor Specific Class, Driver=ftdi_sio, 12M

Example dmesg error log after arbitrary uptime:

# journalctl -k -b -1
[...]
Sep 09 06:58:48 rpi kernel: xhci_hcd 0000:01:00.0: Ring expansion failed
Sep 09 06:58:48 rpi kernel: ftdi_sio ttyUSB6: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
Sep 09 06:58:48 rpi kernel: raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-arm frequency: -12
Sep 09 06:58:48 rpi kernel: raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-arm frequency: -12
Sep 09 06:58:49 rpi kernel: xhci_hcd 0000:01:00.0: Ring expansion failed
Sep 09 06:58:49 rpi kernel: ftdi_sio ttyUSB4: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
Sep 09 06:58:49 rpi kernel: xhci_hcd 0000:01:00.0: Ring expansion failed
Sep 09 06:58:49 rpi kernel: ftdi_sio ttyUSB1: usb_serial_generic_submit_read_urb - usb_submit_urb failed: -12
Sep 09 06:58:49 rpi kernel: raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-arm frequency: -12
[...]

All ports seem to fail simultaneously. usbreset is not a feasible option for us.
This occured on current stock stable kernel for Raspberry Pi OS 32-bit, and persists after compiling and installing kernel branch rpi-5.19.y (HEAD was 4ba8fe5) which includes df30883.

I am happy to provide any further information as needed.

@P33M
Copy link
Contributor

P33M commented Sep 9, 2022

How are the ports being used? Do you get the out-of-memory condition if a single port is used? What userspace program is accessing the ports?

@mhartoft
Copy link

mhartoft commented Sep 9, 2022

A Python app spawns a thread per port with a loop structured like this to check if a serial device is connected:

import serial
import time
while True:
  try:
    with serial.Serial(interface, baudrate, timeout=1) as ser:
      ser.write("PING")
      res = ser.readline()
  except:
    pass
  time.sleep(2)

I get the OOM condition even if no device is ever connected. The ports just stop working after an arbitrary amount of time. Everything works fine until then, and devices can connect, communicate and disconnect at will.
For context, the same Python application has been deployed to Windows boxes with the same 8-port converters for years without this or similar issue.
The system has only used about 100MB of its 4GB RAM when the issue has occurred, if that is of interest.

I have not yet tried with only one thread for a single port.

@P33M
Copy link
Contributor

P33M commented Sep 9, 2022

The python fragment appears to be incomplete. Can you post a self-contained testcase that you know will stimulate the bug?

  File "serial.py", line 1, in <module>
    import serial
  File "/home/pi/serial.py", line 5, in <module>
    with serial.Serial(interface, baudrate, timeout=1) as ser:
AttributeError: module 'serial' has no attribute 'Serial'

@mhartoft
Copy link

mhartoft commented Sep 9, 2022

Of course.
I will construct one, wait for the error to occur to verify, and edit this comment.
EDIT: Successfully recreated the issue with the following python script "test.py", only targeting a single port. The port in question never has a physical device connected.

import serial
import time
import os

while True:
    try:
        with serial.Serial('/dev/ttyUSB0', 38400, timeout=1) as dev:
            dev.write('PING\n'.encode('utf-8'))
            response = dev.readline()
    except (serial.SerialException) as e:
        if str(e).find('Cannot allocate memory'):
            print('OOM error has occured. Exiting')
            os._exit(1)
        pass

    time.sleep(0.6)

The error you got from your fragment looks like a naming collision between module "serial" and your filename "serial.py". But to be sure, this is my environment:

# sudo apt install python3 python3-serial
# python3 --version
Python 3.9.2
# python3 test.py
OOM error has occured. Exiting

(after 12-18 hours). Like @Jorl17 this does lots of opens and closes.
As per #3919 (comment) here are the trbs afterward:

root# for d in /sys/kernel/debug/usb/xhci/0000:01:00.0/devices/*/*; do if [ -d "$d" ]; then cd $d; echo "${d/?*\/devices\//}: $(wc -l trbs)"; fi done
01/ep00: 512 trbs
01/ep02: 512 trbs
02/ep00: 512 trbs
02/ep02: 512 trbs
03/ep00: 512 trbs
03/ep02: 32768 trbs
03/ep03: 512 trbs
04/ep00: 512 trbs
04/ep02: 512 trbs
04/ep03: 512 trbs
05/ep00: 512 trbs
05/ep02: 512 trbs
05/ep03: 512 trbs
06/ep00: 512 trbs
06/ep02: 512 trbs
06/ep03: 512 trbs
07/ep00: 512 trbs
07/ep02: 512 trbs
07/ep03: 512 trbs
08/ep00: 512 trbs
08/ep02: 512 trbs
08/ep03: 512 trbs
09/ep00: 512 trbs
09/ep02: 512 trbs
10/ep00: 512 trbs
10/ep02: 512 trbs
10/ep03: 512 trbs
11/ep00: 512 trbs
11/ep02: 512 trbs
11/ep03: 512 trbs

I have no expertise regarding what might cause this error in either of the USB drivers (or hardware), but stumbled upon this and couldn't tell from the current implementation if this reset is present.

@Jorl17
Copy link

Jorl17 commented Sep 9, 2022

For reference, this is also our scenario. We have a single serial port (bar the ones we have for an LTE modem) where we are running commands very frequently, and, as they are run, the trbs size just keeps on increasing.

Lowering the frequency of updates has a very clear impact. We changed it from 0.2s to 1s and the trbs is not growing so fast -- though it is certainly growing.

One additional situation we have is that our serial port is being opened and closed very fast, instead of being kept open all the time. Every time we send or receive data, we're opening it and closing it.

@mhartoft
Copy link

usbreset doesn't seem to fix when applied to a single port (using 3-digit bus and device number - BBB/DDD - as argument). Would introducing a separate USB hub change anything - if the issue is VL805-related?

@P33M
Copy link
Contributor

P33M commented Sep 19, 2022

usbreset doesn't seem to fix when applied to a single port (using 3-digit bus and device number - BBB/DDD - as argument). Would introducing a separate USB hub change anything - if the issue is VL805-related?

No. The device context needs to be destroyed and reallocated for the leak to be reset.

I can replicate the leak with the script posted above (note: github doesn't ping email notifications on comment edits). The first expansion happens in about 2-3 minutes, the second after 10+, so it's either random or the requisite interval doubles each time.

@P33M
Copy link
Contributor

P33M commented Sep 20, 2022

The bug is effectively random, but only occurs immediately after a Link TRB is traversed. I get this odd condition in room_on_ring():
xhci_hcd 0000:01:00.0: room_on_ring B num_trbs_free 248 num_trbs 1 deq_seg 248
Somehow a) we have half the TRBs that we should have in the condition check, and b) the driver thinks there's 248 TRBs in a dequeue segment (the TRBs forming the current TD).

Edit: the link TRB appears to be coincidental. There's a steady leak of num_free_trbs each time we stop and restart the bulk IN endpoint, which eventually breaks the condition check.

@P33M
Copy link
Contributor

P33M commented Sep 21, 2022

Spent several hours looking at trace logs and got mightily confused until I realised there's two "bugs" happening simultaneously.

usb-serial submits multiple URBs for the IN endpoint and this is the primary cause of the leak. In addition to this, there's a race between URB dequeue and resubmission - while the port is being closed, there's nothing stopping the callback from resubmitting the next URB in sequence, but this doesn't result in a leak when the next URB gets dequeued from the transfer ring.

When the port is shut down, the first dequeue request must stop the endpoint ring because the IN urb has been claimed by the xhci hardware. This process happens asynchronously, so other URBs can sneak in afterwards (and ring the endpoint doorbell, undoing the stop-endpoint command). Eventually after 1-2 Stop Endpoint commands the state ends up consistent with no more reubmissions.

Somewhere in the first call to Stop Endpoint and Set TR Dequeue Pointer is where the leak happens. It doesn't take into account the fact that if the dequeue pointer is moved past all of the TDs on the ring (on to the first TRB with a cycle state bit != the consumer cycle state), it needs to return those TRBs in the process.

If I change the number of URBs here: https://github.com/raspberrypi/linux/blob/rpi-5.15.y/include/linux/usb/serial.h#L92
then the bug shifts. A value of 1 prevents a leak, a value of 8 makes the leak happen faster.

Oh and the bug isn't specific to the VL805. The dwc3 xhci controller on the Pi's USB-C port is affected as well.

@P33M
Copy link
Contributor

P33M commented Sep 21, 2022

One obvious bug was in xhci_move_dequeue_past_td() where the ring state was advanced but not accounted for. With this fixed, I still get occasional drops across device open/close which makes me think there's a genuine race condition happening. In particular, I'm suspicious of td->cancel_status which attempts to keep track of whether the xHC has read (and cached) the TD. It has several states, touched in many places in the driver.

@P33M
Copy link
Contributor

P33M commented Sep 22, 2022

I think I've found the cause - there's a large number of gotchas when a URB is removed that is ahead of the ring's current dequeue pointer, i.e. the xHC hasn't yet processed the TD associated with it. This frequently happens if you have a mostly-idle endpoint such as a serial port. Removing an un-processed TD invalidates it by turning it into a no-op instead of moving the dequeue pointer (which was the case above), but doesn't account for the TRBs that got binned.

Strictly speaking, the space isn't "free" because if you subsequently moved the dequeue pointer to the end of the current cycle, or added more TDs before restarting the ring, the free count vs the actual pointer position would diverge.

@P33M
Copy link
Contributor

P33M commented Sep 22, 2022

@Jorl17 @mhartoft can you try compiling from this branch: https://github.com/P33M/linux/tree/trb_leak_fix

and testing with your hardware. Here, on either the vl805 or dwc3 controllers on a Pi 4, I no longer get stray num_trbs_free counts across 1, 2, 8 or 56 usb-serial bulk URBs pending.

@Jorl17
Copy link

Jorl17 commented Sep 22, 2022

Do you know if I can get away with just compiling some specific modules so as to not fully recompile the kernel? From an operational perspective it will be difficult

@P33M
Copy link
Contributor

P33M commented Sep 22, 2022

xhci_hcd is built-in, so /boot/kernelN.img has to be updated, and the machine has to be rebooted. It goes without saying that you shouldn't test this on a board you care about accessing remotely.

@mhartoft
Copy link

@Jorl17 @mhartoft can you try compiling from this branch: https://github.com/P33M/linux/tree/trb_leak_fix

Yes, I am on it

@mhartoft
Copy link

There is no leak detectable in shorter timeframes where it would definitely show without the patch. Will leave it running and continually monitor over the weekend. The ports are functional, and serial communication is flowing 🎉

@Jorl17
Copy link

Jorl17 commented Sep 24, 2022

@P33M Thank you very much for your hard work!

From what you said it seems like this affected all kinds of serial ports, right?

While we noticed this with the FTDI adapter, we also see it (albeit much slower) with a serial associated to an LTE modem.

Thanks again!

@P33M
Copy link
Contributor

P33M commented Sep 26, 2022

The bug is exercised by a driver indirectly manipulating endpoint ring state, and is not limited to a particular class of driver. For example one that frequently issues queued commands over a bulk or control pipe that end in stall responses would also hit the bug.

@Kappers
Copy link

Kappers commented Jun 16, 2023

Probably - @P33M's patch went into rpi-5.15.y at 5.15.70.

I'm on bad hardware to test Yocto (need to set up a VM I suspect) but recursively cloning trunk resolves the kernel .bb file to 5.15.34. Sad times, I'll see if I can get the dependency bumped.

Thought I'd also share this related-looking thread: https://bugzilla.kernel.org/show_bug.cgi?id=217242

@pelwell
Copy link
Contributor

pelwell commented Jun 16, 2023

Here are the most relevant portions of that bug:

Thanks, I see whats going on now.
Some transfers that are further ahead on the ring can simply be turned to no-op trbs by driver when cancelled. These are not added back to num_trbs_free.
This is the case when several URBs are queued for an endpoint and then cancelled in reverse order.
I have a untested fix for this that goes on top of previous debug patch.

--- a/drivers/usb/host/xhci-ring.c	
+++ a/drivers/usb/host/xhci-ring.c	
@@ -2214,6 +2214,7 @@ static int finish_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
 		     u32 trb_comp_code)
 {
 	struct xhci_ep_ctx *ep_ctx;
+	int trbs_freed;
 
 	ep_ctx = xhci_get_ep_ctx(xhci, ep->vdev->out_ctx, ep->ep_index);
 
@@ -2283,9 +2284,12 @@ static int finish_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
 	}
 
 	/* Update ring dequeue pointer */
+	trbs_freed = xhci_num_trbs_to(ep_ring->deq_seg, ep_ring->dequeue,
+				      td->last_trb_seg, td->last_trb,
+				      ep_ring->num_segs);
+	ep_ring->num_trbs_free += trbs_freed;
 	ep_ring->dequeue = td->last_trb;
 	ep_ring->deq_seg = td->last_trb_seg;
-	ep_ring->num_trbs_free += td->num_trbs - 1;
 	inc_deq(xhci, ep_ring);
 
 	return xhci_td_cleanup(xhci, td, ep_ring, td->status);

popcornmix pushed a commit that referenced this issue Jun 16, 2023
If a ring has a number of TDs enqueued past the dequeue pointer, and the
URBs corresponding to these TDs are dequeued, then num_trbs_free isn't
updated to show that these TDs have been converted to no-ops and
effectively "freed". This means that num_trbs_free creeps downwards
until the count is exhausted, which then triggers xhci_ring_expansion()
and effectively leaks memory by infinitely growing the transfer ring.

This is commonly encounted through the use of a usb-serial port where
the port is repeatedly opened, read, then closed.

Move the num_trbs_free crediting out of the Set TR Dequeue Pointer
handling and into xhci_invalidate_cancelled_tds().

There is a potential for overestimating the actual space on the ring if
the ring is nearly full and TDs are arbitrarily enqueued by a device
driver while it is dequeueing them, but dequeues are usually batched
during device close/shutdown or endpoint error recovery.

See #5088

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
@nebbles
Copy link

nebbles commented Jun 16, 2023

Having found this upstream discussion, which we believe is a reproduction of our issue, we're interested to see it was patched recently here.

It was backported to 5.15.117, but now we're stuck until raspberrypi-linux integrates it so we can use in a downstream OS.

What's the usual pattern/timeline for integrating the latest changes from upstream into 5.15.y?

@bournehlm
Copy link

I have the same problem, LTE ttyUSB serial port trb leaks, LTE module --> vl805 --> CM4.
How to backport to 4.19.108 kernel?

popcornmix pushed a commit that referenced this issue Jun 20, 2023
If a ring has a number of TDs enqueued past the dequeue pointer, and the
URBs corresponding to these TDs are dequeued, then num_trbs_free isn't
updated to show that these TDs have been converted to no-ops and
effectively "freed". This means that num_trbs_free creeps downwards
until the count is exhausted, which then triggers xhci_ring_expansion()
and effectively leaks memory by infinitely growing the transfer ring.

This is commonly encounted through the use of a usb-serial port where
the port is repeatedly opened, read, then closed.

Move the num_trbs_free crediting out of the Set TR Dequeue Pointer
handling and into xhci_invalidate_cancelled_tds().

There is a potential for overestimating the actual space on the ring if
the ring is nearly full and TDs are arbitrarily enqueued by a device
driver while it is dequeueing them, but dequeues are usually batched
during device close/shutdown or endpoint error recovery.

See #5088

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
@pelwell
Copy link
Contributor

pelwell commented Jun 20, 2023

We aren't expecting to build any kernels from rpi-5.15.y, so it is no longer being updated as a matter of course. A quick check has shown there are several merge conflicts, so it's more than just merge and push.

Have you considered moving to 6.1?

popcornmix pushed a commit that referenced this issue Jun 22, 2023
If a ring has a number of TDs enqueued past the dequeue pointer, and the
URBs corresponding to these TDs are dequeued, then num_trbs_free isn't
updated to show that these TDs have been converted to no-ops and
effectively "freed". This means that num_trbs_free creeps downwards
until the count is exhausted, which then triggers xhci_ring_expansion()
and effectively leaks memory by infinitely growing the transfer ring.

This is commonly encounted through the use of a usb-serial port where
the port is repeatedly opened, read, then closed.

Move the num_trbs_free crediting out of the Set TR Dequeue Pointer
handling and into xhci_invalidate_cancelled_tds().

There is a potential for overestimating the actual space on the ring if
the ring is nearly full and TDs are arbitrarily enqueued by a device
driver while it is dequeueing them, but dequeues are usually batched
during device close/shutdown or endpoint error recovery.

See #5088

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
popcornmix pushed a commit that referenced this issue Jun 26, 2023
If a ring has a number of TDs enqueued past the dequeue pointer, and the
URBs corresponding to these TDs are dequeued, then num_trbs_free isn't
updated to show that these TDs have been converted to no-ops and
effectively "freed". This means that num_trbs_free creeps downwards
until the count is exhausted, which then triggers xhci_ring_expansion()
and effectively leaks memory by infinitely growing the transfer ring.

This is commonly encounted through the use of a usb-serial port where
the port is repeatedly opened, read, then closed.

Move the num_trbs_free crediting out of the Set TR Dequeue Pointer
handling and into xhci_invalidate_cancelled_tds().

There is a potential for overestimating the actual space on the ring if
the ring is nearly full and TDs are arbitrarily enqueued by a device
driver while it is dequeueing them, but dequeues are usually batched
during device close/shutdown or endpoint error recovery.

See #5088

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
popcornmix pushed a commit that referenced this issue Jun 30, 2023
If a ring has a number of TDs enqueued past the dequeue pointer, and the
URBs corresponding to these TDs are dequeued, then num_trbs_free isn't
updated to show that these TDs have been converted to no-ops and
effectively "freed". This means that num_trbs_free creeps downwards
until the count is exhausted, which then triggers xhci_ring_expansion()
and effectively leaks memory by infinitely growing the transfer ring.

This is commonly encounted through the use of a usb-serial port where
the port is repeatedly opened, read, then closed.

Move the num_trbs_free crediting out of the Set TR Dequeue Pointer
handling and into xhci_invalidate_cancelled_tds().

There is a potential for overestimating the actual space on the ring if
the ring is nearly full and TDs are arbitrarily enqueued by a device
driver while it is dequeueing them, but dequeues are usually batched
during device close/shutdown or endpoint error recovery.

See #5088

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
popcornmix pushed a commit that referenced this issue Jul 3, 2023
If a ring has a number of TDs enqueued past the dequeue pointer, and the
URBs corresponding to these TDs are dequeued, then num_trbs_free isn't
updated to show that these TDs have been converted to no-ops and
effectively "freed". This means that num_trbs_free creeps downwards
until the count is exhausted, which then triggers xhci_ring_expansion()
and effectively leaks memory by infinitely growing the transfer ring.

This is commonly encounted through the use of a usb-serial port where
the port is repeatedly opened, read, then closed.

Move the num_trbs_free crediting out of the Set TR Dequeue Pointer
handling and into xhci_invalidate_cancelled_tds().

There is a potential for overestimating the actual space on the ring if
the ring is nearly full and TDs are arbitrarily enqueued by a device
driver while it is dequeueing them, but dequeues are usually batched
during device close/shutdown or endpoint error recovery.

See #5088

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
popcornmix pushed a commit that referenced this issue Jul 3, 2023
If a ring has a number of TDs enqueued past the dequeue pointer, and the
URBs corresponding to these TDs are dequeued, then num_trbs_free isn't
updated to show that these TDs have been converted to no-ops and
effectively "freed". This means that num_trbs_free creeps downwards
until the count is exhausted, which then triggers xhci_ring_expansion()
and effectively leaks memory by infinitely growing the transfer ring.

This is commonly encounted through the use of a usb-serial port where
the port is repeatedly opened, read, then closed.

Move the num_trbs_free crediting out of the Set TR Dequeue Pointer
handling and into xhci_invalidate_cancelled_tds().

There is a potential for overestimating the actual space on the ring if
the ring is nearly full and TDs are arbitrarily enqueued by a device
driver while it is dequeueing them, but dequeues are usually batched
during device close/shutdown or endpoint error recovery.

See #5088

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
popcornmix pushed a commit that referenced this issue Jul 10, 2023
If a ring has a number of TDs enqueued past the dequeue pointer, and the
URBs corresponding to these TDs are dequeued, then num_trbs_free isn't
updated to show that these TDs have been converted to no-ops and
effectively "freed". This means that num_trbs_free creeps downwards
until the count is exhausted, which then triggers xhci_ring_expansion()
and effectively leaks memory by infinitely growing the transfer ring.

This is commonly encounted through the use of a usb-serial port where
the port is repeatedly opened, read, then closed.

Move the num_trbs_free crediting out of the Set TR Dequeue Pointer
handling and into xhci_invalidate_cancelled_tds().

There is a potential for overestimating the actual space on the ring if
the ring is nearly full and TDs are arbitrarily enqueued by a device
driver while it is dequeueing them, but dequeues are usually batched
during device close/shutdown or endpoint error recovery.

See #5088

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
popcornmix pushed a commit that referenced this issue Jul 10, 2023
If a ring has a number of TDs enqueued past the dequeue pointer, and the
URBs corresponding to these TDs are dequeued, then num_trbs_free isn't
updated to show that these TDs have been converted to no-ops and
effectively "freed". This means that num_trbs_free creeps downwards
until the count is exhausted, which then triggers xhci_ring_expansion()
and effectively leaks memory by infinitely growing the transfer ring.

This is commonly encounted through the use of a usb-serial port where
the port is repeatedly opened, read, then closed.

Move the num_trbs_free crediting out of the Set TR Dequeue Pointer
handling and into xhci_invalidate_cancelled_tds().

There is a potential for overestimating the actual space on the ring if
the ring is nearly full and TDs are arbitrarily enqueued by a device
driver while it is dequeueing them, but dequeues are usually batched
during device close/shutdown or endpoint error recovery.

See #5088

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
popcornmix pushed a commit that referenced this issue Jul 14, 2023
If a ring has a number of TDs enqueued past the dequeue pointer, and the
URBs corresponding to these TDs are dequeued, then num_trbs_free isn't
updated to show that these TDs have been converted to no-ops and
effectively "freed". This means that num_trbs_free creeps downwards
until the count is exhausted, which then triggers xhci_ring_expansion()
and effectively leaks memory by infinitely growing the transfer ring.

This is commonly encounted through the use of a usb-serial port where
the port is repeatedly opened, read, then closed.

Move the num_trbs_free crediting out of the Set TR Dequeue Pointer
handling and into xhci_invalidate_cancelled_tds().

There is a potential for overestimating the actual space on the ring if
the ring is nearly full and TDs are arbitrarily enqueued by a device
driver while it is dequeueing them, but dequeues are usually batched
during device close/shutdown or endpoint error recovery.

See #5088

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
popcornmix pushed a commit that referenced this issue Jul 14, 2023
If a ring has a number of TDs enqueued past the dequeue pointer, and the
URBs corresponding to these TDs are dequeued, then num_trbs_free isn't
updated to show that these TDs have been converted to no-ops and
effectively "freed". This means that num_trbs_free creeps downwards
until the count is exhausted, which then triggers xhci_ring_expansion()
and effectively leaks memory by infinitely growing the transfer ring.

This is commonly encounted through the use of a usb-serial port where
the port is repeatedly opened, read, then closed.

Move the num_trbs_free crediting out of the Set TR Dequeue Pointer
handling and into xhci_invalidate_cancelled_tds().

There is a potential for overestimating the actual space on the ring if
the ring is nearly full and TDs are arbitrarily enqueued by a device
driver while it is dequeueing them, but dequeues are usually batched
during device close/shutdown or endpoint error recovery.

See #5088

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
jglathe pushed a commit to jglathe/linux_ms_dev_kit that referenced this issue Jul 17, 2023
If a ring has a number of TDs enqueued past the dequeue pointer, and the
URBs corresponding to these TDs are dequeued, then num_trbs_free isn't
updated to show that these TDs have been converted to no-ops and
effectively "freed". This means that num_trbs_free creeps downwards
until the count is exhausted, which then triggers xhci_ring_expansion()
and effectively leaks memory by infinitely growing the transfer ring.

This is commonly encounted through the use of a usb-serial port where
the port is repeatedly opened, read, then closed.

Move the num_trbs_free crediting out of the Set TR Dequeue Pointer
handling and into xhci_invalidate_cancelled_tds().

There is a potential for overestimating the actual space on the ring if
the ring is nearly full and TDs are arbitrarily enqueued by a device
driver while it is dequeueing them, but dequeues are usually batched
during device close/shutdown or endpoint error recovery.

See raspberrypi/linux#5088

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
unifreq pushed a commit to unifreq/linux-5.10.y-rk35xx that referenced this issue Jul 19, 2023
If a ring has a number of TDs enqueued past the dequeue pointer, and the
URBs corresponding to these TDs are dequeued, then num_trbs_free isn't
updated to show that these TDs have been converted to no-ops and
effectively "freed". This means that num_trbs_free creeps downwards
until the count is exhausted, which then triggers xhci_ring_expansion()
and effectively leaks memory by infinitely growing the transfer ring.

This is commonly encounted through the use of a usb-serial port where
the port is repeatedly opened, read, then closed.

Move the num_trbs_free crediting out of the Set TR Dequeue Pointer
handling and into xhci_invalidate_cancelled_tds().

There is a potential for overestimating the actual space on the ring if
the ring is nearly full and TDs are arbitrarily enqueued by a device
driver while it is dequeueing them, but dequeues are usually batched
during device close/shutdown or endpoint error recovery.

Link: raspberrypi/linux#5088
Change-Id: I858a32e6bcbb525cccff3a6d07fe77d2be67f5e7
Signed-off-by: William Wu <william.wu@rock-chips.com>
Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
popcornmix pushed a commit that referenced this issue Jul 20, 2023
If a ring has a number of TDs enqueued past the dequeue pointer, and the
URBs corresponding to these TDs are dequeued, then num_trbs_free isn't
updated to show that these TDs have been converted to no-ops and
effectively "freed". This means that num_trbs_free creeps downwards
until the count is exhausted, which then triggers xhci_ring_expansion()
and effectively leaks memory by infinitely growing the transfer ring.

This is commonly encounted through the use of a usb-serial port where
the port is repeatedly opened, read, then closed.

Move the num_trbs_free crediting out of the Set TR Dequeue Pointer
handling and into xhci_invalidate_cancelled_tds().

There is a potential for overestimating the actual space on the ring if
the ring is nearly full and TDs are arbitrarily enqueued by a device
driver while it is dequeueing them, but dequeues are usually batched
during device close/shutdown or endpoint error recovery.

See #5088

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
popcornmix pushed a commit that referenced this issue Jul 24, 2023
If a ring has a number of TDs enqueued past the dequeue pointer, and the
URBs corresponding to these TDs are dequeued, then num_trbs_free isn't
updated to show that these TDs have been converted to no-ops and
effectively "freed". This means that num_trbs_free creeps downwards
until the count is exhausted, which then triggers xhci_ring_expansion()
and effectively leaks memory by infinitely growing the transfer ring.

This is commonly encounted through the use of a usb-serial port where
the port is repeatedly opened, read, then closed.

Move the num_trbs_free crediting out of the Set TR Dequeue Pointer
handling and into xhci_invalidate_cancelled_tds().

There is a potential for overestimating the actual space on the ring if
the ring is nearly full and TDs are arbitrarily enqueued by a device
driver while it is dequeueing them, but dequeues are usually batched
during device close/shutdown or endpoint error recovery.

See #5088

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
popcornmix pushed a commit that referenced this issue Jul 27, 2023
If a ring has a number of TDs enqueued past the dequeue pointer, and the
URBs corresponding to these TDs are dequeued, then num_trbs_free isn't
updated to show that these TDs have been converted to no-ops and
effectively "freed". This means that num_trbs_free creeps downwards
until the count is exhausted, which then triggers xhci_ring_expansion()
and effectively leaks memory by infinitely growing the transfer ring.

This is commonly encounted through the use of a usb-serial port where
the port is repeatedly opened, read, then closed.

Move the num_trbs_free crediting out of the Set TR Dequeue Pointer
handling and into xhci_invalidate_cancelled_tds().

There is a potential for overestimating the actual space on the ring if
the ring is nearly full and TDs are arbitrarily enqueued by a device
driver while it is dequeueing them, but dequeues are usually batched
during device close/shutdown or endpoint error recovery.

See #5088

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
popcornmix pushed a commit that referenced this issue Aug 3, 2023
If a ring has a number of TDs enqueued past the dequeue pointer, and the
URBs corresponding to these TDs are dequeued, then num_trbs_free isn't
updated to show that these TDs have been converted to no-ops and
effectively "freed". This means that num_trbs_free creeps downwards
until the count is exhausted, which then triggers xhci_ring_expansion()
and effectively leaks memory by infinitely growing the transfer ring.

This is commonly encounted through the use of a usb-serial port where
the port is repeatedly opened, read, then closed.

Move the num_trbs_free crediting out of the Set TR Dequeue Pointer
handling and into xhci_invalidate_cancelled_tds().

There is a potential for overestimating the actual space on the ring if
the ring is nearly full and TDs are arbitrarily enqueued by a device
driver while it is dequeueing them, but dequeues are usually batched
during device close/shutdown or endpoint error recovery.

See #5088

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
popcornmix pushed a commit that referenced this issue Aug 9, 2023
If a ring has a number of TDs enqueued past the dequeue pointer, and the
URBs corresponding to these TDs are dequeued, then num_trbs_free isn't
updated to show that these TDs have been converted to no-ops and
effectively "freed". This means that num_trbs_free creeps downwards
until the count is exhausted, which then triggers xhci_ring_expansion()
and effectively leaks memory by infinitely growing the transfer ring.

This is commonly encounted through the use of a usb-serial port where
the port is repeatedly opened, read, then closed.

Move the num_trbs_free crediting out of the Set TR Dequeue Pointer
handling and into xhci_invalidate_cancelled_tds().

There is a potential for overestimating the actual space on the ring if
the ring is nearly full and TDs are arbitrarily enqueued by a device
driver while it is dequeueing them, but dequeues are usually batched
during device close/shutdown or endpoint error recovery.

See #5088

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
popcornmix pushed a commit that referenced this issue Aug 17, 2023
If a ring has a number of TDs enqueued past the dequeue pointer, and the
URBs corresponding to these TDs are dequeued, then num_trbs_free isn't
updated to show that these TDs have been converted to no-ops and
effectively "freed". This means that num_trbs_free creeps downwards
until the count is exhausted, which then triggers xhci_ring_expansion()
and effectively leaks memory by infinitely growing the transfer ring.

This is commonly encounted through the use of a usb-serial port where
the port is repeatedly opened, read, then closed.

Move the num_trbs_free crediting out of the Set TR Dequeue Pointer
handling and into xhci_invalidate_cancelled_tds().

There is a potential for overestimating the actual space on the ring if
the ring is nearly full and TDs are arbitrarily enqueued by a device
driver while it is dequeueing them, but dequeues are usually batched
during device close/shutdown or endpoint error recovery.

See #5088

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
popcornmix pushed a commit that referenced this issue Sep 5, 2023
If a ring has a number of TDs enqueued past the dequeue pointer, and the
URBs corresponding to these TDs are dequeued, then num_trbs_free isn't
updated to show that these TDs have been converted to no-ops and
effectively "freed". This means that num_trbs_free creeps downwards
until the count is exhausted, which then triggers xhci_ring_expansion()
and effectively leaks memory by infinitely growing the transfer ring.

This is commonly encounted through the use of a usb-serial port where
the port is repeatedly opened, read, then closed.

Move the num_trbs_free crediting out of the Set TR Dequeue Pointer
handling and into xhci_invalidate_cancelled_tds().

There is a potential for overestimating the actual space on the ring if
the ring is nearly full and TDs are arbitrarily enqueued by a device
driver while it is dequeueing them, but dequeues are usually batched
during device close/shutdown or endpoint error recovery.

See #5088

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
popcornmix pushed a commit that referenced this issue Sep 8, 2023
If a ring has a number of TDs enqueued past the dequeue pointer, and the
URBs corresponding to these TDs are dequeued, then num_trbs_free isn't
updated to show that these TDs have been converted to no-ops and
effectively "freed". This means that num_trbs_free creeps downwards
until the count is exhausted, which then triggers xhci_ring_expansion()
and effectively leaks memory by infinitely growing the transfer ring.

This is commonly encounted through the use of a usb-serial port where
the port is repeatedly opened, read, then closed.

Move the num_trbs_free crediting out of the Set TR Dequeue Pointer
handling and into xhci_invalidate_cancelled_tds().

There is a potential for overestimating the actual space on the ring if
the ring is nearly full and TDs are arbitrarily enqueued by a device
driver while it is dequeueing them, but dequeues are usually batched
during device close/shutdown or endpoint error recovery.

See #5088

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
popcornmix pushed a commit that referenced this issue Sep 13, 2023
If a ring has a number of TDs enqueued past the dequeue pointer, and the
URBs corresponding to these TDs are dequeued, then num_trbs_free isn't
updated to show that these TDs have been converted to no-ops and
effectively "freed". This means that num_trbs_free creeps downwards
until the count is exhausted, which then triggers xhci_ring_expansion()
and effectively leaks memory by infinitely growing the transfer ring.

This is commonly encounted through the use of a usb-serial port where
the port is repeatedly opened, read, then closed.

Move the num_trbs_free crediting out of the Set TR Dequeue Pointer
handling and into xhci_invalidate_cancelled_tds().

There is a potential for overestimating the actual space on the ring if
the ring is nearly full and TDs are arbitrarily enqueued by a device
driver while it is dequeueing them, but dequeues are usually batched
during device close/shutdown or endpoint error recovery.

See #5088

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
jai-raptee pushed a commit to jai-raptee/iliteck1 that referenced this issue Apr 30, 2024
If a ring has a number of TDs enqueued past the dequeue pointer, and the
URBs corresponding to these TDs are dequeued, then num_trbs_free isn't
updated to show that these TDs have been converted to no-ops and
effectively "freed". This means that num_trbs_free creeps downwards
until the count is exhausted, which then triggers xhci_ring_expansion()
and effectively leaks memory by infinitely growing the transfer ring.

This is commonly encounted through the use of a usb-serial port where
the port is repeatedly opened, read, then closed.

Move the num_trbs_free crediting out of the Set TR Dequeue Pointer
handling and into xhci_invalidate_cancelled_tds().

There is a potential for overestimating the actual space on the ring if
the ring is nearly full and TDs are arbitrarily enqueued by a device
driver while it is dequeueing them, but dequeues are usually batched
during device close/shutdown or endpoint error recovery.

See raspberrypi/linux#5088

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants