Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvme-pci: Add quirk for disabling broken ASPM L0s/L1 implementations … #6229

Closed

Conversation

mikegapinski
Copy link
Contributor

@mikegapinski mikegapinski commented Jun 18, 2024

…on Phison E13 controller (1987:5013)

Drives with the Phison E13 tend to throw PCIe ASPM errors and they can't be used as boot drives unless ASPM L0s/L1 is disabled with the pciex1-compat-pi5 overlay.

This quirk applies the required configuration based on NVMe vendor and device identifier. This change makes those drives compatible with a clean install of Raspberry Pi OS.

ASPM errors on Phison E13 controllers show up in PCIe GEN 3 on PIP boards with poor layout and no impedance control on the ribbon cable. They are also present in PCIe GEN 2 when the drive is used behind a packet switch.

Implementation is based on the Hailo PCIe driver that also disables ASPM LOs/L1 when the device is initialized.

List of affected drives:

  • Pineboards Pinedrive 2280 [SSD98-256CG-PB]
  • Pineboards Pinedrive 2242 [SSD94-2563CG-PB]
  • Patriot M.2 P300
  • Apacer AS2280P4 256GB [AP256GAS2280P4-1] – Crucial P2 1000 [CT1000P2SSD8]
  • DEXP L3 PCIe Gen3x4 [EPA256GNLNXE30CD-DRL3]
  • Gigabyte GP-GSM2NE256GNTD 256GB
  • GIGABYTE NVMe SSD [GP-GSM2NE3256GNTD]
  • Kingston NV1 500GB SNVS-500G
  • MSI SPATIUM M370 1TB
  • MSI SPATIUM M371 1TB
  • Patriot P300 256GB
  • Seagate Barracuda Q5 500GB ZP500CV3A001
  • Smartbuy Stream E13T [SBSSD-512GT-PH13T-M2P4]
  • Smartbuy Stream E13T Pro [SBSSD-128GT-PH13P-M2P4] – Team Group MP33 256GB [TM8FP6256G0C101]
  • SSD OWC Aura P13 Pro 1TB M.2 2242 PCI-E x4 Gen3.1 NVMe [OWCS3DN3P3T10]

…on Phison E13 controller (1987:5013)

Drives with the Phison E13 tend to throw PCIe ASPM errors and they
can't be used as boot drives unless ASPM L0s/L1 is disabled with
the pciex1-compat-pi5 overlay.

This quirk applies the required configuration based on NVMe vendor
and device identifier. This change makes those drives compatible
with a clean install of Raspberry Pi OS.

ASPM errors on Phison E13 controllers show up in PCIe GEN 3 on PIP
boards with poor layout and no impedance control on the ribbon cable.
They are also present in PCIe GEN 2 when the drive is used behind a
packet switch.

Implementation is based on the Hailo PCIe driver that also disables
ASPM LOs/L1 when the device is initialized.

List of affected drives:

- Pineboards Pinedrive 2280 [SSD98-256CG-PB]
- Pineboards Pinedrive 2242 [SSD94-2563CG-PB]
- Patriot M.2 P300
- Apacer AS2280P4 256GB [AP256GAS2280P4-1]
– Crucial P2 1000 [CT1000P2SSD8]
– DEXP L3 PCIe Gen3x4 [EPA256GNLNXE30CD-DRL3]
– Gigabyte GP-GSM2NE256GNTD 256GB
– GIGABYTE NVMe SSD [GP-GSM2NE3256GNTD]
– Kingston NV1 500GB SNVS-500G
– MSI SPATIUM M370 1TB
– MSI SPATIUM M371 1TB
– Patriot P300 256GB
– Seagate Barracuda Q5 500GB ZP500CV3A001
– Smartbuy Stream E13T [SBSSD-512GT-PH13T-M2P4]
– Smartbuy Stream E13T Pro [SBSSD-128GT-PH13P-M2P4]
– Team Group MP33 256GB [TM8FP6256G0C101]
- SSD OWC Aura P13 Pro 1TB M.2 2242 PCI-E x4 Gen3.1 NVMe [OWCS3DN3P3T10]
@mikegapinski
Copy link
Contributor Author

@peterharperuk @timg236 This PR solves the ASPM errors on Phison E13 discovered in raspberrypi/firmware#1833

I also reproduced this problem on a rig without a switch in Gen 3 mode with a low quality ribbon cable. It doesn't render the drive useless in this config, but same dmesg problems show up less frequently.

I can also test other drives we have, I've seen reports of ASPM errors on Lexar drives with Maxio controllers.

The pciex1-compat-pi5 overlay is nice for debugging those problems, but since we know the affected NVMe disk ids the fix should be applied without forcing users to play with device tree overlays. Similar quirks exist for other PCIe features (MSI-X etc) in the NVMe driver. I've seen other vendors disabling ASPM similarly: https://github.com/hailo-ai/hailort-drivers/blob/43e8a9a575654c773edf95789f93cde40c708a6a/linux/pcie/src/pcie.c#L383

Merging this will improve compatibility with NVMe on the Pi 5 and reduce the number of support requests significantly on our end. I don't think it should go upstream - I couldn't reproduce this problem on X86_64 and other aarch64 platforms with different packet switches.

@P33M
Copy link
Contributor

P33M commented Jun 18, 2024

This patch affects unmodified upstream files. It should be submitted to linux-nvme@lists.infradead.org for discussion with maintainers.

I'm against including this downstream as-is, because maintainers may want it done a different way.

@popcornmix
Copy link
Collaborator

Agreed - quirks are particularly annoying to maintain downstream as they always produce conflicts when upstream adds a new quirk (very likely using the bit position you've chosen).

In fact it looks like the bit 21 chosen for NVME_QUIRK_BROKEN_ASPM_L0S is conflicting with NVME_QUIRK_BROKEN_MSI (quite likely due to a conflict when rebasing).

@mikegapinski
Copy link
Contributor Author

@P33M I can do that, but I don't see any sense in merging this upstream since this is a Pi5/bcm2712 specific thing. I'll fix the bit for L0s submit it and see what happens.

Would moving this to drivers/pci/controller/pcie-brcmstb.c work for you? It'll also enable other types of devices that need pciex1-compat-pi5 to have the quirks enabled based on device id. Same solution, different approach. I am trying to make things easier for the community - if the device is known to need a quirk asking users to enable a dtoverlay every time is not exactly efficient.

@P33M
Copy link
Contributor

P33M commented Jun 18, 2024

The root complex driver should be agnostic as to what the connected endpoint hardware is (and not doing this actually introduced a bug with existing setups on CM4). Devicetree is the way to tell it to go do something different.

@mikegapinski
Copy link
Contributor Author

Closing this until a better solution comes to my mind, most of issues like this can be solved in firmware of the endpoint device. Not ideal, since people can't often update them but we got the solution from Phison/ASMedia that fixed the L0/L1 ASPM issues on the Pi 5 for SSD98-256CG-PB and SSD94-2563CG-PB NVMe SSDs paired with ASM1182/4e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants