Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix irq source-id error #2

Open
quo opened this issue May 7, 2022 · 4 comments
Open

Fix irq source-id error #2

quo opened this issue May 7, 2022 · 4 comments

Comments

@quo
Copy link
Owner

quo commented May 7, 2022

The IOMMU gives the following error when trying to use the irq:

DMAR: DRHD: handling fault status reg 2
DMAR: [INTR-REMAP] Request device [01:05.0] fault index 0x2f [fault reason 0x26] Blocked an interrupt request due to source-id verification failure

No clue how to fix this. May be an ACPI bug?

@quo
Copy link
Owner Author

quo commented May 26, 2022

It says 01:05.0, but the actual address of the THC is 00:10.6. I guess that's the problem? But I don't really know how any of this IOMMU stuff works. According to lspci there isn't even anything at 01:05.0...

Should probably compile a kernel with INTEL_IOMMU_DEBUGFS to get some more info.

@quo
Copy link
Owner Author

quo commented Jul 15, 2022

The source/request ID check is set up in the kernel by set_msi_sid(). This checks for DMA alias ids with pci_for_each_dma_alias(). As far as I can tell, the only ways you can have aliases are:

  1. If the device is on a VMD bus (via pci_real_dma_dev()), which I don't think it is,
  2. If an alias was set up with pci_add_dma_alias(), but this can only create aliases on the same bus, or
  3. If the device is behind a PCI bridge, which I also don't think it is.

So since there are no aliases, set_msi_sid() creates a strict check for the 00:10.6 sid. And then for some reason the interrupt has request id 01:05.0.

We can't use a quirk to add an alias since the bus number differs. We can add a hack to set_msi_sid() to disable id checking for the THC device only (instead of just disabling it for all devices with intremap=nosid). We could also leave the check enabled, but hardcode the sid to 01:05.0 for THC, but since I don't understand where that value comes from, I don't know if the value might change.

Patch to disable the sid check for ITHC device only:

diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index a67319597884..9f9322a17810 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -396,6 +396,22 @@ static int set_msi_sid(struct irte *irte, struct pci_dev *dev)
 	data.busmatch_count = 0;
 	pci_for_each_dma_alias(dev, set_msi_sid_cb, &data);
 
+	/*
+	 * The Intel Touch Host Controller is at 00:10.6, but for some reason
+	 * the MSI interrupts have request id 01:05.0.
+	 * Disable id verification to work around this.
+	 * FIXME Find proper fix or turn this into a quirk.
+	 */
+	if (dev->vendor == PCI_VENDOR_ID_INTEL && (dev->class >> 8) == PCI_CLASS_INPUT_PEN) {
+		switch(dev->device) {
+		case 0x98d0: case 0x98d1: // LKF
+		case 0xa0d0: case 0xa0d1: // TGL LP
+		case 0x43d0: case 0x43d1: // TGL H
+			set_irte_sid(irte, SVT_NO_VERIFY, SQ_ALL_16, 0);
+			return 0;
+		}
+	}
+
 	/*
 	 * DMA alias provides us with a PCI device and alias.  The only case
 	 * where the it will return an alias on a different bus than the

@Headcrabed
Copy link

Any news about this? Firmware seems to have upgraded since then, did acpi table changed?

@quo
Copy link
Owner Author

quo commented Nov 19, 2022

It looks like nosid is maybe no longer required for the new Alder Lake devices, so I suspect it was a hardware bug on Tiger Lake.

I don't know if something could be done on the ACPI side to fix/workaround the problem (I don't really know how ACPI interacts with the iommu).

I believe the above patch will be added to the Surface kernel, which will remove the need for using nosid with that kernel at least.

I think a proper fix will involve adding support to the kernel for DMA aliases with different bus numbers, then I could add an alias in the ithc driver. But someone who really understands how PCI MSI and the Intel iommu work should have a look at this.

Edit: I could also add some code to detect if the irq is working, and automatically switch to polling mode if it isn't. Not optimal, but maybe the easiest fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants