New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kernel errors with mainline coreboot: INVALID_DEVICE_REQUEST and IO_PAGE_FAULT #240
Comments
|
Have a look at this issue which have same symptoms as Yours : https://github.com/pcengines/coreboot/issues/216 You might also disable iommu via kernel cmdline by adding amd_iommu=off |
|
Thanks, I missed that issue. Interesting that a kernel update might resolve this. But upgrading to Debian Testing is not an option for me right now, so I'll stick to not using IOMMU for now (I don't have a need for it anyway). I guess this can be closed and marked as a duplicate then. Duplicate of pcengines/coreboot#216 |
|
Meanwhile I upgraded my machine to Debian Buster and coreboot v4.10.0.0. But despite what was suggested in issue pcengines/coreboot#216, this problem is not resolved by the newer kernel 4.19. If I leave IOMMU enabled, I see these page faults again: And when I disable IOMMU via the kernel command line (amd_iommu=off), I get stack traces as described in issue pcengines/coreboot#323 - despite having the ath9k module option "use_msi=1" set. So, I guess this issue is not fully resolved, at least with my particular setup which comprises of both a Compex WLE600VX and a WLE200NX card. Any ideas how to properly resolve this? For now I reverted to legacy firmware where none of the issues occur. |
|
Since v4.11.0.2 the IOMMU is runtime configurable with default state disabled. The IOMMu still needs some work in the firmware and certain modules have to be handled differently. I have started some work on that issues. |
|
Yeah, I saw that mentioned in the release announcement. Just didn't have the time to give it a try up until now. I upgraded to 4.11.0.2 now and it seems to work. I'm a bit curious to see whether the stack traces mentioned in pcengines/coreboot#323 will start to show up again, but so far so good. Or is there a difference between the IOMMU turned off in the firmware vs. using the kernel command line option amd_iommu=off? Btw. kudos to making the option default to disabled. That saved me some time to hook up a serial cable, etc. |
|
@silentcreek yes there is a difference. By disabling the IOMMU in the BIOS we do not expose any information about the IOMMU to OS and disable the device itself. When |
|
Also, I have been digging into the IOMMU recently and saw some misconfiguration. See my comment: https://github.com/pcengines/coreboot/issues/200#issuecomment-573575702 For sure we will make some updates soon. |
|
I see. Well, it seems to run fine now with the IOMMU disabled in the firmware. So, from my side this issue can be closed now unless you'd like to keep it open and test it again with IOMMU enabled when the configuration issues have been sorted out. Thanks! |
|
I think this kind of output originates from the WLE200NX NIC. By default it uses legacy IRQs, which is a mode not supported when using the IOMMU. So I would assume, that these INVALID_DEVICE_REQUEST messages are seen, because interrupt remapping is not supported by WLE200NX in legacy IRQ mode. You can try to enable message signaled interrupts with a module parameter. |
|
Thanks, I just tried enabling MSI for the ath9k module again. And while I get no errors during boot (with IOMMU disabled in the firmware) and hostapd starts up fine, my SSIDs are not broadcasted at all. None of my clients see the AP anymore. So, this doesn't work for me. And if I look through the discussion leading up to MSI support for ath9k landing in the kernel, then I doubt this has been tested much and may work for some people, but certainly not generally - see https://patchwork.kernel.org/patch/9999249/ Has anyone here with a WLE200NX card tested AP functionality with use_msi=1 and had more luck than me? |
|
@silentcreek our validation can add this test in this or next release. @miczyg1 @artur-rs cc |
|
@silentcreek I've experienced such issues with "Core Performance Boost" enabled. My Wi-Fi was very unstable and got latency spikes every now and then. When using Qualcomm/Atheros Wi-Fi NICS on my APU2C4 boards, I therefore always disable the "Core Performance Boost" feature. Intel Wi-Fi NICs experienced no performance degradation with "Core Performance Boost" enabled. So you may test again, while this particular feature is disabled. |
|
@pietrushnic That would be nice, of course. But before you go at lengths to implement a new test, I would first test it manually to see whether the WLE200NX cards works with MSI at all. Or at least wait until somebody confirms it works for them. @thillux I tested it again today with CPB disabled. The result is the same as before. The systems boots without errors, hostapd starts up fine without complaining, but the SSIDs are not broadcasted or not seen by any client. Have you tested the WLE200NX card and got it working with MSI? At this point, I suspect, it simply doesn't work with MSI, at least on this platform. When I look at /proc/interrupts, there seems to be no activity with MSI enabled. After booting, the interrupt counter for ath9k stays at a very low number (~20-30). In normal operation with MSI disabled, I can see thousands of interrupts within minutes after booting. Now, about CPB and my two Qualcomm Atheros wireless cards: I have been using the legacy firmware releases with CPB enabled for about 9 months now and never experienced any issues. It was only with the mainline releases and the IOMMU enabled or now trying to use MSI with the ath9k driver that I saw errors or lack of functionality. So, I don't see any reason to disable CPB at this point. With the IOMMU disabled in mainline coreboot (or with the legacy releases that don't support IOMMU), everything just works. I do use one non-default setting related to the CPU frequency though. I set the minimum frequency for the cpufreq ondemand governor to 800MHz, so it doesn't scale down as much as it would by default. I'm not sure though, whether that would be enough of a difference that could explain why it just works for me but doesn't for you. |
|
@pietrushnic Oh, I just realized you might have been talking about something else. In case you didn't mean my issue with the use_msi option in the ath9k driver, but rather my original issue, you may as well ignore my previous comment ;) |
|
@silentcreek it is our job to introduce such a test into regression if it makes sense, so your comment is valid IMO. Thank you for contributing to this issue and supporting PC Engines open-source firmware effort. |
Hi,
with 4.9.0.3 I see lots of kernel errors that seem to be related to IOMMU that I haven't seen before:
These messages go on an on. Now, I might well be the case that these errors are not limited to version 4.9.0.3 and would also occur on earlier versions. I recently added wireless mini PCIe cards to my device and I haven't used them with earlier mainline releases, so I can't tell. It might also be the case that these errors stem from this issue: https://github.com/pcengines/coreboot/issues/200
What I can tell for sure though, is that I don't see these errors on 4.0.25.
The text was updated successfully, but these errors were encountered: