Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel errors with mainline coreboot: INVALID_DEVICE_REQUEST and IO_PAGE_FAULT #240

Open
silentcreek opened this issue Mar 20, 2019 · 15 comments
Assignees

Comments

@silentcreek
Copy link

Hi,

with 4.9.0.3 I see lots of kernel errors that seem to be related to IOMMU that I haven't seen before:

kernel: [   13.134460] AMD-Vi: Event logged [
kernel: [   13.137925] INVALID_DEVICE_REQUEST device=00:00.1 address=0x000000fdf8000020 flags=0x0a00]
kernel: [   11.773399] AMD-Vi: Event logged [
kernel: [   11.776863] INVALID_DEVICE_REQUEST device=00:00.1 address=0x000000fdf8000020 flags=0x0a00]
kernel: [  221.562629] AMD-Vi: Event logged [
kernel: [  221.565929] IO_PAGE_FAULT device=05:00.0 domain=0x0012 address=0x00000000ff6101b0 flags=0x0070]
kernel: [  221.565938] AMD-Vi: Event logged [
kernel: [  221.569235] IO_PAGE_FAULT device=05:00.0 domain=0x0012 address=0x00000000ff6101c0 flags=0x0070]
kernel: [  221.569244] AMD-Vi: Event logged [
kernel: [  221.572523] IO_PAGE_FAULT device=05:00.0 domain=0x0012 address=0x00000000ff610200 flags=0x0070]
kernel: [  221.572531] AMD-Vi: Event logged [
kernel: [  221.575783] IO_PAGE_FAULT device=05:00.0 domain=0x0012 address=0x00000000ff610240 flags=0x0070]
kernel: [  221.575791] AMD-Vi: Event logged [
kernel: [  221.579072] IO_PAGE_FAULT device=05:00.0 domain=0x0012 address=0x00000000ff610280 flags=0x0070]
kernel: [  221.579080] AMD-Vi: Event logged [
kernel: [  221.582330] IO_PAGE_FAULT device=05:00.0 domain=0x0012 address=0x00000000ff6102c0 flags=0x0070]
kernel: [  221.582339] AMD-Vi: Event logged [
kernel: [  221.585637] IO_PAGE_FAULT device=05:00.0 domain=0x0012 address=0x00000000ff610300 flags=0x0070]
kernel: [  221.585646] AMD-Vi: Event logged [
kernel: [  221.589154] IO_PAGE_FAULT device=05:00.0 domain=0x0012 address=0x00000000ff610340 flags=0x0070]
kernel: [  221.589164] AMD-Vi: Event logged [
kernel: [  221.592445] IO_PAGE_FAULT device=05:00.0 domain=0x0012 address=0x00000000ff610360 flags=0x0070]
kernel: [  221.592455] AMD-Vi: Event logged [
kernel: [  221.595707] IO_PAGE_FAULT device=05:00.0 domain=0x0012 address=0x00000000ff6101b0 flags=0x0070]
kernel: [  221.595715] AMD-Vi: Event logged [
kernel: [  221.599131] IO_PAGE_FAULT device=05:00.0 domain=0x0012 address=0x00000000ff6101c0 flags=0x0070]
kernel: [  221.599139] AMD-Vi: Event logged [
kernel: [  221.602398] IO_PAGE_FAULT device=05:00.0 domain=0x0012 address=0x00000000ff610200 flags=0x0070]
kernel: [  221.602407] AMD-Vi: Event logged [
kernel: [  221.605697] IO_PAGE_FAULT device=05:00.0 domain=0x0012 address=0x00000000ff610240 flags=0x0070]
kernel: [  221.605705] AMD-Vi: Event logged [
kernel: [  221.608941] IO_PAGE_FAULT device=05:00.0 domain=0x0012 address=0x00000000ff6102c0 flags=0x0070]
kernel: [  221.608949] AMD-Vi: Event logged [
kernel: [  221.612198] IO_PAGE_FAULT device=05:00.0 domain=0x0012 address=0x00000000ff610280 flags=0x0070]

These messages go on an on. Now, I might well be the case that these errors are not limited to version 4.9.0.3 and would also occur on earlier versions. I recently added wireless mini PCIe cards to my device and I haven't used them with earlier mainline releases, so I can't tell. It might also be the case that these errors stem from this issue: https://github.com/pcengines/coreboot/issues/200

What I can tell for sure though, is that I don't see these errors on 4.0.25.

@miczyg1
Copy link
Member

miczyg1 commented Mar 20, 2019

Have a look at this issue which have same symptoms as Yours : https://github.com/pcengines/coreboot/issues/216

You might also disable iommu via kernel cmdline by adding amd_iommu=off

@silentcreek
Copy link
Author

Thanks, I missed that issue. Interesting that a kernel update might resolve this. But upgrading to Debian Testing is not an option for me right now, so I'll stick to not using IOMMU for now (I don't have a need for it anyway).

I guess this can be closed and marked as a duplicate then.

Duplicate of pcengines/coreboot#216

@silentcreek
Copy link
Author

Meanwhile I upgraded my machine to Debian Buster and coreboot v4.10.0.0. But despite what was suggested in issue pcengines/coreboot#216, this problem is not resolved by the newer kernel 4.19. If I leave IOMMU enabled, I see these page faults again:
kernel: [ 99.558448] ath10k_pci 0000:05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fffab950 flags=0x0070]

And when I disable IOMMU via the kernel command line (amd_iommu=off), I get stack traces as described in issue pcengines/coreboot#323 - despite having the ath9k module option "use_msi=1" set.

So, I guess this issue is not fully resolved, at least with my particular setup which comprises of both a Compex WLE600VX and a WLE200NX card.

Any ideas how to properly resolve this? For now I reverted to legacy firmware where none of the issues occur.

@silentcreek silentcreek reopened this Aug 24, 2019
@silentcreek silentcreek changed the title Kernel errors with 4.9.0.3: INVALID_DEVICE_REQUEST and IO_PAGE_FAULT Kernel errors with mainline coreboot: INVALID_DEVICE_REQUEST and IO_PAGE_FAULT Aug 24, 2019
@miczyg1
Copy link
Member

miczyg1 commented Jan 7, 2020

Since v4.11.0.2 the IOMMU is runtime configurable with default state disabled.

The IOMMu still needs some work in the firmware and certain modules have to be handled differently. I have started some work on that issues.

@silentcreek
Copy link
Author

Yeah, I saw that mentioned in the release announcement. Just didn't have the time to give it a try up until now. I upgraded to 4.11.0.2 now and it seems to work. I'm a bit curious to see whether the stack traces mentioned in pcengines/coreboot#323 will start to show up again, but so far so good.

Or is there a difference between the IOMMU turned off in the firmware vs. using the kernel command line option amd_iommu=off?

Btw. kudos to making the option default to disabled. That saved me some time to hook up a serial cable, etc.

@miczyg1
Copy link
Member

miczyg1 commented Jan 13, 2020

@silentcreek yes there is a difference. By disabling the IOMMU in the BIOS we do not expose any information about the IOMMU to OS and disable the device itself. When amd_iommu=off is in action, the IOMMU device is still enabled by BIOS (in case of earlier versions) and the information about IOMMU (memory address, handled devices, etc.) is still in place which may cause those errors.

@miczyg1
Copy link
Member

miczyg1 commented Jan 13, 2020

Also, I have been digging into the IOMMU recently and saw some misconfiguration. See my comment: https://github.com/pcengines/coreboot/issues/200#issuecomment-573575702

For sure we will make some updates soon.

@silentcreek
Copy link
Author

I see. Well, it seems to run fine now with the IOMMU disabled in the firmware. So, from my side this issue can be closed now unless you'd like to keep it open and test it again with IOMMU enabled when the configuration issues have been sorted out. Thanks!

@thillux
Copy link

thillux commented Jan 18, 2020

I think this kind of output originates from the WLE200NX NIC. By default it uses legacy IRQs, which is a mode not supported when using the IOMMU. So I would assume, that these INVALID_DEVICE_REQUEST messages are seen, because interrupt remapping is not supported by WLE200NX in legacy IRQ mode. You can try to enable message signaled interrupts with a module parameter.

@silentcreek
Copy link
Author

Thanks, I just tried enabling MSI for the ath9k module again. And while I get no errors during boot (with IOMMU disabled in the firmware) and hostapd starts up fine, my SSIDs are not broadcasted at all. None of my clients see the AP anymore. So, this doesn't work for me.

And if I look through the discussion leading up to MSI support for ath9k landing in the kernel, then I doubt this has been tested much and may work for some people, but certainly not generally - see https://patchwork.kernel.org/patch/9999249/

Has anyone here with a WLE200NX card tested AP functionality with use_msi=1 and had more luck than me?

@pietrushnic
Copy link
Member

@silentcreek our validation can add this test in this or next release. @miczyg1 @artur-rs cc

@thillux
Copy link

thillux commented Jan 19, 2020

@silentcreek I've experienced such issues with "Core Performance Boost" enabled. My Wi-Fi was very unstable and got latency spikes every now and then. When using Qualcomm/Atheros Wi-Fi NICS on my APU2C4 boards, I therefore always disable the "Core Performance Boost" feature. Intel Wi-Fi NICs experienced no performance degradation with "Core Performance Boost" enabled. So you may test again, while this particular feature is disabled.

@silentcreek
Copy link
Author

@pietrushnic That would be nice, of course. But before you go at lengths to implement a new test, I would first test it manually to see whether the WLE200NX cards works with MSI at all. Or at least wait until somebody confirms it works for them.

@thillux I tested it again today with CPB disabled. The result is the same as before. The systems boots without errors, hostapd starts up fine without complaining, but the SSIDs are not broadcasted or not seen by any client. Have you tested the WLE200NX card and got it working with MSI? At this point, I suspect, it simply doesn't work with MSI, at least on this platform. When I look at /proc/interrupts, there seems to be no activity with MSI enabled. After booting, the interrupt counter for ath9k stays at a very low number (~20-30). In normal operation with MSI disabled, I can see thousands of interrupts within minutes after booting.

Now, about CPB and my two Qualcomm Atheros wireless cards: I have been using the legacy firmware releases with CPB enabled for about 9 months now and never experienced any issues. It was only with the mainline releases and the IOMMU enabled or now trying to use MSI with the ath9k driver that I saw errors or lack of functionality. So, I don't see any reason to disable CPB at this point. With the IOMMU disabled in mainline coreboot (or with the legacy releases that don't support IOMMU), everything just works. I do use one non-default setting related to the CPU frequency though. I set the minimum frequency for the cpufreq ondemand governor to 800MHz, so it doesn't scale down as much as it would by default. I'm not sure though, whether that would be enough of a difference that could explain why it just works for me but doesn't for you.

@silentcreek
Copy link
Author

@pietrushnic Oh, I just realized you might have been talking about something else. In case you didn't mean my issue with the use_msi option in the ath9k driver, but rather my original issue, you may as well ignore my previous comment ;)

@pietrushnic
Copy link
Member

@silentcreek it is our job to introduce such a test into regression if it makes sense, so your comment is valid IMO. Thank you for contributing to this issue and supporting PC Engines open-source firmware effort.

@damiankaras damiankaras reopened this Oct 18, 2021
@damiankaras damiankaras transferred this issue from pcengines/coreboot Oct 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants