Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[APU2] Kernel error: gpio-keys-polled gpio-keys-polled: unable to claim gpio 0, err=-517 #204

Open
silentcreek opened this issue Oct 2, 2021 · 32 comments
Assignees

Comments

@silentcreek
Copy link

Hi,

after upgrading my APU2 from Debian 10 (Kernel 4.19) to Debian 11 (Kernel 5.10) I'm getting a couple of these error messages at boot time:
kernel: gpio-keys-polled gpio-keys-polled: unable to claim gpio 0, err=-517

I'm also noticing that the first LED stays off which was not the case before.

I found a forum discussion [1] that suggests that this issue should be fixed in Linux 5.3 and that this is likely a module ordering issue. The former can't be true since my kernel is newer than 5.3, obviously. And as for the latter: I tried explicitely loading the mentioned modules leds-gpio and gpio_amd_fch in different orders by adding a cofiguration file in /etc/modules-load.d/ but that didn't work.

I also tried upgrading to the latest coreboot version v4.14.0.4 (was on some 4.12.x release before), but that didn't help either.

Does anyone know how to fix this? Thanks!

Timo

[1] http://pcengines.spruz.com/forums/?page=post&id=F1D75D99-D369-4310-B098-2F222A3CDBAC&fid=1A77794F-FF7D-44CA-AF64-CAA2588102ED

@miczyg1
Copy link
Member

miczyg1 commented Oct 4, 2021

HI, we have just recently noticed the issue. I doubt it is a module ordering issue, because the ACPI GPIOs definitions used by gpio-keys-polled come from the GPIO controller which binds to gpio_amd_fch, so one shouldn't work without the other. THe real problem lies within the Linux kernel itself... There is a separate driver added to the kernel for apus: https://github.com/torvalds/linux/blob/master/drivers/platform/x86/pcengines-apuv2.c
So basically the error indicates that these GPIOs have been claimed by the driver already. I think we have no other choice but to remove the ACPI bindings for the GPIOs

@miczyg1 miczyg1 self-assigned this Oct 4, 2021
@silentcreek
Copy link
Author

Okay, thanks. I think I understand now... So, we have the modules pcengines_apuv2 and leds_gpio (or gpio_keys_polled?) fighting over the gpio control of these leds. That means as a workaround, we could simply blacklist pcengines_apuv2 until a better fix lands either in the kernel or firmware. I actually just tried that (blacklisting pcengins_apuv2) and the error messages are gone now, while the leds are still operational (and the first led is now on again, once the APU2 is booted). So, that seems to work. Is there any crucial functionality missing if I blacklist this module? I'm guessing no, since the module wasn't available on my older Debian 10 kernel anyway and looking at the code you linked, the only other thing that seems interesting to me is the front button, but I have never used it before either, so I guess I could sacrifice it for a temporary workaround.

@miczyg1
Copy link
Member

miczyg1 commented Oct 5, 2021

There is no difference between ACPI definitions of the buttons and leds and the kernel driver. Bot support the leds and front button in the same manner. However:

  • the ACPI is better because it will expose the leds and buttons specific to the platform, so the kernel does not need to know which GPIOs on which apu variant is used, e.g. apu5 can be also supported in such way
  • kernel driver will not support apu5 because it lacks GPIO definitions for it, but it will work for any firmware version unlike ACPI definitions which were added at some point in mainline firmware

There is no good-for-all solution in this case. If you use the latest firmware, then you are probably better blacklisting the kernel driver, because ACPI will handle it anyway.

@silentcreek
Copy link
Author

silentcreek commented Oct 5, 2021

There is no difference between ACPI definitions of the buttons and leds and the kernel driver.
[...] If you use the latest firmware, then you are probably better blacklisting the kernel driver, because ACPI will handle it anyway.

Just to clarify: By kernel driver you mean the module pcengines_apuv2?

@miczyg1
Copy link
Member

miczyg1 commented Oct 5, 2021

Just to clarify: By kernel driver you mean the module pcengines_apuv2?

Indeed

chr0mag pushed a commit to chr0mag/linux-wireless-router that referenced this issue Oct 20, 2021
Use ACPI/GPIO to handle LED config instead.

pcengines/apu2-documentation#204
@silentcreek
Copy link
Author

@miczyg1 Is the newly released firmware v4.14.0.5 supposed to fix this issue? Going by the release notes, I assumed so, but apparently, it does not.

I installed v4.14.0.5 today. After rebooting, the LEDs stay turned off and there are no sysfs entries for them anymore – which is to be expected, because I had blacklisted the pcengines_apuv2 module as a workaround before. If I undo this blacklisting configuration, the LEDs are functional again, but the kernel error messages from gpio-key-polled show up again. So maybe the ACPI GPIO entries are not the root cause for these error messages?

@miczyg1
Copy link
Member

miczyg1 commented Oct 25, 2021

@silentcreek we will be testing the pcengines_apuv2 again to let you know. @mkopec did the errors occur during the removal of ACPI bindings?

@bsdice
Copy link

bsdice commented Oct 28, 2021

Hey there, leaving this here for posterity and people finding it via search.

Just received one of the last APU 4D4 right before the Great Shortage. Flashed last available apu4_v4.14.0.5.rom to it. Since I prefer stability and need some patches, I compiled long-term supported kernel 5.4.156 on Arch. With this version LED control is unavailable because of the ACPI GPIO change in apu4_v4.14.0.5.rom vs apu4_v4.14.0.4.rom. Took me a fair bit of digging to figure this out. I recommend on page https://pcengines.github.io/index.html in known issues a new item:

  • LED support with Linux LTS kernel 5.4 is broken since 4.14.0.5, workaround: Use 4.14.0.4.

Support for 5.4 will end only in December 2025 according to https://en.wikipedia.org/wiki/Linux_kernel_version_history so a fair bit of life is left in this one. LEDs will show up in /sys/class/leds/ as per usual. I added ledtrig-heartbeat and ledtrig-netdev to /etc/modules-load.d/apu.conf and removed all blacklisted modules in /etc/modprobe.d, they are not needed.

Example:

echo 1 > "/sys/class/leds/apu4:green:led1/brightness"
echo heartbeat > "/sys/class/leds/apu4:green:led2/trigger"
echo disk-activity > "/sys/class/leds/apu4:green:led3/trigger"

Happy blinking. ;-)

@silentcreek
Copy link
Author

@bsdice Have you tried loading the module pcengines_apuv2 or is your custom kernel not compiled with CONFIG_PCENGINES_APU2?

@bsdice
Copy link

bsdice commented Oct 28, 2021

@bsdice Have you tried loading the module pcengines_apuv2 or is your custom kernel not compiled with CONFIG_PCENGINES_APU2?

@silentcreek I have CONFIG_PCENGINES_APU2=m but it is of no use because if you look at https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/platform/x86/pcengines-apuv2.c?h=linux-5.4.y there is no support for "apu4" in 5.4. There is an even older driver, https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/leds/leds-apu.c?h=linux-5.4.y which supports only APU1. This is also why on kernel 5.4 I get no conflicts when using a APU 4D4 board, simply not supported by the specialized drivers.

For newer kernels indeed it would appear that on BIOS 4.14.0.4 and below one could blacklist pcengines-apuv2 module and rely on ACPI support of leds-gpio. Whereas on BIOS 4.14.0.5, ACPI is no longer supported and you need a more recent kernel >= 5.5 and the updated pcengines-apuv2 module with APU4 support, see torvalds/linux@3d00da1#diff-c49c6019d36475a9c027f716107d819089d60cb74455e851cad9bf62f4d1e643

S1 seems available through /proc/interrupts on my system and raw GPIO access to switch over SIM slots etc. is also there by using /sys/class/gpio/gpiochip320/ngpio et al.

@silentcreek
Copy link
Author

@bsdice Ah, sorry, I completly disregarded the fact that you have a different model, namely the APU4. Then I guess that change between v4.14.0.4 and v4.14.0.5 really did more harm than good and should be reverted after all...

@mkopec
Copy link
Member

mkopec commented Nov 2, 2021

I have done some testing with the the pcengines-apuv2 and came to the conclusion that the driver does not in fact conflict with the ACPI bindings, and the the error is caused by the gpio-keys-polled driver not implementing the interface that the pcengines-apuv2 driver uses to export the GPIOs. The leds-gpio driver implements this interface and there are no errors from the leds-gpio driver in firmware v4.14.0.4, there are just duplicate entries in /sys/class/leds.

This issue seems to be fixed in the pcengines-apuv2 driver author's personal Linux fork: https://github.com/metux/linux/commits/apu2/defconf-5.5.0/drivers (see the last 3 commits), unfortunately not all of their fixes have been accepted or submitted into the kernel yet.

This, combined with the fact that the driver is only present in recent kernel versions, makes me think that the ACPI bindings should be restored.

@miczyg1 what do you think?

@miczyg1
Copy link
Member

miczyg1 commented Nov 2, 2021

@mkopec in such case we need to revert the changes which removed the ACPI bindings and explain in the documentation that the kernel driver is buggy and shall be blacklisted. Later we may try to update the status of these commits and let people know if these 3 commits work or not.

@macpijan
Copy link
Member

macpijan commented May 6, 2022

The documentation was updated here: https://pcengines.github.io/apu2-documentation/linux_issues/

@bsdice
Copy link

bsdice commented May 6, 2022

Just leaving this here for folks finding this through a search engine...

People running Arch may want to check out my Linux kernel 5.4 build script here: https://seitics.de/files/linux-5.4/

Features:

  • PC Engines APU4 patches, mentioned in the docs now. My pcengines-apuv2 will leave LEDs to ACPI and only offer gpio and button support. With this patch reboot using front button also works, because it delivers the correct scan code to systemd that is watching inputs.
  • Kernel is built with CONFIG_PINCTRL_AMD=y to avoid a module loading/init race seen on APU as well. These races are fixed only in the very very latest of the Linux kernels AFAIR.
  • Only thing I have in /etc/modules-load.d/apu.conf is
    i2c-dev
    ledtrig-heartbeat
    ledtrig-pattern
    ledtrig-netdev
  • Script will pull the latest 5.4 from AUR (currently 5.4.191, maintained by Jonathon Fernyhough @jf1077)
  • Build is in RAM (I have a 64 GB machine), so you need tons of it and a modified systemd logind like so:
    [Login]
    RuntimeDirectorySize=50%
    RuntimeDirectoryInodesMax=1000000
  • Kernel will not include htmldocs because waste of space, usually.
  • Kernel will include Wireguard 1.0.0, unrelated.
  • Kernel will include some ath9k patches to debug bands; do not run unsupervised; needs a different wireless-regdb package and crda package to be used properly, unrelated.
  • Mellanox patch to support virtual functions on mlx4, unrelated.
  • If you call the script with parameter debug, it will build a kernel with full debug symbols and a crashkernel gz for /var/crash.
  • This kernel is supported by gregkh et al until Dec 2025. Yep, Poettering is wrong. ;-)

Contact info is at the top of the domain. I've uploaded a test kernel+include package for Arch so it is easy to try. No warranties. I use it on Arch on APU4D4.

@metux
Copy link

metux commented Feb 21, 2023

Hi,

speaking as the official kernel maintainer of the apu board family driver.

Sorry for replying so late, but i just found this ticket only by accident - obviously nobody here ever considered
reporting a Linux kernel issue to the official maintainer.

after upgrading my APU2 from Debian 10 (Kernel 4.19) to Debian 11 (Kernel 5.10) I'm getting a couple of these error messages at boot time: kernel: gpio-keys-polled gpio-keys-polled: unable to claim gpio 0, err=-517

Due lack of exact information on your Kernel configuration, I can only guess:

a) you still got the old (meanwhile removed) apu1 driver running, and these two are conflicting
(not sure whether Debian ever enabled the old one). Solution: kernel upgrade or delete the old driver

b) you catched the intermediate BROKEN firmware versions, which added some arbitrary, unthoughtful and
incomplete "gpio support" to the ACPI tables. This one highjacks the driver or a completely different device,
just similar enough (both have some common ancestor) to make some things work by accident.
But this one is DANGEROUS: it hands out the whole address space of what looks like a gpio bank
as gpio devices, even though many of the registers are something entirely different, some gpios are used
internally in the SoC (chip designers forgot to unwire the registers).
Do NOT try to use these gpios, unless you know exactly what you're doing - you may break your hardware !
(that actually happened to me, that's why I wrote the new driver and mapped only the few documented and
successfully tested gpios).

I'm also noticing that the first LED stays off which was not the case before.

Works as designed. The kernel should never mess w/ gpios or leds on its, unless there's a special need for that.
The LEDs don't have any hardcoded function, the decision is entirely user/application specific.

I found a forum discussion [1] that suggests that this issue should be fixed in Linux 5.3 and that this is likely a module ordering issue. The former can't be true since my kernel is newer than 5.3, obviously.

Yes, there was a problem that leds_gpio or gpio_amd_fch might not have been loaded, when pcengines-apv2 tries to initialize them. Only happens if these are compiled as modules. You can test this by loading them in the correct order
manually (command line):

  1. gpio_amd_fch
  2. leds-gpio
  3. pcengines-apuv2

And you should blacklist (or not compile) amd-pinctrl - this is the wrong driver, which might be triggered by broken ACPI tables.

And as for the latter: I tried explicitely loading the mentioned modules leds-gpio and gpio_amd_fch in different orders by adding a cofiguration file in /etc/modules-load.d/ but that didn't work.

Do some gpio or led devices appear w/o loading the apu board driver ?
Is there some ACPI device "AMD0030" ?

In this case, you're one of the unlucky ones who got a broken firmware (see above).
You should certainly blacklist pinctrl-amd.

--mtx

@metux
Copy link

metux commented Feb 21, 2023

HI, we have just recently noticed the issue. I doubt it is a module ordering issue, because the ACPI GPIOs definitions used by gpio-keys-polled come from the GPIO controller which binds to gpio_amd_fch, so one shouldn't work without the other.

The APU board driver doesn't use APCI at all - and there shouldn't be any gpio ACPI bindings at all, since APCI just can't describe the whole setup in a sane way.

And certainly: DO NOT USE AMD0030 (!!!) - it's the WRONG device, and risk hardware damage.

I think we have no other choice but to remove the ACPI bindings for the GPIOs

The APCI bindings (which were introduced much later than the pcengines board driver - before you that, you folks actually suggest playing dangerous games with raw register access from userland!) have been broken from the beginning and should have never been introduced w/o at least speaking to the kernel maintainer.

You broke it, not us.

@metux
Copy link

metux commented Feb 21, 2023

Okay, thanks. I think I understand now... So, we have the modules pcengines_apuv2 and leds_gpio (or gpio_keys_polled?) fighting over the gpio control of these leds.

You have the (stable) apu board driver fighting with broken newer firmware. Downgrade the firmware.

Is there any crucial functionality missing if I blacklist this module?

Depending on firmware version: several gpios (eg. reset lines), different and now suddenly model specific naming of leds, different keycode, ... or maybe loosing it all.

And by the way: this is isn't the correct place for discussing kernel issues - just for reporting broken firmware.

--mtx

@bsdice
Copy link

bsdice commented Feb 21, 2023

@metux Thanks for the insight. I perceive an aggressive tone though. Why that? I realize this is sometimes needed on LKML and Torvalds has been doing it for decades, but this isn't LKML... ;-) I think you just made people delete their posts because I see comments in my inbox which are no longer here.

Anyhow. APU2 seems EOL because of chip shortages. There hasn't been a Coreboot update since half a year, so I presume a contract with Pascal ran out. Which probably means this ACPI issue will never be fixed in Coreboot PCengines branch. I am inclined to patch it locally and compile one last final ROM for APU4D4 myself. Because userland obviously should never be able to damage hardware.

I also checked kernel 6.2 and somewhere in last months CONFIG_PINCTRL_AMD became bool, so can't make it a module anymore and blacklist it. Would need to disable it entirely. I realize in the field out there basically no-one will be running 6.2.

@metux
Copy link

metux commented Feb 21, 2023

There is no difference between ACPI definitions of the buttons and leds and the kernel driver.

There's a lot. If you had talked to me (or maybe asked Pascal + friends - I had explained it detail), before making such wild hacks, you would have known, that this statement is wrong:

The apu board driver adds only those gpios that are known to be safe, the LED names are not model specific.
But the APCI definitions make the LEDs model specific, change the key code, drop the gpio naming, instantiate the wrong driver and expose the whole register bank - including entirely different things - as gpios.

  • the ACPI is better because it will expose the leds and buttons specific to the platform, so the kernel does not need to know which GPIOs on which apu variant is used, e.g. apu5 can be also supported in such way

As said, it names the LEDs in HW specific ways, thus userland needs to change again for every single board model.
userland now also needs to be changed for different key code, and we loose other gpio like reset lines.

  • kernel driver will not support apu5 because it lacks GPIO definitions for it,

Support is under way. But since hw/firmware vendor is so uncooperative, not even attempts to talk to the official kernel maintainer, I didn't have this on high priority.

There is no good-for-all solution in this case.

No, that's a very bad solution. As said: existing userland - in the field (or even on high seas) for many years - now needs to be changed. And it's using the WRONG driver.

And no: ACPI just cannot describe the full setup in a sane way.

@metux
Copy link

metux commented Feb 21, 2023

I have done some testing with the the pcengines-apuv2 and came to the conclusion that the driver does not in fact conflict with the ACPI bindings, and the the error is caused by the gpio-keys-polled driver not implementing the interface that the pcengines-apuv2 driver uses to export the GPIOs. The leds-gpio driver implements this interface and there are no errors from the leds-gpio driver in firmware v4.14.0.4, there are just duplicate entries in /sys/class/leds.

Whohoh! You're binding two drivers to the same register space ? Good luck that.

As the mess w/ broken firmware and uncorpative vendor continues, I'm really considering APCI blacklisting.

This, combined with the fact that the driver is only present in recent kernel versions, makes me think that the ACPI bindings should be restored.

This breaks stable distros and make industrial customers unhappy.

Why should anybody still spend so much money for such an unreliable platform instead of some cheap china stuff from Wish or Aliexpress ?

--mtx

@metux
Copy link

metux commented Feb 21, 2023

@metux Please see https://pcengines.github.io/apu2-documentation/linux_issues/ for the rationale as to why the ACPI code for LEDs and GPIO was restored. The error message still appeared for users after we removed the ACPI bindings.

Your "fix" made it worse. Loosing existing functionality and breaking existing userland.
And the commits this page mentions are quite unrelated to this.

Why didn't you just talk to me first ? Such things don't work well by a one man show. We have good reasons for our extensive open discussions and review process.

@metux
Copy link

metux commented Feb 21, 2023

@metux Thanks for the insight. I perceive an aggressive tone though.

Actually, I'm quite dissappointed and pissed. You probably don't know the long story behind this, so I give you a short overview:

Back when writing the apu board driver, I already was in contact w/ PCengines (they knew me and my role).
There hadn't been any ACPI bindings back then. PCengines still adviced users to play dangerous games with raw register access from userland - very unstable and dangerous.
(there only was some experimental, incomplete support in the apuv1 kernel driver, which practically no distro actually enabled - but these two are totally different device, so we split that out).

At that time there was no intent to change anything at PCengies side. The APU board driver (that doesn't use any acpi stuff) went into mainline and was shipped by Distros. So people now got mainline and Distro support for LEDs, front keys, sim switch and pcie/baseband reset. (assuming their hardware was correctly wired as documented in the schematics).

Some time later (too lazy to scan the git log again, when exactly that was), they suddenly added some pieces of ACPI bindings, in a quick shot (w/o even telling us first), changing the LED naming (now model specific) and key code and dropping all the other stuff. And the worst of all: highjacking an ACPI ID for some very different devices, that just happens to share some common ancestor with the actual one (creating potential for HW damage by uncautious users) Creating a situation where - depending on firmware version (!) - where users loose functionality and changing userland ABI - thus breaking existing userland.

And who's gonna have to deal with pissed users (or lots of broken installations in the field) ? Guess who !

I'm really fed up w/ cleaning up the mess of the firmware vendor again and again.
Yes, that's not unusual in the x86 / acpi world, since board/firmware vendors like to do fire and forget.

I hoped that PCengines would be different, being the one x86 vendor doing some decent work.

But I'm DEEPLY DISAPPOINTED.

The whole mess could have been prevented several years ago, if somebody there had spent a few mins to pick up the phone and call me (or at least drop a mail). With a bit of cooperation we could have full mainline support for the whole APU line, all models, including the fresh ones.

Which probably means this ACPI issue will never be fixed in Coreboot PCengines branch. I am inclined to patch it locally and compile one last final ROM for APU4D4 myself.

Please do so, and submit a pull request. (sadly I'm too busy for also taking care of FW)

I actually thought about the same, but just have limited range of board models right now, so testing will be tricky.
No idea why PCengines didn't just give the official maintainer some examples.

Because userland obviously should never be able to damage hardware.

Indeed. This was exactly the main reason for writing a new gpio driver.

I also checked kernel 6.2 and somewhere in last months CONFIG_PINCTRL_AMD became bool, so can't make it a
module anymore and blacklist it.

Yes, that's because it's just meant for some very low level board/soc specific stuff - it doesn't make any sense to load it later (eg. after PCI setup).

Sorry for being noisy, but I have to repeat it again: this is about an entirely different device. Pcengines highjacked the device id of a different device -- thus broke ACPI rules.

Would need to disable it entirely. I realize in the field out there basically no-one will
be running 6.2.

For the time being, I don't see much more practical chance except recompiling w/ CONFIG_PINCTRL_AMD.

Maybe (if I'm bored enought), I could add specific blacklisting in APCI core. But that will be last resort, we already have too much of such nasty quirks in the kernel. And at certain point it's really questionable whether we kernel folks should clean up the mess that HW vendors produce, just for free. At certain point we have to say: "sorry folks, not our problem, get some decent HW instead".

@bsdice
Copy link

bsdice commented Feb 21, 2023

@metux If I may ask, how did you stumble on this issue? Did you wreck a board? Wreck how much? Or did some commercial user wreck one? How is that your problem even?

Comparing an APU2..4 to one of those Made in China boards is too strong for me. Basically nobody does thermal management or signal integrity right. Pascal does, knows his stuff. And no-one has ECC RAM which works properly in an embedded AMD64 platform. Everything made in Asia will just crash instead.

I just built a release 4.19.0.1 for my APU4 using pce-fw-builder on coffee break. Now trying to figure out what I want to get rid of. pcengines/coreboot@bca61d7 and maybe something else.

@bsdice
Copy link

bsdice commented Feb 21, 2023

The file gpio.asl i.e. AMD0030 as mentioned by you was last touched in summer of 2019 (!) https://github.com/pcengines/coreboot/commits/release/src/mainboard/pcengines/apu2/acpi/gpio.asl

Are you certain that leds.asl and buttons.asl aren't just a minor nuissance which prevent pcengines_apuv2 from loading?

So delete

gpio.asl
leds.asl
buttons.asl

and also from dsdt.asl. Quite the lobotomization.

@metux
Copy link

metux commented Feb 21, 2023

@metux If I may ask, how did you stumble on this issue? Did you wreck a board? Wreck how much? Or did some commercial user wreck one? How is that your problem even?

After accidentially writing to some regs in the suppsed gpio bank, the board behaved erratic.
Don't recall the exact details after all these years, but it was very unstable, funny hangups (guess never seen a running kernel on this one anymore). No idea what really happened, must have hit something deeply internal.
(most likely the board went into trash)

What I do recall is the words "undefined" and "reserved" in some AMD FCH specs. Now we know that "reserved" really means "never ever touch it, or you break things".

Comparing an APU2..4 to one of those Made in China boards is too strong for me. Basically nobody does thermal management or signal integrity right. Pascal does, knows his stuff. And no-one has ECC RAM which works properly in an embedded AMD64 platform. Everything made in Asia will just crash instead.

Yes, Asian boards are usually a great deal of fun. If I wasn't under NDA, I could tell funny stories from various industrial places.

I just built a release 4.19.0.1 for my APU4 using pce-fw-builder on coffee break. Now trying to figure out what I want to get rid of. pcengines/coreboot@bca61d7 and maybe something else.

Look for the gpio ASL files.

@metux
Copy link

metux commented Feb 21, 2023

The file gpio.asl i.e. AMD0030 as mentioned by you was last touched in summer of 2019 (!)
https://github.com/pcengines/coreboot/commits/release/src/mainboard/pcengines/apu2/acpi/gpio.asl

Are we looking at the same branch ?
I've been looking in the latest release, and there was a relatively recent commit that reverted a removal of some gpio stuff from few years ago.

Are you certain that leds.asl and buttons.asl aren't just a minor nuissance which prevent pcengines_apuv2 from loading?

"minor" depends on whether you actually need this stuff and have existing applications written for the mainline kernel.

So delete

gpio.asl leds.asl buttons.asl

and also from dsdt.asl. Quite the lobotomization.

Yep, just drop that ASL and rebuild, should do the trick.

@silentcreek
Copy link
Author

silentcreek commented Feb 22, 2023

Due lack of exact information on your Kernel configuration, I can only guess:

I'm using the default Debian kernel configuration which enables the APU1 and APU2 drivers as modules:
CONFIG_LEDS_APU=m CONFIG_PCENGINES_APU2=m
That is for Debian 11 with kernel 5.10. On Debian 10 with kernel 4.19, the APU2 driver didn't exists, obviously.

a) you still got the old (meanwhile removed) apu1 driver running, and these two are conflicting (not sure whether Debian ever enabled the old one). Solution: kernel upgrade or delete the old driver

No. The APU1 module is not loaded by default on my system.

b) you catched the intermediate BROKEN firmware versions, which added some arbitrary, unthoughtful and incomplete "gpio support" to the ACPI tables. [...]

Which versions do you mean? Looking at the history of src/mainboard/pcengines/apu2/acpi/, the GPIO bindings were added in 2019, then they were removed in October 2021 and then the removal was reverted in November 2021. Do you mean all releases between July 2019 and today, except for then one between October and November 2021 are broken, intermediate releases?

In any case, yes, I was/am using those firmware releases.

I found a forum discussion [1] that suggests that this issue should be fixed in Linux 5.3 and that this is likely a module ordering issue. The former can't be true since my kernel is newer than 5.3, obviously.

Yes, there was a problem that leds_gpio or gpio_amd_fch might not have been loaded, when pcengines-apv2 tries to initialize them. Only happens if these are compiled as modules. You can test this by loading them in the correct order manually (command line):

1. gpio_amd_fch

2. leds-gpio

3. pcengines-apuv2

Doing this leads to the original kernel error message reported here. But that is, per your description, to due the APCI definitions in the firmware.

And you should blacklist (or not compile) amd-pinctrl - this is the wrong driver, which might be triggered by broken ACPI tables.

The module pinctrl_amd is not loaded on my system by default. So, it doesn't seem to be triggered by these ACPI tables (and I haven't blacklisted it).

Edit: Oh, Debian seems to compile-in the pinctrl_amd driver, rather than building it as a module (CONFIG_PINCTRL_AMD=y).

Do some gpio or led devices appear w/o loading the apu board driver ?

Yes, since I blacklisted the APU2 driver after this discussion here, the leds appear in sysfs. I didn't check other GPIO devices, because I was only interested in controlling the LEDs and that works just fine.

Is there some ACPI device "AMD0030" ?

I don't know the proper way to look for ACPI devices, but dmesg | grep -i acpi doesn't show any AMD0030 device.

Edit: I did find something under /sys/class/gpio:
$ cat /sys/class/gpio/gpiochip320/device/modalias
acpi:AMD0030:AMD0030:

@silentcreek
Copy link
Author

silentcreek commented Feb 22, 2023

Okay, thanks. I think I understand now... So, we have the modules pcengines_apuv2 and leds_gpio (or gpio_keys_polled?) fighting over the gpio control of these leds.

You have the (stable) apu board driver fighting with broken newer firmware. Downgrade the firmware.

From a user perspective, I think this is a bit shortsighted. The error manifested itself after a kernel upgrade (4.19 to 5.10) which brought a new driver. The firmware or hardware haven't changed and at that point, so it isn't too far-fetched to think it's a kernel regression - again, just as a mere user.

Now, I'm not trying to suggest that you're wrong (especially since I can't make any technical judgements here). I'm just trying to say, I wouldn't paint this as black and white, especially considering, that the first mainline release including your driver and the first firmware release with these ACPI bindings are just 3 months apart.

And by the way: this is isn't the correct place for discussing kernel issues - just for reporting broken firmware.

True, indeed. My bad. But to be fair, I think 3mdeb was in general really helpful with all sorts of issues, and unfortunately, I can't say I've always gotten replies to bug reports sent to LKML or Kernel Bugzilla.

@silentcreek
Copy link
Author

So delete
gpio.asl leds.asl buttons.asl
and also from dsdt.asl. Quite the lobotomization.

Yep, just drop that ASL and rebuild, should do the trick.

So, I built a firmware with these three asl files removed, and their respective include statements removed from dsdt.asl as well. Now, when I boot my system (Debian 11 with Linux 5.10), I get the original kernel error messages again!

I checked that the APU1 driver is not loaded and I can also see that the gpiochip with the alias "acpi:AMD0030:AMD0030:" is gone (there is a different gpiochip now with the label gpio_amd_fch). In addition, the leds show up with different names now in sysfs and there are no duplicate entries, leading me to believe that removing the ACPI bindings worked.

So, I checked to see if the modules are loaded in the correct order, but that doesn't make a difference either.

You can test this by loading them in the correct order manually (command line):

1. gpio_amd_fch

2. leds-gpio

3. pcengines-apuv2

Even if I remove the modules and load them manually in this order, the error message shows up.

Any ideas how to resolve this?

@silentcreek
Copy link
Author

@metux The more I think back to the history of this issue, the more I believe we're missing something here.

Due lack of exact information on your Kernel configuration, I can only guess: [...]
b) you catched the intermediate BROKEN firmware versions, which added some arbitrary, unthoughtful and incomplete "gpio support" to the ACPI tables.

This is actually not causing the kernel errors I reported. It does not matter which firmware you use. The error still happens. After reading through this issue here again, I noticed that I actually did try a firmware release where 3mdeb removed the ACPI GPIO definitions (v4.14.0.5) back in October 2021. I still got the kernel errors. So, it's no surprise that my newly custom built firmware without the APCI bindings doesn't solve the problem either.

We're still missing the root cause for the kernel errors, it seems.
(Now, again, I'm not trying to suggest, that your statement is wrong, that the ACPI bindings shouldn't be in the firmware, just, that it's off the point with regards to solving the issue at hand.)

@bsdice
Copy link

bsdice commented Feb 28, 2023

I tried to fix the error as well (Linux kernel 6.2.1, latest firmware lobotomized) but failed and ran out of time. Switched back to v4.16.0.1 03/07/2022 on APU4D4, which I think is one of the most stable for me. Updated to kernel 5.4.233 with 1004-gpio_keys_polled.patch and 1004-gpio_keys_polled.patch patches as mentioned, also with option CONFIG_PINCTRL_AMD=y to eliminate the driver race. Nothing blacklisted. Not using any GPIOs and I also don't care if LED names are platform-agnostic. ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants