New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2 x APU2E4 unstable with CPB enabled. #251
Comments
The CPB is additionally enabling core/package C6 states. I have recently discovered some bugs in coreboot around the C6 and its save state area in DRAM. It may be causing problems when CPB is enabled. The patches to coreboot are already sent, so I can test if those resolve your issue. If I understood correctly there is no need for stressing the firewall device to trigger this? |
Thank you very much for the feedback, this is correct there is no need to run anything on the firewall. I don't even have network cables attached to the firewall. |
@MyGithubUser01 I have left OPNsense 21.1 installer running on apu2 over a night yesterday (20 hours elapsed since I left the machine idling) with CPB enabled. Not a single MCA error on the serial console with apu2 v4.14.0.1 which contains fixes I have mentioned in the previous comment. Could you please give v4.14.0.1 a try? Let me know if it helps in your case |
Hi, Thank you very much for the update, I've now updated to 4.14.0.1 and made sure CPB is enabled (looks like it was enabled after flashing). I started the firewall about 24h ago with Serial console and WAN connected, but this morning I only had 5h of uptime and found the below in the console/log. This means it happened after 16-20h. Looks like it doesn't happen as often/frequent as before - but I'm not sure. WARNING: attempt to domain_add(netgraph) after domainfinalize() |
Somehow I cannot reproduce it and we never run into MCA erros before. The warning |
Brand new APU4D4 here. Crashing multiple times a day when CPB enabled. FW versions v4.14.0.1 and v4.14.0.2. Currently running Proxmox on Buster and I get many this kind of errors before APU will eventually end up in hanging/crashing. mce: [Hardware Error]: Machine check events logged Disabling CPB seems to make it stable for now. |
This sounds very similar to what I'm experiencing with CPB enabled, thanks for chiming in. Are these all related? |
Still what is written in the forum is not exactly true. CPU Boost does not raise the memory clock frequency, it can't do that because it would require retraining the memory to the new frequency (only BIOS can train the memory, once it is done, the memory frequency is fixed). CPB is not an overclocking feature! It simply raises the CPU clock frequency to the limits allowed by the CPU specification. Overclocking would be to go higher than what CPB provides (i.e. higher than 1400MHz). |
Wanted to chime in this. I have had the same stability issues described here with my APU2E4 for many years. It would work sometimes for weeks at a time, then it would have stability issues daily over a period of several weeks, then back to a few weeks between each failure. I have disable CBP now, and so far it looks good. But it is only been a week, so I will have to wait a few months to really be sure. I've found others that points to the same thing, that CBP causes issues: https://forum.netgate.com/topic/156761/page-fault-while-in-kernel-mode-on-apu2-after-bios-coreboot-upgrade/38 edit: 37d uptime and no issues encountered after CBP was disabled |
After a power-outage, my AP2E4 started to misbehave again, randomly locking up. Had to check if CPB for some reason had been re-enabled, and for sure, it was Enabled again. I'll report back if it is still stable. |
Hi @toredash , did you experience any more lock-ups since disabling CPB? |
Apologies for a "me too" comment but unfortunately for me, the described symptoms in this issue are also somewhat occurring with my APU2E4. Alas, I'm long past the warranty period as I've purchased mine in the summer of 2020. I have not tested with Linux since I am mainly running pfSense on this unit. This occurs with 2.6.0 and a few previous versions. I've only experienced it randomly crashing and restarting a few times, but since this unit is operating as a firewall for my home connection, I'd rather turn off CPB to make the unit stable again than deal with the instability and MCA errors. If at all possible, I would gladly appreciate a fix to this issue as I could use the extra performance to handle bursty traffic flows since I have a gigabit internet connection. I am willing to turn CPB on again and offer my help in debugging the problem. As mentioned just above, I also see the MCA errors with CPB enabled but unfortunately I don't have the logs anymore (I've stumbled across this issue randomly when reading the documentation for something unrelated), but they appear very similar and the error seemed to have been associated with CPU2 in my case. With CPB off, I do not see them ever appear in the syslog. I've since then done a fresh install and why I don't have the logs anymore. I do however, still see errors in the logs of pfSense that are seemingly related to the firmware and the first i210 NIC and I don't know if it's related.
The relevant lines/errors:
and
Again, I don't know if the above lines/errors are relevant to the instability issue with CPB at hand. |
No, my device has been stable since CPB was disabled. |
FTR I've also had bad stability issues with an apu6 that seem to be solved by disabling CPB. Maybe CPB shouldn't be enabled by default? |
Update: My device is still stable after several months. |
v4.14.0.6
…On Tue, 11 Oct 2022 at 09:46, kurtselbach ***@***.***> wrote:
Perfect, thanks for the update. Got two unused APU2E4 in a drawer...
Which BIOS version are you running now?
On Sun, Sep 4, 2022, 22:40 Tore ***@***.***> wrote:
> Update: My device is still stable after several months.
>
> —
> Reply to this email directly, view it on GitHub
> <
#251 (comment)
>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AHZSEG6CDKAJSSLAONEXD5LV4UCK7ANCNFSM5GGOH5ZA
>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
—
Reply to this email directly, view it on GitHub
<#251 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACMFZWO3OLGCWRT2R3ANZ3WCULNNANCNFSM5GGOH5ZA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
FWIW, I have a very old APU2 that became extremely unstable with CPB and Linux 6.2.5. It generates a lot of different "null pointer dereference," "unable to handle page fault," and "soft lockup" panics. It lasts no more than a few hours per reboot, and sometimes only a few minutes. Disabling CPB in the BIOS seems to have solved it. Here is one of the common panics:
|
Hi all,
I'm having some serious stability issues with APU2E4 and CPB with BIOS 4.13.0.1 and 4.13.0.5
This is brand new hardware which was believed to be "DOA" but the replacement I got had the exact same issue.
After disabling CPB the system appears to be stable and has an uptime of a record high 4 days and going.
Operating system tested OPNsense 21.1 and 21.1.5.
I've tried booting from msata, sd card and USB but it gives me the same issue.
I've also tried multiple power adapters.
The CPU Temperature is typically in the range 54-56c and the system isn't even connected to any network just the console cable.
The system has been very unstable and is core dumping every 4-12 hours. BIOS 4.13.0.1, but did see similar issues when testing 4.13.0.5.
From the logs/console I see the following:
FreeBSD/amd64 (OPNsense.localdomain) (ttyu0)
login: MCA: Bank 1, Status 0x9400000000000151
MCA: Global Cap 0x0000000000000106, Status 0x0000000000000000
MCA: Vendor "AuthenticAMD", ID 0x730f01, APIC ID 0
MCA: CPU 0 COR ICACHE L1 IRD error
MCA: Address 0x282060
[HBSD SEGVGUARD] [/usr/local/bin/python3 (5880)] Suspension expired.
-> pid: 5880 ppid: 1302 p_pax: 0xa50<SEGVGUARD,ASLR,NOSHLIBRANDOM,NODISALLOWMAP32BIT>
And:
"root@OPNsense:/var/db/rrd # MCA: Bank 1, Status 0xd400000000000151
MCA: Global Cap 0x0000000000000106, Status 0x0000000000000000
MCA: Vendor "AuthenticAMD", ID 0x730f01, APIC ID 0
MCA: CPU 0 COR OVER ICACHE L1 IRD error
MCA: Address 0xffff80d1ff60"
Let me know if additional details are required.
Broken hardware, bios bug, OPNsense/HardenedBSD compatibility issues?
The text was updated successfully, but these errors were encountered: