-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
coreos install fails using v1.35 with 'synchronous exception at 0x000000003384D000' #235
Comments
For the record I tested 1.35 with Debian 12 ARM64 right before the release (in ACPI mode), and saw no boot issue. Are you using DeviceTree or ACPI? If you use DeviceTree, please try ACPI and report your results. |
I am also using Fedora CoreOS and I am seeing issues booting with v1.35 which is painfully slow and I can't even get to the FCOS grub menu. Reverting back to v1.34 still seems slow but at least it will get to grub. |
I was able to get some more information on this with the 1.35 debug firmware.
|
I would like to help, but am not sure where to begin when debugging this
sort of thing.
The pi uses acpi.
I have been developing a usb security cam ansible script for coreos, since
v1.33 and am particularly interested in being certain that the usb pipeline
is not corrupt in any way, so if I can help I would be happy to do so.
…On Thu, 22 Jun 2023, 01:55 Joe Doss, ***@***.***> wrote:
I was able to get some more information on this with the 1.35 debug
firmware.
FSOpen: Open '\EFI\BOOT\BOOTAA64.EFI' Success
tallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 352993C0
Loading driver at 0x0003388B000 EntryPoint=0x000338A9000
Loading driver at 0x0003388B000 EntryPoint=0x000338A9000
FSOpen: Open 'RPI_EFI.FD' Success
Variables dumped!
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 352E7E98
ProtectUefiImageCommon - 0x352993C0
- 0x000000003388B000 - 0x00000000000D5000
InstallProtocolInterface: 605DAB50-E046-4300-ABB6-3DD810DD8B23 3393E360
FSOpen: Open '\EFI\BOOT\fbaa64.efi' Success
FSOpen: Open '\EFI\BOOT\fbaa64.efi' Success
SetMemoryAttributes: BaseAddress == 0x33854000, Length == 0x1A000, Attributes == 0x4000
ClearMemoryAttributes: BaseAddress == 0x33854000, Length == 0x1A000, Attributes == 0x22000
Synchronous Exception at 0x0000000033858000
Synchronous Exception at 0x0000000033858000
PC 0x000033858000
PC 0x0000338AB288
PC 0x0000338AB338
PC 0x0000338AC1B4
PC 0x0000338A9030
PC 0x00003A11BE58 (0x00003A114000+0x00007E58) [ 1] DxeCore.dll
PC 0x000036E8C664 (0x000036E85000+0x00007664) [ 2] BdsDxe.dll
PC 0x000036E8FBB8 (0x000036E85000+0x0000ABB8) [ 2] BdsDxe.d/RPi4/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/BdsDxe/BdsDxe/DEBUG/BdsDxe.dll
[ 3] /home/runner/work/RPi4/RPi4/Build/RPi4/DEB13 0x0000000000000002 X14 0x0000000000000001 X15 0x0000000000000002
X16 0x0000000036F5F314 X17 0x000000001EA68734 X18 0x003386E098
> 000003B3FF5E0: 000000003B3FF640 00000000338AB338 0000000000000000 0000000033914000
000003B3FF600: 000000003392A000 000000003593FD98 00017F903B3FF640 0000000033858000
000003B3FF620: 0000000033854000 000000000000001A 0000000033888C18 000000003386E018
000003B3FF640: 000000003B3FF6A0 00000000338AC1B4 0000000000000000 00000000338A9428
000003B3FF660: 000000003393E39F 000000003593FD98 000000003B3FF710 000000003393E000
000003B3FF680: 00000000352993C0 0000000035966030 0000000035E08BA0 0000000035299CA0
000003B3FF6A0: 000000003B3FF740 00000000338A9030 0000000000000001 0000000000000000
000003B3FF6C0: 0000000000000000 0000000000000001 0000000036E9E168 0000000036E9A690
ASSERT [ArmCpuDxe] /home/runner/work/RPi4/RPi4/edk2/ArmPkg/Library/DefaultExceptionHandlerLib/AArch64/DefaultExceptionHandler.c(333): ((BOOLEAN)(0==1))
Watchdog Timer resetting system
UsbBusStop: usb bus stopped on 35962D18
FATAL ERROR - RaiseTpl with OldTpl(0x10) > NewTpl(0x8)
—
Reply to this email directly, view it on GitHub
<#235 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA4GSGJSTICYEH6WFG6WKF3XMOJXLANCNFSM6AAAAAAZLDOLEU>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Not sure if I should open a separate bug, but I'm also getting this with a Raspberry Pi 3B+ and the RPi3 firmware on Fedora CoreOS and Ubuntu Server 22.04.2 LTS. However, with Alpine Linux the boot works when using the "Standard" aarch64 image together with the -rpi kernel from the "Raspberry Pi" .tar.gz. Passing acpi=off in grub.cfg didn't help either. To rule out stupid mistakes I also tried with the latest Tow-Boot release and with that the UEFI boot worked successfully with both Ubuntu and CoreOS. In case anyone wants to reproduce on an RPi3 which doesn't support GPT partitioning, I also had to use gdisk after the installation:
|
That might be it. I have a 1mb partition as a consequence of using the coreos-installer that is just before the efi system partition. I am not sure how to use the coreos-installer in a way that would remove that partition as the coreos-installer command takes the firmware loader version as an argument. I guess I could install using v1.35 and then remove the 1mb linux partition before booting and see what happens. |
After running the installer I just stayed in the terminal and ran gdisk. But for more convenience I used qemu to build the disk image and then used dd from the host to copy the qemu image to an SD card. That way if the boot doesn't work you can continue modifying the existing system and copy over. |
Note that with this EDK2 firmware removing the 1MB partition didn't fix the issue for me. The error with that partition is earlier and different. |
I've ran into the same issue (albeit slightly different exception address) with regular Fedora 37 and 38 Workstation install without secure boot enabled. With secure boot enabled it wouldn't boot either - no surprise, I didn't have the security keys enrolled in UEFI, but would not fail with an exception. Rather, it it stayed on the Raspberry logo page and informed me that it could not boot via net (no surprise, I don't have the netboot IP addresses set up) and finally failed with info that no bootable option was found - but no exception this time. After much fiddling and searching I found the following info (and possible workarounds): Is it possible that we are experiencing the same bug on a different architecture? |
Try disabling EFI_MEMORY_ATTRIBUTE_PROTOCOL. Refer to https://edk2.groups.io/g/devel/message/106181 |
No, I don't think that solves the problem (at least for me). On the outside, the situation stays the same. In case anybody wants to try as well can try with the packages I build with the patch mentioned: https://github.com/cedel1/RPi4/releases/tag/untagged-938a1e41485373db6db4 - includes the patch as it is. |
The interesting part is that I am currently also trying with Fedora server (iso image) and that seems to work - at least gets to grub and installation. |
Sure enough, 1.34 works for me too. No luck with 1.35. Centos 9 Stream. |
Most likely this is the reason. |
Does this mean we will have to wait for v1.36 to be released? I just tried v1.35 again, using the coreos-installer and the same asynchronous error is reported. |
On the contrary, from where I stand, what it means that v1.35 is including the patch mentioned above and the reason some distros fail to boot is because they use an old GRUB version that relies on Ditsros that use newer GRUB versions (or at least distros that have applied patches from the recent GRUB mainline, since, very problematically, the GRUB project is sadly unable to release on a timely basis, which creates a huge amount of problems downstream) shouldn't have this issue, as can be evidenced by the fact that recent Debian ARM64 and other distros do not experience the synchronous exception issue. Thus, if you run into this issue, you may have to pressure the maintainers of the Linux distro you use to update their GRUB codebase, to ensure that it works with |
I believe @millerthegorilla is using @millerthegorilla can you report exactly what version of Fedora CoreOS you were installing so we can figure out what grub2 rpms were being used? |
I was also installing CoreOs and my cluster says it has been up for 60 days
so…
38.20230709.3.0
…On Thu 28. Sep 2023 at 16:52, Dusty Mabe ***@***.***> wrote:
@pbatard <https://github.com/pbatard>
Ditsros that use newer GRUB versions (or at least distros that have
applied patches from the recent GRUB mainline, since, very problematically,
the GRUB project is sadly unable to release on a timely basis, which
creates a huge amount of problems downstream) shouldn't have this issue
I believe @millerthegorilla <https://github.com/millerthegorilla> is
using coreos-installer to install Fedora CoreOS so we're using the grub
packages from Fedora. I could be wrong but usually Fedora is on the cutting
edge of software. The sources for GRUB2 RPM are stored here
<https://src.fedoraproject.org/rpms/grub2>.
@millerthegorilla <https://github.com/millerthegorilla> can you report
exactly what version of Fedora CoreOS you were installing so we can figure
out what grub2 rpms were being used?
—
Reply to this email directly, view it on GitHub
<#235 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABG4THEAVFXYPLTZ3VH67TTX4WFMBANCNFSM6AAAAAAZLDOLEU>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
but did you use v1.35? |
No, I tried and it didn’t work. I backed off to 1.34. Otherwise the cluster
wouldn’t be up.
…On Thu 28. Sep 2023 at 16:59, Dusty Mabe ***@***.***> wrote:
I was also installing CoreOs and my cluster says it has been up for 60 days
but did you use v1.35 <https://github.com/pftf/RPi4/releases/tag/v1.35>?
—
Reply to this email directly, view it on GitHub
<#235 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABG4THE76YADZY4LB4UQ4Q3X4WGGHANCNFSM6AAAAAAZLDOLEU>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Hi, I am using coreos-installer and v1.35 fails, whilst v1.34 works as expected. |
yes, but what exact version of Fedora CoreOS are you using? |
60 days ago, when I installed i was using
38.20230709.3.0
An aarch64 raw image
But since someone has confirmed the problem still exists, that’s not really
relevant, is it?
As written above
…On Thu 28. Sep 2023 at 17:23, Dusty Mabe ***@***.***> wrote:
Hi, I am using coreos-installer and v1.35 fails, whilst v1.34 works as
expected.
yes, but what exact version of Fedora CoreOS are you using?
—
Reply to this email directly, view it on GitHub
<#235 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABG4THALAWOBX7NAJ5CRWILX4WI7FANCNFSM6AAAAAAZLDOLEU>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
In my case, when trying to install Fedora Server 39 beta, the installation works correctly, but when booting the new system it fails with v1.35. With v1.34 both the installation and the new system work as expected. Doing the installation again with the |
Is there any update on this error ? |
I was wondering the same, particularly after fedora 39 release was delayed
by rpi4 bugs. I probably should have reported it to the f39 team at the
time.
Should they be informed?
…On Tue, 14 Nov 2023, 07:18 Nauman, ***@***.***> wrote:
Is there any update on this error ?
—
Reply to this email directly, view it on GitHub
<#235 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA4GSGJFZZE3RGOL63LQBQDYEMLL5AVCNFSM6AAAAAAZLDOLEWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBZGY3DANRUG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I suspect I just ran headlong into this with a recent rebase of my edk2 branch. Try dropping to the shell and running grub directly rather than the default shim->grub sequence. That allowed me to boot, but I've now further broken things. |
Right, so confirmation that it looks like rpm firmware that supports the EFI memory attributes protocol blows up with:
|
This is a known bug, and has several downstream bugs associated with it, the upstream bug is: rhboot/shim#614 |
Although the immediate crash can be fixed (at least in my testing) with https://src.fedoraproject.org/rpms/shim-unsigned-aarch64/pull-request/2 which is already merged to mainline shim. That doesn't mean the alignments are correct, only that at least fedora 4k boots with that patch. |
FWIW, one of the RH bugzillas associated with this is: https://bugzilla.redhat.com/show_bug.cgi?id=2259264 and there is a shim-aa64 package which has been built with the updated shim source. It should be in the fedora updates repositories soon(ish), although it can be downloaded directly from fedora koji: https://koji.fedoraproject.org/koji/buildinfo?buildID=2420877 for those that wish/need to test now. |
I've tried putting new UEFI after today's update to Fedora IOT 40 but synchronous error is still present upon reboot with different address (I guess since binary has changed a bit). Can anyone check and confirm because I thought fedora should work or I have to install something? @garybuhrmaster could you tell something about this? |
I have not tried a new install from a recent compose, but have validated that using the new shim and using v1.35 works with an existing RPi 4 EFI booting system (server base, not IoT). You may need to check to be sure that the IOT compose is using the new shim. |
@garybuhrmaster does it have to be unsigned shim or the "normal" |
It's time to test again since there's new release. I'll try in the coming days. Anyone tested already? |
0x00000000363D2000 it's still the same sadly |
It could be that it's GRUB version being used in Fedora 40. It's 2.06 with some patches but I don't want to check it because there's a lot of them. Maybe with Fedora 41 that has grub 2.12 things will change because that's the only thing that can stop it as all the other things were updated. |
Feel free to test with |
@dustymabe just tried rebasing IOT to 41 and it's still the same situation at the same address. |
Hi, when using v1.35 and following instructions at https://docs.fedoraproject.org/en-US/fedora-coreos/provisioning-raspberry-pi4/#_edk2_combined_disk_mode_alternate_machine_disk_preparation the firmware boots, but then stops at the point of booting the os, with an error -
synchronous exception at 0x000000003384D000
.If I use v1.34 no problem at all.
This is on a coreos install on a Raspberry Pi 4B 8Gb. I tried several different microsd cards, and several different ignition files, but always with the same issue. I am guessing there is a problem with the firmware addressing the bootloader incorrectly.
The text was updated successfully, but these errors were encountered: