Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coreos install fails using v1.35 with 'synchronous exception at 0x000000003384D000' #235

Open
millerthegorilla opened this issue Jun 18, 2023 · 39 comments

Comments

@millerthegorilla
Copy link

Hi, when using v1.35 and following instructions at https://docs.fedoraproject.org/en-US/fedora-coreos/provisioning-raspberry-pi4/#_edk2_combined_disk_mode_alternate_machine_disk_preparation the firmware boots, but then stops at the point of booting the os, with an error - synchronous exception at 0x000000003384D000.

If I use v1.34 no problem at all.

This is on a coreos install on a Raspberry Pi 4B 8Gb. I tried several different microsd cards, and several different ignition files, but always with the same issue. I am guessing there is a problem with the firmware addressing the bootloader incorrectly.

@pbatard
Copy link
Member

pbatard commented Jun 18, 2023

For the record I tested 1.35 with Debian 12 ARM64 right before the release (in ACPI mode), and saw no boot issue.

Are you using DeviceTree or ACPI? If you use DeviceTree, please try ACPI and report your results.

@jdoss
Copy link

jdoss commented Jun 20, 2023

I am also using Fedora CoreOS and I am seeing issues booting with v1.35 which is painfully slow and I can't even get to the FCOS grub menu. Reverting back to v1.34 still seems slow but at least it will get to grub.

@jdoss
Copy link

jdoss commented Jun 22, 2023

I was able to get some more information on this with the 1.35 debug firmware.

FSOpen: Open '\EFI\BOOT\BOOTAA64.EFI' Success
tallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 352993C0
Loading driver at 0x0003388B000 EntryPoint=0x000338A9000
Loading driver at 0x0003388B000 EntryPoint=0x000338A9000 
FSOpen: Open 'RPI_EFI.FD' Success
Variables dumped!
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 352E7E98
ProtectUefiImageCommon - 0x352993C0
  - 0x000000003388B000 - 0x00000000000D5000
InstallProtocolInterface: 605DAB50-E046-4300-ABB6-3DD810DD8B23 3393E360
FSOpen: Open '\EFI\BOOT\fbaa64.efi' Success
FSOpen: Open '\EFI\BOOT\fbaa64.efi' Success
SetMemoryAttributes: BaseAddress == 0x33854000, Length == 0x1A000, Attributes == 0x4000
ClearMemoryAttributes: BaseAddress == 0x33854000, Length == 0x1A000, Attributes == 0x22000


Synchronous Exception at 0x0000000033858000


Synchronous Exception at 0x0000000033858000
PC 0x000033858000
PC 0x0000338AB288
PC 0x0000338AB338
PC 0x0000338AC1B4
PC 0x0000338A9030
PC 0x00003A11BE58 (0x00003A114000+0x00007E58) [ 1] DxeCore.dll
PC 0x000036E8C664 (0x000036E85000+0x00007664) [ 2] BdsDxe.dll
PC 0x000036E8FBB8 (0x000036E85000+0x0000ABB8) [ 2] BdsDxe.d/RPi4/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/BdsDxe/BdsDxe/DEBUG/BdsDxe.dll
[ 3] /home/runner/work/RPi4/RPi4/Build/RPi4/DEB13 0x0000000000000002  X14 0x0000000000000001  X15 0x0000000000000002
 X16 0x0000000036F5F314  X17 0x000000001EA68734  X18 0x003386E098
> 000003B3FF5E0: 000000003B3FF640 00000000338AB338 0000000000000000 0000000033914000
  000003B3FF600: 000000003392A000 000000003593FD98 00017F903B3FF640 0000000033858000
  000003B3FF620: 0000000033854000 000000000000001A 0000000033888C18 000000003386E018
  000003B3FF640: 000000003B3FF6A0 00000000338AC1B4 0000000000000000 00000000338A9428
  000003B3FF660: 000000003393E39F 000000003593FD98 000000003B3FF710 000000003393E000
  000003B3FF680: 00000000352993C0 0000000035966030 0000000035E08BA0 0000000035299CA0
  000003B3FF6A0: 000000003B3FF740 00000000338A9030 0000000000000001 0000000000000000
  000003B3FF6C0: 0000000000000000 0000000000000001 0000000036E9E168 0000000036E9A690
ASSERT [ArmCpuDxe] /home/runner/work/RPi4/RPi4/edk2/ArmPkg/Library/DefaultExceptionHandlerLib/AArch64/DefaultExceptionHandler.c(333): ((BOOLEAN)(0==1))
Watchdog Timer resetting system
UsbBusStop: usb bus stopped on 35962D18
FATAL ERROR - RaiseTpl with OldTpl(0x10) > NewTpl(0x8)

@millerthegorilla
Copy link
Author

millerthegorilla commented Jun 22, 2023 via email

@wkornewald
Copy link

Not sure if I should open a separate bug, but I'm also getting this with a Raspberry Pi 3B+ and the RPi3 firmware on Fedora CoreOS and Ubuntu Server 22.04.2 LTS. However, with Alpine Linux the boot works when using the "Standard" aarch64 image together with the -rpi kernel from the "Raspberry Pi" .tar.gz.

Passing acpi=off in grub.cfg didn't help either.

To rule out stupid mistakes I also tried with the latest Tow-Boot release and with that the UEFI boot worked successfully with both Ubuntu and CoreOS.

In case anyone wants to reproduce on an RPi3 which doesn't support GPT partitioning, I also had to use gdisk after the installation:

  • Convert from GPT to MBR.
  • If there is a tiny (e.g. 1MB) reserved partition before the EFI system partition delete it and sort the partition table to make the the EFI system partitions start at position 1. This mostly affects CoreOS.
  • Change the EFI system partition type to 0x0C (FAT32-LBA).

@millerthegorilla
Copy link
Author

  • If there is a tiny (e.g. 1MB) reserved partition before the EFI system partition delete it and sort the partition table to make the the EFI system partitions start at position 1. This mostly affects CoreOS.

That might be it. I have a 1mb partition as a consequence of using the coreos-installer that is just before the efi system partition. I am not sure how to use the coreos-installer in a way that would remove that partition as the coreos-installer command takes the firmware loader version as an argument. I guess I could install using v1.35 and then remove the 1mb linux partition before booting and see what happens.

@wkornewald
Copy link

After running the installer I just stayed in the terminal and ran gdisk. But for more convenience I used qemu to build the disk image and then used dd from the host to copy the qemu image to an SD card. That way if the boot doesn't work you can continue modifying the existing system and copy over.

@wkornewald
Copy link

wkornewald commented Jun 25, 2023

Note that with this EDK2 firmware removing the 1MB partition didn't fix the issue for me. The error with that partition is earlier and different.

@cedel1
Copy link

cedel1 commented Aug 15, 2023

I've ran into the same issue (albeit slightly different exception address) with regular Fedora 37 and 38 Workstation install without secure boot enabled.

With secure boot enabled it wouldn't boot either - no surprise, I didn't have the security keys enrolled in UEFI, but would not fail with an exception. Rather, it it stayed on the Raspberry logo page and informed me that it could not boot via net (no surprise, I don't have the netboot IP addresses set up) and finally failed with info that no bootable option was found - but no exception this time.

After much fiddling and searching I found the following info (and possible workarounds):
https://discussion.fedoraproject.org/t/install-media-dont-boot-in-uefi-mode-on-certain-motherboards/71376
https://bugzilla.redhat.com/show_bug.cgi?id=2113005

Is it possible that we are experiencing the same bug on a different architecture?

@nhivp
Copy link

nhivp commented Aug 15, 2023

Try disabling EFI_MEMORY_ATTRIBUTE_PROTOCOL. Refer to https://edk2.groups.io/g/devel/message/106181

@cedel1
Copy link

cedel1 commented Aug 16, 2023

@nhivp

No, I don't think that solves the problem (at least for me). On the outside, the situation stays the same.

In case anybody wants to try as well can try with the packages I build with the patch mentioned:
https://github.com/cedel1/RPi4/releases

https://github.com/cedel1/RPi4/releases/tag/untagged-938a1e41485373db6db4 - includes the patch as it is.
https://github.com/cedel1/RPi4/releases/tag/untagged-033356db89331e36d576 - includes the patch with the pdc turned off.

@cedel1
Copy link

cedel1 commented Aug 16, 2023

The interesting part is that I am currently also trying with Fedora server (iso image) and that seems to work - at least gets to grub and installation.

@alexwoellhaf
Copy link

Sure enough, 1.34 works for me too. No luck with 1.35. Centos 9 Stream.

@mihalicyn
Copy link

tianocore/edk2@2997ae3

commit 2997ae38739756ecba9b0de19e86032ebc689ef9
Author: Ard Biesheuvel <ardb@kernel.org>
Date:   Tue Aug 2 11:48:04 2022 +0200

    ArmVirtPkg: make EFI_LOADER_DATA non-executable
    
    When the memory protections were implemented and enabled on ArmVirtQemu
    5+ years ago, we had to work around the fact that GRUB at the time
    expected EFI_LOADER_DATA to be executable, as that is the memory type it
    allocates when loading its modules.
    
    This has been fixed in GRUB in August 2017, so by now, we should be able
    to tighten this, and remove execute permissions from EFI_LOADER_DATA
    allocations.
    
    Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

Most likely this is the reason.

@millerthegorilla
Copy link
Author

millerthegorilla commented Sep 28, 2023

Does this mean we will have to wait for v1.36 to be released? I just tried v1.35 again, using the coreos-installer and the same asynchronous error is reported.
Can anyone recommend a point of contact that I can msg to ask if v1.35 can be updated?

@pbatard
Copy link
Member

pbatard commented Sep 28, 2023

Does this mean we will have to wait for v1.36 to be released?

On the contrary, from where I stand, what it means that v1.35 is including the patch mentioned above and the reason some distros fail to boot is because they use an old GRUB version that relies on EFI_LOADER_DATA to be executable, which shouldn't be the case and which is what the patch fixed.

Ditsros that use newer GRUB versions (or at least distros that have applied patches from the recent GRUB mainline, since, very problematically, the GRUB project is sadly unable to release on a timely basis, which creates a huge amount of problems downstream) shouldn't have this issue, as can be evidenced by the fact that recent Debian ARM64 and other distros do not experience the synchronous exception issue.

Thus, if you run into this issue, you may have to pressure the maintainers of the Linux distro you use to update their GRUB codebase, to ensure that it works with EFI_LOADER_DATA regions not being executable. Because, if the issue is what is being referenced above, then there is nothing that a new release of the Raspberry Pi 4 UEFI firmware will fix, as it already includes the 2022 patch above (and, just in case this is what you have in mind, I don't think we want to start reverting cherry-picked EDK2 commits just to make some distros that use a somewhat obsolete version of GRUB work...).

@dustymabe
Copy link

@pbatard
Ditsros that use newer GRUB versions (or at least distros that have applied patches from the recent GRUB mainline, since, very problematically, the GRUB project is sadly unable to release on a timely basis, which creates a huge amount of problems downstream) shouldn't have this issue

I believe @millerthegorilla is using coreos-installer to install Fedora CoreOS so we're using the grub packages from Fedora. I could be wrong but usually Fedora is on the cutting edge of software. The sources for GRUB2 RPM are stored here.

@millerthegorilla can you report exactly what version of Fedora CoreOS you were installing so we can figure out what grub2 rpms were being used?

@stevedcc
Copy link

stevedcc commented Sep 28, 2023 via email

@dustymabe
Copy link

I was also installing CoreOs and my cluster says it has been up for 60 days

but did you use v1.35?

@stevedcc
Copy link

stevedcc commented Sep 28, 2023 via email

@millerthegorilla
Copy link
Author

Hi, I am using coreos-installer and v1.35 fails, whilst v1.34 works as expected.

@dustymabe
Copy link

Hi, I am using coreos-installer and v1.35 fails, whilst v1.34 works as expected.

yes, but what exact version of Fedora CoreOS are you using?

@stevedcc
Copy link

stevedcc commented Sep 28, 2023 via email

@ignic
Copy link

ignic commented Oct 5, 2023

In my case, when trying to install Fedora Server 39 beta, the installation works correctly, but when booting the new system it fails with v1.35. With v1.34 both the installation and the new system work as expected.

Doing the installation again with the inst.sdboot option (which stops using grub2 and uses systemd-boot instead, see link), then the new system boots without problems with v1.35.

@nomi-ramzan
Copy link

Is there any update on this error ?

@millerthegorilla
Copy link
Author

millerthegorilla commented Nov 14, 2023 via email

@jlinton
Copy link
Member

jlinton commented Dec 21, 2023

I suspect I just ran headlong into this with a recent rebase of my edk2 branch.

Try dropping to the shell and running grub directly rather than the default shim->grub sequence. That allowed me to boot, but I've now further broken things.

@jlinton
Copy link
Member

jlinton commented Dec 22, 2023

Right, so confirmation that it looks like rpm firmware that supports the EFI memory attributes protocol blows up with:

SOpen: Open '\EFI\BOOT\fbaa64.efi' Success
SetMemoryAttributes: BaseAddress == 0x33797000, Length == 0x1A000, Attributes == 0x4000
ClearMemoryAttributes: BaseAddress == 0x33797000, Length == 0x1A000, Attributes == 0x22000


Synchronous Exception at 0x000000003379B000
PC 0x00003379B000
PC 0x0000337EB288
PC 0x0000337EB338
PC 0x0000337EC1B4
PC 0x0000337E9030
PC 0x000039E9B480 (0x000039E94000+0x00007480) [ 1] DxeCore.dll
PC 0x00003390E25C (0x000033903000+0x0000B25C) [ 2] UiApp.dll
PC 0x000033913A98 (0x000033903000+0x00010A98) [ 2] UiApp.dll
PC 0x000036E7B4B8 (0x000036E66000+0x000154B8) [ 3] SetupBrowser.dll
PC 0x000036E71A1C (0x000036E66000+0x0000BA1C) [ 3] SetupBrowser.dll
PC 0x00003390BD98 (0x000033903000+0x00008D98) [ 4] UiApp.dll
PC 0x000039E9B480 (0x000039E94000+0x00007480) [ 5] DxeCore.dll
PC 0x000036E43F9C (0x000036E3D000+0x00006F9C) [ 6] BdsDxe.dll
PC 0x000036E4733C (0x000036E3D000+0x0000A33C) [ 6] BdsDxe.dll
PC 0x000039E9EDA8 (0x000039E94000+0x0000ADA8) [ 7] DxeCore.dll
PC 0x000000027040
PC 0x00000002716C
[ 1] /home/jlinton/rpi2/Build/RPi4/DEBUG_GCC5/AARCH64/MdeModulePkg/Core/Dxe/DxeMain/DEBUG/DxeCore.dll
[ 2] /home/jlinton/rpi2/Build/RPi4/DEBUG_GCC5/AARCH64/MdeModulePkg/Application/UiApp/UiApp/DEBUG/UiApp.dll
[ 3] /home/jlinton/rpi2/Build/RPi4/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/SetupBrowserDxe/SetupBrowserDxe/DEBUG/SetupBrowser.dll
[ 4] /home/jlinton/rpi2/Build/RPi4/DEBUG_GCC5/AARCH64/MdeModulePkg/Application/UiApp/UiApp/DEBUG/UiApp.dll
[ 5] /home/jlinton/rpi2/Build/RPi4/DEBUG_GCC5/AARCH64/MdeModulePkg/Core/Dxe/DxeMain/DEBUG/DxeCore.dll
[ 6] /home/jlinton/rpi2/Build/RPi4/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/BdsDxe/BdsDxe/DEBUG/BdsDxe.dll
[ 7] /home/jlinton/rpi2/Build/RPi4/DEBUG_GCC5/AARCH64/MdeModulePkg/Core/Dxe/DxeMain/DEBUG/DxeCore.dll

  X0 0x0000000034B23798   X1 0x00000000373D0018   X2 0x000000003379B000   X3 0x0000000000000000
  X4 0x0000000036F5D068   X5 0x0000000000000001   X6 0x0000000000000000   X7 0x0000000000000000
  X8 0x0000000000000002   X9 0x0000000000000001  X10 0x00000000337CA018  X11 0x0000000000000000
 X12 0x0000000000000002  X13 0x0000000000000002  X14 0x0000000000000001  X15 0x00000000000000FF
 X16 0x0000000036F562D0  X17 0x000000001EA68734  X18 0x0000000000000011  X19 0x000000003386A000
 X20 0x0000000000000000  X21 0x0000000034B23798  X22 0x000000003387E2F0  X23 0x0000000000000001
 X24 0x000000003387E000  X25 0x000000003387E3B8  X26 0x000000003387E3C0  X27 0x000000003387E3C8
 X28 0x000000003387E3D0   FP 0x000000003B3FEFF0   LR 0x00000000337EB288  

  V0 0xAFAFAFAFAFAFAFAF AFAFAFAFAFAFAFAF   V1 0xFFFFFF80FFFFFFD0 000000003B3FEE90
  V2 0x4F43213A4C4C4100 3635324148535F4D   V3 0x0000000000000000 0000000000000400
  V4 0x0000000100000000 0000000000000000   V5 0x4010040140100401 4010040140100401
  V6 0x0100000000000004 0100000000000004   V7 0x0000000000000000 0000000000000000
  V8 0x0000000000000000 0000001B00000004   V9 0x0000000000000000 0000000000000000
 V10 0x0000000000000000 0000000000000000  V11 0x0000000000000000 0000000000000000
 V12 0x0000000000000000 0000000000000000  V13 0x0000000000000000 0000000000000000
 V14 0x0000000000000000 0000000000000000  V15 0x0000000000000000 0000000000000000
 V16 0x0000000000000000 0000000000000000  V17 0x0000000000000000 0000000000000000
 V18 0x0000000000000000 0000000000000000  V19 0x0000000000000000 0000000000000000
 V20 0x0000000000000000 0000000000000000  V21 0x0000000000000000 0000000000000000
 V22 0x0000000000000000 0000000000000000  V23 0x0000000000000000 0000000000000000
 V24 0x0000000000000000 0000000000000000  V25 0x0000000000000000 0000000000000000
 V26 0x0000000000000000 0000000000000000  V27 0x0000000000000000 0000000000000000
 V28 0x0000000000000000 0000000000000000  V29 0x0000000000000000 0000000000000000
 V30 0x0000000000000000 0000000000000000  V31 0x0000000000000000 0000000000000000

  SP 0x000000003B3FEFF0  ELR 0x000000003379B000  SPSR 0x60000209  FPSR 0x00000000
 ESR 0x8600000F          FAR 0x000000003379B000

 ESR : EC 0x21  IL 0x1  ISS 0x0000000F

Instruction abort: Permission fault, third level

Stack dump:
  000003B3FEEF0: 0000000000000001 000000003387E000 000000003387E3B8 000000003387E3C0
  000003B3FEF10: 000000003387E3C8 000000003387E3D0 000000003B3FEF60 00000000337B11F0
  000003B3FEF30: 000000003B3FF038 00000000337A6000 00000000337A6009 000000003385862A
  000003B3FEF50: 00000000337AF000 00000000000000DF 000000003B3FEF80 D31AAEC05D36E324
  000003B3FEF70: EB867F1A51BB14DB 000000006CA176BC ABA93E539C790EF5 2852AEF3743D7964
  000003B3FEF90: A2F2674AB971207B 1125D02FD20A3F40 0000000000000000 0000000000019000
  000003B3FEFB0: 0000000000004000 0000000000000400 000010000007EFF0 00000000337B11A0
  000003B3FEFD0: 00000000337B1148 00000000337B1140 0000000000000010 00000000337B1098
> 000003B3FEFF0: 000000003B3FF050 00000000337EB338 0000000000000000 0000000033854000
  000003B3FF010: 000000003386A000 0000000034B23798 00017F903B3FF050 000000003379B000
  000003B3FF030: 0000000033797000 000000000000001A 0000000033900C18 00000000337B1018
  000003B3FF050: 000000003B3FF0B0 00000000337EC1B4 0000000000000000 00000000337E9428
  000003B3FF070: 000000003387E39F 0000000034B23798 000000003B3FF120 000000003387E000
  000003B3FF090: 00000000351B6AC0 000000003525E030 0000000034B2CEA0 0000000035C605A0
  000003B3FF0B0: 000000003B3FF150 00000000337E9030 0000000000000001 000000003392B000
  000003B3FF0D0: 0000000035C74898 0000000000000001 0000000000000000 0000000000000001

@jlinton
Copy link
Member

jlinton commented Jan 11, 2024

This is a known bug, and has several downstream bugs associated with it, the upstream bug is: rhboot/shim#614

@jlinton
Copy link
Member

jlinton commented Jan 22, 2024

Although the immediate crash can be fixed (at least in my testing) with https://src.fedoraproject.org/rpms/shim-unsigned-aarch64/pull-request/2 which is already merged to mainline shim. That doesn't mean the alignments are correct, only that at least fedora 4k boots with that patch.

@garybuhrmaster
Copy link

Although the immediate crash can be fixed (at least in my testing) with https://src.fedoraproject.org/rpms/shim-unsigned-aarch64/pull-request/2 which is already merged to mainline shim. That doesn't mean the alignments are correct, only that at least fedora 4k boots with that patch.

FWIW, one of the RH bugzillas associated with this is: https://bugzilla.redhat.com/show_bug.cgi?id=2259264 and there is a shim-aa64 package which has been built with the updated shim source. It should be in the fedora updates repositories soon(ish), although it can be downloaded directly from fedora koji: https://koji.fedoraproject.org/koji/buildinfo?buildID=2420877 for those that wish/need to test now.

@pm4rcin
Copy link

pm4rcin commented Apr 23, 2024

I've tried putting new UEFI after today's update to Fedora IOT 40 but synchronous error is still present upon reboot with different address (I guess since binary has changed a bit). Can anyone check and confirm because I thought fedora should work or I have to install something? @garybuhrmaster could you tell something about this?
Compose version: Fedora-IoT-40-20240422.3
Mem address: 3641D

@garybuhrmaster
Copy link

@garybuhrmaster could you tell something about this?

I have not tried a new install from a recent compose, but have validated that using the new shim and using v1.35 works with an existing RPi 4 EFI booting system (server base, not IoT). You may need to check to be sure that the IOT compose is using the new shim.

@pm4rcin
Copy link

pm4rcin commented Apr 26, 2024

@garybuhrmaster does it have to be unsigned shim or the "normal" shim-aa64 should be good too?
EDIT: Tried today and shim-aa64 is correct version but still fails. The address was 3641D000 on debug firmware.

@pm4rcin
Copy link

pm4rcin commented Aug 28, 2024

It's time to test again since there's new release. I'll try in the coming days. Anyone tested already?

@pm4rcin
Copy link

pm4rcin commented Sep 5, 2024

0x00000000363D2000 it's still the same sadly

@pm4rcin
Copy link

pm4rcin commented Sep 5, 2024

It could be that it's GRUB version being used in Fedora 40. It's 2.06 with some patches but I don't want to check it because there's a lot of them. Maybe with Fedora 41 that has grub 2.12 things will change because that's the only thing that can stop it as all the other things were updated.

@dustymabe
Copy link

Feel free to test with rawhide (Fedora 42) or branched (Fedora 41) builds now to test your theory: https://builds.coreos.fedoraproject.org/browser?stream=rawhide&arch=aarch64

@pm4rcin
Copy link

pm4rcin commented Sep 19, 2024

@dustymabe just tried rebasing IOT to 41 and it's still the same situation at the same address.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests