Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #2474: add AMD firmware to OPAL PBA if necessary #2507

Merged
merged 2 commits into from Nov 5, 2020

Conversation

OliverO2
Copy link
Contributor

@OliverO2 OliverO2 commented Nov 3, 2020

Pull Request Details:
  • Type: Bug Fix

  • Impact: Normal

  • Reference to related issue (URL): Opal PBA does not boot on AMD APU/GPU systems: Missing firmware for amdgpu module #2474

  • How was this pull request tested?

    As AMD graphics hardware was not available on the test system, tests were run by mocking as follows:

    • Add these lines in usr/share/rear/lib/mkopalpba-workflow.sh after SourceStage "prep":

          LogPrintError "FIRMWARE_FILES=<${FIRMWARE_FILES[*]}>"
          exit
    • Run usr/sbin/rear mkopalpba with default configuration (Intel graphics), expect result: FIRMWARE_FILES=<no>.

    • In default.conf use these single-line configuration changes:

      • FIRMWARE_FILES=( yes ), expect result: FIRMWARE_FILES=<no>.
      • FIRMWARE_FILES=( a b c ), expect result: FIRMWARE_FILES=<a b c>.
      • OPAL_PBA_FIRMWARE_FILES=( one two ), expect result: FIRMWARE_FILES=<one two>.
    • In usr/share/rear/conf/GNU/Linux.conf, change grep expression AMD to Intel, expect result: FIRMWARE_FILES=<amdgpu>.

  • Brief description of the changes in this pull request:

    OPAL PBA images only:

    • Add a required firmware file when AMD graphics hardware is present.
    • Allow setting a PBA-specific firmware configuration via the OPAL_PBA_FIRMWARE_FILES configuration variable.

    The issue report Opal PBA does not boot on AMD APU/GPU systems: Missing firmware for amdgpu module #2474 indicates that a single firmware file is all that is required to make a PBA boot with AMD graphics hardware. If field testing shows that more changes are necessary, this PR's solution should be easy to extend.

# lspci output is expected to look similar to this:
# 00:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 98e4 (rev 81)
if type -p lspci >/dev/null && lspci | grep --quiet ' VGA .*AMD'; then
OPAL_PBA_FIRMWARE_FILES+=( amdgpu )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OPAL_PBA_FIRMWARE_FILES+=( amdgpu ) will lead to
FIRMWARE_FILES=( amdgpu ) but that will not work because
find /lib*/firmware -ipath 'amdgpu' will find nothing
cf. the code in build/GNU/Linux/420_copy_firmware_files.sh
and see the FIRMWARE_FILES description in default.conf:

# The FIRMWARE_FILES array contains filename globbing patterns
# that are used as '-ipath' arguments for 'find' calls to find
# firmware files in the /lib*/firmware/ directories that will
# get included in the rescue/recovery system for example like
# FIRMWARE_FILES=( '*rtl*' '*nvidia*' '*radeon*' )

so what should work is
OPAL_PBA_FIRMWARE_FILES+=( '*amdgpu*' )
because with that
find /lib*/firmware -ipath '*amdgpu*' finds firmware files
(at least on my openSUSE Leap 15.1 system).

Furthermore if I understand #2474 correctly
the /lib/firmware/amdgpu/* firmware files plus the amdgpu kernel module
are needed in any case to boot the ReaR recovery system
on systems with AMD-GPU graphics hardware
so I think we should better have

# Explicitly add amdgup kernel module and its firmware files
# to boot the recovery system on AMD graphics hardware (if present)
# in particular for TCG Opal pre-boot authentication (PBA) images
# that do not include firmware files by default
# cf. https://github.com/rear/rear/issues/2474
# lspci output is expected to look similar to this:
#   00:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 98e4 (rev 81)
if type -p lspci >/dev/null && lspci | grep --quiet ' VGA .*AMD'; then
    MODULES+=( amdgpu )
    FIRMWARE_FILES+=( '*amdgpu*' )
    OPAL_PBA_FIRMWARE_FILES+=( '*amdgpu*' )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that excellent discovery! Indeed, amdgpu being a firmware directory will not be discovered the way I thought it would.

Copy link
Member

@jsmeix jsmeix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsmeix
Copy link
Member

jsmeix commented Nov 4, 2020

@OliverO2
as always thank you so much for your continuous valuable contributions to ReaR
that help so much to make ReaR step by step better and better.

I think it is not right that a single firmware file is all that is required
to make a system boot with AMD graphics hardware because on my
openSUSE Leap 15.1 system find /lib*/firmware -ipath '*amdgpu*'
lists more than 300 firmware files and their names indicate that
each kind of AMD GPU chips has and needs its own matching firmware
which is what I would expect for all those many different AMD GPU chips,
cf. https://en.wikipedia.org/wiki/Radeon

On my openSUSE Leap 15.1 system there are 24MB of AMD GPU firmware files:

# du -hs /lib/firmware/amdgpu/
24M     /lib/firmware/amdgpu/

I hope this is not too much for a PBA?

@jsmeix jsmeix self-assigned this Nov 4, 2020
@jsmeix jsmeix added the enhancement Adaptions and new features label Nov 4, 2020
@jsmeix jsmeix added this to the ReaR v2.7 milestone Nov 4, 2020
@OliverO2
Copy link
Contributor Author

OliverO2 commented Nov 4, 2020

@jsmeix
Thank you for the thorough review, it seems to be a bit more tricky than I had hoped for.

Note that all of the following only applies to PBA images. Standard ReaR images will contain the amdgpu module and all firmware by default.

The PBA image on my Ubuntu 20.04 desktop is about 46 MB (du -hs). AMD firmware looks like this:

$ du -hs /lib/firmware/amdgpu
33M     /lib/firmware/amdgpu

On Ubuntu 20.04 the amdgpu firmware directory contains 336 files for 30 different GPU models. While we do not know which of those might trigger the boot-time freeze, at least some AMD GPUs seem to be affected:

Without this package installed, poor 2D/3D performance is commonly experienced. Some GPUs may require firmware to function properly at all.

The spec. allows a PBA size up to 128 MB. So we could just throw in that entire firmware directory here to avoid boot-time freezes. Downside: Slows down PBA uploading to the device (can take minutes).

Additional notes:

  1. We should not change FIRMWARE_FILES: By default (non-PBA), it already includes all available files. If the user chose to restrict firmware files, we should not override this choice by auto-configuration.

  2. With respect to modules, I don't think we need to change anything (PBA or non-PBA):

    • default.conf contains MODULES=( 'all_modules' ), which should include the amdgpu module automatically.
    • usr/share/rear/prep/OPALPBA/Linux-i386/001_configure_workflow.sh contains MODULES=( 'loaded_modules' ) which apparently also includes the amdgpu module, otherwise the issue Opal PBA does not boot on AMD APU/GPU systems: Missing firmware for amdgpu module #2474 would not have occured as described (the kernel would have never tried to load AMD firmware).

@OliverO2
Copy link
Contributor Author

OliverO2 commented Nov 4, 2020

With 0a1ba2f on Ubuntu 20.04 desktop, the PBA size is

  • 48 MB for systems without AMD GPU,
  • 59 MB for systems with AMD GPU.

Looks like we get a good compression ratio on these files in initrd.cgz.

@jsmeix
Copy link
Member

jsmeix commented Nov 5, 2020

@OliverO2

thank you for your explanatory reply.
It helped me a lot to avoid a mistake.

I fully agree with your reasoning in
#2507 (comment)
therein in particular

We should not change FIRMWARE_FILES:
By default (non-PBA), it already includes all available files.
If the user chose to restrict firmware files, we should not override this choice by auto-configuration.

Yesterday I was too fast with my (false) conclusion that because of

/lib/firmware/amdgpu/* firmware files plus the amdgpu kernel module
are needed in any case to boot the ReaR recovery system
on systems with AMD-GPU graphics hardware

we should have in ReaR some hardcoded

if type -p lspci >/dev/null && lspci | grep --quiet ' VGA .*AMD'; then
    MODULES+=( amdgpu )
    FIRMWARE_FILES+=( '*amdgpu*' )

My conclusion is wrong because it would be against what I always demand:
"Final power to the user!"

In contrast with such hardcoded things it would be ReaR that has the final power
what gets included in the recovery system regardless what the user may have
specified in his etc/rear/local.conf.

@jsmeix jsmeix requested a review from a team November 5, 2020 09:01
@jsmeix
Copy link
Member

jsmeix commented Nov 5, 2020

@rear/contributors
if there are no objections I would like to merge it today afternoon.

@jsmeix
Copy link
Member

jsmeix commented Nov 5, 2020

@OliverO2
I think my argument in
#2507 (comment)
also applies to

if type -p lspci >/dev/null && lspci | grep --quiet ' VGA .*AMD'; then
    OPAL_PBA_FIRMWARE_FILES+=( '*/amdgpu/*' )
fi

which hardcodedly adds amdgpu firmware files regardless what
the user has specified for OPAL_PBA_FIRMWARE_FILES in his local.conf.

So I wonder if not something like

if type -p lspci >/dev/null && lspci | grep --quiet ' VGA .*AMD'; then
    # Add amdgup firmware files unless the user has specified to not add any firmware files:
    is_false "$OPAL_PBA_FIRMWARE_FILES" || OPAL_PBA_FIRMWARE_FILES+=( '*/amdgpu/*' )
fi

would be better?

I know this is probably somewhat nitpicking or even over the top
but I like to understand how OPAL_PBA_FIRMWARE_FILES is meant to be used.

@OliverO2
Copy link
Contributor Author

OliverO2 commented Nov 5, 2020

@jsmeix
Yes, in general you are right that OPAL_PBA_FIRMWARE_FILES+=... always extends the user's configuration. In this respect it works more like COPY_AS_IS and others in usr/share/rear/conf/GNU/Linux.conf.

If we would like to give the user more control, supporting just a "no firmware" decision seems insufficient. More likely an expert user would want to include some firmware as in

OPAL_PBA_FIRMWARE_FILES=( '*/amdgpu/vega20*' )

To support this, we could choose to enable AMD firmware auto-detection only if OPAL_PBA_FIRMWARE_FILES is empty. The downside would be that if a user wanted to include some unrelated firmware like */intel-ucode/*, the user would also have to remember dealing with AMD GPU quirks, probably without knowing.

The question is, given that

  • the added weight of AMD firmware files is not that significant and
  • those files are only present when an AMD GPU is detected, and
  • the probable outcome of not correctly including AMD firmware is a non-booting system,

is it worth it?

@jsmeix
Copy link
Member

jsmeix commented Nov 5, 2020

@OliverO2
I will merge it as is because in its current state it is at least "good enough".
If further enhancements are needed we can add them as actually needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adaptions and new features fixed / solved / done
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants