Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement compatibility with DKMS (Nvidia, etc.) #1091

Open
dustymabe opened this issue Nov 7, 2017 · 26 comments

Comments

Projects
None yet
@dustymabe
Copy link
Collaborator

commented Nov 7, 2017

rpm-ostree version info:

  centos-atomic-host:centos-atomic-host/7/x86_64/standard
                Version: 7.1708 (2017-09-15 15:32:30)
                 Commit: 33b4f0442242a06096ffeffadcd9655905a41fbd11f36cd6f33ee0d974fdb2a8
           GPGSignature: 1 signature
                         Signature made Fri 15 Sep 2017 05:17:39 PM UTC using RSA key ID F17E745691BA8335
                         Good signature from "CentOS Atomic SIG <security@centos.org>"

When installing nvidia kmod it fails:

# rpm-ostree install epel-release && reboot
#
# cat <<EOF > /etc/yum.repos.d/nvidia.repo
[nvidia]
baseurl=https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/
enabled=1
gpgcheck=0
EOF
#
# rpm-ostree install nvidia-kmod
....
....
Resolving dependencies... done
Overlaying... done

Creating symlink /var/lib/dkms/nvidia/384.81/source ->
                 /usr/src/nvidia-384.81

DKMS: add completed.

Creating symlink /var/lib/dkms/nvidia/384.81/source ->
                 /usr/src/nvidia-384.81

DKMS: add completed.

Creating symlink /var/lib/dkms/nvidia/384.81/source ->
                 /usr/src/nvidia-384.81

DKMS: add completed.
error: Running %post for nvidia-kmod: Executing bwrap(/usr/nvidia-kmod.post): Child process exited with code 8

From the journal:

Nov 07 22:04:30 vanilla-c7atomic rpm-ostree[13001]: /usr/share/info/dir: Read-only file system
Nov 07 22:04:30 vanilla-c7atomic rpm-ostree[13001]: /usr/share/info/dir: Read-only file system
Nov 07 22:04:30 vanilla-c7atomic rpm-ostree[13001]: mkdir: cannot create directory ‘/var/lib/dkms’: Read-only file system
Nov 07 22:04:30 vanilla-c7atomic rpm-ostree[13001]: ln: failed to create symbolic link ‘/var/lib/dkms/nvidia/384.81/source’: No such file or directory
Nov 07 22:04:31 vanilla-c7atomic rpm-ostree[13001]: mkdir: cannot create directory ‘/var/lib/dkms’: Read-only file system
Nov 07 22:04:31 vanilla-c7atomic rpm-ostree[13001]: ln: failed to create symbolic link ‘/var/lib/dkms/nvidia/384.81/source’: No such file or directory
Nov 07 22:04:31 vanilla-c7atomic rpm-ostree[13001]: ls: cannot access /var/lib/dkms/nvidia/384.81/source: No such file or directory
Nov 07 22:04:31 vanilla-c7atomic rpm-ostree[13001]: Error! The directory /var/lib/dkms/nvidia/384.81/source/
Nov 07 22:04:31 vanilla-c7atomic rpm-ostree[13001]: does not appear to have module source located within it.  Build halted.
Nov 07 22:04:31 vanilla-c7atomic rpm-ostree[13001]: mkdir: cannot create directory ‘/var/lib/dkms’: Read-only file system
Nov 07 22:04:31 vanilla-c7atomic rpm-ostree[13001]: ln: failed to create symbolic link ‘/var/lib/dkms/nvidia/384.81/source’: No such file or directory
Nov 07 22:04:31 vanilla-c7atomic rpm-ostree[13001]: ls: cannot access /var/lib/dkms/nvidia/384.81/source: No such file or directory
Nov 07 22:04:31 vanilla-c7atomic rpm-ostree[13001]: Error! The directory /var/lib/dkms/nvidia/384.81/source/
Nov 07 22:04:31 vanilla-c7atomic rpm-ostree[13001]: does not appear to have module source located within it.  Build halted.
Nov 07 22:04:31 vanilla-c7atomic rpm-ostree[13001]: Txn /org/projectatomic/rpmostree1/centos_atomic_host failed: Running %post for nvidia-kmod: Executing bwrap(/usr/nvidia-kmod.post): Child process exited with code 8

The scriptlets are:

-bash-4.2# rpm -qp nvidia-kmod-384.81-2.el7.x86_64.rpm --scripts
warning: nvidia-kmod-384.81-2.el7.x86_64.rpm: Header V3 RSA/SHA512 Signature, key ID 7fa2af80: NOKEY
postinstall scriptlet (using /bin/sh):
dkms add --rpm_safe_upgrade -m nvidia -v 384.81
dkms build -m nvidia -v 384.81
dkms install --force -m nvidia -v 384.81
preuninstall scriptlet (using /bin/sh):
dkms remove --rpm_safe_upgrade -m nvidia -v 384.81 --all || :
postuninstall scriptlet (using /bin/sh):
if [ "$1" -eq "0" ] ; then
    dracut -f /boot/initramfs-$(uname -r).img $(uname -r)
fi

@cgwalters cgwalters changed the title %post compat: installing nvidia-kmod fails Implement compatibility with DKMS (Nvidia, etc.) Nov 7, 2017

@cgwalters

This comment has been minimized.

Copy link
Member

commented Nov 7, 2017

Yeah, there's a huge world of stuff here. Supporting dkms is going to require a lot of work.

@Conan-Kudo

This comment has been minimized.

Copy link

commented Dec 4, 2017

@cgwalters Would this also cover supported akmods as a mechanism too? Or would that be easier than DKMS?

@cgwalters

This comment has been minimized.

Copy link
Member

commented Dec 4, 2017

Would this also cover supported akmods as a mechanism too? Or would that be easier than DKMS?

I have no idea honestly without diving in a lot. I suspect they're going to be mostly equivalent but it's really just a wild guess.

@Conan-Kudo

This comment has been minimized.

Copy link

commented Dec 29, 2017

@cgwalters The Akmods mechanism makes kmod RPMs for the kernel packages installed and installs them, rather than building kmods for the running kernel and just slotting them in.

This was recently integrated into Fedora proper.

@cgwalters

This comment has been minimized.

Copy link
Member

commented May 16, 2018

@jlebon

This comment has been minimized.

@cgwalters

This comment has been minimized.

Copy link
Member

commented Aug 24, 2018

I said this elsewhere but to repeat here; I think we could pretty easily implement a generic hook in rpm-ostree upgrade that calls out to a user-specified external process, which would accept as input the new ostree, and could perform arbitrary modification (overlayfs), which would then be included in the final commit.

Where things bifurcate a lot here is - do you install the equivalent of dnf builddep kernel on the host? Let's call this option #1. If you do...that's a whole lot of layered stuff and is (IMO) against the larger goals. Or, option #2 - does the hook do what the atomic-wireguard stuff does and basically install the kernel+builddeps in a container? There's some nontrivial infrastructure to hook up rpm-ostree + build container in such a way - do we ship it as a package/container?

Option #3 is to have rpm-ostree itself create a container, reusing the same packages. This would be totally possible, it's a system-level variant of the rpm-ostree ex container support that we have today. But it'd be some nontrivial work in rpm-ostree, and increase our overlap with general-purpose container tools.

What would mix both #2 and #3 is a podman/containers-storage "backend" that knows how to call out to rpm-ostree to do the heavy lifting for the filesystem builds.

@alexhaydock

This comment has been minimized.

Copy link

commented Aug 24, 2018

Perhaps it's not quite the right issue to discuss this in, but I'd just like to raise the concern that ideally whatever solution for is proposed for this issue should not make it too difficult for an end-user to sign the resulting akmods-built modules for Secure Boot using their own keypair.

As this issue identifies, losing easy access to ZFS, VirtualBox and the NVidia Drivers is a major concern for new users to Atomic/Silverblue, but I think it's also a concern if a user is only able to access the above at the cost of disabling Secure Boot.

@znmeb

This comment has been minimized.

Copy link

commented Sep 7, 2018

@alexhaydock Not just new users! I've been a Linux workstation / laptop user since Red Hat 6.2 and if a distro won't support my AMD GCN 1.1 card or WiFi hotspot or HP Omen laptop with an NVidia 1050Ti, I'll run a different distro. Secure Boot is over-rated; if I have to disable it to use my machine, that's exactly what I'll do.

@alexlarsson

This comment has been minimized.

Copy link
Member

commented Nov 13, 2018

So, I took a short initial look at this from the perspective of supporting nvidia in silverblue. There are two major suppliers of the nvidia driver in rpm form, rpmfusion and negativo17. rpmfusion seem to only support akmod, whereas negativo17 does both akmod and dkms.

I didn't know any details about dkms or akmod other than the fact that they auto-built drivers before this, so i took a quick look at them:

akmod

You install akmod-nvidia, it depends on akmods (in fedora) and contains just:

/usr/src/akmods/nvidia-kmod-396.54-1.fc28.src.rpm
/usr/src/akmods/nvidia-kmod.latest

Then akmods itself has a few hooks (boot service, rpm transaction hook, optionally shutdown service) that gets called such that we can rebuild the srpm whenever there is a new kernel, generating a kernel-specific version of the bundled src.rpm. For example, the above srpm + kernel 4.18.10-200 generates the built rpm kmod-nvidia-4.18.10-200.fc28.x86_64-396.54-1.fc28.x86_64. This is cached in /var/cache/akmods/nvidia/ and installed. That rpm then contains the driver module:

/usr/lib/modules/4.18.10-200.fc28.x86_64/extra/nvidia/nvidia.ko.xz

This seems very nice, simple and rpm-focused, and the akmods program is a 500 line shellscript.

dkms

dkms is a more generic framework and works on multiple distros. As such it has its own database of stuff in /var/lib/dkms, matched with sources in /usr/src which is updated with the "dkms" helper. The dkms-nvidia package contains the sources for the module extracted in /usr/src/nvidia-396.54, as well as a dkms.conf file telling dkms how to build the sources.

The %post of the dkms-nvidia rpm then does:

dkms add -m %{dkms_name} -v %{version} -q || :
dkms build -m %{dkms_name} -v %{version} -q || :
dkms install -m %{dkms_name} -v %{version} -q --force || :

The first one sets up a symlink from /var/lib/dkms/nvidia/396.54/source to /usr/src/nvidia-396.54, the second builds the module for the current kernel and puts it in /var/lib/dkms/nvidia/396.54/4.18.10-200.fc28.x86_64 and the last one then copies the result from that into /lib/modules/4.18.10-200.fc28.x86_64, where it sits unowned in the rpm db.

Additionally, dkms has a hooks similar to akmods (boot service, rpm transaction hook) that runs the build and install parts for the new kernel.

what works for rpm-ostree

dkms is not really a great fit for rpm-ostree with its reliance of stuff in /var, and non-rpm-tracked module files. akmods seems like a pretty clean to me, and fits the overall rpm based scheme of rpm-ostree, but building on the live system or in the rpm transaction hook clearly doesn't work.

However, they way akmods work is that you create a kmod-nvidia srpm with full sources, but when built normally just generates a akmod-nvidia rpm (containing a copy of the srpm, which is later rebuild targeting a specific kernel). This means that the yum repo for the akmod has a .src.rpm for the driver which is easy for rpm-ostree to get at via dnf.

So, the way I propose this would work is that you can layer srpms as well as rpms:

rpm-ostree install-from-source kmod-nvidia

This would mean the same as install kmod-nvidia, except it would dnf download --source the srpm, build that in a container, and then layer the resultant rpm as it would usually layer it.

There are some special things we need to to when building the srpm. For instance we need to set the kernels rpm macro to the kernel version in the ostree image to make the akmod srpm build a targeted kernel, and we need to ensure that the kernel and kernel-devel in the build container matches the kernel in the ostree image. Still, this strikes me as pretty simple stuff.

@alexlarsson

This comment has been minimized.

Copy link
Member

commented Nov 13, 2018

I guess the question is, do we take a dependency on podman & co for the build container, or do we use rpm-ostree itself to construct the image for building the srpm, deploy it to a termporary location and spawn it via bwrap? @cgwalters ?

@cgwalters

This comment has been minimized.

Copy link
Member

commented Nov 13, 2018

So, the way I propose this would work is that you can layer srpms as well as rpms:

Thanks for the analysis; you went a bit deeper into the details of both akmod/dkms than I had before. But some of this was already noted in #1091 (comment) right?

I like the idea, a whole lot of implications. Actually in general...I would really love to also support a more "build from source" model on top of rpm-ostree (e.g. "apply this patch to systemd", or "build systemd from this git commit"). There's a lot of prior art here; obviously libostree was split out of gnome-continuous which has such a model. Such a system could be built on top of something that built srpms, although I lean a bit towards skipping the srpm path and orienting more towards at least dist-git, as well as direct from upstream git.

But even this though opens up a question as whether we would really want the build tools on the host or in a container.

or do we use rpm-ostree itself to construct the image for building the srpm,

This would probably block on #1180

The "build using container" was already prototyped out here #1091 (comment)

Big picture...I lean a bit towards the container path. But I am not likely to hack on this myself in the near future (even though my laptop has an nvidia card, I don't play games and nouveau is OK for me).

@alexlarsson

This comment has been minimized.

Copy link
Member

commented Nov 13, 2018

So, the way I propose this would work is that you can layer srpms as well as rpms:

Thanks for the analysis; you went a bit deeper into the details of both akmod/dkms than I had before. But some of this was already noted in #1091 (comment) right?

Yeah, I just have a primary interest in the specific nvidia case, so i wanted to dump my research here.

I like the idea, a whole lot of implications. Actually in general...I would really love to also support a more "build from source" model on top of rpm-ostree (e.g. "apply this patch to systemd", or "build systemd from this git commit"). There's a lot of prior art here; obviously libostree was split out of gnome-continuous which has such a model. Such a system could be built on top of something that built srpms, although I lean a bit towards skipping the srpm path and orienting more towards at least dist-git, as well as direct from upstream git.

I agree that this would be nice. However, there are two complications to this.

First of all rpm-ostree needs to have a way to specify how to build the modifications and store these in the ostree metadata next to where the package layer is stored. Punting this to srpm means all we need is to store the srpm name. Of course, one could punt specifiying this to some other highlevel method, like "run the container image named foo", then all you need to do is store the image name in the metadata.

Secondly, there needs to be a way to extract the modifications of the new build into the final ostree image. With rpm the build and the install are automatically separated, whereas in a container situation they might not be. For example, you will be building in a container that has a /usr with compilers, etc, but then you want to install into a different /usr.

I can imagine solving this. For example, you could have the newly composed ostree image checked out somewhere, and then you can use rofiles-fuse to get a safe version of that, which you mount as a volume in the build container, and then when you build the app you set DESTDIR to the volume during install. Should work, but it is a bunch of extra work you get for free from rpmbuild.

But even this though opens up a question as whether we would really want the build tools on the host or in a container.

or do we use rpm-ostree itself to construct the image for building the srpm,

This would probably block on #1180

The "build using container" was already prototyped out here #1091 (comment)

Big picture...I lean a bit towards the container path. But I am not likely to hack on this myself in the near future (even though my laptop has an nvidia card, I don't play games and nouveau is OK for me).

One complexity of using the container path here is that you have to somehow ensure that the build container matches the ABI of the final ostree image. For example, if we're building kernel modules we need to have the right kernel-devel header. However, if you're building arbitrary userspace code you need to match the full userspace ABI. I.e. if you build against a library it needs to be the same version of the library and built in the same way, you need same c++ compiler ABIs, etc. If we automatically compose the build environment from the same packages image as the ostree image this is a lot easier to guarantee.

@Conan-Kudo

This comment has been minimized.

Copy link

commented Nov 14, 2018

I lean a bit towards skipping the srpm path and orienting more towards at least dist-git, as well as direct from upstream git.

This is problematic for things like akmods, which rely on being able to build from a source package. In addition, you can't guarantee git. You can, however, guarantee a srpm.

One complexity of using the container path here is that you have to somehow ensure that the build container matches the ABI of the final ostree image. For example, if we're building kernel modules we need to have the right kernel-devel header. However, if you're building arbitrary userspace code you need to match the full userspace ABI. I.e. if you build against a library it needs to be the same version of the library and built in the same way, you need same c++ compiler ABIs, etc. If we automatically compose the build environment from the same packages image as the ostree image this is a lot easier to guarantee.

Could we do something similar to the btrfs seed+sprout thing to support a transparent layer that is invoked as a container to do these things? The other, more practical issue is that we can't guarantee that the matching kernel packages are going to be present in the repo at the time this happens. So what do we do then?

@znmeb

This comment has been minimized.

Copy link

commented Nov 14, 2018

@cgwalters "Big picture...I lean a bit towards the container path. But I am not likely to hack on this myself in the near future (even though my laptop has an nvidia card, I don't play games and nouveau is OK for me)."

I have an HP Omen with Intel graphics and an NVidia 1050Ti. nouveau black-screens on every current Linux distro I've tried; when I do an install I have to blacklist nouveau and bring the machine up with just the Intel graphics. Then I add the NVidia drivers after the install is finished.

How the drivers get built doesn't matter to me - if it takes a container that only occupies resources during an install and has to do a moderate-sized compile, that's no big deal. Not having the NVidia drivers is a show-stopper. So I like the source RPM install idea a lot. ;-)

@mskarbek

This comment has been minimized.

Copy link

commented Nov 14, 2018

An alternative solution would be to provide kmod packages. It doesn't solve the main issue by switching the responsibility to package maintainers but it is the only thing that doesn't involve some drastic changes/heavy development and basically just works™. I have switched form DKMS to kmod packages for ZFS on Linux.

@alexlarsson

This comment has been minimized.

Copy link
Member

commented Nov 16, 2018

@mskarbek kmod packages are not special in any way and probably works already. However, the problem with them is that they need to be updated in lock-step with new kernel updates, and they stop working the second you run a non-standard kernel. In practice this means that people need something like dkms to be guaranteed to have an up-to-date nvidia driver.

@matthiasclasen

This comment has been minimized.

Copy link

commented Mar 5, 2019

This should be working in F30 Silverblue

@znmeb

This comment has been minimized.

Copy link

commented Mar 5, 2019

@matthiasclasen Great news! How do I get F30 Silverblue to test? I have a pretty short window of availability this coming week but can squeeze this in.

@alexlarsson

This comment has been minimized.

Copy link
Member

commented Mar 5, 2019

@jamescassell

This comment has been minimized.

Copy link

commented Mar 6, 2019

Would be awesome! This is what forced me back to the traditional RPM setup after trying out Silverblue.

@alexlarsson

This comment has been minimized.

Copy link
Member

commented Mar 6, 2019

@whs-dot-hk

This comment has been minimized.

Copy link

commented Mar 29, 2019

This should be working in F30 Silverblue

Hi @matthiasclasen, could you kindly explain why?

@matthiasclasen

This comment has been minimized.

Copy link

commented Mar 29, 2019

Not sure I understand the question. I should be working because the necessary changes were merged

@rm-happens

This comment has been minimized.

Copy link

commented Apr 1, 2019

FWIW, DKMS support is required for VirtualBox and ZFS. In addition, a ZFS root filesystem install requires the ability to run grub2-mkconfig to generate a new grub.conf, and to run dracut to generate a new initramfs. Running grub2-install to install grub is also needed on systems with legacy (non-EFI) BIOS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.