Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

centos 7: dkms strikes again #3801

Closed
dmaziuk opened this issue Sep 18, 2015 · 52 comments
Closed

centos 7: dkms strikes again #3801

dmaziuk opened this issue Sep 18, 2015 · 52 comments
Labels
Type: Building Indicates an issue related to building binaries
Milestone

Comments

@dmaziuk
Copy link

dmaziuk commented Sep 18, 2015

This is consistent on all our c7 boxen: kernel update kernel-3.10.0-229.14.1.el7.x86_64 does not trigger a dkms rebuild during install or subsequent reboot. Whether zfs/spl rpms are updated at the same time or not.

A related problem is it doesn't throw you down to single-user login when zfs pool fails to init, like when an fstab failsystem fails to mount. Then init will happily try to start daemons that use your zfs filesystems.

@rgmiller
Copy link

For what it's worth, this one has bitten me, too. I'll be happy to test any fixes when they're ready.

@dmaziuk
Copy link
Author

dmaziuk commented Sep 21, 2015

I think I'm going to try and see if mountpoint=legacy works because the real problem here is systems coming up without zpool.

@dmaziuk
Copy link
Author

dmaziuk commented Sep 21, 2015

.. and that doesn't work so we're pretty much screwed. The only known workaround at this point is to run dkms install -k kernel.version before reboot and hope for the best.

@behlendorf behlendorf added this to the 0.7.0 milestone Sep 22, 2015
@behlendorf behlendorf added Type: Building Indicates an issue related to building binaries Bug - Minor labels Sep 22, 2015
@behlendorf
Copy link
Contributor

I'm not sure why this isn't being triggered for CentOS 7. I'd be happy to apply a fix if someone has the time to look in to it and determine why.

That said, I may have a better solution available for CentOS 6/7 users. My suggestion would be to switch to the KMOD repository. These are binary packages for CentOS with weak-modules support which rely on the stable kABI. You can install them once and they will work with any of the stock CentOS kernels, no need to rebuild.

You can install them from the official repository but I'd suggest cleanly removing the DKMS version first. As always let me know if you hit any rough edges. ZFS still uses a few more symbols than exist in the stable kABI so it's possible things might break if one of those change. This is the main reason I didn't mention the KMOD repository before. However, thus far I haven't observed any issues due to this.

[zfs-kmod]
name=ZFS on Linux for EL $releasever (KMOD)
baseurl=http://archive.zfsonlinux.org/epel/$releasever/kmod/$basearch/
enabled=0
metadata_expire=7d
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-zfsonlinux

@dmaziuk
Copy link
Author

dmaziuk commented Sep 22, 2015

At a guess you might have a race condition during yum install: zfs is updated before the kernel so modules don't get built. It doesn't explain why they don't get built on boot -- that used to work at least sometimes though I don't remember seeing that on centos 7 (but definitely happened on centos 6).

I'll give the kmod repo a try once I figure out which of the systems I can break safely. ;)

@FireDrunk
Copy link
Contributor

Would it be an idea to build some verification scripts and run those on system shutdown and/or reboot?
The script can verify if a ZFS kernel module is present for all installed kernels (because it might be hard to find out which kernel is going to be booted next time), and warn you if there is not. Via a configuration option we could autocompile on shutdown?

I've had the same issue on Fedora 21/22

@ryao
Copy link
Contributor

ryao commented Sep 23, 2015

As a workaround until someone writes a fix for this, dkms install could likely be run manually:

sudo dkms install spl/0.6.5.1
sudo dkms install zfs/0.6.5.1

@ebyrne242
Copy link

Just wanted to note that I run the kernel-ml (mainline) package from elrepo due to lack of hfsplus module in the centos 7 kernel, so the kmod packages won't work for me. When the problem hit me, I was too lazy to investigate at the time, and basically did what ryao suggests: yum reinstall spl && yum reinstall zfs.

@dmaziuk
Copy link
Author

dmaziuk commented Sep 23, 2015

dkms install (spl,zfs)/<version> -k <kernel version>
where kernel version is what you see in ls /lib/modules will build it for the kernel you've just upgraded to.

@fmikker
Copy link

fmikker commented Sep 27, 2015

Could it be a workaround to upgrade the kernel-packages and reboot prior upgrading zfs/spl ?

@behlendorf behlendorf modified the milestones: 0.6.5.3, 0.7.0 Sep 28, 2015
@dmaziuk
Copy link
Author

dmaziuk commented Oct 1, 2015

just for posterity: one one centos 7 machine so far I

  • added the zfs-kmod to zfs.repo,
  • stopped the relevant services,
  • umounted zfs filesystems
  • yum rm'ed zfs, zfs-dkms, libzfs, libzpool2, libnvpair1, spl, spl-dkms, libuutil1
  • yum in'ed kmod-zfs
  • reboot

Everything seems to be in order so I removed dkms, gcc, and the rest of the -devel stuff.

@behlendorf
Copy link
Contributor

A fix for the CentOS dkms packages was applied to both 0.6.5.2 and the master branch. It addresses the case where old DKMS builds we're not being correctly cleaned up. This could have resulted in some of the reported issues.

@jonathandgough
Copy link

I need help... I think this issue has shut my system down. I must have updated and not rebooted. My machine got hung, and when I rebooted It refuses to load to the current kernal. 3.10.0-229.14.1.el7.x86_64

the only kernal I can boot into is 3.10.0-123.el7.x86_64

The other issue is that I have nvidia drivers installed which won't allow me to boot into a gui...

I did:
find /lib/modules/$(uname -r)/extra -name "splat.ko" -or -name "zcommon.ko" -or -name "zpios.ko" -or -name "spl.ko" -or -name "zavl.ko" -or -name "zfs.ko" -or -name "znvpair.ko" -or -name "zunicode.ko" | xargs rm -f
find /lib/modules/$(uname -r)/weak-updates/ -name "splat.ko" -or -name "zcommon.ko" -or -name "zpios.ko" -or -name "spl.ko" -or -name "zavl.ko" -or -name "zfs.ko" -or -name "znvpair.ko" -or -name "zunicode.ko" | xargs rm -f
yum reinstall zfs-release
yum --enablerepo=zfs-testing reinstall $(rpm -qa | egrep "zfs|spl")

I installed:
kde-settings-ksplash.noarch 0:19-23.5.el7.centos libzfs2.x86_64 0:0.6.5.2-1.el7.centos spl.x86_64 0:0.6.5.2-1.el7.centos
spl-dkms.noarch 0:0.6.5.2-1.el7.centos zfs.x86_64 0:0.6.5.2-1.el7.centos zfs-dkms.noarch 0:0.6.5.2-1.el7.centos
zfs-release.noarch 0:1-2.el7.centos

but it still refuses to boot. and zfs refuses to work...

@ebyrne242
Copy link

@jonathandgough You may want to clarify "refuses to boot". You should be able to boot wihout ZFS fine if you don't have any system-critical filesystems on ZFS.

With yum, you only need to reinstall spl-dkms and zfs-dkms. Note that this will install them in the running kernel. If you're trying to reinstall them on -123 and then boot into -229.14.1, it won't work. If you want to do that, you will have to use the dkms command mentioned above:

dkms install spl/ -k
dkms install zfs/ -k

Don't know much know about your nvidia problem (and it's a bit offtopic here), as there are many different ways to install nvidia drivers. However, if the nvidia kernel module isn't loaded, the nvidia X11 driver won't work. If you installed an nvidia driver package with dkms support, you should be able to do the same as above. If not, you may need a newer package for your newer kernel, or if it is version-independent, to do a reinstall of the driver after booting into the new kernel.

@dmaziuk
Copy link
Author

dmaziuk commented Oct 14, 2015

if you can boot into single-user,

dkms install spl/0.6.5.2 -k 3.10.0-229.14.1.el7.x86_64
dkms install zfs/0.6.5.2 -k 3.10.0-229.14.1.el7.x86_64

should fix ya.

FWIW I'm getting nvidia drivers from elrepo and now zfs: from zfs-kmod and I'm hoping to never touch dkms ever again. (Abandon hope, I know...)

@jonathandgough
Copy link

So.... I pulled my nvidia graphics card, and that (for whatever random reason) allowed me to actually see the boot screen. And, indeed the startup is getting hung up at zfs. It is saying "a start job is running for Mount ZFS filesystems.

and it basically just hangs there. the only way in is to boot through the rescue mode.

In rescue mode I deleted zfs
dkms remove --all zfs/0.6.5.2
dkms remove --all spl/0.6.5.2

then booted into the latest kernal and reinstalled as suggested.

dkms install spl/0.6.5.2 -k 3.10.0-229.14.1.el7.x86_64
dkms install zfs/0.6.5.2 -k 3.10.0-229.14.1.el7.x86_64

but I got errors and my partition will not mount.

for installing spl I got:
modinfo: ERROR: Module /lib/modules/3.10.0-229.14.1.el7.x86_64/weak-updates/splat.ko not found.
Error! Module version 0.6.5.2-1 for splat.ko
is not newer than what is already found in kernel 3.10.0-229.14.1.el7.x86_64 (0.6.5.2-1).
You may override by specifying --force.
Adding any weak-modules
modinfo: ERROR: Module /lib/modules/3.10.0-229.1.2.el7.x86_64/weak-updates/ not found.
modinfo: ERROR: Module /lib/modules/3.10.0-229.14.1.el7.x86_64/splat.ko not found.
modprobe: FATAL: Module /lib/modules/3.10.0-229.14.1.el7.x86_64/splat.ko not found.
Warning: Module splat.ko from kernel has no modversions, so it cannot be reused for kernel 3.10.0-229.1.2.el7.x86_64
depmod: ERROR: fstatat(4, zavl.ko): No such file or directory
depmod: ERROR: fstatat(4, znvpair.ko): No such file or directory
depmod: ERROR: fstatat(4, zunicode.ko): No such file or directory
depmod: ERROR: fstatat(4, zcommon.ko): No such file or directory
depmod: ERROR: fstatat(4, zfs.ko): No such file or directory
depmod: ERROR: fstatat(4, zpios.ko): No such file or directory
depmod: ERROR: fstatat(4, spl.ko): No such file or directory
depmod: ERROR: fstatat(4, splat.ko): No such file or directory
depmod: WARNING: could not open /lib/modules/3.10.0-229.1.2.el7.x86_64/modules.order: No such file or directory
depmod: WARNING: could not open /lib/modules/3.10.0-229.1.2.el7.x86_64/modules.builtin: No such file or directory
depmod: WARNING: could not open /var/tmp/initramfs.Hu7qnM/lib/modules/3.10.0-229.1.2.el7.x86_64/modules.order: No such file or directory
depmod: WARNING: could not open /var/tmp/initramfs.Hu7qnM/lib/modules/3.10.0-229.1.2.el7.x86_64/modules.builtin: No such file or directory

depmod...

for installing zfs I got:

zavl.ko:
Running module version sanity check.
modinfo: ERROR: Module /lib/modules/3.10.0-229.14.1.el7.x86_64/weak-updates/zavl.ko not found.

  • Original module
    • No original module exists within this kernel
  • Installation
    • Installing to /lib/modules/3.10.0-229.14.1.el7.x86_64/extra/

znvpair.ko:
Running module version sanity check.
modinfo: ERROR: Module /lib/modules/3.10.0-229.14.1.el7.x86_64/weak-updates/znvpair.ko not found.
Error! Module version 0.6.5.2-1 for znvpair.ko
is not newer than what is already found in kernel 3.10.0-229.14.1.el7.x86_64 (0.6.5.2-1).
You may override by specifying --force.

zunicode.ko:
Running module version sanity check.
modinfo: ERROR: Module /lib/modules/3.10.0-229.14.1.el7.x86_64/weak-updates/zunicode.ko not found.
Error! Module version 0.6.5.2-1 for zunicode.ko
is not newer than what is already found in kernel 3.10.0-229.14.1.el7.x86_64 (0.6.5.2-1).
You may override by specifying --force.

zcommon.ko:
Running module version sanity check.
modinfo: ERROR: Module /lib/modules/3.10.0-229.14.1.el7.x86_64/weak-updates/zcommon.ko not found.
Error! Module version 0.6.5.2-1 for zcommon.ko
is not newer than what is already found in kernel 3.10.0-229.14.1.el7.x86_64 (0.6.5.2-1).
You may override by specifying --force.

zfs.ko:
Running module version sanity check.
modinfo: ERROR: Module /lib/modules/3.10.0-229.14.1.el7.x86_64/weak-updates/zfs.ko not found.
Error! Module version 0.6.5.2-1 for zfs.ko
is not newer than what is already found in kernel 3.10.0-229.14.1.el7.x86_64 (0.6.5.2-1).
You may override by specifying --force.

zpios.ko:
Running module version sanity check.
modinfo: ERROR: Module /lib/modules/3.10.0-229.14.1.el7.x86_64/weak-updates/zpios.ko not found.
Error! Module version 0.6.5.2-1 for zpios.ko
is not newer than what is already found in kernel 3.10.0-229.14.1.el7.x86_64 (0.6.5.2-1).
You may override by specifying --force.
Adding any weak-modules
modinfo: ERROR: Module /lib/modules/3.10.0-229.1.2.el7.x86_64/weak-updates/ not found.
modinfo: ERROR: Module /lib/modules/3.10.0-229.14.1.el7.x86_64/znvpair.ko not found.
modprobe: FATAL: Module /lib/modules/3.10.0-229.14.1.el7.x86_64/znvpair.ko not found.
Warning: Module znvpair.ko from kernel has no modversions, so it cannot be reused for kernel 3.10.0-229.1.2.el7.x86_64
modinfo: ERROR: Module /lib/modules/3.10.0-229.1.2.el7.x86_64/weak-updates/ not found.
modinfo: ERROR: Module /lib/modules/3.10.0-229.14.1.el7.x86_64/zunicode.ko not found.
modprobe: FATAL: Module /lib/modules/3.10.0-229.14.1.el7.x86_64/zunicode.ko not found.
Warning: Module zunicode.ko from kernel has no modversions, so it cannot be reused for kernel 3.10.0-229.1.2.el7.x86_64
modinfo: ERROR: Module /lib/modules/3.10.0-229.1.2.el7.x86_64/weak-updates/ not found.
modinfo: ERROR: Module /lib/modules/3.10.0-229.14.1.el7.x86_64/zcommon.ko not found.
modprobe: FATAL: Module /lib/modules/3.10.0-229.14.1.el7.x86_64/zcommon.ko not found.
Warning: Module zcommon.ko from kernel has no modversions, so it cannot be reused for kernel 3.10.0-229.1.2.el7.x86_64
modinfo: ERROR: Module /lib/modules/3.10.0-229.1.2.el7.x86_64/weak-updates/ not found.
modinfo: ERROR: Module /lib/modules/3.10.0-229.14.1.el7.x86_64/zfs.ko not found.
modprobe: FATAL: Module /lib/modules/3.10.0-229.14.1.el7.x86_64/zfs.ko not found.
Warning: Module zfs.ko from kernel has no modversions, so it cannot be reused for kernel 3.10.0-229.1.2.el7.x86_64
modinfo: ERROR: Module /lib/modules/3.10.0-229.1.2.el7.x86_64/weak-updates/ not found.
modinfo: ERROR: Module /lib/modules/3.10.0-229.14.1.el7.x86_64/zpios.ko not found.
modprobe: FATAL: Module /lib/modules/3.10.0-229.14.1.el7.x86_64/zpios.ko not found.
Warning: Module zpios.ko from kernel has no modversions, so it cannot be reused for kernel 3.10.0-229.1.2.el7.x86_64
depmod: ERROR: fstatat(4, zavl.ko): No such file or directory
depmod: ERROR: fstatat(4, znvpair.ko): No such file or directory
depmod: ERROR: fstatat(4, zunicode.ko): No such file or directory
depmod: ERROR: fstatat(4, zcommon.ko): No such file or directory
depmod: ERROR: fstatat(4, zfs.ko): No such file or directory
depmod: ERROR: fstatat(4, zpios.ko): No such file or directory
depmod: ERROR: fstatat(4, spl.ko): No such file or directory
depmod: ERROR: fstatat(4, splat.ko): No such file or directory
depmod: WARNING: could not open /lib/modules/3.10.0-229.1.2.el7.x86_64/modules.order: No such file or directory
depmod: WARNING: could not open /lib/modules/3.10.0-229.1.2.el7.x86_64/modules.builtin: No such file or directory
depmod: WARNING: could not open /var/tmp/initramfs.SmTEqS/lib/modules/3.10.0-229.1.2.el7.x86_64/modules.order: No such file or directory
depmod: WARNING: could not open /var/tmp/initramfs.SmTEqS/lib/modules/3.10.0-229.1.2.el7.x86_64/modules.builtin: No such file or directory

depmod...

DKMS: install completed.

@behlendorf behlendorf modified the milestones: 0.6.5.4, 0.6.5.3 Oct 15, 2015
@jonathandgough
Copy link

@dmaziuk where are you getting zfs-kmod from?

@behlendorf
Copy link
Contributor

To be clear you must use either the KMOD or DKMS repository, not both. Before switching definitely make sure old packages are removed and any stale modules.

@jonathandgough
Copy link

@behlendorf

I haven't done anything with a KMOD, I'm just at my wits end.

I have managed to get my machine to boot. i have tried removing everything and then re-installing but I when I check zpool status, I still keep getting:
[root@localhost-2 ~]# zpool status
The ZFS modules are not loaded.
Try running '/sbin/modprobe zfs' as root to load them
[root@localhost-2 ~]# /sbin/modprobe zfs
modprobe: FATAL: Module zfs not found.
[root@localhost-2 ~]#

I even removed everything zfs related and re-installed everything from the archive as explained on the webpage.

@behlendorf
Copy link
Contributor

Check the output of dmesg it should show why a module couldn't be loaded.

@dmaziuk
Copy link
Author

dmaziuk commented Oct 15, 2015

@jonathandgough OK, here's what worked for me so far (5 or 6 boxen, one centos 6, the rest's centos 7, no nvidia drivers):

  • boot your machine to where it has network and can run yum in,
  • run yum ls installed | grep zfs, then yum rm everything it lists except zfs-release,
  • cd /lib/modules, go through the subdirectories and delete all zfs stuff from weak-updates and extra. In your case you may have nvidia stuff in there, too, and will have to figure out what's what.
  • edit /etc/yum.repos.d/zfs.repo, copy-paste the [kmod-zfs] stanza from behlendorf's comment above. Set enabled = 1 on kmod-zfs and enabled = 0 on the regular zfs repo.
  • yum in kmod-zfs
  • reboot and enjoy

@jonathandgough
Copy link

The problem is it's not finding

zavl.ko
znvpair.ko
zunicode.ko
zcommon.ko
zfs.ko
zpios.ko

From what I can surmise, when i was trying to clean things up I deleted these from the old location and when it is re-installing (and uninstalling) it's looking for them in the old location. Problem is, is they are gone...

I tried using

dkms install spl/0.6.5.2 -k 3.10.0-229.14.1.el7.x86_64 --force
dkms install zfs/0.6.5.2 -k 3.10.0-229.14.1.el7.x86_64 --force

but that didn't work...

now trying to force install zfs on 3.10.0-229.1.2.el7.x86_64 but i'm getting the same errors.

@behlendorf behlendorf removed this from the 0.6.5.4 milestone Dec 30, 2015
@skorgu
Copy link

skorgu commented Jan 18, 2016

I threw this into /etc/cron.daily/zfs-update-dkms on CentOS 7 (7.2.1511). Obviously if you reboot immediately after updating kernels this won't help unless you remember to run it manually.

#!/bin/bash
set -eu
# Set this to something high to get modules built for all of your kernels 
# or 1 to just build for the most recent.
N_KERNELS=10

SPL_VER=$(rpm -qa --qf "%{VERSION}\n"  spl-dkms | sort -V | tail -n1)
ZFS_VER=$(rpm -qa --qf "%{VERSION}\n"  zfs-dkms | sort -V | tail -n1)
for KERNEL_VER in $(rpm -qa --qf "%{VERSION}-%{RELEASE}.%{ARCH}\n"  kernel | sort -V | tail -n${N_KERNELS})
do
  dkms install -q spl/${SPL_VER} -k ${KERNEL_VER}
  dkms install -q zfs/${ZFS_VER} -k ${KERNEL_VER}
done

@tpdownes
Copy link

Using some tips above, this is how I approach a yum upgrade that includes both ZFS and kernel packages:

yum upgrade kernel*
# yes literally delete all the modules. They're in RAM don't worry.
find /lib/modules -name "splat.ko" -or -name "zcommon.ko" -or -name "zpios.ko" -or -name "spl.ko" -or -name "zavl.ko" -or -name "zfs.ko" -or -name "znvpair.ko" -or -name "zunicode.ko" | xargs rm -f
yum upgrade -y

Something like it might work well in concert with your crontab. I suppose whether or not there are ZFS upgrades, you would need to manually run your script after the commands I list above.

@tlvu
Copy link

tlvu commented Jan 21, 2016

On vanila new install of centos 7.0-1406 (3.10.0-123.el7.x86_64), sudo yum install kernel-devel zfs doens't even work (sudo modprobe zfs => modprobe: FATAL: Module zfs not found.)

[lvu@c7zfs ~]$ rpm -qa |grep -e zfs -e dkms -e spl
libzfs2-0.6.5.4-1.el7.centos.x86_64
spl-dkms-0.6.5.4-1.el7.centos.noarch
spl-0.6.5.4-1.el7.centos.x86_64
zfs-release-1-2.el7.centos.noarch
dkms-2.2.0.3-30.git.7c3e7c5.el7.noarch
zfs-dkms-0.6.5.4-1.el7.centos.noarch
zfs-0.6.5.4-1.el7.centos.x86_64
[lvu@c7zfs ~]$ uname -a
Linux c7zfs 3.10.0-123.el7.x86_64 #1 SMP Mon Jun 30 12:09:22 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
[lvu@c7zfs ~]$

I am surprised ! I've been using zfs with centos 6 for almost 2 years without any problems.

Would like to promote ZFS strongly in my company but this does not look "enterprise solid" if clean install does not even work, let alone upgrade.

@dmaziuk
Copy link
Author

dmaziuk commented Jan 21, 2016

FWIW I've been using zfs-kmod since October on several centos 7 hosts. Two of them have problems with centos kernel builds and had to be locked into -229 builds -- both are old (3+ years) AMD SuperMicros. Another one got bit by #3708 when adding ~30th disk to the pool. The rest have no issues.

Since redhat's already invested a lot into btr, zfs support in fedora/rhel and derivatives will likely never get to what you'd call "enterprise solid" level.

@tlvu
Copy link

tlvu commented Jan 21, 2016

On vanila new install of centos 7.0-1406 (3.10.0-123.el7.x86_64), sudo yum install kmod-zfs (with the kmod yum repo file posted by @behlendorf) result in a bunch of depmod: WARNING: /lib/modules/3.10.0-123.el7.x86_64/weak-updates/spl/splat/splat.ko needs unknown symbol taskq_empty_ent so zfs it not working either (sudo modprobe zfs => modprobe: FATAL: Module zfs not found.).

However, a nice surprise, sudo yum update to the lastest kernel and zfs works (sudo modprobe zfs works, did not try to create a pool yet).

[lvu@c7zfs ~]$ rpm -qa |grep -e zfs -e kmod -e spl
libzfs2-0.6.5.3-1.el7.centos.x86_64
kmod-20-5.el7.x86_64
zfs-release-1-2.el7.centos.noarch
kmod-spl-0.6.5.3-1.el7.centos.x86_64
kmod-zfs-0.6.5.3-1.el7.centos.x86_64
spl-0.6.5.3-1.el7.centos.x86_64
zfs-0.6.5.3-1.el7.centos.x86_64
kmod-libs-20-5.el7.x86_64
[lvu@c7zfs ~]$ uname -a
Linux c7zfs 3.10.0-327.4.4.el7.x86_64 #1 SMP Tue Jan 5 16:07:00 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Posted here as reference that this specific combination of packages/versions seem to work together.

@tlvu
Copy link

tlvu commented Jan 21, 2016

In the previous comment I installed kmod-zfs on old kernel, then upgrade the kernel. This time I upgrade the kernel first, then install kmod-zfs.

Happy to report that kmod-zfs clean install with latest kernel 3.10.0-123.el7.x86_64 also works (sudo modprobe zfs works, did not try to create a pool yet).

So I am ditching DKMS for KMOD now.

@Merlin83b
Copy link

Just a "me too", but after a yum upgrade and with no system-critical filesystems on ZFS, running

dkms install spl/0.6.5.4 -k 3.10.0-327.10.1.el7.x86_64
dkms install zfs/0.6.5.4 -k 3.10.0-327.10.1.el7.x86_64
systemctl restart systemd-modules-load.service
systemctl start zfs-import-cache.service zfs-mount.service zed.service

Had everything up and running.

@FireDrunk
Copy link
Contributor

aaaand again.... Upgrading to kernel 4.4.5-301 in FC23 got me a failing ZFS system (it wasn't automatically upgraded).
After doing a dkms install spl/0.6.5.5 and dkms install zfs/0.6.5.5 everything worked again....

@dominic-p
Copy link

dominic-p commented Oct 21, 2016

Just wanted to share my process here in case it helps anyone else. I read through this and a couple other threads and came away with the following to get DKMS working again after a kernel update.

# dkms status

You should see a list of spl and zfs kernel modules compiled for each kernel. The important info here is the version number. At the time of this writing it was: 0.6.5.8

Remove the DKMS kernel modules

# dkms remove -m zfs -v 0.6.5.8 --all
# dkms remove -m spl -v 0.6.5.8 --all

Clean out the other ZFS related modules. Note that this is dangerous if you have other kernel modules, so double check these directories. When I ran it I had the following (all ZFS related):

# ls /lib/modules/<kernel>/extra/
splat.ko  zavl.ko     zfs.ko      zpios.ko
spl.ko    zcommon.ko  znvpair.ko  zunicode.ko

# rm -fr /lib/modules/*/extra/*
# rm -fr /lib/modules/*/weak-updates/*

Reinstall ZFS and SPL

# yum reinstall spl
# yum reinstall zfs

Readd and reinstall the DKMS moudles (note the version number from above)

# dkms add -m spl -v 0.6.5.8
# dkms add -m zfs -v 0.6.5.8
# dkms install -m spl -v 0.6.5.8
# dkms install -m zfs -v 0.6.5.8

After that, you will need to load the modules

# /sbin/modprobe spl
# /sbin/modprobe zfs

Now, you should be able to run:

# zpool status

@greggwon
Copy link

greggwon commented Dec 4, 2016

Since all of the above commands work in an RPM, why is it that this still matters to this day? Why don't we either do these things in the RPMs after detecting dkms is present, or just stop having two different ways of managing updates because of "kernel modules"? What's the missing technology here to make this a seamless detail for users? It seems really odd that all of this manual process exists in this day and age. These utility applications can be made smarter and if needed more meta-data made available to them so that they can work seamlessly with the users environment.

Just to clarify this more, the kernel install RPM should provide some sort of trigger that DKMS and other kernel sensitive components can use. It should be possible to pre-build new modules for new kernels before having to reboot. In the ZFS world, new kernel installs could even be done with new snapshot preserved configurations so that the user can always know what goes together and have everything preserved.

There is just so much that ZFS or any other transactional filesystem can provide to the management of the operating system update mechanism. If we could just get ZFS fixed to deal with full file systems completely without locking up, that would be even better.

@FireDrunk
Copy link
Contributor

We might be able to trigger these rebuilts on startup, with some kind of systemd service which verifies whether it's required to rebuilt the module for the current kernel.

@bubbagump210
Copy link

I upgraded from CentOS 7.2 to 7.3 which broke all of ZFS. For what it is worth, the dominic-p mention works perfectly.

@dominic-p
Copy link

I'm glad to hear the notes worked for you. In case it will help anyone, I put together a simple script to automate the rebuild process after a kernel update. Use it with caution! It runs some dangerous commands (e.g. rm -fr /lib/modules/*/extra/*) if you have non ZFS related DKMS modules installed.

@jasker5183
Copy link

Thanks dominic-p I will definitely try that on the next update. Would I use this before or after rebooting? Or would it matter? Would be nice to have a solution where upon upgrading of zfs, and spl and/or kernel update it would just rebuild the module. Wouldn't the simplest way be to kill/disable weak updates?

@dominic-p
Copy link

This procedure is meant to be run after reboot (once the new kernel is running). I honestly don't know very much about DKMS, so I'm not sure if disabling weak updates would solve the problem. If you make any progress on that, please post back.

@jasker5183
Copy link

Going to try SIGBUS's solution from this post:

https://www.centos.org/forums/viewtopic.php?t=56627

brianjmurrell pushed a commit to whamcloud/iml-agent that referenced this issue Nov 16, 2017
openzfs/zfs#3801 describes an issue with
the spl/zfs dkms build which means that spl/zfs may not always be
installed.

A suggested workaround is to

 dkms install spl/0.6.5.1
 dkms install zfs/0.6.5.1

to force the install and so this solution (sic) is to dkms install
when ever the agent restarts which will always include after a reboot
for a kernel update. This is generally a null event but occasionally
may cause the right thing to happen.

 - Create a method by which functions can signify they will must be
   called when the agent starts. Use a decorator called
   agent_daemon_startup_function to add the function to a list which
   the daemon can all.
 - Add some tests for the new functionality.

Change-Id: Id7d189736a451292e7e2a688a8d46b4775666025
Signed-off-by: Chris Gearing <chris.gearing@intel.com>
Reviewed-on: http://review.whamcloud.com/23278
Tested-by: Jenkins
Reviewed-by: Kelsey Prantis <kelsey.prantis@intel.com>
Tested-by: Chroma Test User
Reviewed-by: Tom Nabarro <tom.nabarro@intel.com>
brianjmurrell pushed a commit to whamcloud/iml-agent that referenced this issue Nov 16, 2017
The initialise_driver method which is called at ManagedNode install time but
also every time the agent starts carries out actions even in monitored mode.
The actions are to create the hostid file for zfs (if it does not exist) and
issue dkms commands to work around
openzfs/zfs#3801
namely it does dkms install for zfs/spl packages that are installed (install
does the actual build).

This actions should only occur in the case the Managed mode case and not in
the monitor mode case.

 - Make initialize_driver take managed_mode as a parameter
 - In BlockDeviceZFS initialize_driver do nothing if monitored mode.
 - Provide safe way of detected that managed mode is configure. This safe
   way should default to not in the case of an unknwon state
 - Use this safe method to detect managed mode and for completeness make
   code that detected directly use the new provider

Change-Id: I7e30811bcb8a12d05d48cb55d95c56e6d17d10bd
Signed-off-by: Chris Gearing <chris.gearing@intel.com>
Reviewed-on: http://review.whamcloud.com/23669
Tested-by: Jenkins
Tested-by: Chroma Test User
Reviewed-by: Tom Nabarro <tom.nabarro@intel.com>
Reviewed-by: Kelsey Prantis <kelsey.prantis@intel.com>
@remyd1
Copy link

remyd1 commented Nov 22, 2017

I had the same issue with 0.6.5.9

None of the solution listed above actually works. The kmod-zfs repository is not working with my kernel. The repomd.xml file is not valid.

When running dkms install, The build step of the zfs failed.

dkms install -m zfs -v 0.6.5.8 -k `uname -r`

[...]
make -j24 KERNELRELEASE=3.10.0-693.5.2.el7.x86_64...(bad exit status: 2)
Error! Bad return status for module build on kernel: 3.10.0-693.5.2.el7.x86_64 (x86_64)
Consult /var/lib/dkms/zfs/0.6.5.9/build/make.log for more information.

My kernel is 3.10.0-693.5.2.el7.x86_64

My log output is here

Any help would be useful...

Best regards,
Rémy

edit: finally it works by removing any kernel module with dkms and then removing dkms, spl and all zfs packages. Then, reboot, and go here to reinstall it like if you were doing it from scratch using kmod.

@greggwon
Copy link

Again, what set of dependency checks and technology keep this from being just implemented to work correctly, for DKMS or not? This really feels like something a freshmen student at university threw together for some friends. It should be a professionally behaving installation that just works. Every variation and detail discussed on this page seems like something readily handled by an RPM or at least a shell script.

Is there really just no experience here with automation of things like this?

@behlendorf
Copy link
Contributor

Closing. This issue is being resolved in 0.8 by moving the spl source in to the zfs repository to eliminate the dependency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Building Indicates an issue related to building binaries
Projects
None yet
Development

No branches or pull requests