Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config/kernel: enforce kernel max version, with escape hatch #15986

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

robn
Copy link
Contributor

@robn robn commented Mar 12, 2024

Motivation and Context

It's possible for OpenZFS to build correctly against a newer kernel than it is supported for, but then not work correctly. This invariably results in disappointment, confusion and/or anger. See #15930/#15931 for a recent example.

Since it's not feasible for us to match Linux's release frequency, the next best thing seems to be to warn the user that they're entering the Nightmare Realm, so they aren't surprised when the wolves get them.

Description

Check the kernel version we're configuring against and bail out if the kernel is too new.

Sometimes however we do actually want to compile against a newer kernel than is supported, usually when testing a pre-release kernel. Add the deliberately-verbose --disable-supported-linux-version-check option to disable this check. This is lots to type, and so hopefully can be taken as a very explicit signal that the user knows what they're doing.

Finally, if an unsupported kernel is used and the option is used, a big warning message is displayed at the end of the configure run to really try and make the point.

How Has This Been Tested?

Configuring as normal against a kernel in the supported range does what it always has:

...
checking kernel source version... 5.10.170
checking for kernel config option compatibility... done
...
...
checking kernel source version... 6.7.9
checking for kernel config option compatibility... done
...

Configuring against a kernel version higher than the max supported throws an error:

checking kernel source version... 6.8.0-rc3
configure: error:
	*** Cannot build against kernel version 6.8.0-rc3.
	*** The maximum supported kernel version is 6.7.

Overriding allows it to continue:

$ ./configure ... --disable-supported-linux-version-check
...
checking kernel source version... 6.8.0-rc3
checking for kernel config option compatibility... done
...

Should configure succeed after overriding the check (as currently happens on 6.8), it throws a big warning at the end:

...
config.status: executing depfiles commands
config.status: executing libtool commands
config.status: executing po-directories commands
configure: WARNING:

	You are building OpenZFS against Linux version 6.8.0-rc3.

	This combination IS NOT SUPPORTED by the OpenZFS project. Even if it
	appears to build and run correctly, there may be bugs that can cause
	SERIOUS DATA LOSS.

	YOU HAVE BEEN WARNED!

	If you choose to continue, we'd appreciate if you could report your
	results on the OpenZFS issue tracker at:

	   https://github.com/openzfs/zfs/issues/new

	Your feedback will help us prepare a new OpenZFS release that supports
	this version of Linux.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

It's possible for OpenZFS to build correctly against a newer kernel than
it is supported for, but then not work correctly. This invariably
results in disappointment, confusion and/or anger.

Sometimes however we do actually want to compile against a newer kernel
than is supported, usually when testing a pre-release kernel. Add the
deliberately-verbose `--disable-supported-linux-version-check` option to
disable this check. This is lots to type, and so hopefully can be taken
as a very explicit signal that the user knows what they're doing.

Finally, if an unsupported kernel is used and the option is used, a big
warning message is displayed at the end of the configure run to really
try and make the point.

Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
@satmandu
Copy link
Contributor

As the reporter for #15930 , I am obligated to note that running OpenZFS on not yet officially supported kernels is exactly how I find bugs to report.

As such, I would suggest some sort of ALL CAPS DANGER WILL ROBINSON warning as opposed to an error out condition that I then have to patch around for testing.

Having said that, it would be nice to have a 2.2.x proposed branch with back ported patches known to be needed for new kernel support merged in (even if a release isn't ready to be tagged), for those of us who are actively testing new kernel support.

@darkbasic
Copy link

darkbasic commented Mar 12, 2024

There is zfs-2.2.4-staging but it's a bit annoying because you will have to change the branch each and every time a new minor version gets released. Would be nice to have a zfs-2.2.x-staging or a zfs-stable-staging branch to follow.

@robn
Copy link
Contributor Author

robn commented Mar 12, 2024

As the reporter for #15930 , I am obligated to note that running OpenZFS on not yet officially supported kernels is exactly how I find bugs to report.

As such, I would suggest some sort of ALL CAPS DANGER WILL ROBINSON warning as opposed to an error out condition that I then have to patch around for testing.

I'm confused; that's literally what the --disable-supported-linux-version-check option is for. Were you thinking of something different?

@satmandu
Copy link
Contributor

There is zfs-2.2.4-staging but it's a bit annoying because you will have to change the branch each and every time a new point version gets released. Would be nice to have a zfs-stable-staging branch to follow.

Yes but I would note that https://github.com/openzfs/zfs/tree/zfs-2.2.4-staging does not yet have #15931 backported.

I've been using #15931 in my own 2.2.3-based PPA, and do use it with kernel 6.8.0, but I label that as experimental on purpose.

As I understand it, the current OpenZFS workflow is to add commits to the staging branch at the point when it is decided that it is probably time to tag a release.

I have no problems with that, but it does mean that someone wanting to use a newer kernel has to keep on top of the PRs (submitted and/or accepted) to figure out what patches might need to be applied to get that additional kernel support working properly.

Would it be nice to have some subset of OpenZFS tagged releases follow the kernel release cycle? Sure. But I'm not funding development of this well-honed software project, so I don't get a say in that.

@rincebrain
Copy link
Contributor

I would remark also that if people expect this warning to tell them it's a bad idea, they may be burned by expecting the inverse implication, that it is known to work on things this warning doesn't come up from, and then someone cherrypicks a breaking change into Linux LTS or a distro cherrypicks something from the future and people are very surprised indeed.

@darkbasic
Copy link

@satmandu that's because it still hasn't been merged in master either. Once it lands in master it will get backported. I don't know why it's taking so long. I usually look at commits to backport in my own branch where I add compatibility patches for newer kernels but I've missed that one, I will start looking at open PRs either.

@satmandu
Copy link
Contributor

As the reporter for #15930 , I am obligated to note that running OpenZFS on not yet officially supported kernels is exactly how I find bugs to report.

As such, I would suggest some sort of ALL CAPS DANGER WILL ROBINSON warning as opposed to an error out condition that I then have to patch around for testing.

I'm confused; that's literally what the --disable-supported-linux-version-check option is for. Were you thinking of something different?

My apologies for being obtuse. I just meant that in my PPA I would have to add a patch to apply the --disable-supported-linux-version-check flag in dkms. It's not a big deal on my end, and I agree with the intention of this PR!

@darkbasic
Copy link

How is this patch supposed to work? I've asked the Arch Linux maintainer to apply it to their dkms but I've seen reports that it doesn't work.

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Mar 21, 2024
@robn
Copy link
Contributor Author

robn commented Mar 27, 2024

@darkbasic

How is this patch supposed to work?

If configure detects you trying to build against a kernel newer than the Linux-Maximum version declared in META, it will abort with an error.

If you set --disable-supported-linux-version-check, then it will reduce that to emitting a warning.

I've asked the Arch Linux maintainer to apply it to their dkms but I've seen reports that it doesn't work.

"It doesn't work" needs details, preferably a build log.

@robn
Copy link
Contributor Author

robn commented Mar 27, 2024

I will say, if we do land this patch and vendor distributions simply add --disable-supported-linux-version-check and move on with life, I will be far more inclined to either ignore bug reports on unsupported kernels or require a lot more work from the bug submitter (at least, reconfirming on a supported Linux version).

I definitely do want to know about breakages specifically because of changes in newer kernel versions, but I am far less interested in standard OpenZFS operational issues on unsupported configurations. If downstreams are going to actively ignore recommendations, then they need to do more to support their users directly, or assist them when they turn up here.

I get that some distributions' whole purpose is to ship bleeding-edge everything, but that must be balanced by at least making it clear that OpenZFS' traditional stability and reliability guarantees may not hold in those situations. Quietly hiding this fact is just dishonest. At the very least, I would like those distributions to inform their users of this by some other mechanism, if showing output from OpenZFS configure is not possible or appropriate.

If that's a non-starter, I'm happy to take suggestions on alternate methods. I don't want to be a dick about it, but its already incredibly difficult to support the range of kernels we do. Distributors adding even more combinations without also getting involved in their support and upkeep just seems rude.

@robn robn mentioned this pull request Mar 27, 2024
13 tasks
@rrevans
Copy link
Contributor

rrevans commented Mar 27, 2024

The supported Linux version range today is the fully supported range.

Is there value in having this flag only enable a list of versions that are known to be stable enough for bleeding edge testing?

As a disincentive to shipping this to unsuspecting users, this flag could also imply debug build and/or emit warnings to kmsg on import and when invoking tools.

@darkbasic
Copy link

If configure detects you trying to build against a kernel newer than the Linux-Maximum version declared in META, it will abort with an error.

Ok, just wanted to be sure that the check aborts and doesn't simply show a warning.

"It doesn't work" needs details, preferably a build log.

They didn't provide any, I've just checked myself and it looks like it's working as expected:

(2/3) Install DKMS modules
==> dkms install --no-depmod zfs/2.2.3.r1.g58211157bf -k 6.8.1-arch1-1
configure: error: 
	*** Cannot build against kernel version 6.8.1-arch1-1.
	*** The maximum supported kernel version is 6.7.
			
Error! Bad return status for module build on kernel: 6.8.1-arch1-1 (x86_64)
Consult /var/lib/dkms/zfs/2.2.3.r1.g58211157bf/build/make.log for more information.
==> WARNING: `dkms install --no-depmod zfs/2.2.3.r1.g58211157bf -k 6.8.1-arch1-1' exited 10

They probably filed the report against the wrong package and the one they're using didn't have this patch backported.

@robn
Copy link
Contributor Author

robn commented Mar 27, 2024

@darkbasic I'm confused. That build does have this patch - they've taken the patch, and then not used the option? I don't even know why you would.

@darkbasic
Copy link

@darkbasic I'm confused. That build does have this patch - they've taken the patch, and then not used the option? I don't even know why you would.

That log is from my build, which does indeed have the patch. What happened is that a user commented on the AUR about being able to successfully build the dkms against 6.8. The package on the AUR has backported the patch, so I started wondering if this pr is supposed to downright fail or just add a warning. I've then tried it myself and it does indeed the former. What I guess has happened is that the user uses the zfs Arch repository, which contrary to the AUR probably didn't backport this PR. Why he commented on the AUR remains a mystery.

@robn
Copy link
Contributor Author

robn commented Mar 27, 2024

Ok, I'm sensing confusion about the intent of this change. Maybe that's me confused, or maybe I'm doing fine but haven't explained it properly. So I'll try explaining again, and if we're still no good, I'll let someone point out that I'm confused, and then I'll quietly withdraw into the hedges.

We publish a "maximum supported" kernel version in META. Currently, that says 6.7. However, we do nothing to enforce this.

In 2.2.3 we shipped experimental support for 6.8. This was buggy/incomplete. People tried it, noticed problems, reported them. That's great! But, there was also a certain amount of pressure in those requests (and elsewhere) to have it fixed quickly, which I felt was unreasonable.

I wondered if maybe the problem was that it wasn't easily discoverable that kernels beyond 6.7 were unsupported/experimental, and thus this patch: a message to let you know and a way to explicitly opt-in to potential carnage.

In my opinion, if a vendor distributes OpenZFS for a kernel with higher version than the maximum version listed in META, then that vendor is explicitly opting their users into an experimental/unsupported configuration, and if that breaks (incl. data loss), the responsibility is mostly on the vendor, not on the OpenZFS project itself. This patch doesn't change that theory, but would at least make it very clear that the vendor is making that decision also.

(not that I don't want to hear about such breakage; but I don't want an irate end-user blaming us for shipping broken software when we didn't, at least not knowingly).

The alternatives to this seem to be either to never put experimental patches anywhere near a release series (even with warnings), or to make experimental builds available ahead of time. Builds are probably ideal but requires time and infrastructure we mostly don't have, so shipping experimental support with warnings on it at least is a straightforward way to get it into people's hands.

I dunno, this felt like a light touch :)

@robn
Copy link
Contributor Author

robn commented Mar 27, 2024

@rrevans

Is there value in having this flag only enable a list of versions that are known to be stable enough for bleeding edge testing?

Maybe, except most of the time OpenZFS won't even compile against a new kernel version, due to their perpetual API churn. So this was mostly intended as a gate for when we do ship early support but don't yet know if its complete.

Maybe instead META could have Linux-Maximum-Experimental: 6.8, and you have to --enable-linux-experimental to build up to that, and beyond that is just a hard rejection.

As a disincentive to shipping this to unsuspecting users, this flag could also imply debug build and/or emit warnings to kmsg on import and when invoking tools.

A warning to the kernel log seems reasonable and benign. I like the idea of building with debug, though I wonder if its too heavy-handed so long as failed assertions panic the whole module. I'm pretty sure I don't want that if vendors are going to be quietly opting users into this option; that's very much a "you are definitely testing now" and I don't think that's entirely fair to spring on people unknowingly. On the other hand, maybe a vendor is going to be far less inclined to enable this option if the result is "kernel crashes" vs "mild inconvenience".

@rrevans
Copy link
Contributor

rrevans commented Mar 28, 2024

Maybe instead META could have Linux-Maximum-Experimental: 6.8, and you have to --enable-linux-experimental to build up to that, and beyond that is just a hard rejection.

Thanks for explaining. Yes this is exactly the sort of idea. In that approach it is nice and clear where supported ends and experimental starts, and it's also opt-in only.

I like the idea of building with debug, though I wonder if its too heavy-handed so long as failed assertions panic the whole module. I'm pretty sure I don't want that if vendors are going to be quietly opting users into this option; that's very much a "you are definitely testing now" and I don't think that's entirely fair to spring on people unknowingly. On the other hand, maybe a vendor is going to be far less inclined to enable this option if the result is "kernel crashes" vs "mild inconvenience".

What do you think about a middle of the road option where assertions are built and executed but print warnings instead?

The outcome is then identical for the user - corruption maybe or other unknown problems the assertions are intended to catch - but they get an actionable report to file if desired.

@robn
Copy link
Contributor Author

robn commented Mar 28, 2024

Thanks for explaining. Yes this is exactly the sort of idea. In that approach it is nice and clear where supported ends and experimental starts, and it's also opt-in only.

I'm persuaded. It's not miles away from this PR in function, but it feels a lot more solid.

What do you think about a middle of the road option where assertions are built and executed but print warnings instead?

I... rather like this. I'll have a play with it.

@robn robn marked this pull request as draft March 28, 2024 09:33
@Gendra13
Copy link

What I am wondering at this point:
How much are the distribution vendors even aware of the fact that there is a maximum supported kernel version and that there might be still serious issues left, even when the building process itself completes successfully?

As a current example:
The next Ubuntu LTS 24.04 release happens to be shipped with kernel 6.8 and includes zfs-2.2.2. Even though they cherry-picked an arbitrary selection of 6.8-compat-patches from the early 2.2.3-staging-branch to get the kernel module building, there are still a lot problems left including #15930.
But since the module builds correctly it is deemed to be fine and is about to be shipped with the next LTS release in April (and that’s not a rolling-release where I might expect some hiccups but a LTS version where I would expect extra stability).

@tonyhutter
Copy link
Contributor

tonyhutter commented Mar 28, 2024

@robn I'm fine with your overall approach. You might want to add another line to give the user a hint that --disable-supported-linux-version-check is available, like:

checking kernel source version... 6.8.0-rc3
configure: error:
	*** Cannot build against kernel version 6.8.0-rc3.
	*** The maximum supported kernel version is 6.7.
        *** Use --disable-supported-linux-version-check to bypass this check.

Also, is this still "Draft" or are you ready for it to be approved?

@robn
Copy link
Contributor Author

robn commented Mar 28, 2024

@Gendra13

How much are the distribution vendors even aware of the fact that there is a maximum supported kernel version and that there might be still serious issues left, even when the building process itself completes successfully?

Yeah, I have no idea. That's part of why I want this!

But since the module builds correctly it is deemed to be fine and is about to be shipped with the next LTS release in April (and that’s not a rolling-release where I might expect some hiccups but a LTS version where I would expect extra stability).

Yep, I'd hope they're paying attention.

@robn
Copy link
Contributor Author

robn commented Mar 28, 2024

@tonyhutter

Also, is this still "Draft" or are you ready for it to be approved?

Hold off for the moment, I'm going to try a different approach first and see if it feels nicer.

@ahesford
Copy link
Contributor

ahesford commented Apr 10, 2024

While I hate that this is necessary, I think it's a good idea. ZFS support is critical to Void Linux, and we hold off bumping our generic linux meta-package (which pulls in the [approximately] most recent stable kernel series) until ZFS and other important out-of-tree modules work with that series. We try to be conservative in backporting patches, and the risk of unexpected incompatibilities is further mitigated by offering users a choice between several active kernel series as well as a zfs-lts package. Nevertheless, having an additional sanity check on compatibility will help us avoid any major missteps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Code Review Needed Ready for review and testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants