Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zpool cache not respected by opnsense #7553

Closed
2 tasks done
nt1 opened this issue Jun 23, 2024 · 13 comments
Closed
2 tasks done

zpool cache not respected by opnsense #7553

nt1 opened this issue Jun 23, 2024 · 13 comments
Assignees
Labels
bug Production bug
Milestone

Comments

@nt1
Copy link

nt1 commented Jun 23, 2024

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

zpools other than zroot present in zpool cache (/etc/zfs/zpool.cache or /boot/zfs/zpool.cache) are not imported at boot time in OPNsense 24.1.9.

To Reproduce

Steps to reproduce the behavior:

  1. Access command line of device and get a root shell.
  2. Create a new zpool.
  3. Export the zpool (zpool export )
  4. Import the zpool (zpool import )
  5. Verify the zpool exists in zpool cache (zdb -U /etc/zfs/zpool.cache)
  6. Reboot the system.
  7. When the system is back up, access the command line and check if the new zpool was imported (zpool list). Notice only zroot exists.

Expected behavior

Upon reboot, the newly created zpool was imported.

Describe alternatives you considered

It is possible to manually import the zpool using zpool import after the reboot, but this doesn't really scale.

I was able to get the pool to import as expected by adding zfs_enable="YES" to /etc/rc.conf which allowed /etc/rc.d/zfs to start at boot per expectations. I notice opnsense sets this variable in /usr/local/etc/rc.loader.d/20-zfs, perhaps this variable isn't making it to freebsd's rc.d?

Screenshots

If applicable, add screenshots to help explain your problem.

Relevant log files

If applicable, information from log files supporting your claim.

Additional context

Add any other context about the problem here.

Environment

Software version used and hardware type if relevant, e.g.:

OPNsense 24.1.9

@fichtner fichtner transferred this issue from opnsense/src Jun 23, 2024
@fichtner
Copy link
Member

fichtner commented Jun 23, 2024

The code you may want to improve is this:

core/src/etc/rc

Lines 153 to 160 in 0f73da0

if kldstat -qm zfs; then
mount -uw /
zpool import -Na
zfs mount -va
# maybe there is a mountpoint in fstab
# that requires ZFS to be fully set up
mount -a
fi

Do not confuse _load with _enable vars. We don’t use the RC subsystem that much as it tends to interfere with the boot sequence.

@nt1
Copy link
Author

nt1 commented Jun 24, 2024

Adding -c CACHE_FILE to the referenced zpool import and removing changes to /etc/rc.conf fixes my issue. However, I'm fairly certain that adding additional pools worked in a previous release of OPNSense with no changes.

The behavior of zpool import -a changed when OpenZFS was adopted in FreeBSD 13. In prior releases, -a would default to searching for pools in /dev:
https://github.com/freebsd/freebsd-src/blob/b5ad6b488d9e62d820fe90fdce4aee4f4d3d7162/cddl/contrib/opensolaris/cmd/zpool/zpool_main.c#L2634

This changes in OpenZFS, used in 13+, the environment variable ZPOOL_IMPORT_PATH is used as the search path:
https://github.com/freebsd/freebsd-src/blob/5fe9c9de03ef3191d216964bc4d8e427d5ed5720/sys/contrib/openzfs/cmd/zpool/zpool_main.c#L3492

@fichtner
Copy link
Member

Thanks for looking into this. Feel free to provide a PR, but I won't be able to review and commit in the next two weeks.

@fichtner
Copy link
Member

(I'd say that ZPOOL_IMPORT_PATH might be the better path forward as I've avoided -c in the past for portability.)

@fichtner fichtner self-assigned this Jun 24, 2024
@fichtner fichtner added the bug Production bug label Jun 24, 2024
@fichtner fichtner added this to the 24.7 milestone Jun 24, 2024
@nt1
Copy link
Author

nt1 commented Jun 24, 2024

I'll open a PR to fix this this evening. I can work around this issue for now, so no rush on the review.

@nt1
Copy link
Author

nt1 commented Jun 26, 2024

I will propose a change that imports pools from zpool cache at boot (just like vanilla FreeBSD) rather than allowing the implementation of zpool import to dictate what's imported.

Pros of using zpool cache

  1. It's explicit, fstab like behavior.
  2. Consistency across reboots, the system boots with the same pools it had at shutdown.
  3. Prevents importing potentially problematic pools that are connected to the system, but have never been successfully imported.
  4. On appliances, may help deter/prevent tampering with root fs.

Cons of using zpool cache

  1. The existence and purpose of zpool cache does not appear to be widely known or extensively documented. This may surprise some people.
  2. Boot behavior is changing. The end user may see a change in available pools after booting this release (easily resolved by importing missing pools and exporting those that shouldn't be there).
  3. The cache file should be present in every existing install. If, for any reason, the cache file is missing, the system will not boot properly.
  4. The path of the cachefile could change in future FreeBSD/OpenZFS releases (which would require a fix to rc).

@fichtner fichtner modified the milestones: 24.7, 25.1 Jul 24, 2024
@fichtner fichtner linked a pull request Aug 1, 2024 that will close this issue
@fichtner
Copy link
Member

Experiment is 701dff4, likely in 24.7.2.

@doktornotor
Copy link
Contributor

Adding here for reference - this did not go particularly well for some users (as in - unbootable box due to kernel panic) and I'm wondering in which other corner cases this will rear its ugly head.

https://forum.opnsense.org/index.php?topic=42373.msg209269#msg209269
https://forum.opnsense.org/index.php?topic=42387.msg209391#msg209391

@fichtner
Copy link
Member

I'm not sure about the second one. I'm happy to debug this. Though the peripheral nature of leading into panics is a bit astonishing. I still don't know why zpool-import -a doesn't consider looking for devices to be part of "-a". That seems like a harmful oversight.

@doktornotor
Copy link
Contributor

doktornotor commented Aug 22, 2024

Well, guess I'm out of this race. In hunt for the most vintage HW in the closet, I found this Fujitsu server but that one has ATI ES1000 32MB RAM PCI on board and no AGP slot. 😢 😭

And yeah, I'm also absolutely puzzled as for what does zpool import have to do with AGP graphics.

@fichtner
Copy link
Member

It’s a working theory. I don’t mind reverting if necessary but we also need to reproduce this remotely without fatalities. In theory we can install the 24.7.1 debug kernel in a working 24.7.1 and then do a tainted zpool import to get a vmcore….

@doktornotor
Copy link
Contributor

doktornotor commented Aug 24, 2024

Meh, OK - the AGP theory is apparently confirmed. Even set hint.agp.0.disabled=1 in loader prompt got some people back into bootable system - 1, 2

@fichtner
Copy link
Member

Good idea with the device hints helping people to overcome this manually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Production bug
Development

Successfully merging a pull request may close this issue.

3 participants