Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two bugfixes for 2.0.6 #12346

Merged
merged 6 commits into from Aug 31, 2021

Conversation

aerusso
Copy link
Contributor

@aerusso aerusso commented Jul 10, 2021

Description

This MR cherry picks

  • a81b812 Revert Consolidate arc_buf allocation checks
  • 958826b file reference counts can get corrupted

The second was not completely clean (a very minor fixup due to a631283, which change the ZFS device allocation code, making zfsdev_minor_alloc static). @grwilson, could you glance at this and make sure nothing more serious happened since 2.0.5 that would require more changes to your MR? (Obviously, this shouldn't be merged until the aforementioned MR is actually finalized) (Both commits are now been merged.)

Motivation and Context

I suspect these two patches might resolve a bug I'm experiencing since 2.0, and I'd like feedback on the cherry pick before I run it on the production machine --- the only way I have figured out how to reproduce the bug.

How Has This Been Tested?

ZTS locally (a VM). I opened this MR partly for the CI infrastructure feedback.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

aerusso and others added 2 commits July 10, 2021 07:32
This reverts commit 13fac09.

Per the discussion in openzfs#11531, the reverted commit---which intended only
to be a cleanup commit---introduced a subtle, unintended change in
behavior.

Care was taken to partially revert and then reapply 10b3c7f
which would otherwise have caused a conflict.  These changes were
squashed in to this commit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Suggested-by: @chrisrd
Suggested-by: robn@despairlabs.com
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes openzfs#11531
Closes openzfs#12227
Callers of zfs_file_get and zfs_file_put can corrupt the reference
counts for the file structure resulting in a panic or a soft lockup.
When zfs send/recv runs, it will add a reference count to the
open file, and begin to send or recv the stream. If the file descriptor
is closed, then when dmu_recv_stream() or dmu_send() return we will
call zfs_file_put to remove the reference we placed on the file
structure. Unfortunately, because zfs_file_put() uses the file
descriptor to lookup the file structure, it may end up finding that
the file descriptor table no longer contains the file struct, thus
leaking the file structure. Or it might end up finding a file
descriptor for a different file and blindly updating its reference
counts. Other failure modes probably exists.

This change reworks the zfs_file_[get|put] interface to not rely
on the file descriptor but instead pass the zfs_file_t pointer around.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Co-authored-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
External-issue: DLPX-76119
Closes openzfs#12299
@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Jul 13, 2021
@aerusso aerusso changed the title [WIP] Two bugfixes for 2.0.6 Two bugfixes for 2.0.6 Jul 15, 2021
@aerusso
Copy link
Contributor Author

aerusso commented Jul 16, 2021

I'm pinging this because I'll have some time this weekend to see if this MR resolves #11688, but I'd like some expert review before running it on my desktop.

When logging TX_SETATTR, we could otherwise fail to initialize part of
the corresponding ZIL record depending on which fields are present in
the xvattr.  Initialize the creation time and the AV scan timestamp to
zero so that uninitialized bytes are not written to the ZIL.

This was found using KMSAN.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes openzfs#12383
When allocating a record, we round up the allocation size to a multiple
of 8.  In this case, any padding bytes should be zeroed, otherwise the
contents of uninitialized memory are written to the ZIL.

This was found using KMSAN.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes openzfs#12383
When logging a TX_WRITE record in the case where file data has to be
copied from the DMU, we pad the log record size to a multiple of 8
bytes.  In this case, any padding bytes should be zeroed, otherwise the
contents of uninitialized memory are written to the ZIL.

This was found using KMSAN.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes openzfs#12383
It seems nothing ensures that this array is zeroed when a dnode is
freshly allocated, so in principle it retains the values from the
previous allocation.  In practice it seems to be the case that the
fields should end up zeroed, but we can zero the field anyway for
consistency.

This was found using KMSAN.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes openzfs#12383
@tonyhutter
Copy link
Contributor

Sorry for the late response - I got assigned to look at this while I was on vacation. Can you re-submit so buildbot fires off again? Looks like a bunch of the builders never completed.

@aerusso
Copy link
Contributor Author

aerusso commented Aug 20, 2021

I pushed these again--I haven't had a chance to look at the failures.

@tonyhutter tonyhutter merged commit d591c94 into openzfs:zfs-2.0.6-staging Aug 31, 2021
@aerusso aerusso deleted the mrs/206-proposed-2fixes branch January 7, 2023 02:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Code Review Needed Ready for review and testing
Projects
2.0-release
Awaiting triage
Development

Successfully merging this pull request may close these issues.

None yet

5 participants