Correct refcount_add in dmu_zfetch #12602

rincebrain · 2021-10-02T14:21:38Z

Motivation and Context

Description

Exploding refcount_add_many(foo, N, ...) into for (i=0; i<N; i++) { refcount_add(foo, ...); }

I'm pretty confident that's what was intended, between the NULL identifier and the lack of any refcount_remove_many calls anywhere in the patches that added it.

(In a separate PR, I've got a patch that lets you turn the panic down to a "whoa now, that's not ideal", but it's tangled up with some hacky things...)

How Has This Been Tested?

A round of zfs-tests.sh -r sanity with reference_tracking_enable=1.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

amotin · 2021-10-02T19:17:20Z

I propose we instead ignore tracking if holder == NULL. I see no point to track NULL pointers, since it won't give us anything useful.

rincebrain · 2021-10-02T19:25:24Z

Special-casing NULL doesn't seem like it adds any benefit, to me.

If you think that the refcount interface is too heavy for this, maybe it shouldn't be a refcount, and it should just be some simple atomic_adds on an int instead of a zfs_refcount_t, or an additional set of function indirections for not tracking, if people really want to use the interface.

But "using it this way panics unless you pass a magic value" seems unfortunate.

amotin · 2021-10-02T23:08:19Z

Proposed replacement of single atomic with multiple atomics in a loop does not buy anything, only wastes time. So either pass there some real pointers useful for debugging, or as you have said replace it with just atomics. Or create the refcount as zfs_refcount_create_untracked().

rincebrain · 2021-10-02T23:44:57Z

Yes, no purpose at all, other than the minor issue of not panicking.

Personally, I think changing it to refcount_create_untracked() would be reasonable, but using refcount_{add,remove}_many() like that here is a bad idea. IMO, if you have to start remembering whether refcount_create versus _untracked was used to create something to know if using it a certain way will work or panic, you should probably use a different interface.

The only place refcount_create_untracked() is currently called is in dnode.c, and there's no _many calls to be found there.

behlendorf · 2021-10-04T18:53:01Z

It seems to me the fix proposed here is right, this is how the interface was intended to be used. The last thing we want is for callers to need to care if it the refcount was created as tracked or untracked. Switching to the atomic interfaces seems like the right way to handle this if we think the added loop will measurably impact performance. But I'm fine with the change as is since it fixes the core issue.

amotin · 2021-10-04T18:58:22Z

It increases number of atomics there from one per call if any new block was prefetched to one per block prefetched. It may be not dramatic (though somebody could test this with recordsize = 4KB) and may be "right", but it is absolutely pointless with NULL as a pointer. Either pass there some useful piointer or value for it to make sense or just axe out the tracking one way or another.

behlendorf · 2021-10-06T23:04:11Z

Either passing a pointer value or switching to the atomic interface would by fine with me.

rincebrain · 2021-10-07T06:16:14Z

If I hadn't already written the patch this way initially and decided not to include it because it wasn't necessary, I'd probably have just closed the PR at this point.

"I hate how I wrote this code 7 months ago, please go rewrite every callsite with a cosmetic change to get your one-line fix in" is a great way to discourage contributors.

amotin · 2021-10-07T15:51:24Z

Rich, I appreciate your wish to fix the problem. I am just trying to get maximum of it. Don't take it wrong.

Speaking about the last revision, I am not sure what additional benefit gives passing identical zfetch_t pointer every time? In what way it should help the tracker to do helpful things? Tracker supposed to help find lost references, references dropped more than once, and in that context passing unique pointer (like pointer specific ZIO or data block) allows to find out what is that block and possibly trace its history. Passing same pointer everywhere won't protect from that. It is only a pointer to the zfetch_t, which may already be available from other places. It may be better than nothing, but is it ehough? Unfortunately to this code zfetch operates only with block numbers, not the block themselves. We could probably pass the block numbers there instead of pointer, but there could be few tricks on that route, plus it is a bit less useful for debugging. That is why I'd prefer tracking to be disabled here, or just switch to atomics. If it is too much to ask of you, I may do it myself one day.

rincebrain · 2021-10-07T16:21:10Z

I'm glad you think this should be improved.

I would suggest that a two-line fix for a panic in the code you wrote is not the correct place to plan or request that.

@behlendorf please merge this or the prior version without unnecessary changes if you like; if you'd like all the calls reconstructed, please let me know, and I'll just close the PR and keep it in my local branch so I can use the feature without a panic.

amotin · 2021-10-07T17:18:37Z

I've already told what I think about the previous patch. I'd prefer this refcount to be marked as untracked with respective comment, until it is rewritten to atomics, unless you wish to do it now. I am going on vacation in few hours, otherwise I'd just done it myself.

rincebrain · 2021-10-07T17:34:04Z

@jwk404 Unless you chose to close this in a pretty unusual way, I think that commit had the wrong Closes...

behlendorf · 2021-10-07T18:25:43Z

@rincebrain if you can restore the original patch let's proceed with that minimal fix to resolve the ASSERT when tracking is enabled. Then when @amotin returns from vacation he can propose a PR which either tracks some pointer values (which I think we all agree would be more useful) or simply switchs it all to atomics.

I've gone ahead and reopened this issue which was close due to that commit typo.

refcount_add_many(foo,N) is not the same as for (i=0; i < N; i++) { refcount_add(foo); } Unfortunately, this is only actually true with debug kernels and reference_tracking_enable=1. Signed-off-by: Rich Ercolani <rincebrain@gmail.com>

rincebrain · 2021-10-07T18:32:41Z

@rincebrain if you can restore the original patch let's proceed with that minimal fix to resolve the ASSERT when tracking is enabled. Then when @amotin returns from vacation he can propose a PR which either tracks some pointer values (which I think we all agree would be more useful) or simply switchs it all to atomics.

I've gone ahead and reopened this issue which was close due to that commit typo.

Sure, tracking a useful value would be helpful for anyone trying to use the functionality moving forward. Reworking that just seemed much more like "future work".

Pushed.

rincebrain · 2021-10-07T19:04:16Z

...that's very strange. Both the two runs of sanity that I triggered trying to push the old version of the commit without vacuously updating it (which triggered running against 15cc4ed) and against 4b99eaf all failed on the same tests in destroy (among others).

I'm reasonably confident I did not somehow break it between the first time I pushed this before and now, so...not sure why they're failing suddenly when they didn't before?

I'll go see about reproducing this on my 20.04 VM...

behlendorf · 2021-10-07T20:48:49Z

I just started seeing these same failures in other PRs so I'm pretty sure it's unrelated. But I'm not sure what changed.

rincebrain · 2021-10-07T20:53:33Z

When it passed originally, it was on the 20210929.1 snapshot. Failing now is on 20211004.1.

Perhaps something is rotten there.

edit: This run was on 20210929.1 and was the last in the long line of passing, then the very next one where it started failing was 20211004.1...and the outlier that succeeded later was also 20210929.1.

refcount_add_many(foo,N) is not the same as for (i=0; i < N; i++) { refcount_add(foo); } Unfortunately, this is only actually true with debug kernels and reference_tracking_enable=1. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes openzfs#12589 Closes openzfs#12602

For those not already familiar with the code base it can be a challenge to understand how the libraries are laid out. This has sometimes resulted in functionality being added in the wrong place. To help avoid that in the future this commit documents the high-level dependencies for easy reference in lib/Makefile.am. It also simplifies a few things. - Switched libzpool dependency on libzfs_core to libzutil. This change makes it clear libzpool should never depend on the ioctl() functionality provided by libzfs_core. - Moved zfs_ioctl_fd() from libzutil to libzfs_core and renamed it lzc_ioctl_fd(). Normal access to the kmods should all be funneled through the libzfs_core library. The sole exception is the pool_active() which was updated to not use lzc_ioctl_fd() to remove the libzfs_core dependency. - Removed libzfs_core dependency on libzutil. - Removed the lib/libzfs/os/freebsd/libzfs_ioctl_compat.c source file which was all dead code. - Removed libzfs_core dependency from mkbusy and ctime test utilities. It was only needed for some trivial wrapper functions and that code is easy to replicate to shed the unneeded dependency. Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#12602

behlendorf approved these changes Oct 4, 2021

View reviewed changes

behlendorf added the Status: Code Review Needed Ready for review and testing label Oct 5, 2021

ahrens assigned mmaybee Oct 6, 2021

rincebrain force-pushed the tracking branch from 306d01c to c77f362 Compare October 7, 2021 06:11

jwk404 closed this in 514498f Oct 7, 2021

behlendorf reopened this Oct 7, 2021

rincebrain force-pushed the tracking branch from c77f362 to 15cc4ed Compare October 7, 2021 18:29

Correct refcount_add in dmu_zfetch

4b99eaf

refcount_add_many(foo,N) is not the same as for (i=0; i < N; i++) { refcount_add(foo); } Unfortunately, this is only actually true with debug kernels and reference_tracking_enable=1. Signed-off-by: Rich Ercolani <rincebrain@gmail.com>

rincebrain force-pushed the tracking branch from 15cc4ed to 4b99eaf Compare October 7, 2021 18:30

behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Oct 7, 2021

behlendorf merged commit 9d1407e into openzfs:master Oct 8, 2021

This was referenced Sep 1, 2022

Simplify and document OpenZFS library dependencies iomesh/zfs#1

Closed

Simplify and document OpenZFS library dependencies iomesh/zfs#2

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correct refcount_add in dmu_zfetch #12602

Correct refcount_add in dmu_zfetch #12602

rincebrain commented Oct 2, 2021

amotin commented Oct 2, 2021

rincebrain commented Oct 2, 2021

amotin commented Oct 2, 2021

rincebrain commented Oct 2, 2021

behlendorf commented Oct 4, 2021

amotin commented Oct 4, 2021 •

edited

behlendorf commented Oct 6, 2021

rincebrain commented Oct 7, 2021

amotin commented Oct 7, 2021

rincebrain commented Oct 7, 2021

amotin commented Oct 7, 2021

rincebrain commented Oct 7, 2021

behlendorf commented Oct 7, 2021

rincebrain commented Oct 7, 2021

rincebrain commented Oct 7, 2021

behlendorf commented Oct 7, 2021 •

edited

rincebrain commented Oct 7, 2021 •

edited

Correct refcount_add in dmu_zfetch #12602

Correct refcount_add in dmu_zfetch #12602

Conversation

rincebrain commented Oct 2, 2021

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

amotin commented Oct 2, 2021

rincebrain commented Oct 2, 2021

amotin commented Oct 2, 2021

rincebrain commented Oct 2, 2021

behlendorf commented Oct 4, 2021

amotin commented Oct 4, 2021 • edited

behlendorf commented Oct 6, 2021

rincebrain commented Oct 7, 2021

amotin commented Oct 7, 2021

rincebrain commented Oct 7, 2021

amotin commented Oct 7, 2021

rincebrain commented Oct 7, 2021

behlendorf commented Oct 7, 2021

rincebrain commented Oct 7, 2021

rincebrain commented Oct 7, 2021

behlendorf commented Oct 7, 2021 • edited

rincebrain commented Oct 7, 2021 • edited

amotin commented Oct 4, 2021 •

edited

behlendorf commented Oct 7, 2021 •

edited

rincebrain commented Oct 7, 2021 •

edited