Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct refcount_add in dmu_zfetch #12602

Merged
merged 1 commit into from Oct 8, 2021
Merged

Conversation

rincebrain
Copy link
Contributor

Motivation and Context

#12589

Description

Exploding refcount_add_many(foo, N, ...) into for (i=0; i<N; i++) { refcount_add(foo, ...); }

I'm pretty confident that's what was intended, between the NULL identifier and the lack of any refcount_remove_many calls anywhere in the patches that added it.

(In a separate PR, I've got a patch that lets you turn the panic down to a "whoa now, that's not ideal", but it's tangled up with some hacky things...)

How Has This Been Tested?

A round of zfs-tests.sh -r sanity with reference_tracking_enable=1.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@amotin
Copy link
Member

amotin commented Oct 2, 2021

I propose we instead ignore tracking if holder == NULL. I see no point to track NULL pointers, since it won't give us anything useful.

@rincebrain
Copy link
Contributor Author

Special-casing NULL doesn't seem like it adds any benefit, to me.

If you think that the refcount interface is too heavy for this, maybe it shouldn't be a refcount, and it should just be some simple atomic_adds on an int instead of a zfs_refcount_t, or an additional set of function indirections for not tracking, if people really want to use the interface.

But "using it this way panics unless you pass a magic value" seems unfortunate.

@amotin
Copy link
Member

amotin commented Oct 2, 2021

Proposed replacement of single atomic with multiple atomics in a loop does not buy anything, only wastes time. So either pass there some real pointers useful for debugging, or as you have said replace it with just atomics. Or create the refcount as zfs_refcount_create_untracked().

@rincebrain
Copy link
Contributor Author

Yes, no purpose at all, other than the minor issue of not panicking.

Personally, I think changing it to refcount_create_untracked() would be reasonable, but using refcount_{add,remove}_many() like that here is a bad idea. IMO, if you have to start remembering whether refcount_create versus _untracked was used to create something to know if using it a certain way will work or panic, you should probably use a different interface.

The only place refcount_create_untracked() is currently called is in dnode.c, and there's no _many calls to be found there.

@behlendorf
Copy link
Contributor

It seems to me the fix proposed here is right, this is how the interface was intended to be used. The last thing we want is for callers to need to care if it the refcount was created as tracked or untracked. Switching to the atomic interfaces seems like the right way to handle this if we think the added loop will measurably impact performance. But I'm fine with the change as is since it fixes the core issue.

@amotin
Copy link
Member

amotin commented Oct 4, 2021

It increases number of atomics there from one per call if any new block was prefetched to one per block prefetched. It may be not dramatic (though somebody could test this with recordsize = 4KB) and may be "right", but it is absolutely pointless with NULL as a pointer. Either pass there some useful piointer or value for it to make sense or just axe out the tracking one way or another.

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Oct 5, 2021
@behlendorf
Copy link
Contributor

Either passing a pointer value or switching to the atomic interface would by fine with me.

@rincebrain
Copy link
Contributor Author

If I hadn't already written the patch this way initially and decided not to include it because it wasn't necessary, I'd probably have just closed the PR at this point.

"I hate how I wrote this code 7 months ago, please go rewrite every callsite with a cosmetic change to get your one-line fix in" is a great way to discourage contributors.

@amotin
Copy link
Member

amotin commented Oct 7, 2021

Rich, I appreciate your wish to fix the problem. I am just trying to get maximum of it. Don't take it wrong.

Speaking about the last revision, I am not sure what additional benefit gives passing identical zfetch_t pointer every time? In what way it should help the tracker to do helpful things? Tracker supposed to help find lost references, references dropped more than once, and in that context passing unique pointer (like pointer specific ZIO or data block) allows to find out what is that block and possibly trace its history. Passing same pointer everywhere won't protect from that. It is only a pointer to the zfetch_t, which may already be available from other places. It may be better than nothing, but is it ehough? Unfortunately to this code zfetch operates only with block numbers, not the block themselves. We could probably pass the block numbers there instead of pointer, but there could be few tricks on that route, plus it is a bit less useful for debugging. That is why I'd prefer tracking to be disabled here, or just switch to atomics. If it is too much to ask of you, I may do it myself one day.

@rincebrain
Copy link
Contributor Author

I'm glad you think this should be improved.

I would suggest that a two-line fix for a panic in the code you wrote is not the correct place to plan or request that.

@behlendorf please merge this or the prior version without unnecessary changes if you like; if you'd like all the calls reconstructed, please let me know, and I'll just close the PR and keep it in my local branch so I can use the feature without a panic.

@amotin
Copy link
Member

amotin commented Oct 7, 2021

I've already told what I think about the previous patch. I'd prefer this refcount to be marked as untracked with respective comment, until it is rewritten to atomics, unless you wish to do it now. I am going on vacation in few hours, otherwise I'd just done it myself.

@jwk404 jwk404 closed this in 514498f Oct 7, 2021
@rincebrain
Copy link
Contributor Author

@jwk404 Unless you chose to close this in a pretty unusual way, I think that commit had the wrong Closes...

@behlendorf
Copy link
Contributor

@rincebrain if you can restore the original patch let's proceed with that minimal fix to resolve the ASSERT when tracking is enabled. Then when @amotin returns from vacation he can propose a PR which either tracks some pointer values (which I think we all agree would be more useful) or simply switchs it all to atomics.

I've gone ahead and reopened this issue which was close due to that commit typo.

refcount_add_many(foo,N) is not the same as
for (i=0; i < N; i++) { refcount_add(foo); }

Unfortunately, this is only actually true with debug kernels and
reference_tracking_enable=1.

Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
@rincebrain
Copy link
Contributor Author

@rincebrain if you can restore the original patch let's proceed with that minimal fix to resolve the ASSERT when tracking is enabled. Then when @amotin returns from vacation he can propose a PR which either tracks some pointer values (which I think we all agree would be more useful) or simply switchs it all to atomics.

I've gone ahead and reopened this issue which was close due to that commit typo.

Sure, tracking a useful value would be helpful for anyone trying to use the functionality moving forward. Reworking that just seemed much more like "future work".

Pushed.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Oct 7, 2021
@rincebrain
Copy link
Contributor Author

...that's very strange. Both the two runs of sanity that I triggered trying to push the old version of the commit without vacuously updating it (which triggered running against 15cc4ed) and against 4b99eaf all failed on the same tests in destroy (among others).

I'm reasonably confident I did not somehow break it between the first time I pushed this before and now, so...not sure why they're failing suddenly when they didn't before?

I'll go see about reproducing this on my 20.04 VM...

@behlendorf
Copy link
Contributor

behlendorf commented Oct 7, 2021

I just started seeing these same failures in other PRs so I'm pretty sure it's unrelated. But I'm not sure what changed.

@rincebrain
Copy link
Contributor Author

rincebrain commented Oct 7, 2021

When it passed originally, it was on the 20210929.1 snapshot. Failing now is on 20211004.1.

Perhaps something is rotten there.

edit: This run was on 20210929.1 and was the last in the long line of passing, then the very next one where it started failing was 20211004.1...and the outlier that succeeded later was also 20210929.1.

@behlendorf behlendorf merged commit 9d1407e into openzfs:master Oct 8, 2021
rincebrain added a commit to rincebrain/zfs that referenced this pull request Dec 21, 2021
refcount_add_many(foo,N) is not the same as
for (i=0; i < N; i++) { refcount_add(foo); }

Unfortunately, this is only actually true with debug kernels and
reference_tracking_enable=1.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#12589 
Closes openzfs#12602
hpingfs pushed a commit to hpingfs/zfs that referenced this pull request Aug 26, 2022
For those not already familiar with the code base it can be a
challenge to understand how the libraries are laid out.  This
has sometimes resulted in functionality being added in the
wrong place.  To help avoid that in the future this commit
documents the high-level dependencies for easy reference in
lib/Makefile.am.  It also simplifies a few things.

- Switched libzpool dependency on libzfs_core to libzutil.
  This change makes it clear libzpool should never depend
  on the ioctl() functionality provided by libzfs_core.

- Moved zfs_ioctl_fd() from libzutil to libzfs_core and
  renamed it lzc_ioctl_fd().  Normal access to the kmods
  should all be funneled through the libzfs_core library.
  The sole exception is the pool_active() which was updated
  to not use lzc_ioctl_fd() to remove the libzfs_core
  dependency.

- Removed libzfs_core dependency on libzutil.

- Removed the lib/libzfs/os/freebsd/libzfs_ioctl_compat.c
  source file which was all dead code.

- Removed libzfs_core dependency from mkbusy and ctime
  test utilities.  It was only needed for some trivial
  wrapper functions and that code is easy to replicate
  to shed the unneeded dependency.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#12602
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants