Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs_rmdir() can use zap_count() instead of cached z_size. this #6987

Closed
wants to merge 1 commit into from

Conversation

bzzz77
Copy link
Contributor

@bzzz77 bzzz77 commented Dec 20, 2017

[RFC] a proto to discuss with the developers
allows Lustre to skip z_size updates when a non-dir entriees
being added/removed. this in turn improves performance as SA
update is quite expensive.

Description

Motivation and Context

How Has This Been Tested?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • All commit messages are properly formatted and contain Signed-off-by.
  • Change has been approved by a ZFS on Linux member.

* we want to check ZAP's internal counter as some backends
* (Lustre) do not maintain z_size properly
*/
error = zap_count(zfsvfs->z_os, dzp->z_id, &count);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't quite right because the directory zaps don't include entries for . and ... There entries get added as needed. For example, here's what a directory with 3 files is expected to look like.

$ zdb -dddd tank/fs 2
Dataset tank/fs [ZPL], ID 166, cr_txg 3883, 28K, 18 objects, rootbp DVA[0]=<0:ab617000:200> DVA[1]=<0:4104ea00:200> [L0 DMU objset] fletcher4 lz4 unencrypted LE contiguous unique double size=800L/200P birth=3898L/3898P fill=18 cksum=c964b60f5:4a9ec5d2a2d:e4884d68ed8a:1e0cbd346c3dfe

    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
         2    1   128K    512      0     512    512  100.00  ZFS directory
                                               176   bonus  System attributes
	dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED 
	dnode maxblkid: 0
	uid     0
	gid     0
	atime	Thu Dec 21 17:28:06 2017
	mtime	Thu Dec 21 17:28:13 2017
	ctime	Thu Dec 21 17:28:13 2017
	crtime	Thu Dec 21 17:28:06 2017
	gen	3885
	mode	40777
	size	5
	parent	34
	links	2
	pflags	40800000144
	xattr	3
	microzap: 512 bytes, 3 entries

		f1 = 5 (type: Regular File)
		f2 = 128 (type: Regular File)
		f3 = 8 (type: Regular File)

Leaving that issue aside, I have some reservations about calling zap_count() here which can be potentially quite expensive. What if instead we did a similar check when the directory was first accessed and automatically reconciled the link count with the zap size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@behlendorf first accessed with a given dzp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as for performance impact.. my thought was that rmdir isn't very frequent operation (compared to unlink, for example). though ideally the whole thing should be using _bydnode, I guess.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent some time investigating if there was a better place for this check and came around to the same conclusion. Since rmdir is such a relatively infrequent operation I'm OK with putting it here.

I did a little benchmarking and determined the overhead isn't as bad at all. Here's my performance data with and without the (slightly modifed) version of this PR. The chart shows total rmdir time averaged over 3 runs, it appears this won't have any significant performance impact.

Files Master Issue-6987
100 0.146 0.143
1000 1.536 1.633
10,000 16.410 14.750
100,000 176.993 177.606

Can you review this slightly modified version of your original patch, and if you're OK with it refresh this PR with it.

/*
 * Indicate whether the directory is empty.  Works with or without z_lock
 * held, but can only be consider a hint in the latter case.  Returns true
 * if only "." and ".." remain and there's no work in progress.
 *
 * The internal ZAP size, rather than zp->z_size, needs to be checked since
 * some consumers (Lustre) do not strictly maintain an accurate SA_ZPL_SIZE.
 */
boolean_t
zfs_dirempty(znode_t *dzp)
{
        zfsvfs_t *zfsvfs = ZTOZSB(dzp);
        uint64_t count;
        int error;

        if (dzp->z_dirlocks != NULL)
                return (B_FALSE);

        error = zap_count(zfsvfs->z_os, dzp->z_id, &count);
        if (error || count != 0)
                return (B_FALSE);

        return (B_TRUE);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@behlendorf looks fine to me. thanks a lot!

allows Lustre to skip z_size updates when a non-dir entriees
being added/removed. this in turn improves performance as SA
update is quite expensive.
@codecov
Copy link

codecov bot commented Dec 27, 2017

Codecov Report

Merging #6987 into master will increase coverage by 0.2%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #6987     +/-   ##
=========================================
+ Coverage   75.33%   75.54%   +0.2%     
=========================================
  Files         296      296             
  Lines       95500    95457     -43     
=========================================
+ Hits        71945    72109    +164     
+ Misses      23555    23348    -207
Flag Coverage Δ
#kernel 74.94% <100%> (+0.39%) ⬆️
#user 68.01% <ø> (+0.32%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 823d48b...3e19b4e. Read the comment docs.

@behlendorf
Copy link
Contributor

Replaced by #7019, @bzzz77 can you please give the updated patch a final review.

@behlendorf behlendorf closed this Jan 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants