New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zfs_rmdir() can use zap_count() instead of cached z_size. this #6987
Conversation
| * we want to check ZAP's internal counter as some backends | ||
| * (Lustre) do not maintain z_size properly | ||
| */ | ||
| error = zap_count(zfsvfs->z_os, dzp->z_id, &count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't quite right because the directory zaps don't include entries for . and ... There entries get added as needed. For example, here's what a directory with 3 files is expected to look like.
$ zdb -dddd tank/fs 2
Dataset tank/fs [ZPL], ID 166, cr_txg 3883, 28K, 18 objects, rootbp DVA[0]=<0:ab617000:200> DVA[1]=<0:4104ea00:200> [L0 DMU objset] fletcher4 lz4 unencrypted LE contiguous unique double size=800L/200P birth=3898L/3898P fill=18 cksum=c964b60f5:4a9ec5d2a2d:e4884d68ed8a:1e0cbd346c3dfe
Object lvl iblk dblk dsize dnsize lsize %full type
2 1 128K 512 0 512 512 100.00 ZFS directory
176 bonus System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED
dnode maxblkid: 0
uid 0
gid 0
atime Thu Dec 21 17:28:06 2017
mtime Thu Dec 21 17:28:13 2017
ctime Thu Dec 21 17:28:13 2017
crtime Thu Dec 21 17:28:06 2017
gen 3885
mode 40777
size 5
parent 34
links 2
pflags 40800000144
xattr 3
microzap: 512 bytes, 3 entries
f1 = 5 (type: Regular File)
f2 = 128 (type: Regular File)
f3 = 8 (type: Regular File)
Leaving that issue aside, I have some reservations about calling zap_count() here which can be potentially quite expensive. What if instead we did a similar check when the directory was first accessed and automatically reconciled the link count with the zap size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@behlendorf first accessed with a given dzp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as for performance impact.. my thought was that rmdir isn't very frequent operation (compared to unlink, for example). though ideally the whole thing should be using _bydnode, I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent some time investigating if there was a better place for this check and came around to the same conclusion. Since rmdir is such a relatively infrequent operation I'm OK with putting it here.
I did a little benchmarking and determined the overhead isn't as bad at all. Here's my performance data with and without the (slightly modifed) version of this PR. The chart shows total rmdir time averaged over 3 runs, it appears this won't have any significant performance impact.
| Files | Master | Issue-6987 |
|---|---|---|
| 100 | 0.146 | 0.143 |
| 1000 | 1.536 | 1.633 |
| 10,000 | 16.410 | 14.750 |
| 100,000 | 176.993 | 177.606 |
Can you review this slightly modified version of your original patch, and if you're OK with it refresh this PR with it.
/*
* Indicate whether the directory is empty. Works with or without z_lock
* held, but can only be consider a hint in the latter case. Returns true
* if only "." and ".." remain and there's no work in progress.
*
* The internal ZAP size, rather than zp->z_size, needs to be checked since
* some consumers (Lustre) do not strictly maintain an accurate SA_ZPL_SIZE.
*/
boolean_t
zfs_dirempty(znode_t *dzp)
{
zfsvfs_t *zfsvfs = ZTOZSB(dzp);
uint64_t count;
int error;
if (dzp->z_dirlocks != NULL)
return (B_FALSE);
error = zap_count(zfsvfs->z_os, dzp->z_id, &count);
if (error || count != 0)
return (B_FALSE);
return (B_TRUE);
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@behlendorf looks fine to me. thanks a lot!
allows Lustre to skip z_size updates when a non-dir entriees being added/removed. this in turn improves performance as SA update is quite expensive.
0eb0f52
to
3e19b4e
Compare
Codecov Report
@@ Coverage Diff @@
## master #6987 +/- ##
=========================================
+ Coverage 75.33% 75.54% +0.2%
=========================================
Files 296 296
Lines 95500 95457 -43
=========================================
+ Hits 71945 72109 +164
+ Misses 23555 23348 -207
Continue to review full report at Codecov.
|
[RFC] a proto to discuss with the developers
allows Lustre to skip z_size updates when a non-dir entriees
being added/removed. this in turn improves performance as SA
update is quite expensive.
Description
Motivation and Context
How Has This Been Tested?
Types of changes
Checklist:
Signed-off-by.