Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove limit on number of async zio_frees of non-dedup blocks #10000

Merged
merged 1 commit into from Feb 14, 2020

Conversation

ahrens
Copy link
Member

@ahrens ahrens commented Feb 13, 2020

Motivation and Context

The module parameter zfs_async_block_max_blocks limits the number of
blocks that can be freed by the background freeing of filesystems and
snapshots (from "zfs destroy"), in one TXG. This is useful when freeing
dedup blocks, becuase each zio_free() of a dedup block can require an
i/o to read the relevant part of the dedup table (DDT), and will also
dirty that block.

zfs_async_block_max_blocks is set to 100,000 by default. For the more
typical case where dedup is not used, this can have a negative
performance impact on the rate of background freeing (from "zfs
destroy"). For example, with recordsize=8k, and TXG's syncing once
every 5 seconds, we can free only 160MB of data per second, which may be
much less than the rate we can write data.

Description

This change increases zfs_async_block_max_blocks to be unlimited by
default. To address the dedup freeing issue, a new tunable is
introduced, zfs_max_async_dedup_frees, which limits the number of
zio_free()'s of dedup blocks done by background destroys, per txg. The
default is 100,000 free's (same as the old zfs_async_block_max_blocks
default).

How Has This Been Tested?

Without this change, deleting a filesystem with lots of blocks (non-dedup in this case, but behavior is the same for dedup):

1581569878   dsl_scan.c:3366:dsl_process_async_destroys(): freed 100000 blocks in 487ms from free_bpobj/bptree txg 22103; err=85
1581569879   dsl_scan.c:3366:dsl_process_async_destroys(): freed 100000 blocks in 359ms from free_bpobj/bptree txg 22104; err=85
1581569880   dsl_scan.c:3366:dsl_process_async_destroys(): freed 100000 blocks in 535ms from free_bpobj/bptree txg 22105; err=85
...
(takes ~20 txg's and 14 seconds to free 2,000,000 blocks - system is otherwise idle)

With this change, deleting a filesystem with lots of dedup blocks (same behavior as without this change):

1581540878   dsl_scan.c:3375:dsl_process_async_destroys(): freed 100097 blocks in 3565ms from free_bpobj/bptree txg 16765; err=85
1581540967   bptree.c:233:bptree_iterate(): bptree index 0: traversing from min_txg=1 bookmark -1/2/0/1833840
1581540970   dsl_scan.c:3375:dsl_process_async_destroys(): freed 100098 blocks in 2934ms from free_bpobj/bptree txg 16766; err=85
1581541056   bptree.c:233:bptree_iterate(): bptree index 0: traversing from min_txg=1 bookmark -1/2/0/1933840
...
(takes ~10,000 seconds to free 2,000,000 blocks)
(note that the long time between calls to dsl_process_async_destroys() is spent writing the DDT changes)

With this change, deleting a filesystem without dedup:

1581549877   dsl_scan.c:3375:dsl_process_async_destroys(): freed 187576 blocks in 1001ms from free_bpobj/bptree txg 18353; err=85
1581549885   dsl_scan.c:3375:dsl_process_async_destroys(): freed 1137010 blocks in 6000ms from free_bpobj/bptree txg 18354; err=85
1581549889   dsl_scan.c:3375:dsl_process_async_destroys(): freed 725435 blocks in 4204ms from free_bpobj/bptree txg 18355; err=0
(takes 3 txg's and 12 seconds to free 2,000,000 blocks)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the contributing document.
  • I have added tests to cover my changes.
  • I have run the ZFS Test Suite with this change applied.
  • All commit messages are properly formatted and contain Signed-off-by.

The module parameter zfs_async_block_max_blocks limits the number of
blocks that can be freed by the background freeing of filesystems and
snapshots (from "zfs destroy"), in one TXG.  This is useful when freeing
dedup blocks, becuase each zio_free() of a dedup block can require an
i/o to read the relevant part of the dedup table (DDT), and will also
dirty that block.

zfs_async_block_max_blocks is set to 100,000 by default.  For the more
typical case where dedup is not used, this can have a negative
performance impact on the rate of background freeing (from "zfs
destroy").  For example, with recordsize=8k, and TXG's syncing once
every 5 seconds, we can free only 160MB of data per second, which may be
much less than the rate we can write data.

This change increases zfs_async_block_max_blocks to be unlimited by
default.  To address the dedup freeing issue, a new tunable is
introduced, zfs_max_async_dedup_frees, which limits the number of
zio_free()'s of dedup blocks done by background destroys, per txg.  The
default is 100,000 free's (same as the old zfs_async_block_max_blocks
default).

Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
@ikozhukhov
Copy link
Contributor

@ahrens congrats with PR number #10000 :)
i hope this update should fix my issue with destroy of big datasets with recordsize=1M about 400T and more with files about 4g

@scineram
Copy link

Is there any point keeping zfs_async_block_max_blocks around now that there is a nice freeing throttle?

@codecov
Copy link

codecov bot commented Feb 13, 2020

Codecov Report

Merging #10000 into master will increase coverage by 0.57%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10000      +/-   ##
==========================================
+ Coverage   79.10%   79.67%   +0.57%     
==========================================
  Files         385      385              
  Lines      121934   121939       +5     
==========================================
+ Hits        96451    97160     +709     
+ Misses      25483    24779     -704     
Flag Coverage Δ
#kernel 79.78% <100.00%> (+0.25%) ⬆️
#user 67.53% <100.00%> (+1.29%) ⬆️
Impacted Files Coverage Δ
module/zcommon/zfs_uio.c 91.57% <0.00%> (-1.06%) ⬇️
module/zfs/dsl_destroy.c 93.57% <0.00%> (-0.19%) ⬇️
module/zfs/vdev_removal.c 97.58% <0.00%> (-0.12%) ⬇️
lib/libzfs/libzfs_sendrecv.c 75.93% <0.00%> (-0.09%) ⬇️
module/zfs/dsl_dataset.c 92.91% <0.00%> (+0.04%) ⬆️
cmd/zfs/zfs_main.c 82.75% <0.00%> (+0.05%) ⬆️
cmd/zdb/zdb.c 81.79% <0.00%> (+0.05%) ⬆️
module/zfs/dmu_send.c 84.57% <0.00%> (+0.07%) ⬆️
module/zfs/dmu.c 86.59% <0.00%> (+0.11%) ⬆️
module/os/linux/zfs/zio_crypt.c 81.20% <0.00%> (+0.12%) ⬆️
... and 64 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 81acb1e...b5b09d4. Read the comment docs.

@behlendorf behlendorf added Status: Code Review Needed Ready for review and testing Type: Performance Performance improvement or performance problem labels Feb 13, 2020
@ahrens
Copy link
Member Author

ahrens commented Feb 13, 2020

@scineram I think that zfs_async_block_max_blocks is still useful for last-ditch performance tuning, as well as being used by the test suite so that we can be sure to hit the code paths where we pause deletion and come back the next txg.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Feb 14, 2020
@behlendorf behlendorf merged commit 4fe3a84 into openzfs:master Feb 14, 2020
@ahrens ahrens deleted the dedup-free branch February 20, 2020 17:38
jsai20 pushed a commit to jsai20/zfs that referenced this pull request Mar 30, 2021
The module parameter zfs_async_block_max_blocks limits the number of
blocks that can be freed by the background freeing of filesystems and
snapshots (from "zfs destroy"), in one TXG.  This is useful when freeing
dedup blocks, becuase each zio_free() of a dedup block can require an
i/o to read the relevant part of the dedup table (DDT), and will also
dirty that block.

zfs_async_block_max_blocks is set to 100,000 by default.  For the more
typical case where dedup is not used, this can have a negative
performance impact on the rate of background freeing (from "zfs
destroy").  For example, with recordsize=8k, and TXG's syncing once
every 5 seconds, we can free only 160MB of data per second, which may be
much less than the rate we can write data.

This change increases zfs_async_block_max_blocks to be unlimited by
default.  To address the dedup freeing issue, a new tunable is
introduced, zfs_max_async_dedup_frees, which limits the number of
zio_free()'s of dedup blocks done by background destroys, per txg.  The
default is 100,000 free's (same as the old zfs_async_block_max_blocks
default).

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes openzfs#10000
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested) Type: Performance Performance improvement or performance problem
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants