Skip to content
/ linux Public

Commit fcb2618

Browse files
fdmananagregkh
authored andcommitted
btrfs: fix reclaimed bytes accounting after automatic block group reclaim
[ Upstream commit 6207687 ] We are considering the used bytes counter of a block group as the amount to update the space info's reclaim bytes counter after relocating the block group, but this value alone is often not enough. This is because we may have a reserved extent (or more) and in that case its size is reflected in the reserved counter of the block group - the size of the extent is only transferred from the reserved counter to the used counter of the block group when the delayed ref for the extent is run - typically when committing the transaction (or when flushing delayed refs due to ENOSPC on space reservation). Such call chain for data extents is: btrfs_run_delayed_refs_for_head() run_one_delayed_ref() run_delayed_data_ref() alloc_reserved_file_extent() alloc_reserved_extent() btrfs_update_block_group() -> transfers the extent size from the reserved counter to the used counter For metadata extents: btrfs_run_delayed_refs_for_head() run_one_delayed_ref() run_delayed_tree_ref() alloc_reserved_tree_block() alloc_reserved_extent() btrfs_update_block_group() -> transfers the extent size from the reserved counter to the used counter Since relocation flushes delalloc, waits for ordered extent completion and commits the current transaction before doing the actual relocation work, the correct amount of reclaimed space is therefore the sum of the "used" and "reserved" counters of the block group before we call btrfs_relocate_chunk() at btrfs_reclaim_bgs_work(). So fix this by taking the "reserved" counter into consideration. Fixes: 243192b ("btrfs: report reclaim stats in sysfs") Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Stable-dep-of: 19eff93 ("btrfs: fix periodic reclaim condition") Signed-off-by: Sasha Levin <sashal@kernel.org>
1 parent c41742d commit fcb2618

File tree

1 file changed

+21
-7
lines changed

1 file changed

+21
-7
lines changed

fs/btrfs/block-group.c

Lines changed: 21 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1878,6 +1878,7 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
18781878
while (!list_empty(&fs_info->reclaim_bgs)) {
18791879
u64 zone_unusable;
18801880
u64 used;
1881+
u64 reserved;
18811882
int ret = 0;
18821883

18831884
bg = list_first_entry(&fs_info->reclaim_bgs,
@@ -1974,21 +1975,32 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
19741975
goto next;
19751976

19761977
/*
1977-
* Grab the used bytes counter while holding the block group's
1978-
* spinlock to prevent races with tasks concurrently updating it
1979-
* due to extent allocation and deallocation (running
1980-
* btrfs_update_block_group()) - we have set the block group to
1981-
* RO but that only prevents extent reservation, allocation
1982-
* happens after reservation.
1978+
* The amount of bytes reclaimed corresponds to the sum of the
1979+
* "used" and "reserved" counters. We have set the block group
1980+
* to RO above, which prevents reservations from happening but
1981+
* we may have existing reservations for which allocation has
1982+
* not yet been done - btrfs_update_block_group() was not yet
1983+
* called, which is where we will transfer a reserved extent's
1984+
* size from the "reserved" counter to the "used" counter - this
1985+
* happens when running delayed references. When we relocate the
1986+
* chunk below, relocation first flushes dellaloc, waits for
1987+
* ordered extent completion (which is where we create delayed
1988+
* references for data extents) and commits the current
1989+
* transaction (which runs delayed references), and only after
1990+
* it does the actual work to move extents out of the block
1991+
* group. So the reported amount of reclaimed bytes is
1992+
* effectively the sum of the 'used' and 'reserved' counters.
19831993
*/
19841994
spin_lock(&bg->lock);
19851995
used = bg->used;
1996+
reserved = bg->reserved;
19861997
spin_unlock(&bg->lock);
19871998

19881999
btrfs_info(fs_info,
1989-
"reclaiming chunk %llu with %llu%% used %llu%% unusable",
2000+
"reclaiming chunk %llu with %llu%% used %llu%% reserved %llu%% unusable",
19902001
bg->start,
19912002
div64_u64(used * 100, bg->length),
2003+
div64_u64(reserved * 100, bg->length),
19922004
div64_u64(zone_unusable * 100, bg->length));
19932005
trace_btrfs_reclaim_block_group(bg);
19942006
ret = btrfs_relocate_chunk(fs_info, bg->start);
@@ -1997,6 +2009,7 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
19972009
btrfs_err(fs_info, "error relocating chunk %llu",
19982010
bg->start);
19992011
used = 0;
2012+
reserved = 0;
20002013
spin_lock(&space_info->lock);
20012014
space_info->reclaim_errors++;
20022015
if (READ_ONCE(space_info->periodic_reclaim))
@@ -2006,6 +2019,7 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
20062019
spin_lock(&space_info->lock);
20072020
space_info->reclaim_count++;
20082021
space_info->reclaim_bytes += used;
2022+
space_info->reclaim_bytes += reserved;
20092023
spin_unlock(&space_info->lock);
20102024

20112025
next:

0 commit comments

Comments
 (0)