Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bandwidth degradation on sequential write on a file #62

Open
naota opened this issue Mar 8, 2022 · 3 comments
Open

bandwidth degradation on sequential write on a file #62

naota opened this issue Mar 8, 2022 · 3 comments

Comments

@naota
Copy link
Owner

naota commented Mar 8, 2022

The bandwidth decreases while running the following fio command.

fio --filename=${MNT}/testfile --direct=1 \
        --rw=write --bs=256k \
        --ioengine=libaio --iodepth=1 \
        --fallocate=none \
        --write_bw_log=bw --write_lat_log=lat --write_iops_log=iops \
        --log_avg_msec=1000 \
        --numjobs=1 --group_reporting --name=fio-seq-write \
        --size=400GiB

At first, it's around 910 MiB/s, but in the end, it decreases to 420 MiB/s.

@naota
Copy link
Owner Author

naota commented Mar 8, 2022

The patch below improves the final bandwidth to 860 MiB/s.
However, the modified place is too intrusive for the regular allocator. We need to contain the check into do_allocation_zoned()

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 6aa92f84f465..a49196fc755a 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4319,7 +4319,8 @@ static noinline int find_free_extent(struct btrfs_root *root,
 		struct btrfs_block_group *bg_ret;
 
 		/* If the block group is read-only, we can skip it entirely. */
-		if (unlikely(block_group->ro)) {
+		if (unlikely(block_group->ro) ||
+		    block_group->alloc_offset == block_group->zone_capacity) {
 			if (ffe_ctl->for_treelog)
 				btrfs_clear_treelog_bg(block_group);
 			if (ffe_ctl->for_data_reloc)

@naota
Copy link
Owner Author

naota commented Mar 8, 2022

So, the potential fix is like this.

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 6aa92f84f465..1c566f31ff89 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3774,6 +3774,14 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group,
 
 	ASSERT(btrfs_is_zoned(block_group->fs_info));
 
+	if (block_group->alloc_offset == block_group->zone_capacity) {
+		if (ffe_ctl->for_treelog)
+			btrfs_clear_treelog_bg(block_group);
+		if (ffe_ctl->for_data_reloc)
+			btrfs_clear_data_reloc_bg(block_group);
+		return 1;
+	}
+
 	/*
 	 * Do not allow non-tree-log blocks in the dedicated tree-log block
 	 * group, and vice versa.

However, the effectiveness of this patch means that we are not hitting a good block group with the given hint_bytes.

@naota
Copy link
Owner Author

naota commented Mar 8, 2022

The hint for a file extent is set from here.

https://github.com/kdave/btrfs-devel/blob/master/fs/btrfs/inode.c#L1077-L1088

When writing to a non-pre-allocated file, the hint is set to the logical address of the file beginning. When the file size is huge, that hint points to a too far block group from a non-full block group.

As a result, find_free_extent() need to iterate over filled BGs to reach the non-full BG to allocate an extent. That also cause the performance degradataion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant