discourage seeding remount workflow #462

boryas · 2022-04-14T17:30:41Z

Don't want to erase it entirely from the doc, since it is the ideal way to use it ultimately. However, the bug is significant, so I think it's our responsibility to encourage users that can use umount/mount (e.g. in an initrd, not for root fs) to use it.

…tree block read We used to use read_whole_eb() to read super block, but it's no longer the case (so long that I can not even find out which commit did the conversion). Thus there is no need for read_whole_eb() to handle super block read anymore. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

For metadata restore, our logical address is mapped to a single device with logical address 1:1 mapped to device physical address. Move this part of code into a helper, this will make later extent buffer read path refactoer much easier. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

There are two call sites using write_extent_to_disk() directly: - debug_corrupt_block() in btrfs-corrupt-block.c - corrupt_keys() in btrfs-corrupt-block.c The problem of write_extent_to_disk() is, it can only handle plain profiles (All profiles except P/Q stripes of RAID56). Calling it directly can corrupted RAID56 P/Q, and in the future we're going to remove eb::fd/eb::dev_bytes, so remove such call sites with write_and_map_eb(). Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

Function write_extent_to_disk() is just writing the content of a tree block to disk. It can not handle RAID56, and its work is the same as write_data_to_disk(). Thus we can replace write_extent_to_disk() with write_data_to_disk() easily. There is only one special call site in write_raid56_with_parity(), which can easily be replace with btrfs_pwrite() directly. This reduce the write entrance, and make later eb::fd removal easier. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

…sk() and replace read_extent_data() The function read_extent_from_disk() is only a wrapper to read tree block. And read_extent_data() is just a while loop to eliminate short read caused by stripe boundary. In fact, a lot of call sites of read_extent_data() are either reading metadata (thus no possible short read) or doing extra loop by themselves. This patch will replace those two functions with read_data_from_disk(), making it the only entrance for data/metadata read. And update read_data_from_disk() to return the read bytes, so caller can do a simple while loop. For the few callers of read_extent_data(), open-code a small while loop for them. This will allow later RAID56 read repair using P/Q much easier. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

Those two members are a shortcut for non-RAID56 profiles. But we should not use such shortcut, and move all our logical address read/write to the unified read_data_from_disk()/write_data_to_disk(). With previous refactors, now we're safe to remove them. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

This new ability is added by: - Allow btrfs_map_block() to return the chunk type This makes later work much easier - Only reset stripe offset inside btrfs_map_block() when needed Currently if @raid_map is not NULL, btrfs_map_block() will consider this call is for WRITE and will reset stripe offset. This is no longer the case, as for RAID56 read with mirror_num 1/0, we will still call btrfs_map_block() with non-NULL raid_map. Add a small check to make sure we won't reset stripe offset for mirror 1/0 read. - Add new helper read_raid56() to handle rebuild We will read the full stripe (including all data and P/Q stripes) do the rebuild, then only copy the refered part to the caller. There is a catch for RAID6, we have no way to exhaust all combination, so the current repair will assume the mirror = 0 data is corrupted, then try to find a missing device. But if no missing device can be found, it will assume P is corrupted. This is just a guess, and can to totally wrong, but we have no better idea. Now btrfs-progs have full read ability for RAID56. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

Previously 'btrfs check --check-data-csum' will report tons of false alerts for RAID56. Add a test case to make sure with the new RAID56 rebuild ability, there should be no false alerts. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

[BUG] When running btrfs/011 with subpage case, even with RAID56 support, it still fails with the following error: QA output created by 011 *** test btrfs replace mkfs failed (see /home/adam/xfstests-dev/results//btrfs/011.full for details) The full log shows: ---------workout "-m single -d single -M" 1 no 64----------- ERROR: illegal nodesize 65536 (not equal to 4096 for mixed block group) mkfs failed This is a critical error, making test case to be aborted, without checking the rest profiles. [CAUSE] Mkfs.btrfs always uses the maximum value between sectorsize and page size for its mixed profile nodesize. For subpage case, it means we always go PAGE_SIZE, no matter whatever the sectorsize is passed in. [FIX] Just get rid of the direct PAGE_SIZE usage when determining nodesize for mixed profiles. And use sectorsize directly (either passed in by the user, or determined from page size). Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

Currently, if user specifies value 'no' or 'none' on the command line, it gets translated to an empty value that is passed to kernel. There was a change in kernel 5.14 done by commit 5548c8c6f55b ("btrfs: props: change how empty value is interpreted") that changes the behaviour in that case. The empty value is supposed to mean 'the default value' for any property. For compression there is a need to distinguish resetting the value and also setting the NOCOMPRESS property. The translation to empty value makes that impossible. The explanation and behaviour copied from the kernel patch: Old behaviour: $ lsattr file ---------------------- file # the NOCOMPRESS bit is set $ btrfs prop set file compression '' $ lsattr file ---------------------m file This is equivalent to 'btrfs prop set file compression no' in current btrfs-progs as the 'no' or 'none' values are translated to an empty string. This is where the new behaviour is different: empty string drops the compression flag (-c) and nocompress (-m): $ lsattr file ---------------------- file # No change $ btrfs prop set file compression '' $ lsattr file ---------------------- file $ btrfs prop set file compression lzo $ lsattr file --------c------------- file $ btrfs prop get file compression compression=lzo $ btrfs prop set file compression '' # Reset to the initial state $ lsattr file ---------------------- file # Set NOCOMPRESS bit $ btrfs prop set file compression no $ lsattr file ---------------------m file This obviously brings problems with backward compatibility, so this patch should not be backported without making sure the updated btrfs-progs are also used and that scripts have been updated to use the new semantics. Summary: - old kernel: no, none, "" - set NOCOMPRESS bit - new kernel: no, none - set NOCOMPRESS bit "" - drop all compression flags, ie. COMPRESS and NOCOMPRESS Signed-off-by: Li Zhang <zhanglikernel@gmail.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>

Don't want to erase it entirely from the doc, since it is the ideal way to use it ultimately. However, the bug is significant, so I think it's our responsibility to encourage users that can use umount/mount (e.g. in an initrd, not for root fs) to use it.

The remount workflow could cause some problems so make a note about it. Recommend the umount/mount step by default. A seeding device used for e.g. a root filesystem that gets updated and has snapshots is a real world example where the space consumed by unreclaimed deleted snapshots would hurt. Pull-request: #462 Author: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>

kdave · 2023-06-09T11:09:31Z

Merged to devel with some minor edits. The description of the workflows can be still improved, separating clearly the use case without snapshots that the remount is fine.

adam900710 and others added 11 commits April 8, 2022 23:33

kdave force-pushed the devel branch from cbcc00b to bfe6402 Compare April 25, 2022 16:34

kdave force-pushed the devel branch from dbe63eb to 8195c05 Compare June 7, 2022 12:04

kdave force-pushed the devel branch from 685bdbf to 0353c83 Compare June 21, 2022 13:04

kdave force-pushed the devel branch from 76bbb96 to 160beda Compare August 3, 2022 19:39

kdave force-pushed the devel branch 2 times, most recently from e3c0fe2 to 588429c Compare August 16, 2022 13:23

kdave force-pushed the devel branch 8 times, most recently from 54326e2 to ad2a1bf Compare September 21, 2022 13:42

kdave force-pushed the devel branch 3 times, most recently from 1e791f0 to 8f51194 Compare September 23, 2022 11:18

kdave force-pushed the devel branch 2 times, most recently from f230ae6 to d2fa26b Compare October 4, 2022 21:50

kdave force-pushed the devel branch 5 times, most recently from 7bfedf8 to cda1f14 Compare February 22, 2023 02:06

kdave force-pushed the devel branch 4 times, most recently from 63a7417 to 4fc291a Compare March 1, 2023 14:20

kdave force-pushed the devel branch from 2d65f6d to eec7613 Compare March 22, 2023 14:27

kdave force-pushed the devel branch 4 times, most recently from e53f194 to 405f4c7 Compare April 26, 2023 22:51

kdave force-pushed the devel branch 3 times, most recently from af681f4 to 1628b2f Compare May 10, 2023 19:47

kdave force-pushed the devel branch 2 times, most recently from a8c6f08 to 34c5de3 Compare May 22, 2023 12:32

kdave force-pushed the devel branch 5 times, most recently from 5e02de0 to adfc8a9 Compare May 26, 2023 17:53

kdave force-pushed the devel branch from f328c5e to d8172c2 Compare June 1, 2023 18:50

kdave force-pushed the devel branch from 310e8f8 to d8172c2 Compare June 9, 2023 07:53

kdave added this to the v6.3.2 milestone Jun 9, 2023

kdave closed this Jun 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

discourage seeding remount workflow #462

discourage seeding remount workflow #462

Uh oh!

boryas commented Apr 14, 2022

Uh oh!

kdave commented Jun 9, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

discourage seeding remount workflow #462

discourage seeding remount workflow #462

Uh oh!

Conversation

boryas commented Apr 14, 2022

Uh oh!

kdave commented Jun 9, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants