-
Notifications
You must be signed in to change notification settings - Fork 266
discourage seeding remount workflow #462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…tree block read We used to use read_whole_eb() to read super block, but it's no longer the case (so long that I can not even find out which commit did the conversion). Thus there is no need for read_whole_eb() to handle super block read anymore. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
For metadata restore, our logical address is mapped to a single device with logical address 1:1 mapped to device physical address. Move this part of code into a helper, this will make later extent buffer read path refactoer much easier. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
There are two call sites using write_extent_to_disk() directly: - debug_corrupt_block() in btrfs-corrupt-block.c - corrupt_keys() in btrfs-corrupt-block.c The problem of write_extent_to_disk() is, it can only handle plain profiles (All profiles except P/Q stripes of RAID56). Calling it directly can corrupted RAID56 P/Q, and in the future we're going to remove eb::fd/eb::dev_bytes, so remove such call sites with write_and_map_eb(). Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Function write_extent_to_disk() is just writing the content of a tree block to disk. It can not handle RAID56, and its work is the same as write_data_to_disk(). Thus we can replace write_extent_to_disk() with write_data_to_disk() easily. There is only one special call site in write_raid56_with_parity(), which can easily be replace with btrfs_pwrite() directly. This reduce the write entrance, and make later eb::fd removal easier. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
…sk() and replace read_extent_data() The function read_extent_from_disk() is only a wrapper to read tree block. And read_extent_data() is just a while loop to eliminate short read caused by stripe boundary. In fact, a lot of call sites of read_extent_data() are either reading metadata (thus no possible short read) or doing extra loop by themselves. This patch will replace those two functions with read_data_from_disk(), making it the only entrance for data/metadata read. And update read_data_from_disk() to return the read bytes, so caller can do a simple while loop. For the few callers of read_extent_data(), open-code a small while loop for them. This will allow later RAID56 read repair using P/Q much easier. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Those two members are a shortcut for non-RAID56 profiles. But we should not use such shortcut, and move all our logical address read/write to the unified read_data_from_disk()/write_data_to_disk(). With previous refactors, now we're safe to remove them. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
This new ability is added by: - Allow btrfs_map_block() to return the chunk type This makes later work much easier - Only reset stripe offset inside btrfs_map_block() when needed Currently if @raid_map is not NULL, btrfs_map_block() will consider this call is for WRITE and will reset stripe offset. This is no longer the case, as for RAID56 read with mirror_num 1/0, we will still call btrfs_map_block() with non-NULL raid_map. Add a small check to make sure we won't reset stripe offset for mirror 1/0 read. - Add new helper read_raid56() to handle rebuild We will read the full stripe (including all data and P/Q stripes) do the rebuild, then only copy the refered part to the caller. There is a catch for RAID6, we have no way to exhaust all combination, so the current repair will assume the mirror = 0 data is corrupted, then try to find a missing device. But if no missing device can be found, it will assume P is corrupted. This is just a guess, and can to totally wrong, but we have no better idea. Now btrfs-progs have full read ability for RAID56. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Previously 'btrfs check --check-data-csum' will report tons of false alerts for RAID56. Add a test case to make sure with the new RAID56 rebuild ability, there should be no false alerts. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
[BUG] When running btrfs/011 with subpage case, even with RAID56 support, it still fails with the following error: QA output created by 011 *** test btrfs replace mkfs failed (see /home/adam/xfstests-dev/results//btrfs/011.full for details) The full log shows: ---------workout "-m single -d single -M" 1 no 64----------- ERROR: illegal nodesize 65536 (not equal to 4096 for mixed block group) mkfs failed This is a critical error, making test case to be aborted, without checking the rest profiles. [CAUSE] Mkfs.btrfs always uses the maximum value between sectorsize and page size for its mixed profile nodesize. For subpage case, it means we always go PAGE_SIZE, no matter whatever the sectorsize is passed in. [FIX] Just get rid of the direct PAGE_SIZE usage when determining nodesize for mixed profiles. And use sectorsize directly (either passed in by the user, or determined from page size). Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Currently, if user specifies value 'no' or 'none' on the command line,
it gets translated to an empty value that is passed to kernel. There was
a change in kernel 5.14 done by commit 5548c8c6f55b ("btrfs: props:
change how empty value is interpreted") that changes the behaviour
in that case.
The empty value is supposed to mean 'the default value' for any
property. For compression there is a need to distinguish resetting the
value and also setting the NOCOMPRESS property. The translation to empty
value makes that impossible.
The explanation and behaviour copied from the kernel patch:
Old behaviour:
$ lsattr file
---------------------- file
# the NOCOMPRESS bit is set
$ btrfs prop set file compression ''
$ lsattr file
---------------------m file
This is equivalent to 'btrfs prop set file compression no' in current
btrfs-progs as the 'no' or 'none' values are translated to an empty
string.
This is where the new behaviour is different: empty string drops the
compression flag (-c) and nocompress (-m):
$ lsattr file
---------------------- file
# No change
$ btrfs prop set file compression ''
$ lsattr file
---------------------- file
$ btrfs prop set file compression lzo
$ lsattr file
--------c------------- file
$ btrfs prop get file compression
compression=lzo
$ btrfs prop set file compression ''
# Reset to the initial state
$ lsattr file
---------------------- file
# Set NOCOMPRESS bit
$ btrfs prop set file compression no
$ lsattr file
---------------------m file
This obviously brings problems with backward compatibility, so this
patch should not be backported without making sure the updated
btrfs-progs are also used and that scripts have been updated to use the
new semantics.
Summary:
- old kernel:
no, none, "" - set NOCOMPRESS bit
- new kernel:
no, none - set NOCOMPRESS bit
"" - drop all compression flags, ie. COMPRESS and NOCOMPRESS
Signed-off-by: Li Zhang <zhanglikernel@gmail.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
Don't want to erase it entirely from the doc, since it is the ideal way to use it ultimately. However, the bug is significant, so I think it's our responsibility to encourage users that can use umount/mount (e.g. in an initrd, not for root fs) to use it.
e3c0fe2 to
588429c
Compare
54326e2 to
ad2a1bf
Compare
1e791f0 to
8f51194
Compare
f230ae6 to
d2fa26b
Compare
7bfedf8 to
cda1f14
Compare
63a7417 to
4fc291a
Compare
e53f194 to
405f4c7
Compare
af681f4 to
1628b2f
Compare
a8c6f08 to
34c5de3
Compare
5e02de0 to
adfc8a9
Compare
kdave
pushed a commit
that referenced
this pull request
Jun 9, 2023
The remount workflow could cause some problems so make a note about it. Recommend the umount/mount step by default. A seeding device used for e.g. a root filesystem that gets updated and has snapshots is a real world example where the space consumed by unreclaimed deleted snapshots would hurt. Pull-request: #462 Author: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
Owner
|
Merged to devel with some minor edits. The description of the workflows can be still improved, separating clearly the use case without snapshots that the remount is fine. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Don't want to erase it entirely from the doc, since it is the ideal way to use it ultimately. However, the bug is significant, so I think it's our responsibility to encourage users that can use umount/mount (e.g. in an initrd, not for root fs) to use it.