-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to mkfs/btrfstune with both block-group-tree
and zoned
though they are said to be supported
#765
Comments
Looks like a bug in btrfs-progs' support for zoned devices. I'll take a look and fix it soon. |
[BUG] There is a bug report that mkfs.btrfs can not specify block-group-tree feature along with zoned devices: # mkfs.btrfs /dev/nullb0 -O block-group-tree,zoned btrfs-progs v6.7.1 See https://btrfs.readthedocs.io for more information. Resetting device zones /dev/nullb0 (40 zones) ... NOTE: several default settings have changed in version 5.15, please make sure this does not affect your deployments: - DUP for metadata (-m dup) - enabled no-holes (-O no-holes) - enabled free-space-tree (-R free-space-tree) ERROR: error during mkfs: Invalid argument [CAUSE] During mkfs, we need to write all the 7 or 8 tree blocks into the metadata zone, and since it's zoned device, we need to fulfill all the requirement for zoned writes, including: - All writes must be in sequential bytenr - Buffer must be aligned to sector size The sequential bytenr requirement is already met by the mkfs design, but the second requirement on memory alignment is normally handled by btrfs_pwrite() helper. However in create_block_group_tree() we didn't use btrfs_pwrite(), but plain pwrite() call directly, which would lead to -EINVAL error due to memory alignment problem. [FIX] Just call btrfs_pwrite() instead of the plain pwrite() in create_block_group_tree(). Issue: kdave#765 Signed-off-by: Qu Wenruo <wqu@suse.com>
For mkfs.btrfs failure to create block group tree, it's a plain For btrfstune failure, it's related to the Both small fixes, would add test cases for both. |
[BUG] There is a bug report that mkfs.btrfs can not specify block-group-tree feature along with zoned devices: # mkfs.btrfs /dev/nullb0 -O block-group-tree,zoned btrfs-progs v6.7.1 See https://btrfs.readthedocs.io for more information. Resetting device zones /dev/nullb0 (40 zones) ... NOTE: several default settings have changed in version 5.15, please make sure this does not affect your deployments: - DUP for metadata (-m dup) - enabled no-holes (-O no-holes) - enabled free-space-tree (-R free-space-tree) ERROR: error during mkfs: Invalid argument [CAUSE] During mkfs, we need to write all the 7 or 8 tree blocks into the metadata zone, and since it's zoned device, we need to fulfill all the requirement for zoned writes, including: - All writes must be in sequential bytenr - Buffer must be aligned to sector size The sequential bytenr requirement is already met by the mkfs design, but the second requirement on memory alignment is normally handled by btrfs_pwrite() helper. However in create_block_group_tree() we didn't use btrfs_pwrite(), but plain pwrite() call directly, which would lead to -EINVAL error due to memory alignment problem. [FIX] Just call btrfs_pwrite() instead of the plain pwrite() in create_block_group_tree(). Issue: kdave#765 Signed-off-by: Qu Wenruo <wqu@suse.com>
[BUG] There is a report that, for zoned devices btrfstune is unable to convert it to block group tree. # btrfstune /dev/nullb0 --convert-to-block-group-tree Error reading 1342193664, -1 Error reading 1342193664, -1 ERROR: cannot read chunk root ERROR: open ctree failed [CAUSE] For read-write opened zoned devices, all the read/write has to be aligned to its sector size. However btrfs stores its metadata by extent_buffer::data[], which has all the structures before it, thus never aligned to zoned device sector size. Normally we would require btrfs_pread() and btrfs_pwrite() to do the extra alignment, but during open_ctree(), we are not aware if a device is zoned or not. Thus we rely on if the fd is opened with O_DIRECT flag, if the fd has O_DIRECT, then we would temporarily set fs_info->zoned for chunk tree read. Unforunately not all open_ctree_fd() callers have the flags set properly, and btrfstune is one of the missing call site. This makes all the read not properly aligned and cause read failure. [FIX] Just manually check if the target device is a zoned one, and set O_DIRECT accordingly. Issue: kdave#765 Signed-off-by: Qu Wenruo <wqu@suse.com>
[BUG] There is a bug report that mkfs.btrfs can not specify block-group-tree feature along with zoned devices: # mkfs.btrfs /dev/nullb0 -O block-group-tree,zoned btrfs-progs v6.7.1 See https://btrfs.readthedocs.io for more information. Resetting device zones /dev/nullb0 (40 zones) ... NOTE: several default settings have changed in version 5.15, please make sure this does not affect your deployments: - DUP for metadata (-m dup) - enabled no-holes (-O no-holes) - enabled free-space-tree (-R free-space-tree) ERROR: error during mkfs: Invalid argument [CAUSE] During mkfs, we need to write all the 7 or 8 tree blocks into the metadata zone, and since it's zoned device, we need to fulfill all the requirement for zoned writes, including: - All writes must be in sequential bytenr - Buffer must be aligned to sector size The sequential bytenr requirement is already met by the mkfs design, but the second requirement on memory alignment is never met for metadata, as we put the contents of a leaf in extent_buffer::data[], which is after a lot of small members. Thus metadata IO buffer would never be aligned to sector size (normally 4K). And we require btrfs_pwrite() and btrfs_pread() to handle the memory alignment for us. However in create_block_group_tree() we didn't use btrfs_pwrite(), but plain pwrite() call directly, which would lead to -EINVAL error due to memory alignment problem. [FIX] Just call btrfs_pwrite() instead of the plain pwrite() in create_block_group_tree(). Issue: kdave#765 Signed-off-by: Qu Wenruo <wqu@suse.com>
[BUG] There is a report that, for zoned devices btrfstune is unable to convert it to block group tree. # btrfstune /dev/nullb0 --convert-to-block-group-tree Error reading 1342193664, -1 Error reading 1342193664, -1 ERROR: cannot read chunk root ERROR: open ctree failed [CAUSE] For read-write opened zoned devices, all the read/write has to be aligned to its sector size. However btrfs stores its metadata by extent_buffer::data[], which has all the structures before it, thus never aligned to zoned device sector size. Normally we would require btrfs_pread() and btrfs_pwrite() to do the extra alignment, but during open_ctree(), we are not aware if a device is zoned or not. Thus we rely on if the fd is opened with O_DIRECT flag, if the fd has O_DIRECT, then we would temporarily set fs_info->zoned for chunk tree read. Unforunately not all open_ctree_fd() callers have the flags set properly, and btrfstune is one of the missing call site. This makes all the read not properly aligned and cause read failure. [FIX] Just manually check if the target device is a zoned one, and set O_DIRECT accordingly. Issue: kdave#765 Signed-off-by: Qu Wenruo <wqu@suse.com>
[BUG] There is a bug report that mkfs.btrfs can not specify block-group-tree feature along with zoned devices: # mkfs.btrfs /dev/nullb0 -O block-group-tree,zoned btrfs-progs v6.7.1 See https://btrfs.readthedocs.io for more information. Resetting device zones /dev/nullb0 (40 zones) ... NOTE: several default settings have changed in version 5.15, please make sure this does not affect your deployments: - DUP for metadata (-m dup) - enabled no-holes (-O no-holes) - enabled free-space-tree (-R free-space-tree) ERROR: error during mkfs: Invalid argument [CAUSE] During mkfs, we need to write all the 7 or 8 tree blocks into the metadata zone, and since it's zoned device, we need to fulfill all the requirement for zoned writes, including: - All writes must be in sequential bytenr - Buffer must be aligned to sector size The sequential bytenr requirement is already met by the mkfs design, but the second requirement on memory alignment is never met for metadata, as we put the contents of a leaf in extent_buffer::data[], which is after a lot of small members. Thus metadata IO buffer would never be aligned to sector size (normally 4K). And we require btrfs_pwrite() and btrfs_pread() to handle the memory alignment for us. However in create_block_group_tree() we didn't use btrfs_pwrite(), but plain pwrite() call directly, which would lead to -EINVAL error due to memory alignment problem. [FIX] Just call btrfs_pwrite() instead of the plain pwrite() in create_block_group_tree(). Issue: #765 Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
[BUG] There is a report that, for zoned devices btrfstune is unable to convert it to block group tree. # btrfstune /dev/nullb0 --convert-to-block-group-tree Error reading 1342193664, -1 Error reading 1342193664, -1 ERROR: cannot read chunk root ERROR: open ctree failed [CAUSE] For read-write opened zoned devices, all the read/write has to be aligned to its sector size. However btrfs stores its metadata by extent_buffer::data[], which has all the structures before it, thus never aligned to zoned device sector size. Normally we would require btrfs_pread() and btrfs_pwrite() to do the extra alignment, but during open_ctree(), we are not aware if a device is zoned or not. Thus we rely on if the fd is opened with O_DIRECT flag, if the fd has O_DIRECT, then we would temporarily set fs_info->zoned for chunk tree read. Unforunately not all open_ctree_fd() callers have the flags set properly, and btrfstune is one of the missing call site. This makes all the read not properly aligned and cause read failure. [FIX] Just manually check if the target device is a zoned one, and set O_DIRECT accordingly. Issue: #765 Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
[BUG] There is a bug report that mkfs.btrfs can not specify block-group-tree feature along with zoned devices: # mkfs.btrfs /dev/nullb0 -O block-group-tree,zoned btrfs-progs v6.7.1 See https://btrfs.readthedocs.io for more information. Resetting device zones /dev/nullb0 (40 zones) ... NOTE: several default settings have changed in version 5.15, please make sure this does not affect your deployments: - DUP for metadata (-m dup) - enabled no-holes (-O no-holes) - enabled free-space-tree (-R free-space-tree) ERROR: error during mkfs: Invalid argument [CAUSE] During mkfs, we need to write all the 7 or 8 tree blocks into the metadata zone, and since it's zoned device, we need to fulfill all the requirement for zoned writes, including: - All writes must be in sequential bytenr - Buffer must be aligned to sector size The sequential bytenr requirement is already met by the mkfs design, but the second requirement on memory alignment is never met for metadata, as we put the contents of a leaf in extent_buffer::data[], which is after a lot of small members. Thus metadata IO buffer would never be aligned to sector size (normally 4K). And we require btrfs_pwrite() and btrfs_pread() to handle the memory alignment for us. However in create_block_group_tree() we didn't use btrfs_pwrite(), but plain pwrite() call directly, which would lead to -EINVAL error due to memory alignment problem. [FIX] Just call btrfs_pwrite() instead of the plain pwrite() in create_block_group_tree(). Issue: #765 Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
[BUG] There is a report that, for zoned devices btrfstune is unable to convert it to block group tree. # btrfstune /dev/nullb0 --convert-to-block-group-tree Error reading 1342193664, -1 Error reading 1342193664, -1 ERROR: cannot read chunk root ERROR: open ctree failed [CAUSE] For read-write opened zoned devices, all the read/write has to be aligned to its sector size. However btrfs stores its metadata by extent_buffer::data[], which has all the structures before it, thus never aligned to zoned device sector size. Normally we would require btrfs_pread() and btrfs_pwrite() to do the extra alignment, but during open_ctree(), we are not aware if a device is zoned or not. Thus we rely on if the fd is opened with O_DIRECT flag, if the fd has O_DIRECT, then we would temporarily set fs_info->zoned for chunk tree read. Unforunately not all open_ctree_fd() callers have the flags set properly, and btrfstune is one of the missing call site. This makes all the read not properly aligned and cause read failure. [FIX] Just manually check if the target device is a zoned one, and set O_DIRECT accordingly. Issue: #765 Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
[BUG] There is a bug report that mkfs.btrfs can not specify block-group-tree feature along with zoned devices: # mkfs.btrfs /dev/nullb0 -O block-group-tree,zoned btrfs-progs v6.7.1 See https://btrfs.readthedocs.io for more information. Resetting device zones /dev/nullb0 (40 zones) ... NOTE: several default settings have changed in version 5.15, please make sure this does not affect your deployments: - DUP for metadata (-m dup) - enabled no-holes (-O no-holes) - enabled free-space-tree (-R free-space-tree) ERROR: error during mkfs: Invalid argument [CAUSE] During mkfs, we need to write all the 7 or 8 tree blocks into the metadata zone, and since it's zoned device, we need to fulfill all the requirement for zoned writes, including: - All writes must be in sequential bytenr - Buffer must be aligned to sector size The sequential bytenr requirement is already met by the mkfs design, but the second requirement on memory alignment is never met for metadata, as we put the contents of a leaf in extent_buffer::data[], which is after a lot of small members. Thus metadata IO buffer would never be aligned to sector size (normally 4K). And we require btrfs_pwrite() and btrfs_pread() to handle the memory alignment for us. However in create_block_group_tree() we didn't use btrfs_pwrite(), but plain pwrite() call directly, which would lead to -EINVAL error due to memory alignment problem. [FIX] Just call btrfs_pwrite() instead of the plain pwrite() in create_block_group_tree(). Issue: #765 Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
[BUG] There is a report that, for zoned devices btrfstune is unable to convert it to block group tree. # btrfstune /dev/nullb0 --convert-to-block-group-tree Error reading 1342193664, -1 Error reading 1342193664, -1 ERROR: cannot read chunk root ERROR: open ctree failed [CAUSE] For read-write opened zoned devices, all the read/write has to be aligned to its sector size. However btrfs stores its metadata by extent_buffer::data[], which has all the structures before it, thus never aligned to zoned device sector size. Normally we would require btrfs_pread() and btrfs_pwrite() to do the extra alignment, but during open_ctree(), we are not aware if a device is zoned or not. Thus we rely on if the fd is opened with O_DIRECT flag, if the fd has O_DIRECT, then we would temporarily set fs_info->zoned for chunk tree read. Unforunately not all open_ctree_fd() callers have the flags set properly, and btrfstune is one of the missing call site. This makes all the read not properly aligned and cause read failure. [FIX] Just manually check if the target device is a zoned one, and set O_DIRECT accordingly. Issue: #765 Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
[BUG] There is a bug report that mkfs.btrfs can not specify block-group-tree feature along with zoned devices: # mkfs.btrfs /dev/nullb0 -O block-group-tree,zoned btrfs-progs v6.7.1 See https://btrfs.readthedocs.io for more information. Resetting device zones /dev/nullb0 (40 zones) ... NOTE: several default settings have changed in version 5.15, please make sure this does not affect your deployments: - DUP for metadata (-m dup) - enabled no-holes (-O no-holes) - enabled free-space-tree (-R free-space-tree) ERROR: error during mkfs: Invalid argument [CAUSE] During mkfs, we need to write all the 7 or 8 tree blocks into the metadata zone, and since it's zoned device, we need to fulfill all the requirement for zoned writes, including: - All writes must be in sequential bytenr - Buffer must be aligned to sector size The sequential bytenr requirement is already met by the mkfs design, but the second requirement on memory alignment is never met for metadata, as we put the contents of a leaf in extent_buffer::data[], which is after a lot of small members. Thus metadata IO buffer would never be aligned to sector size (normally 4K). And we require btrfs_pwrite() and btrfs_pread() to handle the memory alignment for us. However in create_block_group_tree() we didn't use btrfs_pwrite(), but plain pwrite() call directly, which would lead to -EINVAL error due to memory alignment problem. [FIX] Just call btrfs_pwrite() instead of the plain pwrite() in create_block_group_tree(). Issue: kdave#765 Signed-off-by: Qu Wenruo <wqu@suse.com>
[BUG] There is a report that, for zoned devices btrfstune is unable to convert it to block group tree. # btrfstune /dev/nullb0 --convert-to-block-group-tree Error reading 1342193664, -1 Error reading 1342193664, -1 ERROR: cannot read chunk root ERROR: open ctree failed [CAUSE] For read-write opened zoned devices, all the read/write has to be aligned to its sector size. However btrfs stores its metadata by extent_buffer::data[], which has all the structures before it, thus never aligned to zoned device sector size. Normally we would require btrfs_pread() and btrfs_pwrite() to do the extra alignment, but during open_ctree(), we are not aware if a device is zoned or not. Thus we rely on if the fd is opened with O_DIRECT flag, if the fd has O_DIRECT, then we would temporarily set fs_info->zoned for chunk tree read. Unforunately not all open_ctree_fd() callers have the flags set properly, and btrfstune is one of the missing call site. This makes all the read not properly aligned and cause read failure. [FIX] Just manually check if the target device is a zoned one, and set O_DIRECT accordingly. Issue: kdave#765 Signed-off-by: Qu Wenruo <wqu@suse.com>
[BUG] There is a bug report that mkfs.btrfs can not specify block-group-tree feature along with zoned devices: # mkfs.btrfs /dev/nullb0 -O block-group-tree,zoned btrfs-progs v6.7.1 See https://btrfs.readthedocs.io for more information. Resetting device zones /dev/nullb0 (40 zones) ... NOTE: several default settings have changed in version 5.15, please make sure this does not affect your deployments: - DUP for metadata (-m dup) - enabled no-holes (-O no-holes) - enabled free-space-tree (-R free-space-tree) ERROR: error during mkfs: Invalid argument [CAUSE] During mkfs, we need to write all the 7 or 8 tree blocks into the metadata zone, and since it's zoned device, we need to fulfill all the requirement for zoned writes, including: - All writes must be in sequential bytenr - Buffer must be aligned to sector size The sequential bytenr requirement is already met by the mkfs design, but the second requirement on memory alignment is never met for metadata, as we put the contents of a leaf in extent_buffer::data[], which is after a lot of small members. Thus metadata IO buffer would never be aligned to sector size (normally 4K). And we require btrfs_pwrite() and btrfs_pread() to handle the memory alignment for us. However in create_block_group_tree() we didn't use btrfs_pwrite(), but plain pwrite() call directly, which would lead to -EINVAL error due to memory alignment problem. [FIX] Just call btrfs_pwrite() instead of the plain pwrite() in create_block_group_tree(). Issue: #765 Pull-request: #767 Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
[BUG] There is a report that, for zoned devices btrfstune is unable to convert it to block group tree. # btrfstune /dev/nullb0 --convert-to-block-group-tree Error reading 1342193664, -1 Error reading 1342193664, -1 ERROR: cannot read chunk root ERROR: open ctree failed [CAUSE] For read-write opened zoned devices, all the read/write has to be aligned to its sector size. However btrfs stores its metadata by extent_buffer::data[], which has all the structures before it, thus never aligned to zoned device sector size. Normally we would require btrfs_pread() and btrfs_pwrite() to do the extra alignment, but during open_ctree(), we are not aware if a device is zoned or not. Thus we rely on if the fd is opened with O_DIRECT flag, if the fd has O_DIRECT, then we would temporarily set fs_info->zoned for chunk tree read. Unforunately not all open_ctree_fd() callers have the flags set properly, and btrfstune is one of the missing call site. This makes all the read not properly aligned and cause read failure. [FIX] Just manually check if the target device is a zoned one, and set O_DIRECT accordingly. Issue: #765 Pull-request: #767 Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Closing since it's fixed in v6.8.1 already. |
In btrfs documentation's "Zoned mode" section, Block group tree is listed as "supported". I read it as "compatible". But currently
mkfs.btrfs
orbtrfstune
cannot enable/convert block-group-tree on zoned devices. Is that btrfs-progs does not implement this feature yet, or are there some hidden issues preventing this operation?My environment:
To reproduce, first setup nullb emulated block device with 256MiB zones, 10GiB size, 4KiB block size: (this script is copied from https://lwn.net/Articles/836726/)
Then
sudo mkfs.btrfs /dev/nullb0 -O block-group-tree,zoned
will return errors:mkfs
thenbtrfstune
also fails:In either case,
dmesg
shows nothing exceptnull_blk
module loading and device creation.The text was updated successfully, but these errors were encountered: