Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to mkfs/btrfstune with both block-group-tree and zoned though they are said to be supported #765

Closed
oxalica opened this issue Mar 25, 2024 · 3 comments
Labels
bug mkfs Changes in mkfs.btrfs
Milestone

Comments

@oxalica
Copy link

oxalica commented Mar 25, 2024

In btrfs documentation's "Zoned mode" section, Block group tree is listed as "supported". I read it as "compatible". But currently mkfs.btrfs or btrfstune cannot enable/convert block-group-tree on zoned devices. Is that btrfs-progs does not implement this feature yet, or are there some hidden issues preventing this operation?

My environment:

$ uname -a
Linux invar 6.8.1 #1-NixOS SMP PREEMPT_DYNAMIC Fri Mar 15 18:19:29 UTC 2024 x86_64 GNU/Linux
$ mkfs.btrfs --version
mkfs.btrfs, part of btrfs-progs v6.7.1

To reproduce, first setup nullb emulated block device with 256MiB zones, 10GiB size, 4KiB block size: (this script is copied from https://lwn.net/Articles/836726/)

#!/usr/bin/env bash
set -eo pipefail
sysfs=/sys/kernel/config/nullb/nullb0
if [[ -d $sysfs ]]; then
    echo 0 > "${sysfs}"/power
    rmdir $sysfs
fi
lsmod | grep -q null_blk && rmmod null_blk
modprobe null_blk nr_devices=0
mkdir "${sysfs}"
echo 10240 > "${sysfs}"/size # MiB
echo 1 > "${sysfs}"/zoned
echo 0 > "${sysfs}"/zone_nr_conv
echo 256 > "${sysfs}"/zone_size # MiB
echo 1 > "${sysfs}"/memory_backed
echo 4096 > "${sysfs}"/blocksize
echo 1 > "${sysfs}"/power
udevadm settle

Then sudo mkfs.btrfs /dev/nullb0 -O block-group-tree,zoned will return errors:

# mkfs.btrfs /dev/nullb0 -O block-group-tree,zoned
btrfs-progs v6.7.1
See https://btrfs.readthedocs.io for more information.

Resetting device zones /dev/nullb0 (40 zones) ...
NOTE: several default settings have changed in version 5.15, please make sure
      this does not affect your deployments:
      - DUP for metadata (-m dup)
      - enabled no-holes (-O no-holes)
      - enabled free-space-tree (-R free-space-tree)

ERROR: error during mkfs: Invalid argument

mkfs then btrfstune also fails:

# mkfs.btrfs /dev/nullb0
btrfs-progs v6.7.1
See https://btrfs.readthedocs.io for more information.

Zoned: /dev/nullb0: host-managed device detected, setting zoned feature
Resetting device zones /dev/nullb0 (40 zones) ...
NOTE: several default settings have changed in version 5.15, please make sure
      this does not affect your deployments:
      - DUP for metadata (-m dup)
      - enabled no-holes (-O no-holes)
      - enabled free-space-tree (-R free-space-tree)

Label:              (null)
UUID:               c451698c-6ca0-4d96-ab0e-8d0d9000bc79
Node size:          16384
Sector size:        4096        (CPU page size: 4096)
Filesystem size:    10.00GiB
Block group profiles:
  Data:             single          256.00MiB
  Metadata:         DUP             256.00MiB
  System:           DUP             256.00MiB
SSD detected:       yes
Zoned device:       yes
  Zone size:        256.00MiB
Features:           extref, skinny-metadata, no-holes, free-space-tree, zoned
Checksum:           crc32c
Number of devices:  1
Devices:
   ID        SIZE  ZONES  PATH
    1    10.00GiB     40  /dev/nullb0

# btrfstune /dev/nullb0 --convert-to-block-group-tree
Error reading 1342193664, -1
Error reading 1342193664, -1
ERROR: cannot read chunk root
ERROR: open ctree failed

In either case, dmesg shows nothing except null_blk module loading and device creation.

@kdave kdave added bug mkfs Changes in mkfs.btrfs labels Mar 25, 2024
@adam900710
Copy link
Collaborator

Looks like a bug in btrfs-progs' support for zoned devices.

I'll take a look and fix it soon.

adam900710 added a commit to adam900710/btrfs-progs that referenced this issue Mar 25, 2024
[BUG]
There is a bug report that mkfs.btrfs can not specify block-group-tree
feature along with zoned devices:

  # mkfs.btrfs /dev/nullb0 -O block-group-tree,zoned
  btrfs-progs v6.7.1
  See https://btrfs.readthedocs.io for more information.

  Resetting device zones /dev/nullb0 (40 zones) ...
  NOTE: several default settings have changed in version 5.15, please make sure
        this does not affect your deployments:
        - DUP for metadata (-m dup)
        - enabled no-holes (-O no-holes)
        - enabled free-space-tree (-R free-space-tree)

  ERROR: error during mkfs: Invalid argument

[CAUSE]
During mkfs, we need to write all the 7 or 8 tree blocks into the
metadata zone, and since it's zoned device, we need to fulfill all the
requirement for zoned writes, including:

- All writes must be in sequential bytenr
- Buffer must be aligned to sector size

The sequential bytenr requirement is already met by the mkfs design, but
the second requirement on memory alignment is normally handled by
btrfs_pwrite() helper.

However in create_block_group_tree() we didn't use btrfs_pwrite(), but
plain pwrite() call directly, which would lead to -EINVAL error due to
memory alignment problem.

[FIX]
Just call btrfs_pwrite() instead of the plain pwrite() in
create_block_group_tree().

Issue: kdave#765
Signed-off-by: Qu Wenruo <wqu@suse.com>
@adam900710
Copy link
Collaborator

For mkfs.btrfs failure to create block group tree, it's a plain pwrite() which is not zoned compatible due to memory alignment. (In fact, btrfs metadata would never be aligned to sector size of the zoned device).

For btrfstune failure, it's related to the open() flags, as we need O_DIRECT to properly imply we're doing zoned operations, so that chunk tree can be properly read using zoned compatible helpers.

Both small fixes, would add test cases for both.

adam900710 added a commit to adam900710/btrfs-progs that referenced this issue Mar 26, 2024
[BUG]
There is a bug report that mkfs.btrfs can not specify block-group-tree
feature along with zoned devices:

  # mkfs.btrfs /dev/nullb0 -O block-group-tree,zoned
  btrfs-progs v6.7.1
  See https://btrfs.readthedocs.io for more information.

  Resetting device zones /dev/nullb0 (40 zones) ...
  NOTE: several default settings have changed in version 5.15, please make sure
        this does not affect your deployments:
        - DUP for metadata (-m dup)
        - enabled no-holes (-O no-holes)
        - enabled free-space-tree (-R free-space-tree)

  ERROR: error during mkfs: Invalid argument

[CAUSE]
During mkfs, we need to write all the 7 or 8 tree blocks into the
metadata zone, and since it's zoned device, we need to fulfill all the
requirement for zoned writes, including:

- All writes must be in sequential bytenr
- Buffer must be aligned to sector size

The sequential bytenr requirement is already met by the mkfs design, but
the second requirement on memory alignment is normally handled by
btrfs_pwrite() helper.

However in create_block_group_tree() we didn't use btrfs_pwrite(), but
plain pwrite() call directly, which would lead to -EINVAL error due to
memory alignment problem.

[FIX]
Just call btrfs_pwrite() instead of the plain pwrite() in
create_block_group_tree().

Issue: kdave#765
Signed-off-by: Qu Wenruo <wqu@suse.com>
adam900710 added a commit to adam900710/btrfs-progs that referenced this issue Mar 26, 2024
[BUG]
There is a report that, for zoned devices btrfstune is unable to convert
it to block group tree.

 # btrfstune /dev/nullb0 --convert-to-block-group-tree
 Error reading 1342193664, -1
 Error reading 1342193664, -1
 ERROR: cannot read chunk root
 ERROR: open ctree failed

[CAUSE]
For read-write opened zoned devices, all the read/write has to be
aligned to its sector size.

However btrfs stores its metadata by extent_buffer::data[], which has
all the structures before it, thus never aligned to zoned device sector
size.

Normally we would require btrfs_pread() and btrfs_pwrite() to do the
extra alignment, but during open_ctree(), we are not aware if a device
is zoned or not.

Thus we rely on if the fd is opened with O_DIRECT flag, if the fd has
O_DIRECT, then we would temporarily set fs_info->zoned for chunk tree
read.

Unforunately not all open_ctree_fd() callers have the flags set
properly, and btrfstune is one of the missing call site.

This makes all the read not properly aligned and cause read failure.

[FIX]
Just manually check if the target device is a zoned one, and set
O_DIRECT accordingly.

Issue: kdave#765
Signed-off-by: Qu Wenruo <wqu@suse.com>
adam900710 added a commit to adam900710/btrfs-progs that referenced this issue Mar 26, 2024
[BUG]
There is a bug report that mkfs.btrfs can not specify block-group-tree
feature along with zoned devices:

  # mkfs.btrfs /dev/nullb0 -O block-group-tree,zoned
  btrfs-progs v6.7.1
  See https://btrfs.readthedocs.io for more information.

  Resetting device zones /dev/nullb0 (40 zones) ...
  NOTE: several default settings have changed in version 5.15, please make sure
        this does not affect your deployments:
        - DUP for metadata (-m dup)
        - enabled no-holes (-O no-holes)
        - enabled free-space-tree (-R free-space-tree)

  ERROR: error during mkfs: Invalid argument

[CAUSE]
During mkfs, we need to write all the 7 or 8 tree blocks into the
metadata zone, and since it's zoned device, we need to fulfill all the
requirement for zoned writes, including:

- All writes must be in sequential bytenr
- Buffer must be aligned to sector size

The sequential bytenr requirement is already met by the mkfs design, but
the second requirement on memory alignment is never met for metadata, as
we put the contents of a leaf in extent_buffer::data[], which is after a
lot of small members.

Thus metadata IO buffer would never be aligned to sector size (normally
4K).
And we require btrfs_pwrite() and btrfs_pread() to handle the memory
alignment for us.

However in create_block_group_tree() we didn't use btrfs_pwrite(), but
plain pwrite() call directly, which would lead to -EINVAL error due to
memory alignment problem.

[FIX]
Just call btrfs_pwrite() instead of the plain pwrite() in
create_block_group_tree().

Issue: kdave#765
Signed-off-by: Qu Wenruo <wqu@suse.com>
adam900710 added a commit to adam900710/btrfs-progs that referenced this issue Mar 26, 2024
[BUG]
There is a report that, for zoned devices btrfstune is unable to convert
it to block group tree.

 # btrfstune /dev/nullb0 --convert-to-block-group-tree
 Error reading 1342193664, -1
 Error reading 1342193664, -1
 ERROR: cannot read chunk root
 ERROR: open ctree failed

[CAUSE]
For read-write opened zoned devices, all the read/write has to be
aligned to its sector size.

However btrfs stores its metadata by extent_buffer::data[], which has
all the structures before it, thus never aligned to zoned device sector
size.

Normally we would require btrfs_pread() and btrfs_pwrite() to do the
extra alignment, but during open_ctree(), we are not aware if a device
is zoned or not.

Thus we rely on if the fd is opened with O_DIRECT flag, if the fd has
O_DIRECT, then we would temporarily set fs_info->zoned for chunk tree
read.

Unforunately not all open_ctree_fd() callers have the flags set
properly, and btrfstune is one of the missing call site.

This makes all the read not properly aligned and cause read failure.

[FIX]
Just manually check if the target device is a zoned one, and set
O_DIRECT accordingly.

Issue: kdave#765
Signed-off-by: Qu Wenruo <wqu@suse.com>
kdave pushed a commit that referenced this issue Mar 28, 2024
[BUG]
There is a bug report that mkfs.btrfs can not specify block-group-tree
feature along with zoned devices:

  # mkfs.btrfs /dev/nullb0 -O block-group-tree,zoned
  btrfs-progs v6.7.1
  See https://btrfs.readthedocs.io for more information.

  Resetting device zones /dev/nullb0 (40 zones) ...
  NOTE: several default settings have changed in version 5.15, please make sure
        this does not affect your deployments:
        - DUP for metadata (-m dup)
        - enabled no-holes (-O no-holes)
        - enabled free-space-tree (-R free-space-tree)

  ERROR: error during mkfs: Invalid argument

[CAUSE]
During mkfs, we need to write all the 7 or 8 tree blocks into the
metadata zone, and since it's zoned device, we need to fulfill all the
requirement for zoned writes, including:

- All writes must be in sequential bytenr
- Buffer must be aligned to sector size

The sequential bytenr requirement is already met by the mkfs design, but
the second requirement on memory alignment is never met for metadata, as
we put the contents of a leaf in extent_buffer::data[], which is after a
lot of small members.

Thus metadata IO buffer would never be aligned to sector size (normally
4K).
And we require btrfs_pwrite() and btrfs_pread() to handle the memory
alignment for us.

However in create_block_group_tree() we didn't use btrfs_pwrite(), but
plain pwrite() call directly, which would lead to -EINVAL error due to
memory alignment problem.

[FIX]
Just call btrfs_pwrite() instead of the plain pwrite() in
create_block_group_tree().

Issue: #765
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
kdave pushed a commit that referenced this issue Mar 28, 2024
[BUG]
There is a report that, for zoned devices btrfstune is unable to convert
it to block group tree.

 # btrfstune /dev/nullb0 --convert-to-block-group-tree
 Error reading 1342193664, -1
 Error reading 1342193664, -1
 ERROR: cannot read chunk root
 ERROR: open ctree failed

[CAUSE]
For read-write opened zoned devices, all the read/write has to be
aligned to its sector size.

However btrfs stores its metadata by extent_buffer::data[], which has
all the structures before it, thus never aligned to zoned device sector
size.

Normally we would require btrfs_pread() and btrfs_pwrite() to do the
extra alignment, but during open_ctree(), we are not aware if a device
is zoned or not.

Thus we rely on if the fd is opened with O_DIRECT flag, if the fd has
O_DIRECT, then we would temporarily set fs_info->zoned for chunk tree
read.

Unforunately not all open_ctree_fd() callers have the flags set
properly, and btrfstune is one of the missing call site.

This makes all the read not properly aligned and cause read failure.

[FIX]
Just manually check if the target device is a zoned one, and set
O_DIRECT accordingly.

Issue: #765
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
kdave pushed a commit that referenced this issue Mar 28, 2024
[BUG]
There is a bug report that mkfs.btrfs can not specify block-group-tree
feature along with zoned devices:

  # mkfs.btrfs /dev/nullb0 -O block-group-tree,zoned
  btrfs-progs v6.7.1
  See https://btrfs.readthedocs.io for more information.

  Resetting device zones /dev/nullb0 (40 zones) ...
  NOTE: several default settings have changed in version 5.15, please make sure
        this does not affect your deployments:
        - DUP for metadata (-m dup)
        - enabled no-holes (-O no-holes)
        - enabled free-space-tree (-R free-space-tree)

  ERROR: error during mkfs: Invalid argument

[CAUSE]
During mkfs, we need to write all the 7 or 8 tree blocks into the
metadata zone, and since it's zoned device, we need to fulfill all the
requirement for zoned writes, including:

- All writes must be in sequential bytenr
- Buffer must be aligned to sector size

The sequential bytenr requirement is already met by the mkfs design, but
the second requirement on memory alignment is never met for metadata, as
we put the contents of a leaf in extent_buffer::data[], which is after a
lot of small members.

Thus metadata IO buffer would never be aligned to sector size (normally
4K).
And we require btrfs_pwrite() and btrfs_pread() to handle the memory
alignment for us.

However in create_block_group_tree() we didn't use btrfs_pwrite(), but
plain pwrite() call directly, which would lead to -EINVAL error due to
memory alignment problem.

[FIX]
Just call btrfs_pwrite() instead of the plain pwrite() in
create_block_group_tree().

Issue: #765
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
kdave pushed a commit that referenced this issue Mar 28, 2024
[BUG]
There is a report that, for zoned devices btrfstune is unable to convert
it to block group tree.

 # btrfstune /dev/nullb0 --convert-to-block-group-tree
 Error reading 1342193664, -1
 Error reading 1342193664, -1
 ERROR: cannot read chunk root
 ERROR: open ctree failed

[CAUSE]
For read-write opened zoned devices, all the read/write has to be
aligned to its sector size.

However btrfs stores its metadata by extent_buffer::data[], which has
all the structures before it, thus never aligned to zoned device sector
size.

Normally we would require btrfs_pread() and btrfs_pwrite() to do the
extra alignment, but during open_ctree(), we are not aware if a device
is zoned or not.

Thus we rely on if the fd is opened with O_DIRECT flag, if the fd has
O_DIRECT, then we would temporarily set fs_info->zoned for chunk tree
read.

Unforunately not all open_ctree_fd() callers have the flags set
properly, and btrfstune is one of the missing call site.

This makes all the read not properly aligned and cause read failure.

[FIX]
Just manually check if the target device is a zoned one, and set
O_DIRECT accordingly.

Issue: #765
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
kdave pushed a commit that referenced this issue Apr 2, 2024
[BUG]
There is a bug report that mkfs.btrfs can not specify block-group-tree
feature along with zoned devices:

  # mkfs.btrfs /dev/nullb0 -O block-group-tree,zoned
  btrfs-progs v6.7.1
  See https://btrfs.readthedocs.io for more information.

  Resetting device zones /dev/nullb0 (40 zones) ...
  NOTE: several default settings have changed in version 5.15, please make sure
        this does not affect your deployments:
        - DUP for metadata (-m dup)
        - enabled no-holes (-O no-holes)
        - enabled free-space-tree (-R free-space-tree)

  ERROR: error during mkfs: Invalid argument

[CAUSE]
During mkfs, we need to write all the 7 or 8 tree blocks into the
metadata zone, and since it's zoned device, we need to fulfill all the
requirement for zoned writes, including:

- All writes must be in sequential bytenr
- Buffer must be aligned to sector size

The sequential bytenr requirement is already met by the mkfs design, but
the second requirement on memory alignment is never met for metadata, as
we put the contents of a leaf in extent_buffer::data[], which is after a
lot of small members.

Thus metadata IO buffer would never be aligned to sector size (normally
4K).
And we require btrfs_pwrite() and btrfs_pread() to handle the memory
alignment for us.

However in create_block_group_tree() we didn't use btrfs_pwrite(), but
plain pwrite() call directly, which would lead to -EINVAL error due to
memory alignment problem.

[FIX]
Just call btrfs_pwrite() instead of the plain pwrite() in
create_block_group_tree().

Issue: #765
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
kdave pushed a commit that referenced this issue Apr 2, 2024
[BUG]
There is a report that, for zoned devices btrfstune is unable to convert
it to block group tree.

 # btrfstune /dev/nullb0 --convert-to-block-group-tree
 Error reading 1342193664, -1
 Error reading 1342193664, -1
 ERROR: cannot read chunk root
 ERROR: open ctree failed

[CAUSE]
For read-write opened zoned devices, all the read/write has to be
aligned to its sector size.

However btrfs stores its metadata by extent_buffer::data[], which has
all the structures before it, thus never aligned to zoned device sector
size.

Normally we would require btrfs_pread() and btrfs_pwrite() to do the
extra alignment, but during open_ctree(), we are not aware if a device
is zoned or not.

Thus we rely on if the fd is opened with O_DIRECT flag, if the fd has
O_DIRECT, then we would temporarily set fs_info->zoned for chunk tree
read.

Unforunately not all open_ctree_fd() callers have the flags set
properly, and btrfstune is one of the missing call site.

This makes all the read not properly aligned and cause read failure.

[FIX]
Just manually check if the target device is a zoned one, and set
O_DIRECT accordingly.

Issue: #765
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
adam900710 added a commit to adam900710/btrfs-progs that referenced this issue Apr 2, 2024
[BUG]
There is a bug report that mkfs.btrfs can not specify block-group-tree
feature along with zoned devices:

  # mkfs.btrfs /dev/nullb0 -O block-group-tree,zoned
  btrfs-progs v6.7.1
  See https://btrfs.readthedocs.io for more information.

  Resetting device zones /dev/nullb0 (40 zones) ...
  NOTE: several default settings have changed in version 5.15, please make sure
        this does not affect your deployments:
        - DUP for metadata (-m dup)
        - enabled no-holes (-O no-holes)
        - enabled free-space-tree (-R free-space-tree)

  ERROR: error during mkfs: Invalid argument

[CAUSE]
During mkfs, we need to write all the 7 or 8 tree blocks into the
metadata zone, and since it's zoned device, we need to fulfill all the
requirement for zoned writes, including:

- All writes must be in sequential bytenr
- Buffer must be aligned to sector size

The sequential bytenr requirement is already met by the mkfs design, but
the second requirement on memory alignment is never met for metadata, as
we put the contents of a leaf in extent_buffer::data[], which is after a
lot of small members.

Thus metadata IO buffer would never be aligned to sector size (normally
4K).
And we require btrfs_pwrite() and btrfs_pread() to handle the memory
alignment for us.

However in create_block_group_tree() we didn't use btrfs_pwrite(), but
plain pwrite() call directly, which would lead to -EINVAL error due to
memory alignment problem.

[FIX]
Just call btrfs_pwrite() instead of the plain pwrite() in
create_block_group_tree().

Issue: kdave#765
Signed-off-by: Qu Wenruo <wqu@suse.com>
adam900710 added a commit to adam900710/btrfs-progs that referenced this issue Apr 2, 2024
[BUG]
There is a report that, for zoned devices btrfstune is unable to convert
it to block group tree.

 # btrfstune /dev/nullb0 --convert-to-block-group-tree
 Error reading 1342193664, -1
 Error reading 1342193664, -1
 ERROR: cannot read chunk root
 ERROR: open ctree failed

[CAUSE]
For read-write opened zoned devices, all the read/write has to be
aligned to its sector size.

However btrfs stores its metadata by extent_buffer::data[], which has
all the structures before it, thus never aligned to zoned device sector
size.

Normally we would require btrfs_pread() and btrfs_pwrite() to do the
extra alignment, but during open_ctree(), we are not aware if a device
is zoned or not.

Thus we rely on if the fd is opened with O_DIRECT flag, if the fd has
O_DIRECT, then we would temporarily set fs_info->zoned for chunk tree
read.

Unforunately not all open_ctree_fd() callers have the flags set
properly, and btrfstune is one of the missing call site.

This makes all the read not properly aligned and cause read failure.

[FIX]
Just manually check if the target device is a zoned one, and set
O_DIRECT accordingly.

Issue: kdave#765
Signed-off-by: Qu Wenruo <wqu@suse.com>
kdave pushed a commit that referenced this issue Apr 18, 2024
[BUG]
There is a bug report that mkfs.btrfs can not specify block-group-tree
feature along with zoned devices:

  # mkfs.btrfs /dev/nullb0 -O block-group-tree,zoned
  btrfs-progs v6.7.1
  See https://btrfs.readthedocs.io for more information.

  Resetting device zones /dev/nullb0 (40 zones) ...
  NOTE: several default settings have changed in version 5.15, please make sure
        this does not affect your deployments:
        - DUP for metadata (-m dup)
        - enabled no-holes (-O no-holes)
        - enabled free-space-tree (-R free-space-tree)

  ERROR: error during mkfs: Invalid argument

[CAUSE]
During mkfs, we need to write all the 7 or 8 tree blocks into the
metadata zone, and since it's zoned device, we need to fulfill all the
requirement for zoned writes, including:

- All writes must be in sequential bytenr
- Buffer must be aligned to sector size

The sequential bytenr requirement is already met by the mkfs design, but
the second requirement on memory alignment is never met for metadata, as
we put the contents of a leaf in extent_buffer::data[], which is after a
lot of small members.

Thus metadata IO buffer would never be aligned to sector size (normally
4K).
And we require btrfs_pwrite() and btrfs_pread() to handle the memory
alignment for us.

However in create_block_group_tree() we didn't use btrfs_pwrite(), but
plain pwrite() call directly, which would lead to -EINVAL error due to
memory alignment problem.

[FIX]
Just call btrfs_pwrite() instead of the plain pwrite() in
create_block_group_tree().

Issue: #765
Pull-request: #767
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
kdave pushed a commit that referenced this issue Apr 18, 2024
[BUG]
There is a report that, for zoned devices btrfstune is unable to convert
it to block group tree.

 # btrfstune /dev/nullb0 --convert-to-block-group-tree
 Error reading 1342193664, -1
 Error reading 1342193664, -1
 ERROR: cannot read chunk root
 ERROR: open ctree failed

[CAUSE]
For read-write opened zoned devices, all the read/write has to be
aligned to its sector size.

However btrfs stores its metadata by extent_buffer::data[], which has
all the structures before it, thus never aligned to zoned device sector
size.

Normally we would require btrfs_pread() and btrfs_pwrite() to do the
extra alignment, but during open_ctree(), we are not aware if a device
is zoned or not.

Thus we rely on if the fd is opened with O_DIRECT flag, if the fd has
O_DIRECT, then we would temporarily set fs_info->zoned for chunk tree
read.

Unforunately not all open_ctree_fd() callers have the flags set
properly, and btrfstune is one of the missing call site.

This makes all the read not properly aligned and cause read failure.

[FIX]
Just manually check if the target device is a zoned one, and set
O_DIRECT accordingly.

Issue: #765
Pull-request: #767
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
@adam900710
Copy link
Collaborator

Closing since it's fixed in v6.8.1 already.

@kdave kdave added this to the v6.8.1 milestone Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug mkfs Changes in mkfs.btrfs
Projects
None yet
Development

No branches or pull requests

3 participants