Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

btrfs allocation issue #1473

Open
tormath1 opened this issue Jun 19, 2024 · 10 comments
Open

btrfs allocation issue #1473

tormath1 opened this issue Jun 19, 2024 · 10 comments
Labels
kind/bug Something isn't working

Comments

@tormath1
Copy link
Contributor

tormath1 commented Jun 19, 2024

Description

Recently noticed and I'm not sure really since when it is around but BTRFS allocation looks variable from one build to the other (at least on current Alpha and Beta):

Example on Beta-3941.1.0 (good behavior):

$ sudo btrfs fi usage /usr
Overall:
    Device size:		1015.99MiB
    Device allocated:		 572.00MiB
    Device unallocated:		 443.99MiB
    Device missing:		     0.00B
    Device slack:		     0.00B
    Used:			 462.76MiB
    Free (estimated):		 546.67MiB	(min: 546.67MiB)
    Free (statfs, df):		 442.94MiB
    Data ratio:			      1.00
    Metadata ratio:		      1.00
    Global reserve:		   2.57MiB	(used: 0.00B)
    Multiple profiles:		        no

Data+Metadata,single: Size:568.00MiB, Used:462.75MiB (81.47%)
   /dev/dm-0	 568.00MiB

System,single: Size:4.00MiB, Used:4.00KiB (0.10%)
   /dev/dm-0	   4.00MiB

Unallocated:
   /dev/dm-0	 443.99MiB

While on a main build:

$  sudo btrfs fi usage /usr
Overall:
    Device size:		1015.99MiB
    Device allocated:		 684.00MiB
    Device unallocated:		 331.99MiB
    Device missing:		     0.00B
    Device slack:		     0.00B
    Used:			 462.88MiB
    Free (estimated):		 546.61MiB	(min: 546.61MiB)
    Free (statfs, df):		 330.94MiB
    Data ratio:			      1.00
    Metadata ratio:		      1.00
    Global reserve:		   2.51MiB	(used: 0.00B)
    Multiple profiles:		        no

Data+Metadata,single: Size:680.00MiB, Used:462.88MiB (68.07%)
   /dev/dm-0	 680.00MiB

System,single: Size:4.00MiB, Used:4.00KiB (0.10%)
   /dev/dm-0	   4.00MiB

Unallocated:
   /dev/dm-0	 331.99MiB

Allocated space is different.

Impact

The impact is that the filesystem appears to be more used than in reality:

14:15:07   File    Size  Used Avail Use% Type
14:15:07  -/usr   1016M  465M  443M  52% btrfs
14:15:07  +/usr   1016M  465M  331M  59% btrfs

Random behavior example with the last alpha (4012.0.0) release:

 --- a/tmp/4011.0.0+nightly-20240624-2100-o4CAju
 +++ b/tmp/4012.0.0-0AUpWi
 @@ -1,5 +1,5 @@
  File    Size  Used Avail Use% Type
  /boot   127M   61M   66M  48% vfat
 -/usr   1016M  468M  331M  59% btrfs
 +/usr   1016M  468M  443M  52% btrfs

Similar thing can be observed after rerunning a Beta build.

@ader1990
Copy link

Hello @tormath1, I will try to reproduce the issue in my env too, to take a better look.

@ader1990
Copy link

ader1990 commented Jun 28, 2024

Hello,

I have reproduced the behaviour in my environment using the Flatcar SDK to build a Flatcar image ~ 50% chance after running build_image and image_to_vm.sh.
But I cannot reproduce the issue manually, I have tried with this simple script:

#!/bin/bash

set -xe

umount /mnt || true
losetup -d /dev/loop6 || true

# create a loopback file of ~2GB
dd of=test.loop if=/dev/zero bs=1MB count=2048
losetup /dev/loop6 test.loop
# use the exact values from Flatcar layout
mkfs.btrfs --mixed -m single -d single --byte-count 1065345024 --label USR-A /dev/loop6

# mount the btrfs partition
mount -o relatime,seclabel,space_cache=v2,subvolid=5,subvol=/ /dev/loop6 /mnt
btrfs fi usage /mnt

# set the zstd compression
btrfs property set /mnt compression zstd

# write a ~690MB file
dd if=/dev/zero of=/mnt/test_file bs=1KB count=682490 && sync
# replace the ~690MB file with a ~459MB file
dd if=/dev/zero of=/mnt/test_file bs=1KB count=459490 && sync

# df / usage shows correctly
btrfs fi usage /mnt

# try to rebalance and remove the unused btrfs space
btrfs balance start -v -dusage=5 -musage=5 /mnt

# df / usage shows correctly again, no disparity between Free estimated  and Free statsfs/df
btrfs fi usage /mnt

I think this issue is practically a non-issue, as from what I understood in the case of btrfs, the Linux syscalls used by df/statsfs are not properly showing in some conditions the actual correct values.

I will try to reproduce the disparity, but wanted to share this starting point if anyone else is also investigating.

@ader1990
Copy link

I have tried a few times to create the image using this small fix and the sizes are converging:

diff --git a/build_library/disk_util b/build_library/disk_util
index f94317e3c1..32893c87c4 100755
--- a/build_library/disk_util
+++ b/build_library/disk_util
@@ -660,6 +660,7 @@ def ReadWriteSubvol(options, partition, disable_rw):
   with PartitionLoop(options, partition) as loop_dev:
     btrfs_mount = tempfile.mkdtemp()
     Sudo(['mount', '-t', 'btrfs', loop_dev, btrfs_mount])
+    Sudo(['btrfs', 'balance', 'start', '-dusage=0', '-musage=0', btrfs_mount])
     try:
       Sudo(['btrfs', 'property', 'set', '-ts', btrfs_mount, 'ro', 'true' if disable_rw else 'false'])
     finally:

@tormath1 I could not find the actual cause of this issue or reproduce it in isolation yet, but this patch should not do any harm, as the balance gets done right before making the partition readonly and the verity signing.

ader1990 added a commit to flatcar/scripts that referenced this issue Jul 1, 2024
Rebalance the /usr btrfs allocation to the maximum possible, in order to
increase the chance of having the btrfs `Free (statfs, df)` similar to
`Free (estimated)`.

Note that /usr is also a zstd compressed btrfs partition, so the output
of `df` free size and the actual free size after a file write for
example, will be very different, because the data in that file write has
a compression rate only definable after the file sync.

Unfortunately, there is no determinism in the btrfs file system case, because even if
you could in theory pre-compress with zstd the file before, and have an
idea about the size to be used, you still cannot really predict also the metadata
size for that file write.

See: flatcar/Flatcar#1473

Signed-off-by: Adrian Vladu <avladu@cloudbasesolutions.com>
@ader1990
Copy link

ader1990 commented Jul 1, 2024

Adding the commit flatcar/scripts@95d8361 notes here for visibility:

Note that /usr is also a zstd compressed btrfs partition, so the output
of `df` free size and the actual free size after a file write for
example, will be very different, because the data in that file write has
a compression rate only definable after the file sync.

Unfortunately, there is no determinism in the btrfs file system case, because even if
you could in theory pre-compress with zstd the file before, and have an
idea about the size to be used, you still cannot really predict also the metadata
size for that file write.

@ader1990
Copy link

ader1990 commented Jul 1, 2024

While checking the journalctl output on the latest main, I observed that this warning appears 'nologreplay' is deprecated, use 'rescue=nologreplay' instead. But there is no such mount option used in the flatcar/scripts repo as far as I know, the deprecated values were recently removed by flatcar/scripts@18265de.

@jepio do you have an idea from where the warning might come? I checked flatcar init / bootengine repos, but those also look fine.

/usr mount log :

Jul 01 16:38:11 localhost systemd[1]: Found device dev-mapper-usr.device - /dev/mapper/usr.
Jul 01 16:38:11 localhost systemd[1]: Mounting sysusr-usr.mount - /sysusr/usr...o
Jul 01 16:38:11 localhost systemd[1]: Finished verity-setup.service - Verity Setup for /dev/mapper/usr.
Jul 01 16:38:11 localhost kernel: BTRFS info (device dm-0): first mount of filesystem 60877fc8-37bb-4e8a-ae4f-aaea0a123cfa
Jul 01 16:38:11 localhost kernel: BTRFS info (device dm-0): using crc32c (crc32cc-intel) checksum algorithm
Jul 01 16:38:11 localhost kernel: BTRFS warning (device dm-0): 'nologreplay' is deprecated, use 'rescue=nologreplay' instead
Jul 01 16:38:11 localhost kernel: BTRFS info (device dm-0): disabling log replay at mount time
Jul 01 16:38:11 localhost kernel: BTRFS info (device dm-0): using free space treee
Jul 01 16:38:11 localhost systemd[1]: Mounted sysusr-usr.mount - /sysusr/usr.

@jepio
Copy link
Member

jepio commented Jul 1, 2024

ader1990 added a commit to flatcar/scripts that referenced this issue Jul 5, 2024
Rebalance the /usr btrfs allocation to the maximum possible, in order to
increase the chance of having the btrfs `Free (statfs, df)` similar to
`Free (estimated)`.

Note that /usr is also a zstd compressed btrfs partition, so the output
of `df` free size and the actual free size after a file write for
example, will be very different, because the data in that file write has
a compression rate only definable after the file sync.

Unfortunately, there is no determinism in the btrfs file system case, because even if
you could in theory pre-compress with zstd the file before, and have an
idea about the size to be used, you still cannot really predict also the metadata
size for that file write.

See: flatcar/Flatcar#1473

Signed-off-by: Adrian Vladu <avladu@cloudbasesolutions.com>
@ader1990
Copy link

ader1990 commented Aug 9, 2024

I could actually obtain some really weird results during my experiments:

root@localhost ~ # btrfs fi usage /usr
Overall:
    Device size:                1015.99MiB
    Device allocated:           1014.94MiB
    Device unallocated:            1.05MiB
    Device missing:                  0.00B
    Device slack:                    0.00B
    Used:                        465.39MiB
    Free (estimated):            542.93MiB      (min: 542.93MiB)
    Free (statfs, df):               0.00B
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:                2.63MiB      (used: 0.00B)
    Multiple profiles:                  no

Data+Metadata,single: Size:1010.94MiB, Used:465.38MiB (46.03%)
   /dev/mapper/usr      1010.94MiB

System,single: Size:4.00MiB, Used:4.00KiB (0.10%)
   /dev/mapper/usr         4.00MiB

Unallocated:
   /dev/mapper/usr         1.05MiB
root@localhost ~ # df -h /usr
Filesystem       Size  Used Avail Use% Mounted on
/dev/mapper/usr 1016M  469M     0 100% /usr


root@localhost ~ # uname -a
Linux localhost 6.6.43-flatcar #1 SMP PREEMPT_DYNAMIC Wed Aug  7 13:29:34 -00 2024 x86_64 Intel(R) Xeon(R) Gold 6134 CPU @ 3.20GHz GenuineIntel GNU/Linux
root@localhost ~ # cat /etc/os-release
NAME="Flatcar Container Linux by Kinvolk"
ID=flatcar
ID_LIKE=coreos
VERSION=4054.0.0+nightly-20240806-2100
VERSION_ID=4054.0.0
BUILD_ID=nightly-20240806-2100
SYSEXT_LEVEL=1.0
PRETTY_NAME="Flatcar Container Linux by Kinvolk 4054.0.0+nightly-20240806-2100 (Oklo)"
ANSI_COLOR="38;5;75"
HOME_URL="https://flatcar.org/"
BUG_REPORT_URL="https://issues.flatcar.org"
FLATCAR_BOARD="amd64-usr"
CPE_NAME="cpe:2.3:o:flatcar-linux:flatcar_linux:4054.0.0+nightly-20240806-2100:*:*:*:*:*:*:*"

How I managed to obtain those results -> added a btrfs fi defrag in the workflow. Still puzzled on what is happening and if it is an issue in the Linux kernel or btrfs-progs.

Actual command used in the disk_util: Sudo(['btrfs', 'fi', 'defrag', '-r', '-v', options.disk_image]).

@ader1990
Copy link

ader1990 commented Aug 9, 2024

Made some progress and there might be a way to solve the problem, will make a PR with it.
It seems that the only way to deallocate the size is to shrink and increase the filesystem size.

btrfs filesystem resize -500m /tmp/btrfs-mount
btrfs filesystem resize +500m /tmp/btrfs-mount

Flatcar Results:

root@localhost ~ # df -h /usr
Filesystem       Size  Used Avail Use% Mounted on
/dev/mapper/usr 1016M  468M  443M  52% /usr
root@localhost ~ # btrfs fi usage /usr
Overall:
    Device size:                1015.99MiB
    Device allocated:            572.00MiB
    Device unallocated:          443.99MiB
    Device missing:                  0.00B
    Device slack:                    0.00B
    Used:                        465.46MiB
    Free (estimated):            544.02MiB      (min: 544.02MiB)
    Free (statfs, df):           442.94MiB
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:                2.52MiB      (used: 0.00B)
    Multiple profiles:                  no

Data+Metadata,single: Size:568.00MiB, Used:465.46MiB (81.95%)
   /dev/mapper/usr       568.00MiB

System,single: Size:4.00MiB, Used:4.00KiB (0.10%)
   /dev/mapper/usr         4.00MiB

Unallocated:
   /dev/mapper/usr       443.99MiB

ader1990 added a commit to flatcar/scripts that referenced this issue Aug 14, 2024
Rebalance the /usr btrfs allocation to the maximum possible, in order to
increase the chance of having the btrfs `Free (statfs, df)` similar to
`Free (estimated)`.

Note that /usr is also a zstd compressed btrfs partition, so the output
of `df` free size and the actual free size after a file write for
example, will be very different, because the data in that file write has
a compression rate only definable after the file sync.

Unfortunately, there is no determinism in the btrfs file system case, because even if
you could in theory pre-compress with zstd the file before, and have an
idea about the size to be used, you still cannot really predict also the metadata
size for that file write.

See: flatcar/Flatcar#1473

Signed-off-by: Adrian Vladu <avladu@cloudbasesolutions.com>
@ader1990
Copy link

Came up with a script to get the closest reproduction

#!/bin/bash

set -xe

LOOP=/dev/loop15
mkdir /tmp/btrfs-mount || true
umount /tmp/btrfs-mount  || true
losetup -d $LOOP || true

# create a loopback file of ~2GB
dd of=test.loop if=/dev/zero bs=1MB count=2048
losetup $LOOP test.loop
# use the exact values from Flatcar layout
mkfs.btrfs --mixed -m single -d single --byte-count 1065345024 --label USR-A $LOOP

# mount the btrfs partition
mount -o relatime,seclabel,space_cache=v2,subvolid=5,subvol=/ $LOOP /tmp/btrfs-mount

btrfs fi usage /tmp/btrfs-mount

# set the zstd compression
btrfs property set /tmp/btrfs-mount compression zstd

# write a ~690MB file
dd if=/dev/random of=/tmp/btrfs-mount/test_file bs=1KB count=682490 && sync
# replace the ~690MB file with a ~459MB file
dd if=/dev/random of=/tmp/btrfs-mount/test_file bs=1KB count=459490 && sync

# Allocated value is really high
btrfs fi usage /tmp/btrfs-mount

# decrease the filesystem to more than it can actually do
btrfs filesystem resize -500m /tmp/btrfs-mount | true

# Allocated value is got reset to a low value
btrfs fi usage /tmp/btrfs-mount

Output:

# Initial clean fs


Overall:
    Device size:                1015.99MiB
    Device allocated:             12.00MiB
    Device unallocated:         1003.99MiB
    Device missing:                  0.00B
    Used:                         36.00KiB
    Free (estimated):           1010.59MiB      (min: 1010.59MiB)
    Free (statfs, df):          1010.91MiB
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:                1.38MiB      (used: 0.00B)
    Multiple profiles:                  no


# Before resize

Overall:
    Device size:                1015.99MiB
    Device allocated:           1014.94MiB
    Device unallocated:            1.05MiB
    Device missing:                  0.00B
    Used:                        438.72MiB
    Free (estimated):            570.84MiB      (min: 570.84MiB)
    Free (statfs, df):           572.22MiB
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:                1.38MiB      (used: 0.00B)
    Multiple profiles:                  no


# After failed resize
ERROR: unable to resize '/tmp/btrfs-mount': No space left on device
Overall:
    Device size:                1015.99MiB
    Device allocated:            572.00MiB
    Device unallocated:          443.99MiB
    Device missing:                  0.00B
    Used:                        438.73MiB
    Free (estimated):            571.89MiB      (min: 571.89MiB)
    Free (statfs, df):           572.21MiB
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:                1.38MiB      (used: 0.00B)
    Multiple profiles:                  no

@ader1990
Copy link

Opened an issue upstream: https://bugzilla.kernel.org/show_bug.cgi?id=219167

ader1990 added a commit to flatcar/scripts that referenced this issue Aug 20, 2024
Rebalance the /usr btrfs allocation to the maximum possible, in order to
increase the chance of having the btrfs `Free (statfs, df)` similar to
`Free (estimated)`.

Note that /usr is also a zstd compressed btrfs partition, so the output
of `df` free size and the actual free size after a file write for
example, will be very different, because the data in that file write has
a compression rate only definable after the file sync.

Unfortunately, there is no determinism in the btrfs file system case, because even if
you could in theory pre-compress with zstd the file before, and have an
idea about the size to be used, you still cannot really predict also the metadata
size for that file write.

See: flatcar/Flatcar#1473

Signed-off-by: Adrian Vladu <avladu@cloudbasesolutions.com>
ader1990 added a commit to flatcar/scripts that referenced this issue Sep 5, 2024
We are using read-only btrfs compressed for the /usr partition,
and sometimes the free space shown by `btrfs fi usage` and the `df` IS diverging,
with `df` showing the free space from the `btrfs fi usage`'s `Device unallocated`.

I have found to fix the allocation, by decreasing the size to more than the btrfs file system
allows, but curiosly the allocation is getting reset
during the failed attempt and the issue is fixed.

This is of course, a _HACK_, and can at any time produce unwanted behaviours.

Note that the issue cannot be reliably reproduced.

See Flatcar issue: flatcar/Flatcar#1473

See issue upstream:
https://bugzilla.kernel.org/show_bug.cgi?id=219167

Signed-off-by: Adrian Vladu <avladu@cloudbasesolutions.com>
ader1990 added a commit to flatcar/scripts that referenced this issue Sep 5, 2024
We are using read-only btrfs compressed for the /usr partition,
and sometimes the free space shown by `btrfs fi usage` and the `df` IS diverging,
with `df` showing the free space from the `btrfs fi usage`'s `Device unallocated.

I have found a way to fix the allocation, by decreasing the size to more than the btrfs
filesystem allows, but curiosly the allocation is getting reset during the failed
attempt and the issue is fixed.

This is of course, a _HACK_, and can at any time produce unwanted behaviours.

Note that the issue cannot be reliably reproduced.

See Flatcar issue: flatcar/Flatcar#1473

See issue upstream:
https://bugzilla.kernel.org/show_bug.cgi?id=219167

Signed-off-by: Adrian Vladu <avladu@cloudbasesolutions.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
Status: 📝 Needs Triage
Development

No branches or pull requests

3 participants