Sometimes a delay of up to zfs_txg_timeout seconds when doing ftruncate #14290

Safari77 · 2022-12-15T14:51:09Z

System information

Type	Version/Name
Distribution Name	Fedora
Distribution Version	36
Kernel Version	5.15.82
Architecture	x86_64
OpenZFS Version	2.1.7

Describe the problem you're observing

Usually rpm commands execute quickly, but pretty often (10% of the cases) there is 4 second extra delay:

16:22:55.694445 openat(AT_FDCWD, "/usr/lib/sysimage/rpm/rpmdb.sqlite-shm", O_RDWR|O_CREAT|O_NOFOLLOW|O_CLOEXEC, 0644) = 6 <0.000086>
16:22:55.694563 newfstatat(6, "", {st_dev=makedev(0, 0x1e), st_ino=27231358, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=131072, st_blocks=18, st_size=32768, st_atime=1671114175 /* 2022-12-15T16:22:55.105109260+0200 */, st_atime_nsec=105109260, st_mtime=1671114175 /* 2022-12-15T16:22:55.106109257+0200 */, st_mtime_nsec=106109257, st_ctime=1671114175 /* 2022-12-15T16:22:55.106109257+0200 */, st_ctime_nsec=106109257}, AT_EMPTY_PATH) = 0 <0.000011>
16:22:55.694692 geteuid()       = 0 <0.000010>
16:22:55.694759 fchown(6, 0, 0) = 0 <0.000047>
16:22:55.694855 fcntl(6, F_GETLK, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=128, l_len=1, l_pid=0}) = 0 <0.000010>
16:22:55.694909 fcntl(6, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=128, l_len=1}) = 0 <0.000009>
16:22:55.694953 ftruncate(6, 3) = 0 <4.372489>

I have ZFS as root fs running on top of LUKS2. NVMe Corsair MP510 (Max Random Write QD32 IOMeter: Up to 440K IOPS) with practically zero other IO operations when I run rpm.

Describe how to reproduce the problem

Just try rpm again.

Include any warning/errors/backtraces from the system logs

zpool show no errors and:

NAME  PROPERTY              VALUE                  SOURCE
tank  type                  filesystem             -
tank  creation              Wed Nov 10 15:40 2021  -
tank  used                  1.39T                  -
tank  available             296G                   -
tank  referenced            96K                    -
tank  compressratio         1.29x                  -
tank  mounted               no                     -
tank  quota                 none                   default
tank  reservation           none                   default
tank  recordsize            128K                   default
tank  mountpoint            /tank                  default
tank  sharenfs              off                    default
tank  checksum              on                     default
tank  compression           zstd-3                 local
tank  atime                 on                     default
tank  devices               on                     default
tank  exec                  on                     default
tank  setuid                on                     default
tank  readonly              off                    default
tank  zoned                 off                    default
tank  snapdir               hidden                 default
tank  aclmode               discard                default
tank  aclinherit            restricted             default
tank  createtxg             1                      -
tank  canmount              off                    local
tank  xattr                 sa                     local
tank  copies                1                      default
tank  version               5                      -
tank  utf8only              on                     -
tank  normalization         formD                  -
tank  casesensitivity       sensitive              -
tank  vscan                 off                    default
tank  nbmand                off                    default
tank  sharesmb              off                    default
tank  refquota              none                   default
tank  refreservation        none                   default
tank  guid                  8575689710526589949    -
tank  primarycache          all                    default
tank  secondarycache        all                    default
tank  usedbysnapshots       0B                     -
tank  usedbydataset         96K                    -
tank  usedbychildren        1.39T                  -
tank  usedbyrefreservation  0B                     -
tank  logbias               latency                default
tank  objsetid              54                     -
tank  dedup                 off                    default
tank  mlslabel              none                   default
tank  sync                  standard               default
tank  dnodesize             auto                   local
tank  refcompressratio      1.00x                  -
tank  written               0                      -
tank  logicalused           1.77T                  -
tank  logicalreferenced     42K                    -
tank  volmode               default                default
tank  filesystem_limit      none                   default
tank  snapshot_limit        none                   default
tank  filesystem_count      none                   default
tank  snapshot_count        none                   default
tank  snapdev               hidden                 default
tank  acltype               posix                  local
tank  context               none                   default
tank  fscontext             none                   default
tank  defcontext            none                   default
tank  rootcontext           none                   default
tank  relatime              on                     local
tank  redundant_metadata    all                    default
tank  overlay               on                     default
tank  encryption            off                    default
tank  keylocation           none                   default
tank  keyformat             none                   default
tank  pbkdf2iters           0                      default
tank  special_small_blocks  0                      default

The text was updated successfully, but these errors were encountered:

amarshall · 2024-05-10T15:34:59Z

May I suggest the title and description of this issue be altered to make it more generic? It appears to not be specific to RPM, and is at least a more general issue with sqlite, and likely even more generally just ftruncate. There are a few linked issues above where other uses of sqlite on ZFS see this problem.

farcaller · 2024-05-10T15:46:37Z

I can trivially reproduce this in my setup (zfs over luks2 as well). set sync=disabled helps, but that's barely a workaround. OpenZFS 2.2.3 on 6.6.28 x86_64.

Safari77 · 2024-05-10T16:55:58Z

May I suggest the title and description of this issue be altered to make it more generic? It appears to not be specific to RPM, and is at least a more general issue with sqlite, and likely even more generally just ftruncate. There are a few linked issues above where other uses of sqlite on ZFS see this problem.

Would you say it is just ftruncate or also fsync?

kbknapp · 2024-05-10T18:38:15Z

I think it's only ftruncate, but I'm not 100% certain. Reason I believe that is in investigating a similar issue in atuin (atuinsh/atuin#952) the problem does not happen with fsync alone. For context when sqlite is using WAL mode and set to "truncate on commit" it emits the ftruncate does the issue trigger.

Quoting myself from that issue:

[..] looking at the sqlite source it appears the WAL can be set to "truncate on commit", which issues ftruncate followed by osFcntl(walfd, F_FULLSYNC). I'm assuming it means "write transaction commit" which is what atuin history start is doing.

farcaller · 2024-05-10T18:42:04Z

FWIW, I was trying (albeit unsuccessfully) to repro this with sqlite alone, which is funny, given how easy it is to reproduce on the atuin:

kbknapp · 2024-05-10T18:45:39Z

Not sure if it makes a difference, but atuin also sets PRAGMA synchronous

kbknapp · 2024-05-10T18:50:13Z

Oh also, atuin is doing a full transaction for that insert - just doing a single bare insert doesn't trigger the issue from what I can tell.

farcaller · 2024-05-10T19:00:53Z

wasn't able to trigger it with PRAGMA synchronous=FULL; PRAGMA journal_mode=WAL; PRAGMA wal_checkpoint(TRUNCATE); begin transaction; insert into a (i ) values (1); select count(*) from a; commit; yet. I spent enough time on pressing up arrow and enter in waves enough for it to be done more efficiently with a script now.

It's interesting that it comes in waves for atuin. Sometimes there are minutes without this issue and thn it triggers several times in a row. The loadavg of this machine is negligible (0.44 0.73 0.75 on a 12-core box), the ram is almost free. There's a VM slowly spinning (but mostly doing nothing) in the background. iotop is mostly journald and zvol activity.

kbknapp · 2024-05-10T19:18:18Z

I'm going to say some things I don't really understand in the hopes that it triggers something in someone who knows what they're doing...

I hear the ZIL has a default time for "flush" for every 5s set by the module param zfs_txg_timeout. That seems suspicious to me that the atuin issue can induce delays up to around 5s. This also seems to track with other RPM/flatpak issues I've seen on ZFS that use sqlite internally (although I haven't investigated to see if they're using WAL or any truncating...so this may just be anecdotal).

I'm wondering if maybe the ZIL must flush txgs before the return, thus sometimes it's in the low ms if that timer was already about to fire or other times if the ftruncate happens just after a flush it'll be nearly another 5s before return?

Looking at /proc/spl/kstat/zfs/<zpool>/txgs gives info about the txgs, but I'm not familiar enough with ZFS to know what I'm looking at.

Nemo157 · 2024-05-10T19:30:02Z

I hear the ZIL has a default time for "flush" for every 5s set by the module param zfs_txg_timeout. That seems suspicious to me that the atuin issue can induce delays up to around 5s.

I previously tested this, changing that timeout to 20s made atuin's delays start ranging from 0-20s, so I definitely think it is related.

atuinsh#952 openzfs/zfs#14290 https://github.com/Mic92/dotfiles/blob/a7004dd28161bf9bda37d0e7b11a114e90e69155/home-manager/pkgs/atuin/0001-make-atuin-on-zfs-fast-again.patch

Safari77 added the Type: Defect Incorrect behavior (e.g. crash, hang) label Dec 15, 2022

amarshall mentioned this issue May 8, 2023

Pool timed out while waiting for an open connection, ZFS atuinsh/atuin#952

Closed

nazarewk mentioned this issue Nov 3, 2023

Using ZFS on desktop with stock kernel is bad experience NixOS/nixpkgs#169457

Open

nazarewk mentioned this issue Mar 25, 2024

sqlite backend extremely slow on zfs zhaofengli/attic#113

Open

Safari77 changed the title ~~Sometimes excessive delay truncating rpmdb.sqlite-shm~~ Sometimes a delay of several seconds when doing ftruncate May 10, 2024

Safari77 changed the title ~~Sometimes a delay of several seconds when doing ftruncate~~ Sometimes a delay of up to zfs_txg_timeout seconds when doing ftruncate May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sometimes a delay of up to zfs_txg_timeout seconds when doing ftruncate #14290

Sometimes a delay of up to zfs_txg_timeout seconds when doing ftruncate #14290

Safari77 commented Dec 15, 2022

amarshall commented May 10, 2024

farcaller commented May 10, 2024

Safari77 commented May 10, 2024

kbknapp commented May 10, 2024

farcaller commented May 10, 2024

kbknapp commented May 10, 2024

kbknapp commented May 10, 2024

farcaller commented May 10, 2024

kbknapp commented May 10, 2024

Nemo157 commented May 10, 2024

Sometimes a delay of up to zfs_txg_timeout seconds when doing ftruncate #14290

Sometimes a delay of up to zfs_txg_timeout seconds when doing ftruncate #14290

Comments

Safari77 commented Dec 15, 2022

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

amarshall commented May 10, 2024

farcaller commented May 10, 2024

Safari77 commented May 10, 2024

kbknapp commented May 10, 2024

farcaller commented May 10, 2024

kbknapp commented May 10, 2024

kbknapp commented May 10, 2024

farcaller commented May 10, 2024

kbknapp commented May 10, 2024

Nemo157 commented May 10, 2024