Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support fallocate(2) #326

Closed
behlendorf opened this issue Jul 18, 2011 · 50 comments
Closed

Support fallocate(2) #326

behlendorf opened this issue Jul 18, 2011 · 50 comments
Labels
Type: Feature Feature request or new feature

Comments

@behlendorf
Copy link
Contributor

Observed by xfstests 075, fallocate(2) is not yet supported

"fallocate is used to preallocate blocks to a file. For filesystems which support the fallocate system call, this is done quickly by allocating blocks and marking them as uninitialized, requiring no IO to the data blocks. This is much faster than creating a file by filling it with zeros."

QA output created by 075
brevity is wit...

-----------------------------------------------
fsx.0 : -d -N numops -S 0
-----------------------------------------------
fsx: main: filesystem does not support fallocate, disabling
: Operation not supported

-----------------------------------------------
fsx.1 : -d -N numops -S 0 -x
-----------------------------------------------

-----------------------------------------------
fsx.2 : -d -N numops -l filelen -S 0
-----------------------------------------------
fsx: main: filesystem does not support fallocate, disabling
: Operation not supported

-----------------------------------------------
fsx.3 : -d -N numops -l filelen -S 0 -x
-----------------------------------------------                                             
@adilger
Copy link
Contributor

adilger commented Jul 29, 2011

It is potentially difficult to meaningfully implement fallocate() for ZFS, or any true COW filesystem. The intent of fallocate is to pre-allocate/reserve space for later use, but with a COW filesystem the pre-allocated blocks cannot be overwritten without allocating new blocks, writing into the new blocks, and releasing the old blocks (if not pinned by snapshots). In all cases, having fallocated blocks (with some new flag that marks them as zeroed) cannot be any better than simply reserving some blocks out of those available for the pool, and somehow crediting a dnode with the ability to allocate from those reserved blocks.

@behlendorf
Copy link
Contributor Author

Exactly. Implementing this correctly would be tricky and perhaps not that valuable since fallocate(2) is Linux-specific. I would expect most developers to use the more portable posix_fallocate() which presumably falls back to an alternate approach when fallocate(2) isn't available. I'm not aware of any codes which will be to inconvenienced by not having fallocate(2) available... other than xfstests apparently.

@RedBeard0531
Copy link

Well you could in theory do something tricky like just creating a sparse file of the correct size. This would avoid the wasted space of storing the zeroed-out data that wouldn't be reusable anyway due to COW. It would unfortunately break the contract that you won't get ENOSPC, but you can't give that guarantee with COW and you would be less likely to hit that after using an enhanced posix_fallocate() since it wouldn't be wasting space on the zeroed pages. Out of curiousity, would there be any difference in final on-disk layout of a sparse file that is filled in vs a file that is first allocated by zero-filling?

I work on mongodb and we use posix_fallocate to quickly allocate large files that we can then mmap. It seems to be the quickest way to preallocate files and have a high probability of contiguous allocations (which again isn't possible due to COW). While I doubt anyone will try to run mongodb on zfs-linux anytime soon (my interest in the project is for a home server), I just wanted to give feedback from a user-space developer's point of view.

@ryao
Copy link
Contributor

ryao commented Apr 23, 2012

Commit cb2d190 should have closed this.

@behlendorf
Copy link
Contributor Author

I was leaving this issue open because the referenced commit only added support for FALLOC_FL_PUNCH_HOLE. There are still other fallocate flags which are not yet handled.

@cwedgwood
Copy link
Contributor

@dechamps this doesn't seem to be working for 3.6.x. Looking at your patch for this it looks like this is expected. Is there an update for recent kernels?

11570 open("holes", O_RDWR|O_CREAT|O_TRUNC|O_CLOEXEC, 0644) = 3
11570 write(3, "\252\252"..., 4194304) = 4194304
11570 fallocate(3, 03, 65536, 196608)   = -1 EOPNOTSUPP (Operation not supported)

3 = fd
03 = mode, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE

@dechamps
Copy link
Contributor

My patch only implements FALLOC_FL_PUNCH_HOLE alone, which is not a valid call to fallocate(). It never worked on any kernel, and will never work until someone implements FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE. Right now it's just a placeholder, basically.

@cwedgwood
Copy link
Contributor

@dechamps thanks for that clarification

even with FALLOC_FL_PUNCH_HOLE (only):

12260 open("holes", O_RDWR|O_CREAT|O_TRUNC|O_CLOEXEC, 0644) = 4
12260 write(4, "\252\252"..., 4194304) = 4194304
12260 fallocate(4, 02, 65536, 196608)   = -1 EOPNOTSUPP (Operation not supported)

02 = mode, FALLOC_FL_PUNCH_HOLE

@RJVB
Copy link

RJVB commented Jan 14, 2014

Apparently fallocate is still not supported on zfs?
But could that be the reason that I cannot seem to use fallocate at all on my systems that have a zfs root, not even on the /boot partition which is on a good ole ext3 slice?

@behlendorf
Copy link
Contributor Author

@RJVB fallocate() for the zfs filesystems has not yet been implemented, however the won't have any impact on an ext3 filesystem.

@morsik
Copy link

morsik commented Apr 4, 2014

You can always use:
dd if=/dev/zero of=bigfile bs=1 count=0 seek=100G
Works immediately.

@behlendorf
Copy link
Contributor Author

As of 0.6.4 the FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE behavior of fallocate(2) is supported. But as noted above, for a variety of reasons implementing a meaningful fallocate(2) to reserve space is problematic for a COW filesystem.

@CMCDragonkai
Copy link

I'm using 0.7.2-1, and I noticed that if you run posix_fallocate on a file with the same size as the length specified, it returns with EBADF. This doesn't happen when I do it on tmpfs.

//usr/bin/env make -s "${0%.*}" && ./"${0%.*}" "$@"; s=$?; rm ./"${0%.*}"; exit $s

#include <fcntl.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>

int main () {
  int fd = open("./randomfile", O_WRONLY | O_CREAT, S_IRUSR | S_IWUSR);
  if (fd == -1) {
    perror("open()");
  }
  int status = posix_fallocate(fd, 0, 100);
  if (status != 0) {
    printf("%s\n", strerror(status));
  }
  return 0;
}

Running the above on an empty or non-existent file works fine, as soon as you run it again, it fails with EBADF. This is bit strange behaviour.

@behlendorf
Copy link
Contributor Author

@CMCDragonkai that does seem odd. Can you please open a new issue with the above comment so we can track and fix this issue.

@pandada8
Copy link

Is allocating disk space (set mode = 0) supported now ?
I notice fallocate still return EOPNOTSUPP

BTW, will fallocate generate less fragmentation than just truncate in random write scenes ?

@DeHackEd
Copy link
Contributor

No, because ZFS's copy-on-write semantics just plain don't allow that.

@CMCDragonkai
Copy link

@behlendorf #6860

@shodanshok
Copy link
Contributor

@behlendorf While it is not possible (due to CoW) to have a fully working fallocate, it would be preferable to have at least a partially-working implementation: some applications[1] use fallocate to create very big files and on non-fallocation filesystem this is a very slow operation. Granted that ZFS and its CoW defeat one of the main fallocate feature (ie: to really reserve space in advance), paying the slow (and SSD-wearing) "fill entire file with 0" behavior is also quite bad.

How would you consider to implement a "fake" fallocation, where fallocate returns success but no real allocation is done? After all, even after a "real" fallocate, reserved space is not guaranteed, as any snapshot can eat into the really available disk space.

[1] One of such application is virt-manager: RAW disk images are, by default, fully fallocated. This, depending on disk size, mean GB or TB of null data (zeroes) written to HDDs/SSDs.

@RJVB
Copy link

RJVB commented Jun 1, 2018 via email

@shodanshok
Copy link
Contributor

shodanshok commented Jun 1, 2018

@RJVB On filesystem supporting fallocate, the filesystem reserves len/blocksize blocks and marks them as uninitialized. This has the following consequences:

  • as blocks are marked as reserved/allocated, the user space application which called fallocate is sure that sufficient space is available to write to all such blocks;

  • as no user data are written (and only some very terse metadata are flushed to disk), fallocate returns almost immediately, enabling very fast file allocations.

Point n.1 (space reservation) is at odds with ZFS because as a CoW filesystems, by its very nature, it continuously allocate new data blocks while keeping track of past ones via snapshots. This means that you can't really count on fallocate to guarantee sufficient disk space to write all blocks, unless you tap into the reservation and/or quota properties. However, if I remember correctly, these properties only apply to an entire dataset, rather than to a single file.

And here come point n.2 - fast file allocation. On platform where fallocate is not natively supported, both the user space application and the libc function can force a full file allocation by writing zeroes for all its length. This is very slow, cause unnecessary wear on SSD and is basically useless on ZFS. Hence my suggestion to always return "true" for fallocate, even when doing nothing (ie: faking a successfull fallocate).

Opinions are welcomed!

@RJVB
Copy link

RJVB commented Jun 2, 2018 via email

@shodanshok
Copy link
Contributor

@RJVB

And as to COW systems being at odds with space reservation: btrfs supports it and AFAIK that's a COW filesystem too

fallocate on BTRFS behave differently than on non-CoW filesystems: while it really allocates blocks for the selected file, any rewrite (after the first write) triggers a new block allocation. This means that file fragmentation is only slightly reduced, and can potentially expose some rough corner with a near-full filesystem.

Why is it basically useless? You still get the space reservation, no?

If you tap onto existing quota/reservation system (which, anyway, operate on dataset rather than single file), yes, I'll end with working space reservation. But if you only count on the fallocated reserved block, any snapshot pinning old data can effectively cause a out of space condition even when writing on fallocated files. Something similar to that:

  • fallocate a file writing all zeroes to it
  • create a snapshot
  • rewrite your fallocated file
  • new space is allocated, which can result in ENOSP condition.

My point is that doing the write in the driver might be more efficient. I don't disagree with your suggestion but software that nowadays falls back to the brute-force method because fallocate() fails might start behaving unexpectedly. Maybe a driver parameter that can be controlled at runtime could activate a low-level actual write-zeroes-to-disk implementation?

I really fail to see why an user space application should fail when presented with a sparse file rather than a preallocated one. However, as you suggest, simple let the option be user selectable. In short, while a BTRFS-like fallocate would be the ideal solution, even a fake (user-selectable) implementation would be desiderable.

@rlaager
Copy link
Member

rlaager commented Nov 13, 2019

@von-copec We are only talking about fallocate() in the ZFS POSIX layer (ZPL) filesystems. If someone puts ext4 on top of a zvol or a file, then ext4 still behaves exactly as it always has.

@ghost
Copy link

ghost commented Nov 13, 2019

@von-copec We are only talking about fallocate() in the ZFS POSIX layer (ZPL) filesystems. If someone puts ext4 on top of a zvol or a file, then ext4 still behaves exactly as it always has.

I understand, I was attempting to emphasize that the behavior of another filesystem layer versus the ZPL would be considered correct when it is on top of a (sparse) ZVOL, and so the ZPL doing the same thing would be the "same amount of correctness".

@adilger
Copy link
Contributor

adilger commented Jun 5, 2020

An update on this topic. In the course of implementing fallocate(mode=0) for Lustre-on-ZFS (https://review.whamcloud.com/36506) the dmu_prealloc() function was being used to implement the preallocation. While this patch isn't working yet, it is informative on this topic. The dmu_prealloc() function has been in ZFs for a long time, but is only used on Illumos to preallocate space in a ZVOL for a core dump, so that the core can later write directly into the ZVOL blocks without invoking ZFS, in case ZFS is itself broken by the crash:

zvol_dump_init->zvol_prealloc->dmu_prealloc->dmu_buf_will_not_fill()

This sets DB_NOFILL on every dbuf. The interesting thing is that dmu_prealloc() actually preallocates the blocks on disk, and the DB_NOFILL->WP_NOFILL results in the leaf (data) blocks being marked in dmu_write_policy() with ZIO_COMPRESS_OFF and ZIO_CHECKSUM_OFF. This is essentially what fallocate(mode=0) wants, namely to have reserved space that is not compressed and can be overwritten (at least once, anyway) without running out of space.

Several open questions exist, since there is absolutely no documentation anywhere about this code:

  • what does dmu_prealloc() do to blocks that were previously allocated? fallocate(mode=0) must not modify existing blocks, only allocate new blocks.
  • the DB_NOFILL appears to make reads of these buffers return zero, which seems correct, so long as it is cleared when the blocks are overwritten. Otherwise, it would not be good if normal writes cannot be read back.
  • can the DB_NOFILL buffers be overwritten by normal DMU writes, clearing the DB_NOFILL state?
  • does the ZIO_CHECKSUM_OFF flag persist if the block is overwritten via normal DMU IO? That would be unfortunate, as it means dmu_prealloc() blocks would not be safe for user data, but could be fixed with a new WP_* flag.

This seems to be a path toward implementing fully-featured ZFS fallocate(mode=0), possibly with some digging in the guts of the code if the semantics are not quite as needed.

If this doesn't work out, it still seems practical to go the easy route, for which I've made a simple patch that implements what was previously described here and could hopefully be landed with a minimum of fuss. I don't have any idea how long it would take the dmu_prealloc() approach to finish, but it would need the changes in my patch anyway.

@RJVB
Copy link

RJVB commented Jun 5, 2020 via email

@adilger
Copy link
Contributor

adilger commented Jun 5, 2020

Several open questions exist, since there is absolutely no documentation anywhere about this code:
Probably an open door, but have you tried to answer your questions by poking around under an Illumos implementation?

Yes, the Illumos implementation references this function exactly once, in the code path referenced above, but no actual comments exist in the code that describe these functions.

@RJVB
Copy link

RJVB commented Jun 5, 2020 via email

@shodanshok
Copy link
Contributor

shodanshok commented Jun 5, 2020

@adilger I am right that this preallocation would use the preallocated blocks for the first write only? If so, this seems somewhat similar to BTRFS approach. If so, I am missing why an application (Lustre, in this case) should expect fallocate being really honored considering that:

  • a snapshot can easily eat into the pool free space, leading to ENOSPC even if prellocation was successful

  • rewriting the file will cause ongoing fragmentation, negating any performance benefit from the previous allocation (with some time)

Disabling compression and checksum seems a way too high price to pay for the very limited benefit (if any) which can be obtained by "true" preallocation on ZFS.

Considering how posix_fallocate simply write zeroes on a filesystem not supporting fallocate, and that these zeroes would be converted to a sparse file if compression is enabled, I would simply suggest to create a module option/pool property/flag "faking" true fallocate (returning 0 but ignoring the operation entirely). I know this sounds bad because it broke the "contract" of the fallocate API itself; however, no such guarantees exists for compressing, CoW filesystem (which seems similar to what you are proposing here, right? #10408)

@adilger
Copy link
Contributor

adilger commented Jun 5, 2020

@shodanshok, I understand and agree that all of those issues exist.

Lustre is a distributed parallel filesystem that layers on top of ZFS, so it isn't the thing that is generating the fallocate() request. It is merely passing on the fallocate() request from a higher-level application down to ZFS, after possibly remapping the arguments appropriately.

"I would simply suggest to create a module option/pool property/flag
"faking" true fallocate (returning 0 but ignoring the operation entirely)"

I've essentially done exactly that with my PR#10408. However, while this probably works fine for a large majority of use cases, it would fail if eg. an application is trying to fallocate multiple files in advance of writing, or in parallel, but there is not actually enough free space in the filesystem. In that case, each individual fallocate() call would verify enough space is available, but the aggregate of those calls is not available. Fixing this would need "write once" semantics for reserved blocks (similar to what dmu_prealloc() provides), or at least an in-memory grant that reserves space from statfs() for the dnode that is released as dbufs are written to the dnode. That would at least avoid obvious multi-file issues, but not prevent other writers from consuming this space. It would also get tricky with fallocate() over a non-sparse file, and whether writes are overlapping, etc. so would not be the preferred solution IMHO.

behlendorf pushed a commit that referenced this issue Jun 18, 2020
Implement semi-compatible functionality for mode=0 (preallocation)
and mode=FALLOC_FL_KEEP_SIZE (preallocation beyond EOF) for ZPL.

Since ZFS does COW and snapshots, preallocating blocks for a file
cannot guarantee that writes to the file will not run out of space.
Even if the first overwrite was guaranteed, it would not handle any
later overwrite of blocks due to COW, so strict compliance is futile.
Instead, make a best-effort check that at least enough free space is
currently available in the pool (with a bit of margin), then create
a sparse file of the requested size and continue on with life.

This does not handle all cases (e.g. several fallocate() calls before
writing into the files when the filesystem is nearly full), which
would require a more complex mechanism to be implemented, probably
based on a modified version of dmu_prealloc(), but is usable as-is.

A new module option zfs_fallocate_reserve_percent is used to control
the reserve margin for any single fallocate call.  By default, this
is 110% of the requested preallocation size, so an additional 10% of
available space is reserved for overhead to allow the application a
good chance of finishing the write when the fallocate() succeeds.
If the heuristics of this basic fallocate implementation are not
desirable, the old non-functional behavior of returning EOPNOTSUPP
for calls can be restored by setting zfs_fallocate_reserve_percent=0.

The parameter of zfs_statvfs() is changed to take an inode instead
of a dentry, since no dentry is available in zfs_fallocate_common().

A few tests from @behlendorf cover basic fallocate functionality.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Issue #326
Closes #10408
lundman referenced this issue in openzfsonosx/openzfs Jun 19, 2020
Implement semi-compatible functionality for mode=0 (preallocation)
and mode=FALLOC_FL_KEEP_SIZE (preallocation beyond EOF) for ZPL.

Since ZFS does COW and snapshots, preallocating blocks for a file
cannot guarantee that writes to the file will not run out of space.
Even if the first overwrite was guaranteed, it would not handle any
later overwrite of blocks due to COW, so strict compliance is futile.
Instead, make a best-effort check that at least enough free space is
currently available in the pool (with a bit of margin), then create
a sparse file of the requested size and continue on with life.

This does not handle all cases (e.g. several fallocate() calls before
writing into the files when the filesystem is nearly full), which
would require a more complex mechanism to be implemented, probably
based on a modified version of dmu_prealloc(), but is usable as-is.

A new module option zfs_fallocate_reserve_percent is used to control
the reserve margin for any single fallocate call.  By default, this
is 110% of the requested preallocation size, so an additional 10% of
available space is reserved for overhead to allow the application a
good chance of finishing the write when the fallocate() succeeds.
If the heuristics of this basic fallocate implementation are not
desirable, the old non-functional behavior of returning EOPNOTSUPP
for calls can be restored by setting zfs_fallocate_reserve_percent=0.

The parameter of zfs_statvfs() is changed to take an inode instead
of a dentry, since no dentry is available in zfs_fallocate_common().

A few tests from @behlendorf cover basic fallocate functionality.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Issue #326
Closes #10408
@behlendorf
Copy link
Contributor Author

Closing. As discussed above basic fallocate() support was added by #10408.

@svaroqui
Copy link

svaroqui commented Jan 4, 2021

This bug is fixed in MariaDB 10.1.48, 10.2.35, 10.3.26, 10.4.16, 10.5.7 by MariaDB Pull Request #1658 a.k.a. adding fall-back logic for the code EOPNOTSUPP

jsai20 pushed a commit to jsai20/zfs that referenced this issue Mar 30, 2021
Implement semi-compatible functionality for mode=0 (preallocation)
and mode=FALLOC_FL_KEEP_SIZE (preallocation beyond EOF) for ZPL.

Since ZFS does COW and snapshots, preallocating blocks for a file
cannot guarantee that writes to the file will not run out of space.
Even if the first overwrite was guaranteed, it would not handle any
later overwrite of blocks due to COW, so strict compliance is futile.
Instead, make a best-effort check that at least enough free space is
currently available in the pool (with a bit of margin), then create
a sparse file of the requested size and continue on with life.

This does not handle all cases (e.g. several fallocate() calls before
writing into the files when the filesystem is nearly full), which
would require a more complex mechanism to be implemented, probably
based on a modified version of dmu_prealloc(), but is usable as-is.

A new module option zfs_fallocate_reserve_percent is used to control
the reserve margin for any single fallocate call.  By default, this
is 110% of the requested preallocation size, so an additional 10% of
available space is reserved for overhead to allow the application a
good chance of finishing the write when the fallocate() succeeds.
If the heuristics of this basic fallocate implementation are not
desirable, the old non-functional behavior of returning EOPNOTSUPP
for calls can be restored by setting zfs_fallocate_reserve_percent=0.

The parameter of zfs_statvfs() is changed to take an inode instead
of a dentry, since no dentry is available in zfs_fallocate_common().

A few tests from @behlendorf cover basic fallocate functionality.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Issue openzfs#326
Closes openzfs#10408
sempervictus pushed a commit to sempervictus/zfs that referenced this issue May 31, 2021
Implement semi-compatible functionality for mode=0 (preallocation)
and mode=FALLOC_FL_KEEP_SIZE (preallocation beyond EOF) for ZPL.

Since ZFS does COW and snapshots, preallocating blocks for a file
cannot guarantee that writes to the file will not run out of space.
Even if the first overwrite was guaranteed, it would not handle any
later overwrite of blocks due to COW, so strict compliance is futile.
Instead, make a best-effort check that at least enough free space is
currently available in the pool (with a bit of margin), then create
a sparse file of the requested size and continue on with life.

This does not handle all cases (e.g. several fallocate() calls before
writing into the files when the filesystem is nearly full), which
would require a more complex mechanism to be implemented, probably
based on a modified version of dmu_prealloc(), but is usable as-is.

A new module option zfs_fallocate_reserve_percent is used to control
the reserve margin for any single fallocate call.  By default, this
is 110% of the requested preallocation size, so an additional 10% of
available space is reserved for overhead to allow the application a
good chance of finishing the write when the fallocate() succeeds.
If the heuristics of this basic fallocate implementation are not
desirable, the old non-functional behavior of returning EOPNOTSUPP
for calls can be restored by setting zfs_fallocate_reserve_percent=0.

The parameter of zfs_statvfs() is changed to take an inode instead
of a dentry, since no dentry is available in zfs_fallocate_common().

A few tests from @behlendorf cover basic fallocate functionality.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Issue openzfs#326
Closes openzfs#10408
mmaybee pushed a commit to mmaybee/openzfs that referenced this issue Apr 6, 2022
* Create a separate zfs suite to test features specific to Delphix. 
* Add a new Github workflow
ryanwalder added a commit to ryanwalder/Trash-Guides that referenced this issue Jul 25, 2022
ZFS does not support fallocate[1] which means if the `pre-allocate`
option is set to true qBittorrent will error when adding torrent files

1. openzfs/zfs#326
ryanwalder added a commit to ryanwalder/Trash-Guides that referenced this issue Jul 25, 2022
ZFS does not support fallocate[1] which means if the `pre-allocate`
option is set to true qBittorrent will error when adding torrent files

1. openzfs/zfs#326
EchterAgo added a commit to EchterAgo/zfs that referenced this issue Nov 11, 2023
…penzfs#326)

* spl-time: Use KeQueryPerformanceCounter instead of KeQueryTickCount

`KeQueryTickCount` seems to only have a 15.625ms resolution unless the
interrupt timer frequency is increased, which should be avoided due to
power usage.

Instead, this switches the `zfs_lbolt`, `gethrtime` and
`random_get_bytes` to use `KeQueryPerformanceCounter`.

On my system this gives a 100ns resolution.

Signed-off-by: Axel Gembe <axel@gembe.net>

* spl-time: Add assertion to gethrtime and cache NANOSEC / freq division

One less division for each call.

Signed-off-by: Axel Gembe <axel@gembe.net>

---------

Signed-off-by: Axel Gembe <axel@gembe.net>
andrewc12 pushed a commit to andrewc12/openzfs that referenced this issue May 18, 2024
…penzfs#326)

* spl-time: Use KeQueryPerformanceCounter instead of KeQueryTickCount

`KeQueryTickCount` seems to only have a 15.625ms resolution unless the
interrupt timer frequency is increased, which should be avoided due to
power usage.

Instead, this switches the `zfs_lbolt`, `gethrtime` and
`random_get_bytes` to use `KeQueryPerformanceCounter`.

On my system this gives a 100ns resolution.

Signed-off-by: Axel Gembe <axel@gembe.net>

* spl-time: Add assertion to gethrtime and cache NANOSEC / freq division

One less division for each call.

Signed-off-by: Axel Gembe <axel@gembe.net>

---------

Signed-off-by: Axel Gembe <axel@gembe.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature Feature request or new feature
Projects
None yet
Development

No branches or pull requests