Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs-2.1.6 patchset #13886

Merged
merged 83 commits into from Oct 3, 2022
Merged

Commits on Jul 14, 2022

  1. Scrub mirror children without BPs

    When scrubbing a raidz/draid pool, which contains a replacing or
    sparing mirror with multiple online children, only one child will
    be read.  This is not normally a serious concern because the DTL
    records are used to determine where a good copy of the data is.
    As long as the data can be read from one child the mirror vdev
    will use it to repair gaps in any of its children.  Furthermore,
    even if the data which was read is corrupt the raidz code will
    detect this and issue its own repair I/O to correct the damage
    in the mirror vdev.
    
    However, in the scenario where the DTL is wrong due to silent
    data corruption (say due to overwriting one child) and the scrub
    happens to read from a child with good data, then the other damaged
    mirror child will not be detected nor repaired.
    
    While this is possible for both raidz and draid vdevs, it's most
    pronounced when using draid.  This is because by default the zed
    will sequentially rebuild a draid pool to a distributed spare,
    and the distributed spare half of the mirror is always preferred
    since it delivers better performance.  This means the damaged
    half of the mirror will go undetected even after scrubbing.
    
    For system administrations this behavior is non-intuitive and in
    a worst case scenario could result in the only good copy of the
    data being unknowingly detached from the mirror.
    
    This change resolves the issue by reading all replacing/sparing
    mirror children when scrubbing.  When the BP isn't available for
    verification, then compare the data buffers from each child.  They
    must all be identical, if not there's silent damage and an error
    is returned to prompt the top-level vdev to issue a repair I/O to
    rewrite the data on all of the mirror children.  Since we can't
    tell which child was wrong a checksum error is logged against the
    replacing or sparing mirror vdev.
    
    Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
    Reviewed-by: Tony Hutter <hutter2@llnl.gov>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes openzfs#13555
    behlendorf committed Jul 14, 2022
    Configuration menu
    Copy the full SHA
    3920d7f View commit details
    Browse the repository at this point in the history

Commits on Jul 26, 2022

  1. Remove refcount from spa_config_*()

    The only reason for spa_config_*() to use refcount instead of simple
    non-atomic (thanks to scl_lock) variable for scl_count is tracking,
    hard disabled for the last 8 years.  Switch to simple int scl_count
    reduces the lock hold time by avoiding atomic, plus makes structure
    fit into single cache line, reducing the locks contention.
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
    Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
    Signed-off-by: Alexander Motin <mav@FreeBSD.org>
    Sponsored-By: iXsystems, Inc.
    Closes openzfs#12287
    amotin authored and behlendorf committed Jul 26, 2022
    Configuration menu
    Copy the full SHA
    5b860ae View commit details
    Browse the repository at this point in the history
  2. Avoid small buffer copying on write

    It is wrong for arc_write_ready() to use zfs_abd_scatter_enabled to
    decide whether to reallocate/copy the buffer, because the answer is
    OS-specific and depends on the buffer size.  Instead of that use
    abd_size_alloc_linear(), moved into public header.
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
    Signed-off-by: Alexander Motin <mav@FreeBSD.org>
    Closes openzfs#12425
    amotin authored and behlendorf committed Jul 26, 2022
    Configuration menu
    Copy the full SHA
    415882d View commit details
    Browse the repository at this point in the history
  3. spa.c: Replace VERIFY(nvlist_*(...) == 0) with fnvlist_* (openzfs#12678)

    The fnvlist versions of the functions are fatal if they fail,
    saving each call from having to include checking the result.
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
    Reviewed-by: Igor Kozhukhov <igor@dilos.org>
    Signed-off-by: Allan Jude <allan@klarasystems.com>
    allanjude authored and behlendorf committed Jul 26, 2022
    Configuration menu
    Copy the full SHA
    72a4709 View commit details
    Browse the repository at this point in the history
  4. Add more control/visibility to spa_load_verify().

    Use error thresholds from policy to control whether to scrub data
    and/or metadata.  If threshold is set to UINT64_MAX, then caller
    probably does not care about result and we may skip that part.
    
    By default import neither set the data error threshold nor read
    the error counter, so skip the data scrub for faster import.
    Metadata are still scrubbed and fail if even single error found.
    
    While there just for symmetry return number of metadata errors in
    case threshold is not set to zero and we haven't reached it.
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
    Signed-off-by: Alexander Motin <mav@FreeBSD.org>
    Closes openzfs#13022
    amotin authored and behlendorf committed Jul 26, 2022
    Configuration menu
    Copy the full SHA
    fdb80a2 View commit details
    Browse the repository at this point in the history
  5. Improve log spacemap load time

    Previous flushing algorithm limited only total number of log blocks to
    the minimum of 256K and 4x number of metaslabs in the pool.  As result,
    system with 1500 disks with 1000 metaslabs each, touching several new
    metaslabs each TXG could grow spacemap log to huge size without much
    benefits.  We've observed one of such systems importing pool for about
    45 minutes.
    
    This patch improves the situation from five sides:
     - By limiting maximum period for each metaslab to be flushed to 1000
    TXGs, that effectively limits maximum number of per-TXG spacemap logs
    to load to the same number.
     - By making flushing more smooth via accounting number of metaslabs
    that were touched after the last flush and actually need another flush,
    not just ms_unflushed_txg bump.
     - By applying zfs_unflushed_log_block_pct to the number of metaslabs
    that were touched after the last flush, not all metaslabs in the pool.
     - By aggressively prefetching per-TXG spacemap logs up to 16 TXGs in
    advance, making log spacemap load process for wide HDD pool CPU-bound,
    accelerating it by many times.
     - By reducing zfs_unflushed_log_block_max from 256K to 128K, reducing
    single-threaded by nature log processing time from ~10 to ~5 minutes.
    
    As further optimization we could skip bumping ms_unflushed_txg for
    metaslabs not touched since the last flush, but that would be an
    incompatible change, requiring new pool feature.
    
    Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Alexander Motin <mav@FreeBSD.org>
    Sponsored-By: iXsystems, Inc.
    Closes openzfs#12789
    amotin authored and behlendorf committed Jul 26, 2022
    Configuration menu
    Copy the full SHA
    dd9c110 View commit details
    Browse the repository at this point in the history
  6. Improve mg_aliquot math

    When calculating mg_aliquot alike to openzfs#12046 use number of unique data
    disks in the vdev, not the total number of children vdev.  Increase
    default value of the tunable from 512KB to 1MB to compensate.
    
    Before this change each disk in striped pool was getting 512KB of
    sequential data, in 2-wide mirror -- 1MB, in 3-wide RAIDZ1 -- 768KB.
    After this change in all the cases each disk should get 1MB.
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
    Signed-off-by: Alexander Motin <mav@FreeBSD.org>
    Sponsored-By: iXsystems, Inc.
    Closes openzfs#13388
    amotin authored and behlendorf committed Jul 26, 2022
    Configuration menu
    Copy the full SHA
    6e1e90d View commit details
    Browse the repository at this point in the history
  7. More speculative prefetcher improvements

    - Make prefetch distance adaptive: up to 4MB prefetch doubles for
    every, hit same as before, but after that it grows by 1/8 every time
    the prefetch read does not complete in time to satisfy the demand.
    My tests show that 4MB is sufficient for wide NVMe pool to saturate
    single reader thread at 2.5GB/s, while new 64MB maximum allows the
    same thread to reach 1.5GB/s on wide HDD pool.  Further distance
    increase may increase speed even more, but less dramatic and with
    higher latency.
    
     - Allow early reuse of inactive prefetch streams: streams that never
    saw hits can be reused immediately if there is a demand, while others
    can be reused after 1s of inactivity, starting with the oldest.  After
    2s of inactivity streams are deleted to free resources same as before.
    This allows by several times increase strided read performance on HDD
    pool in presence of simultaneous random reads, previously filling the
    zfetch_max_streams limit for seconds and so blocking most of prefetch.
    
     - Always issue intermediate indirect block reads with SYNC priority.
    Each of those reads if delayed for longer may delay up to 1024 other
    block prefetches, that may be not good for wide pools.
    
    Reviewed-by: Allan Jude <allan@klarasystems.com>
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Alexander Motin <mav@FreeBSD.org>
    Sponsored-By: iXsystems, Inc.
    Closes openzfs#13452
    amotin authored and behlendorf committed Jul 26, 2022
    Configuration menu
    Copy the full SHA
    884364e View commit details
    Browse the repository at this point in the history
  8. AVL: Remove obsolete branching optimizations

    Modern Clang and GCC can successfully implement simple conditions
    without branching with math and flag operations.  Use of arrays for
    translation no longer helps as much as it was 14+ years ago.
    
    Disassemble of the code generated by Clang 13.0.0 on FreeBSD 13.1,
    Clang 14.0.4 on FreeBSD 14 and GCC 10.2.1 on Debian 11 with this
    change still shows no branching instructions.
    
    Profiling of CPU-bound scan stage of sorted scrub shows reproducible
    reduction of time spent inside avl_find() from 6.52% to 4.58%.
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Alexander Motin <mav@FreeBSD.org>
    Sponsored-By: iXsystems, Inc.
    Closes openzfs#13540
    amotin authored and behlendorf committed Jul 26, 2022
    Configuration menu
    Copy the full SHA
    813e15f View commit details
    Browse the repository at this point in the history
  9. Reduce ZIO io_lock contention on sorted scrub

    During sorted scrub multiple threads (one per vdev) are issuing many
    ZIOs same time, all using the same scn->scn_zio_root ZIO as parent.
    It causes huge lock contention on the single global lock on that ZIO.
    Improve it by introducing per-queue null ZIOs, children to that one,
    and using them instead as proxy.
    
    For 12 SSD pool storing 1.5TB of 4KB blocks on 80-core system this
    dramatically reduces lock contention and reduces scrub time from 21
    minutes down to 12.5, while actual read stages (not scan) are about
    3x faster, reaching 100K blocks per second per vdev.
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Alexander Motin <mav@FreeBSD.org>
    Sponsored-By: iXsystems, Inc.
    Closes openzfs#13553
    amotin authored and behlendorf committed Jul 26, 2022
    Configuration menu
    Copy the full SHA
    916d9de View commit details
    Browse the repository at this point in the history
  10. FreeBSD: Improve crypto_dispatch() handling

    Handle crypto_dispatch() return values same as crp->crp_etype errors.
    On FreeBSD 12 many drivers returned same errors both ways, and lack
    of proper handling for the first ended up in assertion panic later.
    It was changed in FreeBSD 13, but there is no reason to not be safe.
    
    While there, skip waiting for completion, including locking and
    wakeup() call, for sessions on synchronous crypto drivers, such as
    typical aesni and software.
    
    Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Alexander Motin <mav@FreeBSD.org>
    Sponsored-By: iXsystems, Inc.
    Closes openzfs#13563
    amotin authored and behlendorf committed Jul 26, 2022
    Configuration menu
    Copy the full SHA
    881249d View commit details
    Browse the repository at this point in the history
  11. Several sorted scrub optimizations

    - Reduce size and comparison complexity of q_exts_by_size B-tree.
    Previous code used two 64-bit divisions and many other operations to
    compare two B-tree elements.  It created enormous overhead.  This
    implementation moves the math to the upper level and stores the score
    in the B-tree elements themselves.  Since all that we need to store in
    that B-tree is the extent score and offset, those can fit into single
    8 byte value instead of 24 bytes of q_exts_by_addr element and can be
    compared with single operation.
     - Better decouple secondary tree logic from main range_tree by moving
    rt_btree_ops and related functions into dsl_scan.c as ext_size_ops.
    Those functions are very small to worry about the code duplication and
    range_tree does not need to know details such as rt_btree_compare.
     - Instead of accounting number of pending bytes per pool, that needs
    atomic on global variable per block, account the number of non-empty
    per-vdev queues, that change much more rarely.
     - When extent scan is interrupted by TXG end, continue it in the next
    TXG instead of selecting next best extent.  It allows to avoid leaving
    one truncated (and so likely not the best any more) extent each TXG.
    
    On top of some other optimizations this saves about 1.5 minutes out of
    10 to scrub pool of 12 SSDs, storing 1.5TB of 4KB zvol blocks.
    
    Reviewed-by: Paul Dagnelie <pcd@delphix.com>
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Reviewed-by: Tom Caputi <caputit1@tcnj.edu>
    Signed-off-by: Alexander Motin <mav@FreeBSD.org>
    Sponsored-By: iXsystems, Inc.
    Closes openzfs#13576
    amotin authored and behlendorf committed Jul 26, 2022
    Configuration menu
    Copy the full SHA
    a861aa2 View commit details
    Browse the repository at this point in the history
  12. Several B-tree optimizations

    - Introduce first element offset within a leaf.  It allows to reduce
    by ~50% average memmove() size when adding/removing elements.  If the
    added/removed element is in the first half of the leaf, we may shift
    elements before it and adjust the bth_first instead of moving more
    elements after it.
     - Use memcpy() instead of memmove() when we know there is no overlap.
     - Switch from uint64_t to uint32_t.  It does not limit anything,
    but 32-bit arches should appreciate it greatly in hot paths.
     - Store leaf capacity in struct btree to avoid 64-bit divisions.
     - Adjust zfs_btree_insert_into_leaf() to always result in balanced
    leaves after splitting, no matter where the new element was inserted.
    Not that we care about it much, but it should also allow B-trees with
    as little as two elements per leaf instead of 4 previously.
    
    When scrubbing pool of 12 SSDs, storing 1.5TB of 4KB zvol blocks this
    reduces amount of time spent in memmove() inside the scan thread from
    13.7% to 5.7% and total scrub time by ~15 seconds out of 9 minutes.
    It should also reduce spacemaps load time, but I haven't measured it.
    
    Reviewed-by: Paul Dagnelie <pcd@delphix.com>
    Signed-off-by: Alexander Motin <mav@FreeBSD.org>
    Sponsored-By: iXsystems, Inc.
    Closes openzfs#13582
    amotin authored and behlendorf committed Jul 26, 2022
    Configuration menu
    Copy the full SHA
    dc91a6a View commit details
    Browse the repository at this point in the history
  13. Avoid two 64-bit divisions per scanned block

    Change math to make it like the ARC, using multiplications instead.
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Alexander Motin <mav@FreeBSD.org>
    Sponsored-By: iXsystems, Inc.
    Closes openzfs#13591
    amotin authored and behlendorf committed Jul 26, 2022
    Configuration menu
    Copy the full SHA
    5e06805 View commit details
    Browse the repository at this point in the history
  14. Fix and disable blocks statistics during scrub

    Block statistics calculation during scrub I/O issue in case of sorted
    scrub accounted ditto blocks several times.  Embedded blocks on other
    side were not accounted at all.  This change moves the accounting from
    issue to scan stage, that fixes both problems and also allows to avoid
    pool-wide locking and the lock contention it created.
    
    Since this statistics is quite specific and is not even exposed now
    anywhere, disable its calculation by default to not waste CPU time.
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Alexander Motin <mav@FreeBSD.org>
    Sponsored-By: iXsystems, Inc.
    Closes openzfs#13579
    amotin authored and behlendorf committed Jul 26, 2022
    Configuration menu
    Copy the full SHA
    4b8f160 View commit details
    Browse the repository at this point in the history
  15. Avoid memory copies during mirror scrub

    Issuing several scrub reads for a block we may use the parent ZIO
    buffer for one of child ZIOs.  If that read complete successfully,
    then we won't need to copy the data explicitly.  If block has only
    one copy (typical for root vdev, which is also a mirror inside),
    then we never need to copy -- succeed or fail as-is.  Previous
    code also copied data from buffer of every successfully completed
    child ZIO, but that just does not make any sense.
    
    On healthy N-wide mirror this saves all N+1 (or even more in case
    of ditto blocks) memory copies for each scrubbed block, allowing
    CPU to focus mostly on check-summing.  For other vdev types it
    should save one memory copy per block copy at root vdev.
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
    Signed-off-by: Alexander Motin <mav@FreeBSD.org>
    Sponsored-By: iXsystems, Inc.
    Closes openzfs#13606
    amotin authored and behlendorf committed Jul 26, 2022
    Configuration menu
    Copy the full SHA
    03e33b2 View commit details
    Browse the repository at this point in the history
  16. Avoid memory copy when verifying raidz/draid parity

    Before this change for every valid parity column raidz_parity_verify()
    allocated new buffer and copied there existing data, then recalculated
    the parity and compared the result with the copy.  This patch removes
    the memory copy, simply swapping original buffer pointers with newly
    allocated empty ones for parity recalculation and comparison. Original
    buffers with potentially incorrect parity data are then just freed,
    while new recalculated ones are used for repair.
    
    On a pool of 12 4-wide raidz vdevs, storing 1.5TB of 16MB blocks, this
    change reduces memory traffic during scrub by 17% and total unhalted
    CPU time by 25%.
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Alexander Motin <mav@FreeBSD.org>
    Sponsored-By: iXsystems, Inc.
    Closes openzfs#13613
    amotin authored and behlendorf committed Jul 26, 2022
    Configuration menu
    Copy the full SHA
    bbb50e6 View commit details
    Browse the repository at this point in the history
  17. Fix scrub resume from newly created hole.

    It may happen that scan bookmark points to a block that was turned
    into a part of a big hole.  In such case dsl_scan_visitbp() may skip
    it and dsl_scan_check_resume() will not be called for it.  As result
    new scan suspend won't be possible until the end of the object, that
    may take hours if the object is a multi-terabyte ZVOL on a slow HDD
    pool, stretching TXG to all that time, creating all sorts of problems.
    
    This patch changes the resume condition to any greater or equal block,
    so even if we miss the bookmarked block, the next one we find will
    delete the bookmark, allowing new suspend.
    
    Signed-off-by: Alexander Motin <mav@FreeBSD.org>
    Sponsored-By: iXsystems, Inc.
    amotin authored and behlendorf committed Jul 26, 2022
    Configuration menu
    Copy the full SHA
    15868d3 View commit details
    Browse the repository at this point in the history
  18. Remove sha1 hashing from OpenZFS, it's not used anywhere.

    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Reviewed-by: Attila Fülöp <attila@fueloep.org>
    Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
    Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
    Closes openzfs#12895
    Closes openzfs#12902
    Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
    mcmilk authored and behlendorf committed Jul 26, 2022
    Configuration menu
    Copy the full SHA
    4b09770 View commit details
    Browse the repository at this point in the history

Commits on Jul 27, 2022

  1. libtpool: -Wno-clobbered

    Also remove -Wno-unused-but-set-variable
    
    Upstream-bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61118
    Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com>
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
    Closes openzfs#13110
    nabijaczleweli authored and behlendorf committed Jul 27, 2022
    Configuration menu
    Copy the full SHA
    37430e8 View commit details
    Browse the repository at this point in the history
  2. config: prune unused -Wno-bool-compare checks

    Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com>
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
    Closes openzfs#13110
    nabijaczleweli authored and behlendorf committed Jul 27, 2022
    Configuration menu
    Copy the full SHA
    2d235d5 View commit details
    Browse the repository at this point in the history
  3. Silence -Winfinite-recursion warning in luaD_throw()

    This code should be kept inline with the upstream lua version as much
    as possible.  Therefore, we simply want to silence the warning.  This
    check was enabled by default as part of -Wall in gcc 12.1.
    
    Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
    Reviewed-by: Alexander Motin <mav@FreeBSD.org>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes openzfs#13528
    Closes openzfs#13575
    behlendorf committed Jul 27, 2022
    Configuration menu
    Copy the full SHA
    d7a8c57 View commit details
    Browse the repository at this point in the history
  4. Fix -Wattribute-warning in zfs_log_xvattr()

    Restructure the code in zfs_log_xvattr() to use a lr_attr_end
    structure when accessing lr_attr_t elements located after the
    variable sized array.  This makes the code more understandable
    and resolves the accessing beyond the end of the field warnings.
    
    Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
    Reviewed-by: Alexander Motin <mav@FreeBSD.org>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes openzfs#13528
    Closes openzfs#13575
    behlendorf committed Jul 27, 2022
    Configuration menu
    Copy the full SHA
    ef0e506 View commit details
    Browse the repository at this point in the history
  5. Fix -Wattribute-warning in edonr

    The wrong union memory was being accessed in EdonRInit resulting in
    a write beyond size of field compiler warning.  Reference the correct
    member to resolve the warning.  The warning was correct and this in
    case the mistake was harmless.
    
        In function ‘fortify_memcpy_chk’,
        inlined from ‘EdonRInit’ at zfs/module/icp/algs/edonr/edonr.c:494:3:
        ./include/linux/fortify-string.h:344:25: error: call to
        ‘__write_overflow_field’ declared with attribute warning:
        detected write beyond size of field (1st parameter);
        maybe use struct_group()? [-Werror=attribute-warning]
    
    Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
    Reviewed-by: Alexander Motin <mav@FreeBSD.org>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes openzfs#13528
    Closes openzfs#13575
    behlendorf committed Jul 27, 2022
    Configuration menu
    Copy the full SHA
    c771583 View commit details
    Browse the repository at this point in the history
  6. Fix -Wattribute-warning in dsl layer

    The memcpy(), memmove(), and memset() functions have been annotated
    to perform bounds checking when using FORTIFY_SOURCE.  A warning is
    now generted when writing beyond the end of the specified field.
    
    Alternately, the new struct_group() macro could be used to create
    an anonymous union member for use by memcpy().  However, since this
    is the only place the macro would be helpful it's preferable to
    restructure the code slights to avoid the need for additional
    compatibility code when the macro does not exist.
    
    https://lore.kernel.org/lkml/20211118183807.1283332-1-keescook@chromium.org/T/
    
    Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
    Reviewed-by: Alexander Motin <mav@FreeBSD.org>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes openzfs#13528
    Closes openzfs#13575
    behlendorf committed Jul 27, 2022
    Configuration menu
    Copy the full SHA
    087f5de View commit details
    Browse the repository at this point in the history
  7. Fix -Wuse-after-free warning in dbuf_issue_final_prefetch_done()

    Move the use of the private pointer after it is freed.  It's only
    used as a tag so a dereference would never occur, but there's no
    harm in inverting the order to resolve the warning.
    
        module/zfs/dbuf.c: In function 'dbuf_issue_final_prefetch_done':
        module/zfs/dbuf.c:3204:17: error:
        pointer 'private' may be used after 'free' [-Werror=use-after-free]
    
    Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
    Reviewed-by: Alexander Motin <mav@FreeBSD.org>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes openzfs#13528
    Closes openzfs#13575
    behlendorf committed Jul 27, 2022
    Configuration menu
    Copy the full SHA
    6a81173 View commit details
    Browse the repository at this point in the history
  8. Fix -Wuse-after-free warning in dbuf_destroy()

    Move the use of the db pointer after it is freed.  It's only used as
    a tag so a dereference would never occur, but there's no reason we
    can't invert the order to resolve the warning.
    
        module/zfs/dbuf.c: In function 'dbuf_destroy':
        module/zfs/dbuf.c:2953:17: error:
        pointer 'db' may be used after 'free' [-Werror=use-after-free]
    
    Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
    Reviewed-by: Alexander Motin <mav@FreeBSD.org>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes openzfs#13528
    Closes openzfs#13575
    behlendorf committed Jul 27, 2022
    Configuration menu
    Copy the full SHA
    60f2cfd View commit details
    Browse the repository at this point in the history
  9. Fix -Wformat-truncation warning in upgrade_set_callback()

    Extend the buffer slightly resolve the warning.
    
        cmd/zfs/zfs_main.c: In function ‘upgrade_set_callback’:
        cmd/zfs/zfs_main.c:2446:22: error: ‘%llu’ directive output
        may be truncated writing between 1 and 20 bytes into a
        region of size 16 [-Werror=format-truncation=]
        cmd/zfs/zfs_main.c:2445:24: note: ‘snprintf’ output between
        2 and 21 bytes into a destination of size 16
    
    Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
    Reviewed-by: Alexander Motin <mav@FreeBSD.org>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes openzfs#13528
    Closes openzfs#13575
    behlendorf committed Jul 27, 2022
    Configuration menu
    Copy the full SHA
    d483ef3 View commit details
    Browse the repository at this point in the history
  10. Fix -Wformat-overflow warning in zfs_project_handle_dir()

    Switch to using asprintf() to satisfy the compiler and resolve the
    potential format-overflow warning.  Not the conditional before the
    sprintf() would have prevented this regardless.
    
        cmd/zfs/zfs_project.c: In function ‘zfs_project_handle_dir’:
        cmd/zfs/zfs_project.c:241:38: error: ‘/’ directive writing
        1 byte into a region of size between 0 and 4352
        [-Werror=format-overflow=]
        cmd/zfs/zfs_project.c:241:17: note: ‘sprintf’ output between
        2 and 4609 bytes into a destination of size 4352
    
    Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
    Reviewed-by: Alexander Motin <mav@FreeBSD.org>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes openzfs#13528
    Closes openzfs#13575
    behlendorf committed Jul 27, 2022
    Configuration menu
    Copy the full SHA
    d2ff219 View commit details
    Browse the repository at this point in the history
  11. ICP: Add missing stack frame info to SHA asm files

    Since the assembly routines calculating SHA checksums don't use
    a standard stack layout, CFI directives are needed to unroll the
    stack.
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Attila Fülöp <attila@fueloep.org>
    Closes openzfs#11733
    AttilaFueloep authored and behlendorf committed Jul 27, 2022
    Configuration menu
    Copy the full SHA
    b9d862f View commit details
    Browse the repository at this point in the history
  12. Fix objtool: missing int3 after ret warning

    Resolve straight-line speculation warnings reported by objtool
    for x86_64 assembly on Linux when CONFIG_SLS is set.  See the
    following LWN article for the complete details.
    
    https://lwn.net/Articles/877845/
    
    Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
    Reviewed-by: Alexander Motin <mav@FreeBSD.org>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes openzfs#13528
    Closes openzfs#13575
    behlendorf committed Jul 27, 2022
    Configuration menu
    Copy the full SHA
    69ad0bd View commit details
    Browse the repository at this point in the history
  13. ZTS: Fix io_uring support check

    Not all Linux distribution kernels enable io_uring support by
    default.  Update the run time check to verify that the booted
    kernel was built with CONFIG_IO_URING=y.
    
    Reviewed-by: Tony Hutter <hutter2@llnl.gov>
    Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
    Co-authored-by: George Melikov <mail@gmelikov.ru>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes openzfs#13648
    Closes openzfs#13685
    behlendorf committed Jul 27, 2022
    Configuration menu
    Copy the full SHA
    98315be View commit details
    Browse the repository at this point in the history

Commits on Jul 28, 2022

  1. module: lua: ldo: fix pragma name

    /home/nabijaczleweli/store/code/zfs/module/lua/ldo.c:175:32: warning:
    unknown option after ‘#pragma GCC diagnostic’ kind [-Wpragmas]
      175 | #pragma GCC diagnostic ignored "-Winfinite-recursion"a
          |                                ^~~~~~~~~~~~~~~~~~~~~~
    
    Fixes: a6e8113 ("Silence
    -Winfinite-recursion warning in luaD_throw()")
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
    Closes openzfs#13348
    nabijaczleweli authored and behlendorf committed Jul 28, 2022
    Configuration menu
    Copy the full SHA
    17512ab View commit details
    Browse the repository at this point in the history

Commits on Aug 2, 2022

  1. Handle partial reads in zfs_read

    Currently, dmu_read_uio_dnode can read 64K of a requested 1M in one
    loop, get EFAULT back from zfs_uiomove() (because the iovec only holds
    64k), and return EFAULT, which turns into EAGAIN on the way out. EAGAIN
    gets interpreted as "I didn't read anything", the caller tries again
    without consuming the 64k we already read, and we're stuck.
    
    This apparently works on newer kernels because the caller which breaks
    on older Linux kernels by happily passing along a 1M read request and a
    64k iovec just requests 64k at a time.
    
    With this, we now won't return EFAULT if we got a partial read.
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
    Closes openzfs#12370
    Closes openzfs#12509
    Closes openzfs#12516
    rincebrain authored and behlendorf committed Aug 2, 2022
    Configuration menu
    Copy the full SHA
    5c56591 View commit details
    Browse the repository at this point in the history
  2. Revert behavior of 59eab10 on not-Linux

    It turns out that short-circuiting the EFAULT behavior on a short read
    breaks things on FreeBSD. So until there's a nicer solution, let's
    just revert the behavior for not-Linux.
    
    Reference:
    https://reviews.freebsd.org/R10:70f51f0e474ffe1fb74cb427423a2fba3637544d
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
    Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
    Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
    Closes openzfs#12698
    rincebrain authored and behlendorf committed Aug 2, 2022
    Configuration menu
    Copy the full SHA
    035ee62 View commit details
    Browse the repository at this point in the history
  3. Fix checkstyle warning: E275 missing whitespace after keyword

    Reviewed-by: George Melikov <mail@gmelikov.ru>
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
    Closes openzfs#13710
    mcmilk authored and behlendorf committed Aug 2, 2022
    Configuration menu
    Copy the full SHA
    b06aff1 View commit details
    Browse the repository at this point in the history

Commits on Aug 8, 2022

  1. Fix problem with zdb -d

    zdb -d <pool>/<objset ID> does not work when
    other command line arguments are included i.e.
    zdb -U <cachefile> -d <pool>/<objset ID>
    This change fixes the command line parsing
    to handle this situation.  Also fix issue
    where zdb -r <dataset> <file> does not handle
    the root <dataset> of the pool. Introduce -N
    option to force <objset ID> to be interpreted
    as a numeric objsetID.
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
    Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
    Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
    Closes openzfs#12845
    Closes openzfs#12944
    PaulZ-98 authored and behlendorf committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    fcbddc7 View commit details
    Browse the repository at this point in the history

Commits on Aug 9, 2022

  1. Linux 5.19 compat: META

    Update the META file to reflect compatibility with the 5.19 kernel.
    
    Reviewed-by: Tony Hutter <hutter2@llnl.gov>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes openzfs#13715
    behlendorf committed Aug 9, 2022
    Configuration menu
    Copy the full SHA
    57e1052 View commit details
    Browse the repository at this point in the history
  2. Linux 5.20 compat: bdevname()

    As of the Linux 5.20 kernel bdevname() has been removed, all
    callers should use snprintf() and the "%pg" format specifier.
    
    Reviewed-by: Tony Hutter <hutter2@llnl.gov>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes openzfs#13728
    behlendorf committed Aug 9, 2022
    Configuration menu
    Copy the full SHA
    58571ba View commit details
    Browse the repository at this point in the history
  3. Linux 5.20 compat: blk_cleanup_disk()

    As of the Linux 5.20 kernel blk_cleanup_disk() has been removed,
    all callers should use put_disk().
    
    Reviewed-by: Tony Hutter <hutter2@llnl.gov>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes openzfs#13728
    behlendorf committed Aug 9, 2022
    Configuration menu
    Copy the full SHA
    4063d7b View commit details
    Browse the repository at this point in the history
  4. Linux 6.0 compat: register_shrinker() now var-arg

    The 6.0 kernel added a printf-style var-arg for args > 0 to the
    register_shrinker function, in order to add names to shrinkers, in
    commit e33c267ab70de4249d22d7eab1cc7d68a889bac2. This enables the
    shrinkers to have friendly names exposed in /sys/kernel/debug/shrinker/.
    
    Reviewed-by: Tony Hutter <hutter2@llnl.gov>
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Coleman Kane <ckane@colemankane.org>
    Closes openzfs#13748
    ckane authored and behlendorf committed Aug 9, 2022
    Configuration menu
    Copy the full SHA
    e0dbab1 View commit details
    Browse the repository at this point in the history
  5. Fix problem with zdb_objset_id test.

    Use large numbers for datasets with
    numeric names to avoid name and id
    collisions.
    
    Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
    PaulZ-98 authored and behlendorf committed Aug 9, 2022
    Configuration menu
    Copy the full SHA
    db5fd16 View commit details
    Browse the repository at this point in the history

Commits on Aug 12, 2022

  1. arcstat: fix -p option

    When the -p option is used, a list of floats is passed to sep.join(),
    which expects strings. Fix this by converting each value to a string.
    
    Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Roberto Ricci <ricci@disroot.org>
    Closes openzfs#12916 
    Closes openzfs#13767
    r-ricci authored and behlendorf committed Aug 12, 2022
    Configuration menu
    Copy the full SHA
    533779f View commit details
    Browse the repository at this point in the history
  2. contrib: dracut: zfs-snapshot-bootfs: exit status fix

    When the zfs-snapshot-bootfs service attempts to create a snapshot
    that already exists, the exit status of the command is non-zero and
    the service reports failed to the systemd service manager. This is a
    common occurrence if bootfs.snapshot is left set on the kernel command
    line and it should not be considered a failure.
    
    This service was originally set to ignore this error by prefixing
    the command with - on the ExecStart line, but the leading - appears
    to have been dropped in openzfs#13359.
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
    Closes openzfs#13769
    gregory-lee-bartholomew authored and behlendorf committed Aug 12, 2022
    Configuration menu
    Copy the full SHA
    979fd5a View commit details
    Browse the repository at this point in the history

Commits on Sep 13, 2022

  1. Fix use-after-free in btree code

    Coverty static analysis found these.
    
    Reviewed-by: Alexander Motin <mav@FreeBSD.org>
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Reviewed-by: Neal Gompa <ngompa@datto.com>
    Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
    Closes openzfs#10989
    Closes openzfs#13861
    ryao authored and tonyhutter committed Sep 13, 2022
    Configuration menu
    Copy the full SHA
    8131a96 View commit details
    Browse the repository at this point in the history

Commits on Sep 14, 2022

  1. rpm: Use the correct version-release information in dependencies

    This tightly links the subpackages together and ensures that everything
    is upgraded together.
    
    Reviewed-by: Tony Hutter <hutter2@llnl.gov>
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Neal Gompa <ngompa@datto.com>
    Closes openzfs#13489
    Conan-Kudo authored and tonyhutter committed Sep 14, 2022
    Configuration menu
    Copy the full SHA
    e1b49e3 View commit details
    Browse the repository at this point in the history
  2. rpm: Silence "unversioned Obsoletes" warnings on EL 9

    Get rid of RPM warnings on AlmaLinux 9:
    
    "It's not recommended to have unversioned Obsoletes"
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Tony Hutter <hutter2@llnl.gov>
    Closes openzfs#13584
    Closes openzfs#13638
    tonyhutter committed Sep 14, 2022
    Configuration menu
    Copy the full SHA
    f48d9b4 View commit details
    Browse the repository at this point in the history
  3. zed: Ignore false 'atari' partitions in autoreplace

    libudev will sometimes falsely identify an 'atari' partition on a
    blank disk, preventing it from being used in an autoreplace.  This
    seems to be a known issue.  The workaround is to just ignore the
    fake partition and continue with the autoreplace.
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Tony Hutter <hutter2@llnl.gov>
    Closes openzfs#13497
    Closes openzfs#13632
    tonyhutter committed Sep 14, 2022
    Configuration menu
    Copy the full SHA
    acd7464 View commit details
    Browse the repository at this point in the history
  4. zed: Look for NVMe DEVPATH if no ID_BUS

    We tried replacing an NVMe drive using autoreplace, only
    to see zed reject it with:
    
    zed[27955]: zed_udev_monitor: /dev/nvme5n1 no devid source
    
    This happened because ZED saw that ID_BUS was not set by udev
    for the NVMe drive, and thus didn't think it was "real drive".
    This commit allows NVMe drives to be autoreplaced even if
    ID_BUS is not set.
    
    Reviewed-by: Don Brady <don.brady@intel.com>
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Tony Hutter <hutter2@llnl.gov>
    Closes openzfs#13512
    Closes openzfs#13646
    tonyhutter committed Sep 14, 2022
    Configuration menu
    Copy the full SHA
    65f8f92 View commit details
    Browse the repository at this point in the history
  5. ZTS: Fix zpool_expand_001_pos

    `zpool_expand_001_pos` was often failing due to not seeing autoexpand
    commands in the `zpool history`.  During testing, I found this to be
    unreliable (sometimes the "online" wouldn't appear in `zpool history`)
    and unnecessary, as we could simply check that the pool increased in
    size.
    
    This commit revamps the test to check for the expanded pool size
    and corresponding new free space.
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Tony Hutter <hutter2@llnl.gov>
    Closes openzfs#13743
    tonyhutter committed Sep 14, 2022
    Configuration menu
    Copy the full SHA
    b1be0a5 View commit details
    Browse the repository at this point in the history
  6. Importing from cachefile can trip assertion

    When importing from cachefile, it is possible that the builtin retry
    logic will trip an assertion because it also fails to find the pool.
    This fix addresses that case and returns the correct error message to
    the user.
    
    Reviewed-by: Richard Yao <ryao@gentoo.org>
    Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: George Wilson <gwilson@delphix.com>
    Closes openzfs#13781
    grwilson authored and tonyhutter committed Sep 14, 2022
    Configuration menu
    Copy the full SHA
    15b64fb View commit details
    Browse the repository at this point in the history
  7. Apply arc_shrink_shift to ARC above arc_c_min

    It makes sense to free memory in smaller chunks when approaching
    arc_c_min to let other kernel subsystems to free more, since after
    that point we can't free anything.  This also matches behavior on
    Linux, where to shrinker reported only the size above arc_c_min.
    
    Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
    Reviewed-by: Allan Jude <allan@klarasystems.com>
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Alexander Motin <mav@FreeBSD.org>
    Closes openzfs#13794
    amotin authored and tonyhutter committed Sep 14, 2022
    Configuration menu
    Copy the full SHA
    b6ebf27 View commit details
    Browse the repository at this point in the history
  8. FreeBSD: Mark ZFS_MODULE_PARAM_CALL as MPSAFE

    ZFS_MODULE_PARAM_CALL handlers implement their own locking if needed
    and do not require Giant.
    
    Reviewed-by: Alexander Motin <mav@FreeBSD.org>
    Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
    Closes openzfs#13756
    Ryan Moeller authored and tonyhutter committed Sep 14, 2022
    Configuration menu
    Copy the full SHA
    78206a2 View commit details
    Browse the repository at this point in the history
  9. Fix column width in 'zpool iostat -v' and 'zpool list -v'

    This commit fixes a minor spacing issue caused when
    enumerating vdev names, which originated from openzfs#13031
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Reviewed-by: Akash B <akash-b@hpe.com>
    Signed-off-by: Samuel Wycliffe <samuelwycliffe@gmail.com>
    Closes openzfs#13811
    npc203 authored and tonyhutter committed Sep 14, 2022
    Configuration menu
    Copy the full SHA
    aa9e887 View commit details
    Browse the repository at this point in the history
  10. Add xattr_handler support for Android kernels

    Some ARM BSPs run the Android kernel, which has
    a modified xattr_handler->get() function signature.
    This adds support to compile against these kernels.
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
    Signed-off-by: Walter Huf <hufman@gmail.com>
    Closes openzfs#13824
    hufman authored and tonyhutter committed Sep 14, 2022
    Configuration menu
    Copy the full SHA
    2010c18 View commit details
    Browse the repository at this point in the history
  11. zed: Fix config_sync autoexpand flood

    Users were seeing floods of `config_sync` events when autoexpand was
    enabled.  This happened because all "disk status change" udev events
    invoke the autoexpand codepath, which calls zpool_relabel_disk(),
    which in turn cause another "disk status change" event to happen,
    in a feedback loop.  Note that "disk status change" happens every time
    a user calls close() on a block device.
    
    This commit breaks the feedback loop by only allowing an autoexpand
    to happen if the disk actually changed size.
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Tony Hutter <hutter2@llnl.gov>
    Closes: openzfs#7132
    Closes: openzfs#7366
    Closes openzfs#13729
    tonyhutter committed Sep 14, 2022
    Configuration menu
    Copy the full SHA
    7bbfac9 View commit details
    Browse the repository at this point in the history
  12. config: check for parallel(1), use it for cstyle

    Before:
    $ time make cstyle
    real    0m23.118s
    user    0m23.002s
    sys     0m0.114s
    
    After:
    $ time make cstyle
    real    0m4.577s
    user    0m31.487s
    sys     0m0.699s
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
    Issue openzfs#12899
    nabijaczleweli authored and tonyhutter committed Sep 14, 2022
    Configuration menu
    Copy the full SHA
    c8f795b View commit details
    Browse the repository at this point in the history
  13. Introduce a tunable to exclude special class buffers from L2ARC

    Special allocation class or dedup vdevs may have roughly the same
    performance as L2ARC vdevs. Introduce a new tunable to exclude those
    buffers from being cacheable on L2ARC.
    
    Reviewed-by: Don Brady <don.brady@delphix.com>
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: George Amanakis <gamanakis@gmail.com>
    Closes openzfs#11761
    Closes openzfs#12285
    gamanakis authored and tonyhutter committed Sep 14, 2022
    Configuration menu
    Copy the full SHA
    8bd3dca View commit details
    Browse the repository at this point in the history

Commits on Sep 15, 2022

  1. Add physical device size to SIZE column in 'zpool list -v'

    Add physical device size/capacity only for physical devices in
    'zpool list -v' instead of displaying "-" in the SIZE column.
    This would make it easier to see the individual device capacity and
    to determine which spares are large enough to replace which devices.
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Reviewed-by: Tony Hutter <hutter2@llnl.gov>
    Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com>
    Signed-off-by: Akash B <akash-b@hpe.com>
    Closes openzfs#12561
    Closes openzfs#13106
    akashb-22 authored and tonyhutter committed Sep 15, 2022
    Configuration menu
    Copy the full SHA
    03fa3ef View commit details
    Browse the repository at this point in the history
  2. vdev_draid_lookup_map() should not iterate outside draid_maps

    Coverity reported this as an out-of-bounds read.
    
    Reviewed-by: Alexander Motin <mav@FreeBSD.org>
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Reviewed-by: Neal Gompa <ngompa@datto.com>
    Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
    Closes openzfs#13865
    ryao authored and tonyhutter committed Sep 15, 2022
    Configuration menu
    Copy the full SHA
    3f7c174 View commit details
    Browse the repository at this point in the history
  3. make DMU_OT_IS_METADATA and DMU_OT_IS_ENCRYPTED return B_TRUE or B_FALSE

    Without this patch, the
    
        ASSERT3U(dbuf_is_metadata(db), ==, arc_is_metadata(buf));
    
    at the beginning of dbuf_assign_arcbuf can panic
    if the object type is a DMU_OT_NEWTYPE that has
    DMU_OT_METADATA set.
    
    While we're at it, fix DMU_OT_IS_ENCRYPTED as well.
    
    Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
    Reviewed-by: Alexander Motin <mav@FreeBSD.org>
    Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
    Closes openzfs#13842
    problame authored and tonyhutter committed Sep 15, 2022
    Configuration menu
    Copy the full SHA
    cde04ba View commit details
    Browse the repository at this point in the history

Commits on Sep 19, 2022

  1. zfs recv hangs if max recordsize is less than received recordsize

    - Some optimizations for bqueue enqueue/dequeue.
    - Added a fix to prevent deadlock when both bqueue_enqueue_impl()
    and bqueue_dequeue() waits for signal to be triggered.
    
    Reviewed-by: Alexander Motin <mav@FreeBSD.org>
    Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
    Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
    Closes openzfs#13855
    ixhamza authored and tonyhutter committed Sep 19, 2022
    Configuration menu
    Copy the full SHA
    a5b0d42 View commit details
    Browse the repository at this point in the history

Commits on Sep 21, 2022

  1. include: move SPA_MINBLOCKSHIFT and zio_encrypt to sys/fs/zfs.h

    These are used by userspace, so should live in a public header
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
    Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
    Closes openzfs#12116
    nabijaczleweli authored and tonyhutter committed Sep 21, 2022
    Configuration menu
    Copy the full SHA
    faa1e40 View commit details
    Browse the repository at this point in the history
  2. zfs recv hangs if max recordsize is less than received recordsize

    - Some optimizations for bqueue enqueue/dequeue.
    - Added a fix to prevent deadlock when both bqueue_enqueue_impl()
    and bqueue_dequeue() waits for signal to be triggered.
    
    Reviewed-by: Alexander Motin <mav@FreeBSD.org>
    Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
    Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
    Closes openzfs#13855
    ixhamza authored and tonyhutter committed Sep 21, 2022
    Configuration menu
    Copy the full SHA
    d5105f0 View commit details
    Browse the repository at this point in the history
  3. Delay ZFS_PROP_SHARESMB property to handle it for encrypted raw receive

    For encrypted raw receive, objset creation is delayed until a call to
    dmu_recv_stream(). ZFS_PROP_SHARESMB property requires objset to be
    populated when calling zpl_earlier_version(). To correctly handle the
    ZFS_PROP_SHARESMB property for encrypted raw receive, this change
    delays setting the property.
    
    Reviewed-by: Alexander Motin <mav@FreeBSD.org>
    Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
    Closes openzfs#13878
    ixhamza authored and tonyhutter committed Sep 21, 2022
    Configuration menu
    Copy the full SHA
    035e52f View commit details
    Browse the repository at this point in the history
  4. Fix incorrect size given to bqueue_enqueue() call in dmu_redact.c

    We pass sizeof (struct redact_record *) rather than sizeof (struct
    redact_record). Passing the pointer size is wrong.
    
    Coverity caught this in two places.
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
    Closes openzfs#13885
    ryao authored and tonyhutter committed Sep 21, 2022
    Configuration menu
    Copy the full SHA
    5096ed3 View commit details
    Browse the repository at this point in the history
  5. Add zfs_btree_verify_intensity kernel module parameter

    I see a few issues in the issue tracker that might be aided by being
    able to turn this on. We have no module parameter for it, so I would
    like to add one.
    
    Reviewed-by: Alexander Motin <mav@FreeBSD.org>
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
    Closes openzfs#13874
    ryao authored and tonyhutter committed Sep 21, 2022
    Configuration menu
    Copy the full SHA
    b66f8d3 View commit details
    Browse the repository at this point in the history
  6. Revert "Reduce dbuf_find() lock contention"

    This reverts commit 34dbc61.  While this
    change resolved the lock contention observed for certain workloads, it
    inadventantly reduced the maximum hash inserts/removes per second.  This
    appears to be due to the slightly higher acquisition cost of a rwlock vs
    a mutex.
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    behlendorf authored and tonyhutter committed Sep 21, 2022
    Configuration menu
    Copy the full SHA
    91e0215 View commit details
    Browse the repository at this point in the history
  7. Optimize txg_kick() process (openzfs#12274)

    Use dp_dirty_pertxg[] for txg_kick(), instead of dp_dirty_total in
    original code. Extra parameter "txg" is added for txg_kick(), thus it
    knows which txg to kick. Also txg_kick() call is moved from
    dsl_pool_need_dirty_delay() to dsl_pool_dirty_space() so that we can
    know the txg number assigned for txg_kick().
    
    Some unnecessary code regarding dp_dirty_total in txg_sync_thread() is
    also cleaned up.
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
    Reviewed-by: Alexander Motin <mav@FreeBSD.org>
    Signed-off-by: jxdking <lostking2008@hotmail.com>
    Closes openzfs#12274
    jxdking authored and tonyhutter committed Sep 21, 2022
    Configuration menu
    Copy the full SHA
    999830a View commit details
    Browse the repository at this point in the history
  8. Add Module Parameter Regarding Log Size Limit

    zfs_wrlog_data_max
    The upper limit of TX_WRITE log data. Once it is reached,
    write operation is blocked, until log data is cleared out
    after txg sync. It only counts TX_WRITE log with WR_COPIED
    or WR_NEED_COPY.
    
    Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: jxdking <lostking2008@hotmail.com>
    Closes openzfs#12284
    jxdking authored and tonyhutter committed Sep 21, 2022
    Configuration menu
    Copy the full SHA
    d05f303 View commit details
    Browse the repository at this point in the history
  9. Ask libtool to stop hiding some errors

    For openzfs#13083, curiously, it did not print the actual error, just
    that the compile failed with "Error 1".
    
    In theory, this flag should cause it to report errors twice sometimes.
    In practice, I'm pretty okay with reporting some twice if it avoids
    reporting some never.
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
    Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
    Closes openzfs#13086
    rincebrain authored and tonyhutter committed Sep 21, 2022
    Configuration menu
    Copy the full SHA
    ebbbe01 View commit details
    Browse the repository at this point in the history
  10. Improve too large physical ashift handling

    When iterating through children physical ashifts for vdev, prefer
    ones above the maximum logical ashift, that we can actually use,
    but within the administrator defined maximum.
    
    When selecting top-level vdev ashift, do not set it to the defined
    maximum in case physical ashift is even higher, but just ignore one.
    Using the maximum does not prevent misaligned writes, but reduces
    space efficiency.  Since ZFS tries to write data sequentially and
    aggregates the writes, in many cases large misanigned writes may be
    not as bad as the space penalty otherwise.
    
    Allow internal physical ashifts for vdevs higher than SHIFT_MAX.
    May be one day allocator or aggregation could benefit from that.
    
    Reduce zfs_vdev_max_auto_ashift default from 16 (64KB) to 14 (16KB),
    so that ZFS may still use bigger ashifts up to SHIFT_MAX (64KB),
    but only if it really has to or explicitly told to, but not as an
    "optimization".
    
    There are some read-intensive NVMe SSDs that report Preferred Write
    Alignment of 64KB, and attempt to build RAIDZ2 of those leads to a
    space inefficiency that can't be justified.  Instead these changes
    make ZFS fall back to logical ashift of 12 (4KB) by default and
    only warn user that it may be suboptimal for performance.
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
    Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
    Sponsored by:	iXsystems, Inc.
    Closes openzfs#13798
    amotin authored and tonyhutter committed Sep 21, 2022
    Configuration menu
    Copy the full SHA
    44cec45 View commit details
    Browse the repository at this point in the history

Commits on Sep 26, 2022

  1. Refactor Log Size Limit

    Original Log Size Limit implementation blocked all writes in case of
    limit reached until the TXG is committed and the log is freed.  It
    caused huge delays and following speed spikes in application writes.
    
    This implementation instead smoothly throttles writes, using exactly
    the same mechanism as used for dirty data.
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Reviewed-by: jxdking <lostking2008@hotmail.com>
    Signed-off-by: Alexander Motin <mav@FreeBSD.org>
    Sponsored-By: iXsystems, Inc.
    Issue openzfs#12284
    Closes openzfs#13476
    amotin authored and tonyhutter committed Sep 26, 2022
    Configuration menu
    Copy the full SHA
    33223cb View commit details
    Browse the repository at this point in the history

Commits on Sep 27, 2022

  1. Linux: Fix uninitialized variable usage in zio_do_crypt_data()

    Coverity complained about this. An error from `hkdf_sha512()` before uio
    initialization will cause pointers to uninitialized memory to be passed
    to `zio_crypt_destroy_uio()`. This is a regression that was introduced
    by cf63739. Interestingly, this never
    affected FreeBSD, since the FreeBSD version never had that patch ported.
    Since moving uio initialization to the top of this function would slow
    down the qat_crypt() path, we only move the `memset()` calls to the top
    of the function. This is sufficient to fix this problem.
    
    Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
    Reviewed-by: Neal Gompa <ngompa@datto.com>
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
    Closes openzfs#13944
    ryao authored and tonyhutter committed Sep 27, 2022
    Configuration menu
    Copy the full SHA
    835e036 View commit details
    Browse the repository at this point in the history
  2. LUA: Fix CVE-2014-5461

    Apply the fix from upstream.
    
    http://www.lua.org/bugs.html#5.2.2-1
    https://www.opencve.io/cve/CVE-2014-5461
    
    It should be noted that exploiting this requires the `SYS_CONFIG`
    privilege, and anyone with that privilege likely has other opportunities
    to do exploits, so it is unlikely that bad actors could exploit this
    unless system administrators are executing untrusted ZFS Channel
    Programs.
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
    Closes openzfs#13949
    ryao authored and tonyhutter committed Sep 27, 2022
    Configuration menu
    Copy the full SHA
    c973929 View commit details
    Browse the repository at this point in the history

Commits on Sep 28, 2022

  1. FreeBSD: Ignore symlink to i386 includes

    A symlink to i386 includes is created in the build dir on amd64 since
    freebsd/freebsd-src@d07600c
    
    Tell git to ignore it like the other include links.
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
    Closes openzfs#13719
    Ryan Moeller authored and tonyhutter committed Sep 28, 2022
    Configuration menu
    Copy the full SHA
    8dcd6af View commit details
    Browse the repository at this point in the history
  2. FreeBSD: Fix integer conversion for vnlru_free{,_vfsops}()

    When reviewing openzfs#13875, I noticed that our FreeBSD code has an issue
    where it converts from `int64_t` to `int` when calling
    `vnlru_free{,_vfsops}()`. The result is that if the int64_t is `1 <<
    36`, the int will be 0, since the low bits are 0. Even when some low
    bits are set, a value such as `((1 << 36) + 1)` would truncate to 1,
    which is wrong.
    
    There is protection against this on 32-bit platforms, but on 64-bit
    platforms, there is no check to protect us, so we add a check.
    
    Reviewed-by: Alexander Motin <mav@FreeBSD.org>
    Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
    Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
    Closes openzfs#13882
    ryao authored and tonyhutter committed Sep 28, 2022
    Configuration menu
    Copy the full SHA
    55816c6 View commit details
    Browse the repository at this point in the history
  3. FreeBSD: stop passing LK_INTERLOCK to VOP_LOCK

    There is an ongoing effort to eliminate this feature.
    
    Reviewed-by: Alexander Motin <mav@FreeBSD.org>
    Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
    Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
    Closes openzfs#13908
    mjguzik authored and tonyhutter committed Sep 28, 2022
    Configuration menu
    Copy the full SHA
    2c8e3e4 View commit details
    Browse the repository at this point in the history
  4. FreeBSD: catch up to 1400068

    Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
    Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
    Closes openzfs#13909
    mjguzik authored and tonyhutter committed Sep 28, 2022
    Configuration menu
    Copy the full SHA
    eec942c View commit details
    Browse the repository at this point in the history
  5. FreeBSD: handle V_PCATCH

    See https://cgit.FreeBSD.org/src/commit/?id=a75d1ddd74312f5dd79bc1e965f7077679659f2e
    
    Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
    Reviewed-by: Alexander Motin <mav@FreeBSD.org>
    Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
    Closes openzfs#13910
    mjguzik authored and tonyhutter committed Sep 28, 2022
    Configuration menu
    Copy the full SHA
    63d4838 View commit details
    Browse the repository at this point in the history
  6. zpool: Don't print "repairing" on force faulted drives

    If you force fault a drive that's resilvering, it's scan stats can get
    frozen in time, giving the false impression that it's being resilvered.
    This commit checks the vdev state to see if the vdev is healthy before
    reporting "resilvering" or "repairing" in zpool status.
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Tony Hutter <hutter2@llnl.gov>
    Closes openzfs#13927
    Closes openzfs#13930
    tonyhutter committed Sep 28, 2022
    Configuration menu
    Copy the full SHA
    a2705b1 View commit details
    Browse the repository at this point in the history

Commits on Sep 29, 2022

  1. Fix bad free in skein code

    Clang's static analyzer found a bad free caused by skein_mac_atomic().
    It will allocate a context on the stack and then pass it to
    skein_final(), which attempts to free it. Upon inspection,
    skein_digest_atomic() also has the same problem.
    
    These functions were created to match the OpenSolaris ICP API, so I was
    curious how we avoided this in other providers and looked at the SHA2
    code. It appears that SHA2 has a SHA2Final() helper function that is
    called by the exported sha2_mac_final()/sha2_digest_final() as well as
    the sha2_mac_atomic() and sha2_digest_atomic() functions. The real work
    is done in SHA2Final() while some checks and the free are done in
    sha2_mac_final()/sha2_digest_final().
    
    We fix the use after free in the skein code by taking inspiration from
    the SHA2 code. We introduce a skein_final_nofree() that does most of the
    work, and make skein_final() into a function that calls it and then
    frees the memory.
    
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Reviewed-by: Tony Hutter <hutter2@llnl.gov>
    Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
    Closes openzfs#13954
    ryao authored and tonyhutter committed Sep 29, 2022
    Configuration menu
    Copy the full SHA
    566e908 View commit details
    Browse the repository at this point in the history
  2. Tag zfs-2.1.6

    META file and changelog updated.
    
    Signed-off-by: Tony Hutter <hutter2@llnl.gov>
    tonyhutter committed Sep 29, 2022
    Configuration menu
    Copy the full SHA
    6a6bd49 View commit details
    Browse the repository at this point in the history