Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup: 64-bit kernel module parameters should use fixed width types #14004

Closed
wants to merge 2 commits into from

Conversation

ryao
Copy link
Contributor

@ryao ryao commented Oct 8, 2022

Motivation and Context

Various module parameters such as zfs_arc_max were originally uint64_t on OpenSolaris/Illumos, but were changed to unsigned long for Linux compatibility because Linux's kernel default module parameter implementation did not support 64-bit types on 32-bit platforms. This caused problems when porting OpenZFS to Windows because its LLP64 memory model made unsigned long a 32-bit type on 64-bit, which created the undesirable situation that parameters that should accept 64-bit values could not on 64-bit Windows.

Description

Upon inspection, it turns out that the Linux kernel module parameter interface is extensible, such that we are allowed to define our own types. Rather than maintaining the original type change via hacks to to continue shrinking module parameters on 32-bit Linux, we implement support for 64-bit module parameters on Linux.

After doing a review of all 64-bit kernel parameters (found via the man page and also proposed changes by Andrew Innes), the kernel module parameters fell into a few groups:

Parameters that were originally 64-bit on Illumos:

  • dbuf_cache_max_bytes
  • dbuf_metadata_cache_max_bytes
  • l2arc_feed_min_ms
  • l2arc_feed_secs
  • l2arc_headroom
  • l2arc_headroom_boost
  • l2arc_write_boost
  • l2arc_write_max
  • metaslab_aliquot
  • metaslab_force_ganging
  • zfetch_array_rd_sz
  • zfs_arc_max
  • zfs_arc_meta_limit
  • zfs_arc_meta_min
  • zfs_arc_min
  • zfs_async_block_max_blocks
  • zfs_condense_max_obsolete_bytes
  • zfs_condense_min_mapping_bytes
  • zfs_deadman_checktime_ms
  • zfs_deadman_synctime_ms
  • zfs_initialize_chunk_size
  • zfs_initialize_value
  • zfs_lua_max_instrlimit
  • zfs_lua_max_memlimit
  • zil_slog_bulk

Parameters that were originally 32-bit on Illumos:

  • zfs_per_txg_dirty_frees_percent

Parameters that were originally ssize_t on Illumos:

  • zfs_immediate_write_sz

Note that ssize_t is int32_t on 32-bit and int64_t on 64-bit. It has been upgraded to 64-bit.

Parameters that were long/unsigned long because of Linux/FreeBSD influence:

  • l2arc_rebuild_blocks_min_l2size
  • zfs_key_max_salt_uses
  • zfs_max_log_walking
  • zfs_max_logsm_summary_length
  • zfs_metaslab_max_size_cache_sec
  • zfs_min_metaslabs_to_flush
  • zfs_multihost_interval
  • zfs_unflushed_log_block_max
  • zfs_unflushed_log_block_min
  • zfs_unflushed_log_block_pct
  • zfs_unflushed_max_mem_amt
  • zfs_unflushed_max_mem_ppm

New parameters that do not exist in Illumos:

  • l2arc_trim_ahead
  • vdev_file_logical_ashift
  • vdev_file_physical_ashift
  • zfs_arc_dnode_limit
  • zfs_arc_dnode_limit_percent
  • zfs_arc_dnode_reduce_percent
  • zfs_arc_meta_limit_percent
  • zfs_arc_sys_free
  • zfs_deadman_ziotime_ms
  • zfs_delete_blocks
  • zfs_history_output_max
  • zfs_livelist_max_entries
  • zfs_max_async_dedup_frees
  • zfs_max_nvlist_src_size
  • zfs_rebuild_max_segment
  • zfs_rebuild_vdev_limit
  • zfs_unflushed_log_txg_max
  • zfs_vdev_max_auto_ashift
  • zfs_vdev_min_auto_ashift
  • zfs_vnops_read_chunk_size
  • zvol_max_discard_blocks

Rather than clutter the lists with commentary, the module parameters that need comments are repeated below.

A few parameters were defined in Linux/FreeBSD specific code, where the use of ulong/long is not an issue for portability, so we leave them alone:

  • zfs_delete_blocks
  • zfs_key_max_salt_uses
  • zvol_max_discard_blocks

The documentation for a few parameters was found to be incorrect:

  • zfs_deadman_checktime_ms - incorrectly documented as int
  • zfs_delete_blocks - not documented as Linux only
  • zfs_history_output_max - incorrectly documented as int
  • zfs_vnops_read_chunk_size - incorrectly documented as long
  • zvol_max_discard_blocks - incorrectly documented as ulong

The documentation for these has been fixed, alongside the changes to document the switch to fixed width types.

In addition, several kernel module parameters were percentages or held ashift values, so being 64-bit never made sense for them. They have been downgraded to 32-bit:

  • vdev_file_logical_ashift
  • vdev_file_physical_ashift
  • zfs_arc_dnode_limit_percent
  • zfs_arc_dnode_reduce_percent
  • zfs_arc_meta_limit_percent
  • zfs_per_txg_dirty_frees_percent
  • zfs_unflushed_log_block_pct
  • zfs_vdev_max_auto_ashift
  • zfs_vdev_min_auto_ashift

Of special note are zfs_vdev_max_auto_ashift and zfs_vdev_min_auto_ashift, which were already defined as uint64_t, and passed to the kernel as ulong. This is inherently buggy on big endian 32-bit Linux, since the values would not be written to the correct locations. 32-bit FreeBSD was unaffected because its sysctl code correctly treated this as a uint64_t.

Lastly, a code comment suggests that zfs_arc_sys_free is Linux-specific, but there is nothing to indicate to me that it is Linux-specific. Nothing was done about that.

How Has This Been Tested?

A build test has been done.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

module/os/freebsd/zfs/sysctl_os.c Outdated Show resolved Hide resolved
@lundman
Copy link
Contributor

lundman commented Oct 9, 2022

Perfect, this is the dream right here. How I should have fixed it, if I weren't wary of changing Linux.

@ryao ryao force-pushed the linux-module-param branch 3 times, most recently from 866ff7b to d64cf95 Compare October 9, 2022 00:06
@ryao
Copy link
Contributor Author

ryao commented Oct 9, 2022

@lundman Unfortunately, there are a few Linux build failures from ZFS_AC_KERNEL_SRC_MODULE_PARAM_CALL_CONST not giving us the right value. Then, when I observe that the headers should no longer require that since they appear to always set const and strip it out, I get a whole new set of build failures from setting const where I should not. I just tried a compromise. Hopefully, the buildbot will be happy with it. If not, I will keep tinkering with it until I get it working. It should not take long. :)

@ryao ryao force-pushed the linux-module-param branch 2 times, most recently from 7dd56e2 to c823ed2 Compare October 9, 2022 00:42
@ryao ryao force-pushed the linux-module-param branch 5 times, most recently from d56d0ab to 738094a Compare October 9, 2022 01:41
@ryao
Copy link
Contributor Author

ryao commented Oct 9, 2022

I figured out the problem.

On older kernels, module_param_call typecasts the function pointers to (void *), which causes the autotools check to fail when it should not. This went undetected since the end result compiles to the same thing no matter whether it is const or not. However, since this patch began using struct kernel_param_ops, the defective autotools check finally caused a problem.

Then, when we constify things like we should, we trip -Werror because some older kernels define a __check_old_set_param() that checks for a non-constified ->set, which breaks the build process.

Furthermore, the autotools check is obsolete, since all kernels that we support constify struct kernel_param_ops.

I believe that I have fixed this headache. I have broken out the changes to fix it into a separate commit, since there are a logically separate change from the one that this PR was intended to do. However, the commit is included in this PR since it is a dependency.

@ryao
Copy link
Contributor Author

ryao commented Oct 9, 2022

This closes #13984.

@ryao ryao force-pushed the linux-module-param branch 3 times, most recently from 0acf067 to 8fd7af2 Compare October 9, 2022 03:21
On older kernels, the definition for `module_param_call()` typecasts
function pointers to `(void *)`, which triggers -Werror, causing the
check to return false when it should return true.

Fixing this breaks the build process on some older kernels because they
define a `__check_old_set_param()` function in their headers that checks
for a non-constified `->set()`. We workaround that through the c
preprocessor by defining `__check_old_set_param(set)` to `(set)`, which
prevents the build failures.

However, it is now apparent that all kernels that we support have
adopted the GRSecurity change, so there is no need to have an explicit
autotools check for it anymore. We therefore remove the autotools check,
while adding the workaround to our headers for the build time
non-constified `->set()` check done by older kernel headers.

Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
@ryao ryao force-pushed the linux-module-param branch 2 times, most recently from 1faf2c2 to f8f4f06 Compare October 9, 2022 03:34
module/zfs/metaslab.c Outdated Show resolved Hide resolved
@ryao ryao force-pushed the linux-module-param branch 2 times, most recently from 92053bd to 13e9aaf Compare October 9, 2022 17:46
@ryao ryao force-pushed the linux-module-param branch 3 times, most recently from c14efa4 to e0702e2 Compare October 9, 2022 18:34
module/zfs/vdev.c Show resolved Hide resolved
module/zfs/arc.c Outdated Show resolved Hide resolved
@ryao ryao force-pushed the linux-module-param branch 2 times, most recently from e4d5ec2 to b0eebfc Compare October 9, 2022 23:18
@ryao
Copy link
Contributor Author

ryao commented Oct 10, 2022

Of special note are zfs_vdev_max_auto_ashift and zfs_vdev_min_auto_ashift, which were already defined as uint64_t, and passed to the kernel as ulong. This is inherently buggy on big endian 32-bit Linux/FreeBSD, since the values would not be written to the correct locations.

Upon review, this was only an issue on Linux. The FreeBSD code properly handled it as uint64_t. This is a note to myself to update the commit message when I am next at my machine.

Various module parameters such as `zfs_arc_max` were originally
`uint64_t` on OpenSolaris/Illumos, but were changed to `unsigned long`
for Linux compatibility because Linux's kernel default module parameter
implementation did not support 64-bit types on 32-bit platforms. This
caused problems when porting OpenZFS to Windows because its LLP64 memory
model made `unsigned long` a 32-bit type on 64-bit, which created the
undesireable situation that parameters that should accept 64-bit values
could not on 64-bit Windows.

Upon inspection, it turns out that the Linux kernel module parameter
interface is extensible, such that we are allowed to define our own
types. Rather than maintaining the original type change via hacks to to
continue shrinking module parameters on 32-bit Linux, we implement
support for 64-bit module parameters on Linux.

After doing a review of all 64-bit kernel parameters (found via the man
page and also proposed changes by Andrew Innes), the kernel module
parameters fell into a few groups:

Parameters that were originally 64-bit on Illumos:

 * dbuf_cache_max_bytes
 * dbuf_metadata_cache_max_bytes
 * l2arc_feed_min_ms
 * l2arc_feed_secs
 * l2arc_headroom
 * l2arc_headroom_boost
 * l2arc_write_boost
 * l2arc_write_max
 * metaslab_aliquot
 * metaslab_force_ganging
 * zfetch_array_rd_sz
 * zfs_arc_max
 * zfs_arc_meta_limit
 * zfs_arc_meta_min
 * zfs_arc_min
 * zfs_async_block_max_blocks
 * zfs_condense_max_obsolete_bytes
 * zfs_condense_min_mapping_bytes
 * zfs_deadman_checktime_ms
 * zfs_deadman_synctime_ms
 * zfs_initialize_chunk_size
 * zfs_initialize_value
 * zfs_lua_max_instrlimit
 * zfs_lua_max_memlimit
 * zil_slog_bulk

Parameters that were originally 32-bit on Illumos:

 * zfs_per_txg_dirty_frees_percent

Parameters that were originally `ssize_t` on Illumos:

 * zfs_immediate_write_sz

Note that `ssize_t` is `int32_t` on 32-bit and `int64_t` on 64-bit. It
has been upgraded to 64-bit.

Parameters that were `long`/`unsigned long` because of Linux/FreeBSD
influence:

 * l2arc_rebuild_blocks_min_l2size
 * zfs_key_max_salt_uses
 * zfs_max_log_walking
 * zfs_max_logsm_summary_length
 * zfs_metaslab_max_size_cache_sec
 * zfs_min_metaslabs_to_flush
 * zfs_multihost_interval
 * zfs_unflushed_log_block_max
 * zfs_unflushed_log_block_min
 * zfs_unflushed_log_block_pct
 * zfs_unflushed_max_mem_amt
 * zfs_unflushed_max_mem_ppm

New parameters that do not exist in Illumos:

 * l2arc_trim_ahead
 * vdev_file_logical_ashift
 * vdev_file_physical_ashift
 * zfs_arc_dnode_limit
 * zfs_arc_dnode_limit_percent
 * zfs_arc_dnode_reduce_percent
 * zfs_arc_meta_limit_percent
 * zfs_arc_sys_free
 * zfs_deadman_ziotime_ms
 * zfs_delete_blocks
 * zfs_history_output_max
 * zfs_livelist_max_entries
 * zfs_max_async_dedup_frees
 * zfs_max_nvlist_src_size
 * zfs_rebuild_max_segment
 * zfs_rebuild_vdev_limit
 * zfs_unflushed_log_txg_max
 * zfs_vdev_max_auto_ashift
 * zfs_vdev_min_auto_ashift
 * zfs_vnops_read_chunk_size
 * zvol_max_discard_blocks

Rather than clutter the lists with commentary, the module parameters
that need comments are repeated below.

A few parameters were defined in Linux/FreeBSD specific code, where the
use of ulong/long is not an issue for portability, so we leave them
alone:

 * zfs_delete_blocks
 * zfs_key_max_salt_uses
 * zvol_max_discard_blocks

The documentation for a few parameters was found to be incorrect:

 * zfs_deadman_checktime_ms - incorrectly documented as int
 * zfs_delete_blocks - not documented as Linux only
 * zfs_history_output_max - incorrectly documented as int
 * zfs_vnops_read_chunk_size - incorrectly documented as long
 * zvol_max_discard_blocks - incorrectly documented as ulong

The documentation for these has been fixed, alongside the changes to
document the switch to fixed width types.

In addition, several kernel module parameters were percentages or held
ashift values, so being 64-bit never made sense for them. They have been
downgraded to 32-bit:

 * vdev_file_logical_ashift
 * vdev_file_physical_ashift
 * zfs_arc_dnode_limit_percent
 * zfs_arc_dnode_reduce_percent
 * zfs_arc_meta_limit_percent
 * zfs_per_txg_dirty_frees_percent
 * zfs_unflushed_log_block_pct
 * zfs_vdev_max_auto_ashift
 * zfs_vdev_min_auto_ashift

Of special note are `zfs_vdev_max_auto_ashift` and
`zfs_vdev_min_auto_ashift`, which were already defined as `uint64_t`,
and passed to the kernel as `ulong`. This is inherently buggy on big
endian 32-bit Linux, since the values would not be written to the
correct locations. 32-bit FreeBSD was unaffected because its sysctl code
correctly treated this as a `uint64_t`.

Lastly, a code comment suggests that `zfs_arc_sys_free` is
Linux-specific, but there is nothing to indicate to me that it is
Linux-specific. Nothing was done about that.

Original-patch-by: Andrew Innes <andrew.c12@gmail.com>
Original-patch-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
@ryao
Copy link
Contributor Author

ryao commented Oct 10, 2022

The buildbot is now happy with this. I also did many iterations of self review, which caught plenty of tiny flaws. It is at the point where there is nothing else wrong with it that I can find.

My only lingering concern is that the buildbot did not do a FreeBSD build and ZTS run to verify my work on FreeBSD.

@ryao
Copy link
Contributor Author

ryao commented Oct 10, 2022

@lundman @amotin I have revised this patch somewhat since you reviewed it. You can see the changes here:

https://github.com/openzfs/zfs/compare/6a8f69c82fe114d17113662c1516f4d67ecf9ea4..4e71b23f961df070cebef6b4dd075fd4ab5db7f7

Most of it is self explanatory, except for an accidental change to the value of zfs_unflushed_log_block_max that the above diff shows as being changed back to the original value. That had been introduced because I had leaned on #13984 to save time changing the types, #13984 included an accidental change to a default parameter value and I missed it in my initial review prior to publishing the PR. In specific, I took the original patch, deleted the hunks that did not touch a variable type, did a few sed commands until it properly changed the types to uint64_t/int64_t and then applied it. Afterward, I then audited all of the kernel module parameters both specified in that patch and specified in the man page anyway. I missed the value change until reviewing the patch on github, which made it much more obvious thanks to its intra-line change highlighting. I had missed that in my initial review of #13984.

My reuse of that patch to save some time (probably 30 minutes) would also be why I added Original-patch-by:, as while this was a complete rewrite, I started with that one to save some time on my part. In hindsight, the time savings was negligible since I ended up spending at least 8 hours on this. 4 were spent before the PR was made and 4 were spent after it was made on an excessive amount of self-review, plus revisions based on buildbot feedback. I likely have spent more time on this PR than I have spent on any other PR in the past month.

Lastly, the revisions included a change to how I wrote the Linux SPL code. In theory, the revision makes it easy to add support on the Linux side for s32/u32, s16/u16 and s8/u8 module parameter types simply by adding a few lines to module/os/linux/spl/spl-generic.c and include/os/linux/kernel/linux/mod_compat.h:

define_spl_param(s32, "%d")
define_spl_param(u32, "%u")
define_spl_param(s16, "%hd")
define_spl_param(u16, "%hu")
define_spl_param(s8, "%hhd")
define_spl_param(u8, "%hhu")
int spl_param_set_s32(const char *val, zfs_kernel_param_t *kp);
extern int spl_param_get_s32(char *buffer, zfs_kernel_param_t *kp);
extern const struct kernel_param_ops spl_param_ops_s32;
#define	spl_param_ops_S32 spl_param_ops_s32

extern int spl_param_set_u32(const char *val, zfs_kernel_param_t *kp);
extern int spl_param_get_u32(char *buffer, zfs_kernel_param_t *kp);
extern const struct kernel_param_ops spl_param_ops_u32;
#define	spl_param_ops_U32 spl_param_ops_u32

int spl_param_set_s16(const char *val, zfs_kernel_param_t *kp);
extern int spl_param_get_s16(char *buffer, zfs_kernel_param_t *kp);
extern const struct kernel_param_ops spl_param_ops_s16;
#define	spl_param_ops_S16 spl_param_ops_s16

extern int spl_param_set_u16(const char *val, zfs_kernel_param_t *kp);
extern int spl_param_get_u16(char *buffer, zfs_kernel_param_t *kp);
extern const struct kernel_param_ops spl_param_ops_u16;
#define	spl_param_ops_U16 spl_param_ops_u16

int spl_param_set_s8(const char *val, zfs_kernel_param_t *kp);
extern int spl_param_get_s8(char *buffer, zfs_kernel_param_t *kp);
extern const struct kernel_param_ops spl_param_ops_s8;
#define	spl_param_ops_S8 spl_param_ops_s8

extern int spl_param_set_u8(const char *val, zfs_kernel_param_t *kp);
extern int spl_param_get_u8(char *buffer, zfs_kernel_param_t *kp);
extern const struct kernel_param_ops spl_param_ops_u8;
#define	spl_param_ops_U8 spl_param_ops_u8

That is left for a follow-up patch, provided that we find a reason to do it. Also, the FreeBSD side looks like it already supports this if I am not mistaken.

Copy link
Member

@amotin amotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't checked down to every parameter, but generally still looks good to me.

@ryao
Copy link
Contributor Author

ryao commented Oct 10, 2022

That is expected. Much of the patch is unchanged. There were just little fixes done in various places to things that came up afterward on the buildbot and in self review. :)

@behlendorf behlendorf added the Status: Accepted Ready to integrate (reviewed, tested) label Oct 12, 2022
Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Built and loaded on FreeBSD stable/13 and main. I'll update if the tests fail, but things are looking good at a glance. Gotta love jumping through the hoops to get CPP to do something useful.

@behlendorf
Copy link
Contributor

I've resubmitted this PR so we can get a clean run on FreeBSD.

behlendorf pushed a commit that referenced this pull request Oct 13, 2022
Various module parameters such as `zfs_arc_max` were originally
`uint64_t` on OpenSolaris/Illumos, but were changed to `unsigned long`
for Linux compatibility because Linux's kernel default module parameter
implementation did not support 64-bit types on 32-bit platforms. This
caused problems when porting OpenZFS to Windows because its LLP64 memory
model made `unsigned long` a 32-bit type on 64-bit, which created the
undesireable situation that parameters that should accept 64-bit values
could not on 64-bit Windows.

Upon inspection, it turns out that the Linux kernel module parameter
interface is extensible, such that we are allowed to define our own
types. Rather than maintaining the original type change via hacks to to
continue shrinking module parameters on 32-bit Linux, we implement
support for 64-bit module parameters on Linux.

After doing a review of all 64-bit kernel parameters (found via the man
page and also proposed changes by Andrew Innes), the kernel module
parameters fell into a few groups:

Parameters that were originally 64-bit on Illumos:

 * dbuf_cache_max_bytes
 * dbuf_metadata_cache_max_bytes
 * l2arc_feed_min_ms
 * l2arc_feed_secs
 * l2arc_headroom
 * l2arc_headroom_boost
 * l2arc_write_boost
 * l2arc_write_max
 * metaslab_aliquot
 * metaslab_force_ganging
 * zfetch_array_rd_sz
 * zfs_arc_max
 * zfs_arc_meta_limit
 * zfs_arc_meta_min
 * zfs_arc_min
 * zfs_async_block_max_blocks
 * zfs_condense_max_obsolete_bytes
 * zfs_condense_min_mapping_bytes
 * zfs_deadman_checktime_ms
 * zfs_deadman_synctime_ms
 * zfs_initialize_chunk_size
 * zfs_initialize_value
 * zfs_lua_max_instrlimit
 * zfs_lua_max_memlimit
 * zil_slog_bulk

Parameters that were originally 32-bit on Illumos:

 * zfs_per_txg_dirty_frees_percent

Parameters that were originally `ssize_t` on Illumos:

 * zfs_immediate_write_sz

Note that `ssize_t` is `int32_t` on 32-bit and `int64_t` on 64-bit. It
has been upgraded to 64-bit.

Parameters that were `long`/`unsigned long` because of Linux/FreeBSD
influence:

 * l2arc_rebuild_blocks_min_l2size
 * zfs_key_max_salt_uses
 * zfs_max_log_walking
 * zfs_max_logsm_summary_length
 * zfs_metaslab_max_size_cache_sec
 * zfs_min_metaslabs_to_flush
 * zfs_multihost_interval
 * zfs_unflushed_log_block_max
 * zfs_unflushed_log_block_min
 * zfs_unflushed_log_block_pct
 * zfs_unflushed_max_mem_amt
 * zfs_unflushed_max_mem_ppm

New parameters that do not exist in Illumos:

 * l2arc_trim_ahead
 * vdev_file_logical_ashift
 * vdev_file_physical_ashift
 * zfs_arc_dnode_limit
 * zfs_arc_dnode_limit_percent
 * zfs_arc_dnode_reduce_percent
 * zfs_arc_meta_limit_percent
 * zfs_arc_sys_free
 * zfs_deadman_ziotime_ms
 * zfs_delete_blocks
 * zfs_history_output_max
 * zfs_livelist_max_entries
 * zfs_max_async_dedup_frees
 * zfs_max_nvlist_src_size
 * zfs_rebuild_max_segment
 * zfs_rebuild_vdev_limit
 * zfs_unflushed_log_txg_max
 * zfs_vdev_max_auto_ashift
 * zfs_vdev_min_auto_ashift
 * zfs_vnops_read_chunk_size
 * zvol_max_discard_blocks

Rather than clutter the lists with commentary, the module parameters
that need comments are repeated below.

A few parameters were defined in Linux/FreeBSD specific code, where the
use of ulong/long is not an issue for portability, so we leave them
alone:

 * zfs_delete_blocks
 * zfs_key_max_salt_uses
 * zvol_max_discard_blocks

The documentation for a few parameters was found to be incorrect:

 * zfs_deadman_checktime_ms - incorrectly documented as int
 * zfs_delete_blocks - not documented as Linux only
 * zfs_history_output_max - incorrectly documented as int
 * zfs_vnops_read_chunk_size - incorrectly documented as long
 * zvol_max_discard_blocks - incorrectly documented as ulong

The documentation for these has been fixed, alongside the changes to
document the switch to fixed width types.

In addition, several kernel module parameters were percentages or held
ashift values, so being 64-bit never made sense for them. They have been
downgraded to 32-bit:

 * vdev_file_logical_ashift
 * vdev_file_physical_ashift
 * zfs_arc_dnode_limit_percent
 * zfs_arc_dnode_reduce_percent
 * zfs_arc_meta_limit_percent
 * zfs_per_txg_dirty_frees_percent
 * zfs_unflushed_log_block_pct
 * zfs_vdev_max_auto_ashift
 * zfs_vdev_min_auto_ashift

Of special note are `zfs_vdev_max_auto_ashift` and
`zfs_vdev_min_auto_ashift`, which were already defined as `uint64_t`,
and passed to the kernel as `ulong`. This is inherently buggy on big
endian 32-bit Linux, since the values would not be written to the
correct locations. 32-bit FreeBSD was unaffected because its sysctl code
correctly treated this as a `uint64_t`.

Lastly, a code comment suggests that `zfs_arc_sys_free` is
Linux-specific, but there is nothing to indicate to me that it is
Linux-specific. Nothing was done about that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Original-patch-by: Andrew Innes <andrew.c12@gmail.com>
Original-patch-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13984
Closes #14004
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Oct 13, 2022
On older kernels, the definition for `module_param_call()` typecasts
function pointers to `(void *)`, which triggers -Werror, causing the
check to return false when it should return true.

Fixing this breaks the build process on some older kernels because they
define a `__check_old_set_param()` function in their headers that checks
for a non-constified `->set()`. We workaround that through the c
preprocessor by defining `__check_old_set_param(set)` to `(set)`, which
prevents the build failures.

However, it is now apparent that all kernels that we support have
adopted the GRSecurity change, so there is no need to have an explicit
autotools check for it anymore. We therefore remove the autotools check,
while adding the workaround to our headers for the build time
non-constified `->set()` check done by older kernel headers.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes openzfs#13984
Closes openzfs#14004
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Oct 13, 2022
Various module parameters such as `zfs_arc_max` were originally
`uint64_t` on OpenSolaris/Illumos, but were changed to `unsigned long`
for Linux compatibility because Linux's kernel default module parameter
implementation did not support 64-bit types on 32-bit platforms. This
caused problems when porting OpenZFS to Windows because its LLP64 memory
model made `unsigned long` a 32-bit type on 64-bit, which created the
undesireable situation that parameters that should accept 64-bit values
could not on 64-bit Windows.

Upon inspection, it turns out that the Linux kernel module parameter
interface is extensible, such that we are allowed to define our own
types. Rather than maintaining the original type change via hacks to to
continue shrinking module parameters on 32-bit Linux, we implement
support for 64-bit module parameters on Linux.

After doing a review of all 64-bit kernel parameters (found via the man
page and also proposed changes by Andrew Innes), the kernel module
parameters fell into a few groups:

Parameters that were originally 64-bit on Illumos:

 * dbuf_cache_max_bytes
 * dbuf_metadata_cache_max_bytes
 * l2arc_feed_min_ms
 * l2arc_feed_secs
 * l2arc_headroom
 * l2arc_headroom_boost
 * l2arc_write_boost
 * l2arc_write_max
 * metaslab_aliquot
 * metaslab_force_ganging
 * zfetch_array_rd_sz
 * zfs_arc_max
 * zfs_arc_meta_limit
 * zfs_arc_meta_min
 * zfs_arc_min
 * zfs_async_block_max_blocks
 * zfs_condense_max_obsolete_bytes
 * zfs_condense_min_mapping_bytes
 * zfs_deadman_checktime_ms
 * zfs_deadman_synctime_ms
 * zfs_initialize_chunk_size
 * zfs_initialize_value
 * zfs_lua_max_instrlimit
 * zfs_lua_max_memlimit
 * zil_slog_bulk

Parameters that were originally 32-bit on Illumos:

 * zfs_per_txg_dirty_frees_percent

Parameters that were originally `ssize_t` on Illumos:

 * zfs_immediate_write_sz

Note that `ssize_t` is `int32_t` on 32-bit and `int64_t` on 64-bit. It
has been upgraded to 64-bit.

Parameters that were `long`/`unsigned long` because of Linux/FreeBSD
influence:

 * l2arc_rebuild_blocks_min_l2size
 * zfs_key_max_salt_uses
 * zfs_max_log_walking
 * zfs_max_logsm_summary_length
 * zfs_metaslab_max_size_cache_sec
 * zfs_min_metaslabs_to_flush
 * zfs_multihost_interval
 * zfs_unflushed_log_block_max
 * zfs_unflushed_log_block_min
 * zfs_unflushed_log_block_pct
 * zfs_unflushed_max_mem_amt
 * zfs_unflushed_max_mem_ppm

New parameters that do not exist in Illumos:

 * l2arc_trim_ahead
 * vdev_file_logical_ashift
 * vdev_file_physical_ashift
 * zfs_arc_dnode_limit
 * zfs_arc_dnode_limit_percent
 * zfs_arc_dnode_reduce_percent
 * zfs_arc_meta_limit_percent
 * zfs_arc_sys_free
 * zfs_deadman_ziotime_ms
 * zfs_delete_blocks
 * zfs_history_output_max
 * zfs_livelist_max_entries
 * zfs_max_async_dedup_frees
 * zfs_max_nvlist_src_size
 * zfs_rebuild_max_segment
 * zfs_rebuild_vdev_limit
 * zfs_unflushed_log_txg_max
 * zfs_vdev_max_auto_ashift
 * zfs_vdev_min_auto_ashift
 * zfs_vnops_read_chunk_size
 * zvol_max_discard_blocks

Rather than clutter the lists with commentary, the module parameters
that need comments are repeated below.

A few parameters were defined in Linux/FreeBSD specific code, where the
use of ulong/long is not an issue for portability, so we leave them
alone:

 * zfs_delete_blocks
 * zfs_key_max_salt_uses
 * zvol_max_discard_blocks

The documentation for a few parameters was found to be incorrect:

 * zfs_deadman_checktime_ms - incorrectly documented as int
 * zfs_delete_blocks - not documented as Linux only
 * zfs_history_output_max - incorrectly documented as int
 * zfs_vnops_read_chunk_size - incorrectly documented as long
 * zvol_max_discard_blocks - incorrectly documented as ulong

The documentation for these has been fixed, alongside the changes to
document the switch to fixed width types.

In addition, several kernel module parameters were percentages or held
ashift values, so being 64-bit never made sense for them. They have been
downgraded to 32-bit:

 * vdev_file_logical_ashift
 * vdev_file_physical_ashift
 * zfs_arc_dnode_limit_percent
 * zfs_arc_dnode_reduce_percent
 * zfs_arc_meta_limit_percent
 * zfs_per_txg_dirty_frees_percent
 * zfs_unflushed_log_block_pct
 * zfs_vdev_max_auto_ashift
 * zfs_vdev_min_auto_ashift

Of special note are `zfs_vdev_max_auto_ashift` and
`zfs_vdev_min_auto_ashift`, which were already defined as `uint64_t`,
and passed to the kernel as `ulong`. This is inherently buggy on big
endian 32-bit Linux, since the values would not be written to the
correct locations. 32-bit FreeBSD was unaffected because its sysctl code
correctly treated this as a `uint64_t`.

Lastly, a code comment suggests that `zfs_arc_sys_free` is
Linux-specific, but there is nothing to indicate to me that it is
Linux-specific. Nothing was done about that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Original-patch-by: Andrew Innes <andrew.c12@gmail.com>
Original-patch-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes openzfs#13984
Closes openzfs#14004
behlendorf pushed a commit that referenced this pull request Nov 1, 2022
On older kernels, the definition for `module_param_call()` typecasts
function pointers to `(void *)`, which triggers -Werror, causing the
check to return false when it should return true.

Fixing this breaks the build process on some older kernels because they
define a `__check_old_set_param()` function in their headers that checks
for a non-constified `->set()`. We workaround that through the c
preprocessor by defining `__check_old_set_param(set)` to `(set)`, which
prevents the build failures.

However, it is now apparent that all kernels that we support have
adopted the GRSecurity change, so there is no need to have an explicit
autotools check for it anymore. We therefore remove the autotools check,
while adding the workaround to our headers for the build time
non-constified `->set()` check done by older kernel headers.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13984
Closes #14004
pcd1193182 pushed a commit to pcd1193182/zfs that referenced this pull request Sep 26, 2023
* etc: mask zfs-load-key.service

Otherwise, systemd-sysv-generator will generate a service equivalent
that breaks the boot: under systemd this is covered by
zfs-mount-generator

We already do this for zfs-import.service, and other init scripts are
suppressed automatically by the "actual" .service files

Fixes: commit f04b976 ("Add init script
 to load keys")
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#14010
Closes openzfs#14019

* Linux: Remove ZFS_AC_KERNEL_SRC_MODULE_PARAM_CALL_CONST autotools check

On older kernels, the definition for `module_param_call()` typecasts
function pointers to `(void *)`, which triggers -Werror, causing the
check to return false when it should return true.

Fixing this breaks the build process on some older kernels because they
define a `__check_old_set_param()` function in their headers that checks
for a non-constified `->set()`. We workaround that through the c
preprocessor by defining `__check_old_set_param(set)` to `(set)`, which
prevents the build failures.

However, it is now apparent that all kernels that we support have
adopted the GRSecurity change, so there is no need to have an explicit
autotools check for it anymore. We therefore remove the autotools check,
while adding the workaround to our headers for the build time
non-constified `->set()` check done by older kernel headers.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes openzfs#13984
Closes openzfs#14004

* Cleanup: 64-bit kernel module parameters should use fixed width types

Various module parameters such as `zfs_arc_max` were originally
`uint64_t` on OpenSolaris/Illumos, but were changed to `unsigned long`
for Linux compatibility because Linux's kernel default module parameter
implementation did not support 64-bit types on 32-bit platforms. This
caused problems when porting OpenZFS to Windows because its LLP64 memory
model made `unsigned long` a 32-bit type on 64-bit, which created the
undesireable situation that parameters that should accept 64-bit values
could not on 64-bit Windows.

Upon inspection, it turns out that the Linux kernel module parameter
interface is extensible, such that we are allowed to define our own
types. Rather than maintaining the original type change via hacks to to
continue shrinking module parameters on 32-bit Linux, we implement
support for 64-bit module parameters on Linux.

After doing a review of all 64-bit kernel parameters (found via the man
page and also proposed changes by Andrew Innes), the kernel module
parameters fell into a few groups:

Parameters that were originally 64-bit on Illumos:

 * dbuf_cache_max_bytes
 * dbuf_metadata_cache_max_bytes
 * l2arc_feed_min_ms
 * l2arc_feed_secs
 * l2arc_headroom
 * l2arc_headroom_boost
 * l2arc_write_boost
 * l2arc_write_max
 * metaslab_aliquot
 * metaslab_force_ganging
 * zfetch_array_rd_sz
 * zfs_arc_max
 * zfs_arc_meta_limit
 * zfs_arc_meta_min
 * zfs_arc_min
 * zfs_async_block_max_blocks
 * zfs_condense_max_obsolete_bytes
 * zfs_condense_min_mapping_bytes
 * zfs_deadman_checktime_ms
 * zfs_deadman_synctime_ms
 * zfs_initialize_chunk_size
 * zfs_initialize_value
 * zfs_lua_max_instrlimit
 * zfs_lua_max_memlimit
 * zil_slog_bulk

Parameters that were originally 32-bit on Illumos:

 * zfs_per_txg_dirty_frees_percent

Parameters that were originally `ssize_t` on Illumos:

 * zfs_immediate_write_sz

Note that `ssize_t` is `int32_t` on 32-bit and `int64_t` on 64-bit. It
has been upgraded to 64-bit.

Parameters that were `long`/`unsigned long` because of Linux/FreeBSD
influence:

 * l2arc_rebuild_blocks_min_l2size
 * zfs_key_max_salt_uses
 * zfs_max_log_walking
 * zfs_max_logsm_summary_length
 * zfs_metaslab_max_size_cache_sec
 * zfs_min_metaslabs_to_flush
 * zfs_multihost_interval
 * zfs_unflushed_log_block_max
 * zfs_unflushed_log_block_min
 * zfs_unflushed_log_block_pct
 * zfs_unflushed_max_mem_amt
 * zfs_unflushed_max_mem_ppm

New parameters that do not exist in Illumos:

 * l2arc_trim_ahead
 * vdev_file_logical_ashift
 * vdev_file_physical_ashift
 * zfs_arc_dnode_limit
 * zfs_arc_dnode_limit_percent
 * zfs_arc_dnode_reduce_percent
 * zfs_arc_meta_limit_percent
 * zfs_arc_sys_free
 * zfs_deadman_ziotime_ms
 * zfs_delete_blocks
 * zfs_history_output_max
 * zfs_livelist_max_entries
 * zfs_max_async_dedup_frees
 * zfs_max_nvlist_src_size
 * zfs_rebuild_max_segment
 * zfs_rebuild_vdev_limit
 * zfs_unflushed_log_txg_max
 * zfs_vdev_max_auto_ashift
 * zfs_vdev_min_auto_ashift
 * zfs_vnops_read_chunk_size
 * zvol_max_discard_blocks

Rather than clutter the lists with commentary, the module parameters
that need comments are repeated below.

A few parameters were defined in Linux/FreeBSD specific code, where the
use of ulong/long is not an issue for portability, so we leave them
alone:

 * zfs_delete_blocks
 * zfs_key_max_salt_uses
 * zvol_max_discard_blocks

The documentation for a few parameters was found to be incorrect:

 * zfs_deadman_checktime_ms - incorrectly documented as int
 * zfs_delete_blocks - not documented as Linux only
 * zfs_history_output_max - incorrectly documented as int
 * zfs_vnops_read_chunk_size - incorrectly documented as long
 * zvol_max_discard_blocks - incorrectly documented as ulong

The documentation for these has been fixed, alongside the changes to
document the switch to fixed width types.

In addition, several kernel module parameters were percentages or held
ashift values, so being 64-bit never made sense for them. They have been
downgraded to 32-bit:

 * vdev_file_logical_ashift
 * vdev_file_physical_ashift
 * zfs_arc_dnode_limit_percent
 * zfs_arc_dnode_reduce_percent
 * zfs_arc_meta_limit_percent
 * zfs_per_txg_dirty_frees_percent
 * zfs_unflushed_log_block_pct
 * zfs_vdev_max_auto_ashift
 * zfs_vdev_min_auto_ashift

Of special note are `zfs_vdev_max_auto_ashift` and
`zfs_vdev_min_auto_ashift`, which were already defined as `uint64_t`,
and passed to the kernel as `ulong`. This is inherently buggy on big
endian 32-bit Linux, since the values would not be written to the
correct locations. 32-bit FreeBSD was unaffected because its sysctl code
correctly treated this as a `uint64_t`.

Lastly, a code comment suggests that `zfs_arc_sys_free` is
Linux-specific, but there is nothing to indicate to me that it is
Linux-specific. Nothing was done about that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Original-patch-by: Andrew Innes <andrew.c12@gmail.com>
Original-patch-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes openzfs#13984
Closes openzfs#14004

* cstyle: Allow URLs in C++ comments

If a C++ comment contained a URL, the `://` part of the URL would
trigger an error because there was no trailing blank, but trailing
blanks make for an invalid URL.  Modify the check to ignore text
within the C++ comment.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Lindee <chris.lindee+github@gmail.com>
Closes openzfs#13987

* zfs_domount: fix double-disown of dataset / double-free of zfsvfs_t

Before this patch, in zfs_domount, if zfs_root or d_make_root fails, we
leave zfsvfs != NULL. This will lead to execution of the error handling
`if` statement at the `out` label, and hence to a call to
dmu_objset_disown and zfsvfs_free.

However, zfs_umount, which we call upon failure of zfs_root and
d_make_root already does dmu_objset_disown and zfsvfs_free.

I suppose this patch rather adds to the brittleness of this part of the
code base, but I don't want to invest more time in this right now.
To add a regression test, we'd need some kind of fault injection
facility for zfs_root or d_make_root, which doesn't exist right now.
And even then, I think that regression test would be too closely tied
to the implementation.

To repro the double-disown / double-free, do the following:
1. patch zfs_root to always return an error
2. mount a ZFS filesystem

Here's the stack trace you would see then:

  VERIFY3(ds->ds_owner == tag) failed (0000000000000000 == ffff9142361e8000)
  PANIC at dsl_dataset.c:1003:dsl_dataset_disown()
  Showing stack for process 28332
  CPU: 2 PID: 28332 Comm: zpool Tainted: G           O      5.10.103-1.nutanix.el7.x86_64 #1
  Call Trace:
   dump_stack+0x74/0x92
   spl_dumpstack+0x29/0x2b [spl]
   spl_panic+0xd4/0xfc [spl]
   dsl_dataset_disown+0xe9/0x150 [zfs]
   dmu_objset_disown+0xd6/0x150 [zfs]
   zfs_domount+0x17b/0x4b0 [zfs]
   zpl_mount+0x174/0x220 [zfs]
   legacy_get_tree+0x2b/0x50
   vfs_get_tree+0x2a/0xc0
   path_mount+0x2fa/0xa70
   do_mount+0x7c/0xa0
   __x64_sys_mount+0x8b/0xe0
   do_syscall_64+0x38/0x50
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Co-authored-by: Christian Schwarz <christian.schwarz@nutanix.com>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Closes openzfs#14025

* Fix potential NULL pointer dereference in lzc_ioctl()

Users are allowed to pass NULL to resultp, but we unconditionally assume
that they never do. When an external user does pass NULL to resultp, we
dereference a NULL pointer.

Clang's static analyzer complained about this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes openzfs#14008

* Cleanup: Address Clang's static analyzer's unused code complaints

These were categorized as the following:

 * Dead assignment		23
 * Dead increment		4
 * Dead initialization		6
 * Dead nested assignment	18

Most of these are harmless, but since actual issues can hide among them,
we correct them.

That said, there were a few return values that were being ignored that
appeared to merit some correction:

 * `destroy_callback()` in `cmd/zfs/zfs_main.c` ignored the error from
   `destroy_batched()`. We handle it by returning -1 if there is an
   error.

 * `zfs_do_upgrade()` in `cmd/zfs/zfs_main.c` ignored the error from
   `zfs_for_each()`. We handle it by doing a binary OR of the error
   value from the subsequent `zfs_for_each()` call to the existing
   value. This is how errors are mostly handled inside `zfs_for_each()`.
   The error value here is passed to exit from the zfs command, so doing
   a binary or on it is better than what we did previously.

 * `get_zap_prop()` in `module/zfs/zcp_get.c` ignored the error from
   `dsl_prop_get_ds()` when the property is not of type string. We
   return an error when it does. There is a small concern that the
   `zfs_get_temporary_prop()` call would handle things, but in the case
   that it does not, we would be pushing an uninitialized numval onto
   the lua stack. It is expected that `dsl_prop_get_ds()` will succeed
   anytime that `zfs_get_temporary_prop()` does, so that not giving it a
   chance to fix things is not a problem.

 * `draid_merge_impl()` in `tests/zfs-tests/cmd/draid.c` used
   `nvlist_add_nvlist()` twice in ways in which errors are expected to
   be impossible, so we switch to `fnvlist_add_nvlist()`.

A few notable ones did not merit use of the return value, so we
suppressed it with `(void)`:

 * `write_free_diffs()` in `lib/libzfs/libzfs_diff.c` ignored the error
   value from `describe_free()`. A look through the commit history
   revealed that this was intentional.

 * `arc_evict_hdr()` in `module/zfs/arc.c` did not need to use the
   returned handle from `arc_hdr_realloc()` because it is already
   referenced in lists.

 * `spa_vdev_detach()` in `module/zfs/spa.c` has a comment explicitly
   saying not to use the error from `vdev_label_init()` because whatever
   causes the error could be the reason why a detach is being done.

Unfortunately, I am not presently able to analyze the kernel modules
with Clang's static analyzer, so I could have missed some cases of this.
In cases where reports were present in code that is duplicated between
Linux and FreeBSD, I made a conscious effort to fix the FreeBSD version
too.

After this commit is merged, regressions like dee8934 should become
extremely obvious with Clang's static analyzer since a regression would
appear in the results as the only instance of unused code. That assumes
that Coverity does not catch the issue first.

My local branch with fixes from all of my outstanding non-draft pull
requests shows 118 reports from Clang's static anlayzer after this
patch. That is down by 51 from 169.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Cedric Berger <cedric@precidata.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes openzfs#13986

* zstream: allow decompress to fix metadata for uncompressed records

If a record is uncompressed on-disk but the block pointer insists
otherwise, reading it will return EIO.  This commit adds an "off" type
to the "zstream decompress" command.  Using it will set the compression
field in a zfs stream to "off" without changing the record's data.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by:	Alan Somers <asomers@FreeBSD.org>
Sponsored by:	Axcient
Closes openzfs#13997

* Fix theoretical array overflow in lua_typename()

Out of the 12 defects in lua that coverity reports, 5 of them involve
`lua_typename()` and out of the dozens of defects in ZFS that lua
reports, 3 of them involve `lua_typename()` due to the ZCP code. Given
all of the uses of `lua_typename()` in the ZCP code, I was surprised
that there were not more. It appears that only 2 were reported because
only 3 called `lua_type()`, which does a defective sanity check that
allows invalid types to be passed.

lua/lua@d4fb848 addressed this in
upstream lua 5.3. Unfortunately, we did not get that fix since we use
lua 5.2 and we do not have assertions enabled in lua, so the upstream
solution would not do anything.

While we could adopt the upstream solution and enable assertions, a
simpler solution is to fix the issue by making `lua_typename()` return
`internal_type_error` whenever it is called with an invalid type. This
avoids the array overflow and if we ever see it appear somewhere, we
will know there is a problem with the lua interpreter.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes openzfs#13947

* Linux compat: fix DECLARE_EVENT_CLASS() test when ZFS is built-in

ZFS_LINUX_TRY_COMPILE_HEADER macro doesn't take CONFIG_ZFS=y into
account. As a result, on several latest Linux versions, configure
script marks DECLARE_EVENT_CLASS() available for non-GPL when ZFS
is being built as a module, but marks it unavailable when ZFS is
built-in.
Follow the logic of the neighbor macros and adjust
ZFS_LINUX_TRY_COMPILE_HEADER accordingly, so that it doesn't try
to look for a .ko when ZFS is built-in.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes openzfs#14006

* Fix declarations of non-global variables

This patch inserts the `static` keyword to non-global variables,
which where found by the analysis tool smatch.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes openzfs#13970

* Coverity model file update

Upon review, it was found that the model for malloc() was incorrect.

In addition, several general purpose memory allocation functions were
missing models:

 * kmem_vasprintf()
 * kmem_asprintf()
 * kmem_strdup()
 * kmem_strfree()
 * spl_vmem_alloc()
 * spl_vmem_zalloc()
 * spl_vmem_free()
 * calloc()

As an experiment to try to find more bugs, some less than general
purpose memory allocation functions were also given models:

 * zfsvfs_create()
 * zfsvfs_free()
 * nvlist_alloc()
 * nvlist_dup()
 * nvlist_free()
 * nvlist_pack()
 * nvlist_unpack()

Finally, the models were improved using additional coverity primitives:

 * __coverity_negative_sink__()
 * __coverity_writeall0__()
 * __coverity_mark_as_uninitialized_buffer__()
 * __coverity_mark_as_afm_allocated__()

In addition, an attempt to inform coverity that certain modelled
functions read entire buffers was used by adding the following to
certain models:

int first = buf[0];
int last = buf[buflen-1];

It was inspired by the QEMU model file.

No additional false positives were found by this, but it is believed
that the more accurate model file will help to catch false positives in
the future.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes openzfs#14048

* Linux 6.1 compat: change order of sys/mutex.h includes

After Linux 6.1-rc1 came out, the build started failing to build a
couple of the files in the linux spl code due to the mutex_init
redefinition. Moving the sys/mutex.h include to a lower position within
these two files appears to fix the problem.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes openzfs#14040

* ZED: Fix uninitialized value reads

Coverity complained about a couple of uninitialized value reads in ZED.

 * zfs_deliver_dle() can pass an uninitialized string to zed_log_msg()
 * An uninitialized sev.sigev_signo is passed to timer_create()

The former would log garbage while the latter is not a real issue, but
we might as well suppress it by initializing the field to 0 for
consistency's sake.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes openzfs#14047

* Fix NULL pointer dereference in zdb

Clang's static analyzer complained that we dereference a NULL pointer in
dump_path() if we return 0 when there is an error.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes openzfs#14044

* fm_fmri_hc_create() must call va_end() before returning

clang-tidy caught this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes openzfs#14044

* Fix NULL pointer passed to strlcpy from zap_lookup_impl()

Clang's static analyzer pointed out that whenever zap_lookup_by_dnode()
is called, we have the following stack where strlcpy() is passed a NULL
pointer for realname from zap_lookup_by_dnode():

strlcpy()
zap_lookup_impl()
zap_lookup_norm_by_dnode()
zap_lookup_by_dnode()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes openzfs#14044

* Fix NULL pointer dereference in spa_open_common()

Calling spa_open() will pass a NULL pointer to spa_open_common()'s
config parameter. Under the right circumstances, we will dereference the
config parameter without doing a NULL check.

Clang's static analyzer found this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes openzfs#14044

* set_global_var() should not pass NULL pointers to dlclose()

Both Coverity and Clang's static analyzer caught this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes openzfs#14044

* Fix possible NULL pointer dereference in sha2_mac_init()

If mechanism->cm_param is NULL, passing mechanism to
PROV_SHA2_GET_DIGEST_LEN() will dereference a NULL pointer.

Coverity reported this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes openzfs#14044

* Cleanup: Simplify userspace abd_free_chunks()

Clang's static analyzer complained that we could use after free here if
the inner loop ever iterated. That is a false positive, but upon
inspection, the userland abd_alloc_chunks() function never will put
multiple consecutive pages into a `struct scatterlist`, so there is no
need to loop. We delete the inner loop.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes openzfs#14042

* Cleanup: Delete unnecessary pointer check from vdev_to_nvlist_iter()

This confused Clang's static analyzer, making it think there was a
possible NULL pointer dereference. There is no NULL pointer dereference.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes openzfs#14042

* Cleanup: metaslab_alloc_dva() should not NULL check mg->mg_next

This is a circularly linked list. mg->mg_next can never be NULL.

This caused 3 defect reports in Coverity.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes openzfs#14042

* Cleanup: zvol_add_clones() should not NULL check dp

It is never NULL because we return early if dsl_pool_hold() fails.

This caused Coverity to complain.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes openzfs#14042

* Cleanup: Delete dead code from send_merge_thread()

range is always deferenced before it reaches this check, such that the
kmem_zalloc() call is never executed.

There is also no need to set `range->eos_marker = B_TRUE` because it is
already set.

Coverity incorrectly complained about a potential NULL pointer
dereference because of this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes openzfs#14042

* Cleanup: Remove NULL pointer check from dmu_send_impl()

The pointer is to a structure member, so it is never NULL.

Coverity complained about this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes openzfs#14042

* Fix memory leaks in dmu_send()/dmu_send_obj()

If we encounter an EXDEV error when using the redacted snapshots
feature, the memory used by dspp.fromredactsnaps is leaked.

Clang's static analyzer caught this during an experiment in which I had
annotated various headers in an attempt to improve the results of static
analysis.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes openzfs#13973

* Support idmapped mount

Adds support for idmapped mounts.  Supported as of Linux 5.12 this 
functionality allows user and group IDs to be remapped without changing 
their state on disk.  This can be useful for portable home directories
and a variety of container related use cases.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes openzfs#12923
Closes openzfs#13671

* Fix sequential resilver drive failure race condition

This patch handles the race condition on simultaneous failure of
2 drives, which misses the vdev_rebuild_reset_wanted signal in
vdev_rebuild_thread. We retry to catch this inside the
vdev_rebuild_complete_sync function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Samuel Wycliffe J <samwyc@hpe.com>
Closes openzfs#14041
Closes openzfs#14050

* Add options to zfs redundant_metadata property

Currently, additional/extra copies are created for metadata in
addition to the redundancy provided by the pool(mirror/raidz/draid),
due to this 2 times more space is utilized per inode and this decreases
the total number of inodes that can be created in the filesystem. By
setting redundant_metadata to none, no additional copies of metadata
are created, hence can reduce the space consumed by the additional
metadata copies and increase the total number of inodes that can be
created in the filesystem.  Additionally, this can improve file create
performance due to the reduced amount of metadata which needs
to be written.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com>
Signed-off-by: Akash B <akash-b@hpe.com>
Closes openzfs#13680

* Fix userland memory leak in zfs_do_send()

Clang 15's static analyzer caught this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes openzfs#14045

* Fix theoretical use of uninitialized values

Clang's static analyzer complains about this.

In get_configs(), if we have an invalid configuration that has no top
level vdevs, we can read a couple of uninitialized variables. Aborting
upon seeing this would break the userland tools for healthy pools, so we
instead initialize the two variables to 0 to allow the userland tools to
continue functioning for the pools with valid configurations.

In zfs_do_wait(), if no wait activities are enabled, we read an
uninitialized error variable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes openzfs#14043

* Silence static analyzer warnings about spa_sync_props()

Both Coverity and Clang's static analyzer complain about reading an
uninitialized intval if the property is not passed as DATA_TYPE_UINT64
in the nvlist. This is impossible becuase spa_prop_validate() already
checked this, but they are unlikely to be the last static analyzers to
complain about this, so lets just refactor the code to suppress the
warnings.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes openzfs#14043

* crypto_get_ptrs() should always write to *out_data_2

Callers will check if it has been set to NULL before trying to access
it, but never initialize it themselves. Whenever "one block spans two
iovecs", `crypto_get_ptrs()` will return, without ever setting
`*out_data_2 = NULL`. The caller will then do a NULL check against the
uninitailized pointer and if it is not zero, pass it to `memcpy()`.

The only reason this has not caused horrible runtime issues is because
`memcpy()` should be told to copy zero bytes when this happens. That
said, this is technically undefined behavior, so we should correct it so
that future changes to the code cannot trigger it.

Clang's static analyzer found this with the help of CodeChecker's CTU
analysis.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes openzfs#14043

* abd_return_buf() should call zfs_refcount_remove_many() early

Calling zfs_refcount_remove_many() after freeing memory means we pass a
reference to freed memory as the holder. This is not believed to be able
to cause a problem, but there is a bit of a tradition of fixing these
issues when they appear so that they do not obscure more serious issues
in static analyzer output, so we fix this one too.

Clang's static analyzer found this with the help of CodeChecker's CTU
analysis.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes openzfs#14043

* Add defensive assertion to vdev_queue_aggregate()

a6ccb36 had been intended to include
this to silence Coverity reports, but this one was missed by mistake.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes openzfs#14043

* Fix build failures

Co-authored-by: наб <nabijaczleweli@nabijaczleweli.xyz>
Co-authored-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Co-authored-by: ColMelvin <chris.lindee+github@gmail.com>
Co-authored-by: Christian Schwarz <me@cschwarz.com>
Co-authored-by: Alan Somers <asomers@FreeBSD.org>
Co-authored-by: Alexander <solbjorn@users.noreply.github.com>
Co-authored-by: Tino Reichardt <milky-zfs@mcmilk.de>
Co-authored-by: Coleman Kane <ckane@colemankane.org>
Co-authored-by: youzhongyang <youzhong@gmail.com>
Co-authored-by: samwyc <115969550+samwyc@users.noreply.github.com>
Co-authored-by: Akash B <akash-b@hpe.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants