Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce bloat in ereport.fs.zfs.checksum events #15052

Closed
wants to merge 2 commits into from

Conversation

asomers
Copy link
Contributor

@asomers asomers commented Jul 11, 2023

Fixes #14717

Motivation and Context

On FreeBSD, a kernel event must fit within a 1016 byte buffer. ZFS checksum events don't, so they never reach userland.

Description

Remove four not-very-useful fields from the events to reduce their size:

  1. Remove the histogram fields. They were originally intended to be used with parallel ATA and SCSI busses, which are obsolete. They probably aren't of any use whatsoever on modern hardware. This is the first commit.
  2. Remove the cksum_actual and cksum_expected fields. With modern checksum functions, they don't really convey any information other than "they don't match". This is the second commit.

How Has This Been Tested?

Tested with FreeBSD's zfsd test suite.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Jul 13, 2023
@asomers
Copy link
Contributor Author

asomers commented Jul 14, 2023

cc @rincebrain

@rincebrain
Copy link
Contributor

I'm a little sad about the cksum_actual and cksum_expected fields, but needs must when the devil drives.

How does this work for backward compat? Wouldn't this cause horrible mangling just removing the fields outright and having mixed kernel/userland?

@asomers
Copy link
Contributor Author

asomers commented Jul 14, 2023

I don't think there's any backwards-compatibility problem, because AFAIK nothing in userland ever reads those fields. And they aren't crammed into fixed-size structs, so there's no serialization risk. At least, on FreeBSD they're serialized into ASCII strings.

@rincebrain
Copy link
Contributor

I absolutely know nothing about how the fmreport events get passed around, it was just my immediate question on changing the structure.

Even if it's not passing a fixed struct around, though, how does old userland handle getting events with those missing? I suppose if it didn't always have them it already had to...

Seems fine to me.

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slimming these down seems reasonable to me. Can you just rebase this to resolve the conflict.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Jul 20, 2023
The checksum histograms were intended to be used with ATA and parallel
SCSI, which are obsolete.  With modern storage hardware, they will
almost always look like white noise; all bits will be wrong.  They only
serve to bloat the event.  That's a particular problem on FreeBSD, where
events must fit into a 1016 byte buffer.

This fixes issue openzfs#14717 for RAIDZ pools, but not for mirror pools.

Sponsored-by:	Axcient
Signed-off-by:	Alan Somers <asomers@gmail.com>
With anything but fletcher-4, even a tiny change in the input will cause
the checksum value to change completely.  So knowing the actual and
expected checksums doesn't provide much more information than "they
don't match".  The harm in sending them is simply that they bloat the
event.  In particular, on FreeBSD the event must fit into a 1016 byte
buffer.

Fixes openzfs#14717 for mirrored pools.

Sponsored-by:   Axcient
Signed-off-by:  Alan Somers <asomers@gmail.com>
@asomers
Copy link
Contributor Author

asomers commented Jul 20, 2023

rebased.

behlendorf pushed a commit that referenced this pull request Jul 21, 2023
With anything but fletcher-4, even a tiny change in the input will cause
the checksum value to change completely.  So knowing the actual and
expected checksums doesn't provide much more information than "they
don't match".  The harm in sending them is simply that they bloat the
event.  In particular, on FreeBSD the event must fit into a 1016 byte
buffer.

Fixes #14717 for mirrored pools.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
Sponsored-by: Axcient
Closes #14717
Closes #15052
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Jul 21, 2023
The checksum histograms were intended to be used with ATA and parallel
SCSI, which are obsolete.  With modern storage hardware, they will
almost always look like white noise; all bits will be wrong.  They only
serve to bloat the event.  That's a particular problem on FreeBSD, where
events must fit into a 1016 byte buffer.

This fixes issue openzfs#14717 for RAIDZ pools, but not for mirror pools.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
Sponsored-by: Axcient
Closes openzfs#15052
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Jul 21, 2023
With anything but fletcher-4, even a tiny change in the input will cause
the checksum value to change completely.  So knowing the actual and
expected checksums doesn't provide much more information than "they
don't match".  The harm in sending them is simply that they bloat the
event.  In particular, on FreeBSD the event must fit into a 1016 byte
buffer.

Fixes openzfs#14717 for mirrored pools.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
Sponsored-by: Axcient
Closes openzfs#14717
Closes openzfs#15052
behlendorf pushed a commit that referenced this pull request Jul 21, 2023
The checksum histograms were intended to be used with ATA and parallel
SCSI, which are obsolete.  With modern storage hardware, they will
almost always look like white noise; all bits will be wrong.  They only
serve to bloat the event.  That's a particular problem on FreeBSD, where
events must fit into a 1016 byte buffer.

This fixes issue #14717 for RAIDZ pools, but not for mirror pools.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
Sponsored-by: Axcient
Closes #15052
behlendorf pushed a commit that referenced this pull request Jul 21, 2023
With anything but fletcher-4, even a tiny change in the input will cause
the checksum value to change completely.  So knowing the actual and
expected checksums doesn't provide much more information than "they
don't match".  The harm in sending them is simply that they bloat the
event.  In particular, on FreeBSD the event must fit into a 1016 byte
buffer.

Fixes #14717 for mirrored pools.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
Sponsored-by: Axcient
Closes #14717
Closes #15052
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Dec 12, 2023
The checksum histograms were intended to be used with ATA and parallel
SCSI, which are obsolete.  With modern storage hardware, they will
almost always look like white noise; all bits will be wrong.  They only
serve to bloat the event.  That's a particular problem on FreeBSD, where
events must fit into a 1016 byte buffer.

This fixes issue openzfs#14717 for RAIDZ pools, but not for mirror pools.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
Sponsored-by: Axcient
Closes openzfs#15052
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Dec 12, 2023
With anything but fletcher-4, even a tiny change in the input will cause
the checksum value to change completely.  So knowing the actual and
expected checksums doesn't provide much more information than "they
don't match".  The harm in sending them is simply that they bloat the
event.  In particular, on FreeBSD the event must fit into a 1016 byte
buffer.

Fixes openzfs#14717 for mirrored pools.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
Sponsored-by: Axcient
Closes openzfs#14717
Closes openzfs#15052
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Posting checksum events fails with ENOMEM on FreeBSD
3 participants