Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve memory-safety of Semigroup instances #443

Merged
merged 17 commits into from
Feb 28, 2022

Conversation

clyring
Copy link
Member

@clyring clyring commented Nov 28, 2021

After making this comment, I was a bit disappointed to discover that the existing Semigroup instances for StrictByteString and ShortByteString were also memory-unsafe in the presence of Int overflow. Here's a quick pass adding some overflow checks where they are necessary, along with a couple of other related small improvements.

@Bodigrim
Copy link
Contributor

Is it possible to add regression tests? At least for 32-bit platforms.

@clyring
Copy link
Member Author

clyring commented Nov 29, 2021

To some extent, sure. I could evaluate and catch expected-exceptions from (potentially-tricky) overflow-nonsense-values inside an IO property. That seems a bit strange, though. Is there a more natural approach for this that I'm ignorant of? Should I write some?

@Bodigrim
Copy link
Contributor

@clyring yes, via ioProperty, e. g.,

prop = ioProperty $ do
  t <- try (evaluate foo)
  pure $ isLeft t 

@Bodigrim Bodigrim added this to the 0.12.0.0 milestone Dec 4, 2021
Copy link
Contributor

@Bodigrim Bodigrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for delay, been busy with 0.11.2.0 release.

Data/ByteString/Builder/Internal.hs Outdated Show resolved Hide resolved
Data/ByteString/Builder/Internal.hs Outdated Show resolved Hide resolved
Data/ByteString/Internal.hs Outdated Show resolved Hide resolved
Data/ByteString/Internal.hs Outdated Show resolved Hide resolved
tests/Properties.hs Outdated Show resolved Hide resolved
Data/ByteString/Internal.hs Outdated Show resolved Hide resolved
@clyring
Copy link
Member Author

clyring commented Dec 12, 2021

Looking at the Core, it seems the rewrite rules on checkedToInt don't yet achieve the desired result even if I add SPECIALIZE pragmas to times. And of course a few cosmetic changes are wanted. But is this direction appropriate?

@clyring
Copy link
Member Author

clyring commented Dec 14, 2021

I've reported the build failure with 9.2.1 at GHC's issue tracker.

@Bodigrim
Copy link
Contributor

@clyring any plans to resume the work here?

@clyring
Copy link
Member Author

clyring commented Dec 18, 2021

I'll try to rebase and clean up a bit more tomorrow.

Copy link
Member

@sjakobi sjakobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty confused about the motivation for all these changes.

I think it would be best if you'd open proper bug reports on the issue tracker that you can reference from commit messages and comments.

Data/ByteString/Builder/Internal.hs Outdated Show resolved Hide resolved
Data/ByteString/Internal.hs Outdated Show resolved Hide resolved
Data/ByteString/Internal.hs Outdated Show resolved Hide resolved
Data/ByteString/Internal.hs Outdated Show resolved Hide resolved
Data/ByteString/Internal.hs Outdated Show resolved Hide resolved
Data/ByteString/Short/Internal.hs Outdated Show resolved Hide resolved
tests/Properties.hs Outdated Show resolved Hide resolved
@sjakobi
Copy link
Member

sjakobi commented Dec 20, 2021

I also think that this PR is getting a bit too large to review easily. If it gets any larger, maybe split it into one PR per instance.

@clyring
Copy link
Member Author

clyring commented Dec 20, 2021

I'm pretty confused about the motivation for all these changes.

I think it would be best if you'd open proper bug reports on the issue tracker that you can reference from commit messages and comments.

When I opened this pull request, there was one commit making a large handful of near-trivial changes, for which a separate issue would hardly be smaller than the patch, whose motivation is adequately summarized by its commit title. But inject tests and a desire to not allocate Integers unnecessarily and suddenly it becomes a bit intimidating. Funny how things happen that way... :-)

I also think that this PR is getting a bit too large to review easily. If it gets any larger, maybe split it into one PR per instance.

I agree that this PR is getting a bit large. It's already only substantially touching one instance; if the current insubstantial tweaks to the others are too distracting I can bundle them with a later PR.

@sjakobi
Copy link
Member

sjakobi commented Dec 20, 2021

I do think that a bug report would be very helpful. Memory safety isn't a topic that I'm encountering very frequently in Haskell code. I hope a bug report would make it clear what precisely the problem with the current code is. This will also be useful for communicating this change to bytestring users.

To simplify the review, I think having one PR per instance would be easiest.

@Bodigrim
Copy link
Contributor

@clyring @sjakobi what's the status of this?

@sjakobi
Copy link
Member

sjakobi commented Feb 17, 2022

Apologies for my inactivity on this PR.

Unfortunately my understanding of the problem that this PR is intended to solve is still very fuzzy. I'd like to know:

  • Which functions are memory-unsafe currently? Is this limited to specific implementations of Semigroup methods?
  • How does the memory-unsafety manifest, i.e. what happens when a size computation overflows?
  • Do all instances affect both 32-bit and 64-bit platforms?

@clyring Could you please write this up on the issue tracker? I think a clearer exposition of the problem will motivate us to find a solution and help guide us to the right trade-off between memory safety, performance and code complexity.

For now I have two ideas for reducing complexity:

  • Limiting checks to 32-bit platforms.
  • Limiting checks to recent GHCs.

Once we have a clearer understanding of the problem we can discuss whether and where these measures might make sense.

@clyring
Copy link
Member Author

clyring commented Feb 18, 2022

The merge conflicts were trivial; I have resolved them.

@clyring Could you please write this up on the issue tracker? I think a clearer exposition of the problem will motivate us to find a solution and help guide us to the right trade-off between memory safety, performance and code complexity.

This already exists an issue on the tracker for this: #456

For now I have two ideas for reducing complexity:

* Limiting checks to 32-bit platforms.

* Limiting checks to recent GHCs.
  • None of the checks are different between 32- and 64-bit platforms. Some of the testing code is already skipped on 64-bit platforms; specifically, the ones that would need to allocate and traverse multi-gigabyte data on 64-bit systems.
  • I'm not attached to the GHC-8.x compatibility CPP in the current version of this pull request. I'd personally be more inclined to just use the fallback implementation even on 9.x than to remove the checks for 8.x. It's likely only one or two CPU cycles slower and two or thee instructions larger on most architectures, which is pretty negligible. (If not for 8.x compatibility, checkedToInt might also just look at Integer constructors instead of allocating and comparing with a second Integer object.)

I'm somewhat unhappy with the current implementation's need for ugly inline hacks and the Note [stimes negativity checks], and am for that reason tempted to rewrite checkedToInt to return explicitly Overflow or Underflow instead of an uninformative Nothing. And then the checked arithmetic stuff becomes substantial enough that I'm tempted to move it into its own module.

Copy link
Member

@sjakobi sjakobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the perf issues, my suggestion is to wait and see whether the WIP improvements in GHC (https://gitlab.haskell.org/ghc/ghc/-/issues/20361) will take care of this. We can record a reminder on our issue tracker.

Data/ByteString/Internal.hs Outdated Show resolved Hide resolved
Data/ByteString/Internal.hs Outdated Show resolved Hide resolved
Its ByteString argument is also marked as strict. This doesn't change
the optimized code produced by GHC, but allows it to be consistent
with the published imprecise exception semantics.
Copy link
Member

@sjakobi sjakobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@clyring
Copy link
Member Author

clyring commented Feb 28, 2022

Amusingly, the intermediate-Integer sub-issue does not represent a regression with respect to master, because times does not seem to specialize at all unless I add an INLINABLE pragma. I'm willing to hold off on re-inserting the rewrite rules in anticipation of expected improvements to Integer-related simplification in GHC 9.4.

@Bodigrim Bodigrim merged commit 1256378 into haskell:master Feb 28, 2022
@Bodigrim
Copy link
Contributor

Thanks @clyring!

@Bodigrim Bodigrim linked an issue Feb 28, 2022 that may be closed by this pull request
@sjakobi
Copy link
Member

sjakobi commented Mar 1, 2022

Regarding the perf issues, my suggestion is to wait and see whether the WIP improvements in GHC (https://gitlab.haskell.org/ghc/ghc/-/issues/20361) will take care of this. We can record a reminder on our issue tracker.

I have opened #493 as a reminder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory-unsafety in Semigroup and Monoid instances
3 participants