Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow the result of unsafeCreate to be unboxed #580

Merged
merged 8 commits into from
Jun 7, 2023

Conversation

clyring
Copy link
Member

@clyring clyring commented Apr 3, 2023

By removing the lazy from the implementation of unsafeDupablePerformIO, this patch allows the simplifier to unbox intermediate StrictByteStrings in many more situations.

This improved freedom for the simplifier comes at a price: It creates opportunities for the simplifier to perform reads at the newly-unboxed addresses before the buffer has actually been initialized. To prevent this from causing problems, the deferForeignPtrAvailability function is introduced. (I'm not entirely sure this provides enough protection in a multi-threaded context on all architectures. Maybe @bgamari can comment? But it's at least no more unsafe than the ShortByteString stuff already is.)

Supersedes #466.

Copy link
Contributor

@Bodigrim Bodigrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to write a test for Core using tasty-inspection-testing?

Data/ByteString/Internal/Type.hs Outdated Show resolved Hide resolved
@clyring
Copy link
Member Author

clyring commented Apr 4, 2023

Is it possible to write a test for Core using tasty-inspection-testing?

I don't have any experience working with it yet, but I'd guess so. The expectations we have about unboxing will depend on the ghc version used. runRW# didn't get special CPR handling until around ghc-9.0 or 9.2.

@Bodigrim
Copy link
Contributor

Bodigrim commented Apr 4, 2023

https://github.com/Bodigrim/tasty-inspection-testing#readme contains some examples how to write inspection tests. It's fine to limit them to GHC 9.0+ only.

@clyring
Copy link
Member Author

clyring commented Apr 10, 2023

Cabal isn't happy when I add tasty-inspection-testing to the testsuite dependencies, because ghc depends on its own version of bytestring.

@Bodigrim
Copy link
Contributor

Ah, right, I had a good experience with tasty-inspection-testing with text, but that's because ghc does not depend on it. But it does depend on bytestring.

Can you describe what kind of manual test can check that result of unsafeCreate is unboxed?

@Bodigrim
Copy link
Contributor

@clyring could you please rebase to get a clean CI run?

(# s1, addr1# #) -> (# s1, ForeignPtr addr1# guts #)

unsafeDupablePerformIO :: IO a -> a
-- Why does this exist? As of base-4.18.0.0, the version of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change first appear in base-4.18?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There has never been good unboxing through unsafeDupablePerformIO:

  • Before 9.0.1, runRW# is too opaque for ghc to ever unbox the result of unsafeDupablePerformIO.
  • In ghc-9.0.1/base-4.15.0.0, ghc starts pushing strict contexts into runRW# and can therefore sometimes unbox the result of an unsafeDupablePerformIO call if it gets inlined into a strict context that discards the box. But, the CPR analysis used for worker/wrapper does not know about runRW# so this unboxing happens only through inlining, and not through worker/wrapper.
  • In all existing later versions (including 9.0.2 and 9.2.1 through 9.6.2),
    • CPR analysis knows that the result of a runRW# call can be unboxed, so worker/wrapper improves in the presence of runRW#, but
    • the definition of unsafeDupablePerformIO in base gets a lazy which prevents unboxing from happening for this function.

@clyring clyring added this to the 0.11.5.0 milestone Jun 5, 2023
@Bodigrim
Copy link
Contributor

Bodigrim commented Jun 5, 2023

Benchmarks of strict scanl / scanr look surprisingly off:

$ compare_benches 6641404 clyring/unboxable-creation -t 50 -p strict.scan
<skipped>
All
  folds
    strict
      scanl
        1:     OK (1.96s)
          14.0 ns ± 806 ps,       same as baseline
        2:     OK (1.99s)
          14.5 ns ± 1.1 ns, 10% more than baseline
        4:     OK (2.05s)
          14.8 ns ± 802 ps,  8% more than baseline
        8:     OK (2.35s)
          17.1 ns ± 896 ps, 15% more than baseline
        16:    OK (1.47s)
          21.1 ns ± 1.6 ns, 19% more than baseline
        32:    OK (2.08s)
          30.2 ns ± 1.6 ns, 29% more than baseline
        64:    OK (1.86s)
          54.2 ns ± 3.4 ns, 48% more than baseline
        128:   OK (1.63s)
          94.3 ns ± 6.7 ns, 23% more than baseline
        256:   OK (1.48s)
          171  ns ±  15 ns, 12% more than baseline
        512:   OK (1.41s)
          325  ns ±  27 ns,       same as baseline
        1024:  OK (1.37s)
          634  ns ±  53 ns,       same as baseline
        2048:  OK (1.36s)
          1.26 μs ± 107 ns,       same as baseline
        4096:  OK (2.90s)
          2.76 μs ± 129 ns,       same as baseline
        8192:  OK (2.78s)
          5.25 μs ± 233 ns,       same as baseline
        16384: OK (1.34s)
          10.0 μs ± 861 ns,       same as baseline
        32768: OK (1.32s)
          19.5 μs ± 1.9 μs,       same as baseline
        65536: OK (1.31s)
          38.8 μs ± 3.5 μs,       same as baseline
      scanr
        1:     OK (1.96s)
          14.2 ns ± 808 ps, 10% more than baseline
        2:     OK (2.01s)
          14.6 ns ± 846 ps, 11% more than baseline
        4:     OK (2.24s)
          16.3 ns ± 850 ps, 18% more than baseline
        8:     OK (1.31s)
          18.8 ns ± 1.6 ns, 24% more than baseline
        16:    OK (1.76s)
          25.6 ns ± 1.7 ns, 39% more than baseline
        32:    OK (1.37s)
          39.4 ns ± 3.3 ns, 61% more than baseline
        64:    OK (1.27s)
          73.2 ns ± 6.8 ns, 94% more than baseline
        128:   OK (2.25s)
          131  ns ± 7.5 ns, 70% more than baseline
        256:   OK (2.11s)
          246  ns ±  14 ns, 61% more than baseline
        512:   OK (2.04s)
          476  ns ±  27 ns, 54% more than baseline
        1024:  OK (2.01s)
          935  ns ±  52 ns, 51% more than baseline
        2048:  OK (2.05s)
          1.93 μs ± 136 ns, 45% more than baseline
        4096:  OK (2.07s)
          3.91 μs ± 309 ns, 40% more than baseline
        8192:  OK (2.01s)
          7.54 μs ± 523 ns, 46% more than baseline
        16384: OK (1.96s)
          14.6 μs ± 857 ns, 49% more than baseline
        32768: OK (1.94s)
          29.0 μs ± 1.8 μs, 49% more than baseline
        65536: OK (1.94s)
          58.0 μs ± 3.5 μs, 49% more than baseline

compare_benches comes from https://github.com/Bodigrim/tasty-bench/blob/master/compare_benches.sh

@clyring
Copy link
Member Author

clyring commented Jun 7, 2023

@Bodigrim I have tried and failed to reproduce any scanl/scanr regressions on my machine; I've tried with all of ghc 9.2.8, 9.4.5, and 9.6.2. Are these consistent for you?

@Bodigrim
Copy link
Contributor

Bodigrim commented Jun 7, 2023

The measurements above are for GHC 9.6.1 + aarch64. Indeed there are no regressions for GHC 9.2.7 or GHC 9.4.5. I tried GHC 9.6.2 and it seems to fix the issue: I still see some changes for short bytestrings, but no changes for longer ones, so it's likely to be just noise.

LGTM!

@Bodigrim Bodigrim merged commit 8d296b7 into haskell:master Jun 7, 2023
clyring added a commit that referenced this pull request Jun 13, 2023
* Allow the result of unsafeCreate to be unboxed

* Fix build with old versions of ghc

* Add hackage source link for referenced Note

* Improvement documentation for the new functions

* Publicly export deferForeignPtrAvailability

* Add convenience function `mkDeferredByteString`

* remove extra '
@clyring clyring mentioned this pull request Jun 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants