Allow the result of unsafeCreate to be unboxed #580

clyring · 2023-04-03T01:49:51Z

By removing the lazy from the implementation of unsafeDupablePerformIO, this patch allows the simplifier to unbox intermediate StrictByteStrings in many more situations.

This improved freedom for the simplifier comes at a price: It creates opportunities for the simplifier to perform reads at the newly-unboxed addresses before the buffer has actually been initialized. To prevent this from causing problems, the deferForeignPtrAvailability function is introduced. (I'm not entirely sure this provides enough protection in a multi-threaded context on all architectures. Maybe @bgamari can comment? But it's at least no more unsafe than the ShortByteString stuff already is.)

Supersedes #466.

Bodigrim

Is it possible to write a test for Core using tasty-inspection-testing?

Data/ByteString/Internal/Type.hs

clyring · 2023-04-04T00:52:24Z

Is it possible to write a test for Core using tasty-inspection-testing?

I don't have any experience working with it yet, but I'd guess so. The expectations we have about unboxing will depend on the ghc version used. runRW# didn't get special CPR handling until around ghc-9.0 or 9.2.

Bodigrim · 2023-04-04T22:21:49Z

https://github.com/Bodigrim/tasty-inspection-testing#readme contains some examples how to write inspection tests. It's fine to limit them to GHC 9.0+ only.

clyring · 2023-04-10T11:24:59Z

Cabal isn't happy when I add tasty-inspection-testing to the testsuite dependencies, because ghc depends on its own version of bytestring.

Bodigrim · 2023-04-10T11:32:29Z

Ah, right, I had a good experience with tasty-inspection-testing with text, but that's because ghc does not depend on it. But it does depend on bytestring.

Can you describe what kind of manual test can check that result of unsafeCreate is unboxed?

Bodigrim · 2023-05-20T09:47:32Z

@clyring could you please rebase to get a clean CI run?

Bodigrim · 2023-06-04T19:05:47Z

Data/ByteString/Internal/Type.hs

+    (# s1, addr1# #) -> (# s1, ForeignPtr addr1# guts #)
+
+unsafeDupablePerformIO :: IO a -> a
+-- Why does this exist? As of base-4.18.0.0, the version of


Does this change first appear in base-4.18?

There has never been good unboxing through unsafeDupablePerformIO:

Before 9.0.1, runRW# is too opaque for ghc to ever unbox the result of unsafeDupablePerformIO.

In ghc-9.0.1/base-4.15.0.0, ghc starts pushing strict contexts into runRW# and can therefore sometimes unbox the result of an unsafeDupablePerformIO call if it gets inlined into a strict context that discards the box. But, the CPR analysis used for worker/wrapper does not know about runRW# so this unboxing happens only through inlining, and not through worker/wrapper.

In all existing later versions (including 9.0.2 and 9.2.1 through 9.6.2),

CPR analysis knows that the result of a runRW# call can be unboxed, so worker/wrapper improves in the presence of runRW#, but

the definition of unsafeDupablePerformIO in base gets a lazy which prevents unboxing from happening for this function.

Bodigrim · 2023-06-05T19:38:13Z

Benchmarks of strict scanl / scanr look surprisingly off:

$ compare_benches 6641404 clyring/unboxable-creation -t 50 -p strict.scan
<skipped>
All
  folds
    strict
      scanl
        1:     OK (1.96s)
          14.0 ns ± 806 ps,       same as baseline
        2:     OK (1.99s)
          14.5 ns ± 1.1 ns, 10% more than baseline
        4:     OK (2.05s)
          14.8 ns ± 802 ps,  8% more than baseline
        8:     OK (2.35s)
          17.1 ns ± 896 ps, 15% more than baseline
        16:    OK (1.47s)
          21.1 ns ± 1.6 ns, 19% more than baseline
        32:    OK (2.08s)
          30.2 ns ± 1.6 ns, 29% more than baseline
        64:    OK (1.86s)
          54.2 ns ± 3.4 ns, 48% more than baseline
        128:   OK (1.63s)
          94.3 ns ± 6.7 ns, 23% more than baseline
        256:   OK (1.48s)
          171  ns ±  15 ns, 12% more than baseline
        512:   OK (1.41s)
          325  ns ±  27 ns,       same as baseline
        1024:  OK (1.37s)
          634  ns ±  53 ns,       same as baseline
        2048:  OK (1.36s)
          1.26 μs ± 107 ns,       same as baseline
        4096:  OK (2.90s)
          2.76 μs ± 129 ns,       same as baseline
        8192:  OK (2.78s)
          5.25 μs ± 233 ns,       same as baseline
        16384: OK (1.34s)
          10.0 μs ± 861 ns,       same as baseline
        32768: OK (1.32s)
          19.5 μs ± 1.9 μs,       same as baseline
        65536: OK (1.31s)
          38.8 μs ± 3.5 μs,       same as baseline
      scanr
        1:     OK (1.96s)
          14.2 ns ± 808 ps, 10% more than baseline
        2:     OK (2.01s)
          14.6 ns ± 846 ps, 11% more than baseline
        4:     OK (2.24s)
          16.3 ns ± 850 ps, 18% more than baseline
        8:     OK (1.31s)
          18.8 ns ± 1.6 ns, 24% more than baseline
        16:    OK (1.76s)
          25.6 ns ± 1.7 ns, 39% more than baseline
        32:    OK (1.37s)
          39.4 ns ± 3.3 ns, 61% more than baseline
        64:    OK (1.27s)
          73.2 ns ± 6.8 ns, 94% more than baseline
        128:   OK (2.25s)
          131  ns ± 7.5 ns, 70% more than baseline
        256:   OK (2.11s)
          246  ns ±  14 ns, 61% more than baseline
        512:   OK (2.04s)
          476  ns ±  27 ns, 54% more than baseline
        1024:  OK (2.01s)
          935  ns ±  52 ns, 51% more than baseline
        2048:  OK (2.05s)
          1.93 μs ± 136 ns, 45% more than baseline
        4096:  OK (2.07s)
          3.91 μs ± 309 ns, 40% more than baseline
        8192:  OK (2.01s)
          7.54 μs ± 523 ns, 46% more than baseline
        16384: OK (1.96s)
          14.6 μs ± 857 ns, 49% more than baseline
        32768: OK (1.94s)
          29.0 μs ± 1.8 μs, 49% more than baseline
        65536: OK (1.94s)
          58.0 μs ± 3.5 μs, 49% more than baseline

compare_benches comes from https://github.com/Bodigrim/tasty-bench/blob/master/compare_benches.sh

clyring · 2023-06-07T15:23:43Z

@Bodigrim I have tried and failed to reproduce any scanl/scanr regressions on my machine; I've tried with all of ghc 9.2.8, 9.4.5, and 9.6.2. Are these consistent for you?

Bodigrim · 2023-06-07T19:03:38Z

The measurements above are for GHC 9.6.1 + aarch64. Indeed there are no regressions for GHC 9.2.7 or GHC 9.4.5. I tried GHC 9.6.2 and it seems to fix the issue: I still see some changes for short bytestrings, but no changes for longer ones, so it's likely to be just noise.

LGTM!

* Allow the result of unsafeCreate to be unboxed * Fix build with old versions of ghc * Add hackage source link for referenced Note * Improvement documentation for the new functions * Publicly export deferForeignPtrAvailability * Add convenience function `mkDeferredByteString` * remove extra '

clyring added 2 commits April 2, 2023 21:32

Allow the result of unsafeCreate to be unboxed

ad383a7

Fix build with old versions of ghc

f9e207a

Bodigrim reviewed Apr 3, 2023

View reviewed changes

Data/ByteString/Internal/Type.hs Outdated Show resolved Hide resolved

clyring added 2 commits June 4, 2023 14:49

Merge branch 'master' into unboxable-creation

b788a85

Add hackage source link for referenced Note

839b807

Bodigrim reviewed Jun 4, 2023

View reviewed changes

clyring added 3 commits June 5, 2023 10:18

Improvement documentation for the new functions

e16bf7d

Publicly export deferForeignPtrAvailability

32bd127

Add convenience function mkDeferredByteString

548a522

clyring added this to the 0.11.5.0 milestone Jun 5, 2023

remove extra '

d2d173f

Bodigrim approved these changes Jun 7, 2023

View reviewed changes

Bodigrim merged commit 8d296b7 into haskell:master Jun 7, 2023

clyring mentioned this pull request Jun 13, 2023

Test that StrictByteString results can be unboxed #595

Open

clyring mentioned this pull request Jun 16, 2023

Unboxable creation #466

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow the result of unsafeCreate to be unboxed #580

Allow the result of unsafeCreate to be unboxed #580

clyring commented Apr 3, 2023

Bodigrim left a comment

clyring commented Apr 4, 2023

Bodigrim commented Apr 4, 2023

clyring commented Apr 10, 2023

Bodigrim commented Apr 10, 2023

Bodigrim commented May 20, 2023

Bodigrim Jun 4, 2023

clyring Jun 5, 2023

Bodigrim commented Jun 5, 2023

clyring commented Jun 7, 2023

Bodigrim commented Jun 7, 2023

Allow the result of unsafeCreate to be unboxed #580

Allow the result of unsafeCreate to be unboxed #580

Conversation

clyring commented Apr 3, 2023

Bodigrim left a comment

Choose a reason for hiding this comment

clyring commented Apr 4, 2023

Bodigrim commented Apr 4, 2023

clyring commented Apr 10, 2023

Bodigrim commented Apr 10, 2023

Bodigrim commented May 20, 2023

Bodigrim Jun 4, 2023

Choose a reason for hiding this comment

clyring Jun 5, 2023

Choose a reason for hiding this comment

Bodigrim commented Jun 5, 2023

clyring commented Jun 7, 2023

Bodigrim commented Jun 7, 2023