Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Isolate reverse in preparation for a pure-Haskell implementation #536

Merged
merged 1 commit into from
Oct 28, 2023

Conversation

chreekat
Copy link
Contributor

CC @hsyl20, @luite , and others whose handles I don't know (Jeff, Josh, ...)

To write programs with the JS backend, text's C bits need to be replaced, either with JavaScript bits or with pure Haskell. Sylvain suggested pure Haskell.

I'm opening this draft just to check if this is the direction you all expected a pure-Haskell implementation to go.

  1. Introduce a Cabal flag pure-haskell
  2. Introduce a CPP guard macro PURE_HASKELL that is defined if pure-haskell is true.
  3. Use the guard to decide whether to use the C bits or write a Haskell implementation.
  4. (not implemented) make pure-haskell default to True when the target platform is JS.

This method allows one to test the pure-Haskell implementation on the platform of their choice.

As a proof of concept, I have introduced a buggy version of reverse with a pure Haskell implementation. Run with cabal test -f pure-haskell to see it "in action"!

@phadej
Copy link
Contributor

phadej commented Aug 24, 2023

General comment:

isn't

(not implemented) make pure-haskell default to True when the target platform is JS.

already possible. Isn't there a way to test whether the backend is JS? wasn't it arch(js)? I'd expect to see

if arch(js) || flag(pure-haskell)

with flag existing mostly for testing purposes.

@phadej
Copy link
Contributor

phadej commented Aug 24, 2023

Similarly, you probably want arch(js) || flag(pure-haskell) to make simdutf flag a no-op. There are already conditionals for a lot of archs.

(AFAIK it's disabled for in-GHC-tree builds so you don't see it, but you can very easily see it outside - as it's enabled by default).

@phadej
Copy link
Contributor

phadej commented Aug 24, 2023

Oh dear, so cabal-install-3.10.1.0recognizes arch(wasm32) but doesn't recognize arch(js). EDIT: it recognizes arch(javascript).

Warning: Unknown architecture name 'js'

Sad. Weren't their initial version added to GHC at about the same time?

@phadej
Copy link
Contributor

phadej commented Aug 24, 2023

Scratch previous: there is arch(javascript) and it works in even older Cabals.

@chreekat
Copy link
Contributor Author

Thanks for comments @phadej .

isn't

(not implemented) make pure-haskell default to True when the target platform is JS.
already possible.

Yes indeed, I just hadn't done it yet for this PR.

I'll also look at the simdutf stuff!

@chreekat
Copy link
Contributor Author

@phadej is the new commit (fd2d9e1) ok?

@chreekat
Copy link
Contributor Author

reverse is now implemented. Looking forward to seeing the benchmarks... hm! cabal bench fails because of missing files. The directory benchmarks/text-test-data is missing.

@chreekat
Copy link
Contributor Author

Ok, on a hunch I looked in the readme and found the bit about cloning test data separately.

text.cabal Outdated Show resolved Hide resolved
Copy link
Contributor

@hsyl20 hsyl20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @chreekat!
Yes that's exactly what I think we should do. Thanks!

text.cabal Show resolved Hide resolved
text.cabal Outdated Show resolved Hide resolved
text.cabal Show resolved Hide resolved
@chreekat
Copy link
Contributor Author

chreekat commented Sep 1, 2023

@phadej is fd03e49 what you had in mind when you said to make simdutf a no-op?

I have to say I don't like it because it silently ignores user input, but it's not a hill I care to die on.

Copy link
Contributor Author

@chreekat chreekat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now almost completely ready.

  • The pure-haskell flag documents itself as "not fully implemented"
  • All flags are manual
  • Nothing is ready for consumption, so no change to the changelog
  • I know the JS team is committed to the JS backend, so I don't feel bad merging a partially-implemented feature even though I have little time to continue working on this (but I do still have time!)

There's just one thing left I don't know how to do! I don't want people to accidentally find Data.Text.Internal.Reverse.reverse on Hoogle and try to use it. They should continue using Data.Text.reverse. I don't know enough about Haddocks to do that. Does someone know how to do it?

cbits/measure_off.c Show resolved Hide resolved
@hsyl20
Copy link
Contributor

hsyl20 commented Sep 8, 2023

Good work!

There's just one thing left I don't know how to do! I don't want people to accidentally find Data.Text.Internal.Reverse.reverse on Hoogle and try to use it. They should continue using Data.Text.reverse. I don't know enough about Haddocks to do that. Does someone know how to do it?

Perhaps {-# OPTIONS_HADDOCK hide, prune #-} (cf https://haskell-haddock.readthedocs.io/en/latest/markup.html#module-attributes) but I've never tried it.

@chreekat
Copy link
Contributor Author

chreekat commented Sep 8, 2023

Yes, thanks, that sounds familiar. I'll have a look.

@phadej
Copy link
Contributor

phadej commented Sep 8, 2023

{-# OPTIONS_HADDOCK not-home #-} that's what other internal modules in text use.

The {-# OPTIONS_HADDOCK hide #-} is IMHO consumer-hostile. If you want to hide a module, hide it by having it in other-modules (which also have a benefit of not changing the external API).

@chreekat chreekat changed the title Draft: Isolate reverse in preparation for a pure-Haskell implementation Isolate reverse in preparation for a pure-Haskell implementation Sep 8, 2023
@chreekat
Copy link
Contributor Author

chreekat commented Sep 8, 2023

All finished now.

text.cabal Outdated Show resolved Hide resolved
@chreekat
Copy link
Contributor Author

chreekat commented Sep 9, 2023

Will respond to other comments tomorrow.

@chreekat
Copy link
Contributor Author

cabal-docspec doesn't appear to be available on my OS, so I'm just taking a stab at fixing it blindly.

@chreekat
Copy link
Contributor Author

Just waiting on the decision to push the module into other-modules and the result of the doctest now.

@chreekat
Copy link
Contributor Author

chreekat commented Sep 10, 2023

By the way, according to the microbenchmarks, the pure Haskell version is 2-100x slower than the C version:

--- fast.result	2023-09-10 22:24:18.873802814 +0300
+++ pure.result	2023-09-10 22:24:49.305166781 +0300
@@ -1,4 +1,4 @@
-cabal -j bench --benchmark-options="-p /reverse/"
+cabal --builddir=dist-pure -fpure-haskell -j bench --benchmark-options="-p /reverse/"
 Build profile: -w ghc-9.2.8 -O1
 In order, the following will be built (use -v for more details):
  - text-2.0.2 (bench:text-benchmarks) (first run)
@@ -10,40 +10,40 @@
   Pure
     tiny
       reverse
-        Text:     OK (0.28s)
-          15.7 ns ± 890 ps
-        LazyText: OK (0.12s)
-          26.8 ns ± 2.6 ns
+        Text:     OK (0.30s)
+          34.3 ns ± 1.9 ns
+        LazyText: OK (0.36s)
+          43.5 ns ± 1.7 ns
     ascii-small
       reverse
-        Text:     OK (0.65s)
-          36.4 μs ± 3.2 μs
-        LazyText: OK (0.65s)
-          37.3 μs ± 3.4 μs
+        Text:     OK (0.18s)
+          349  μs ±  24 μs
+        LazyText: OK (0.17s)
+          349  μs ±  29 μs
     ascii
       reverse
-        Text:     OK (0.62s)
-          29.6 ms ± 732 μs
-        LazyText: OK (0.17s)
-          31.3 ms ± 3.0 ms
+        Text:     OK (0.96s)
+          289  ms ±  17 ms
+        LazyText: OK (0.93s)
+          285  ms ± 3.6 ms
     english
       reverse
-        Text:     OK (0.60s)
-          2.06 ms ± 133 μs
-        LazyText: OK (0.16s)
-          2.14 ms ± 204 μs
+        Text:     OK (0.17s)
+          20.2 ms ± 1.5 ms
+        LazyText: OK (0.14s)
+          19.1 ms ± 1.6 ms
     russian
       reverse
-        Text:     OK (0.36s)
-          5.03 μs ± 195 ns
-        LazyText: OK (0.35s)
-          5.04 μs ± 209 ns
+        Text:     OK (0.18s)
+          44.2 μs ± 2.8 μs
+        LazyText: OK (0.18s)
+          44.3 μs ± 2.7 μs
     japanese
       reverse
-        Text:     OK (0.84s)
-          5.90 μs ± 414 ns
-        LazyText: OK (1.66s)
-          6.06 μs ±  69 ns
+        Text:     OK (0.88s)
+          25.8 μs ± 1.7 μs
+        LazyText: OK (0.89s)
+          25.9 μs ± 1.6 μs
 
-All 12 tests passed (7.03s)
+All 12 tests passed (6.15s)
 Benchmark text-benchmarks: FINISH

@Bodigrim
Copy link
Contributor

(@chreekat you can use --baseline to compare benchmarks, or compare_benches.sh)

@chreekat
Copy link
Contributor Author

Nice, I realize I wasn't reading the numbers properly last night. The actual speeddowns look like:

cabal --builddir=dist-pure -fpure-haskell -j bench --benchmark-options="-p /reverse/ --baseline fast.baseline"
Build profile: -w ghc-9.2.8 -O1
In order, the following will be built (use -v for more details):
 - text-2.0.2 (bench:text-benchmarks) (first run)
Preprocessing benchmark 'text-benchmarks' for text-2.0.2..
Building benchmark 'text-benchmarks' for text-2.0.2..
Running 1 benchmarks...
Benchmark text-benchmarks: RUNNING...
All
  Pure
    tiny
      reverse
        Text:     OK (0.16s)
          33.8 ns ± 3.2 ns, 117% more than baseline
        LazyText: OK (0.20s)
          43.5 ns ± 3.6 ns, 62% more than baseline
    ascii-small
      reverse
        Text:     OK (0.18s)
          346  μs ±  22 μs, 860% more than baseline
        LazyText: OK (0.17s)
          345  μs ±  29 μs, 834% more than baseline
    ascii
      reverse
        Text:     OK (0.95s)
          288  ms ±  17 ms, 876% more than baseline
        LazyText: OK (0.93s)
          284  ms ± 2.7 ms, 802% more than baseline
    english
      reverse
        Text:     OK (0.17s)
          20.1 ms ± 1.5 ms, 878% more than baseline
        LazyText: OK (0.14s)
          18.9 ms ± 1.5 ms, 788% more than baseline
    russian
      reverse
        Text:     OK (0.18s)
          43.9 μs ± 2.9 μs, 767% more than baseline
        LazyText: OK (0.18s)
          43.8 μs ± 2.7 μs, 793% more than baseline
    japanese
      reverse
        Text:     OK (0.87s)
          25.6 μs ± 1.5 μs, 317% more than baseline
        LazyText: OK (0.87s)
          25.6 μs ± 1.5 μs, 314% more than baseline

src/Data/Text/Internal/Reverse.hs Outdated Show resolved Hide resolved
src/Data/Text/Internal/Reverse.hs Outdated Show resolved Hide resolved
src/Data/Text/Internal/Reverse.hs Outdated Show resolved Hide resolved
src/Data/Text/Internal/Reverse.hs Outdated Show resolved Hide resolved
@chreekat
Copy link
Contributor Author

Thanks @hsyl20 ! I'll have more time to look at this again next week. Will follow up then.

@chreekat
Copy link
Contributor Author

I'm gonna mark this as draft again while I figure out the doctest and try out some of the speedups.

@chreekat chreekat changed the title Isolate reverse in preparation for a pure-Haskell implementation Draft: Isolate reverse in preparation for a pure-Haskell implementation Sep 16, 2023
@chreekat
Copy link
Contributor Author

Turns out moving the module into other-modules solved the docspec problem... nice.

let pointLength = utf8LengthByLeader (A.unsafeIndex ba p_in)
in do
A.copyI pointLength dest (p_out - pointLength + 1) ba p_in
reversePoints ba (p_in + pointLength) dest (p_out - pointLength)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't benchmark it, but a general rule is that passing unchanged arguments through recursive calls is suboptimal for performance. A usual pattern is

reversePoints ba x dest y = go x y 
  where 
    go pIn pOut = .... go pInt' pOut'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this, and all the microbenchmarks surprisingly fared a lot worse.

diff --git a/src/Data/Text/Internal/Reverse.hs b/src/Data/Text/Internal/Reverse.hs
index 153611f..3c0f95c 100644
--- a/src/Data/Text/Internal/Reverse.hs
+++ b/src/Data/Text/Internal/Reverse.hs
@@ -46,12 +46,13 @@ reversePoints
     -> A.MArray s -- ^ Output array
     -> Int -- ^ Output index
     -> ST s ()
-reversePoints _ _ _ p_out | p_out < 0 = pure ()
-reversePoints ba p_in dest p_out =
-    let pointLength = utf8LengthByLeader (A.unsafeIndex ba p_in)
-    in do
-        A.copyI pointLength dest (p_out - pointLength + 1) ba p_in
-        reversePoints ba (p_in + pointLength) dest (p_out - pointLength)
+reversePoints src x dest y = go x y where
+    go _ pOut | pOut < 0 = pure ()
+    go pIn pOut =
+        let pLen = utf8LengthByLeader (A.unsafeIndex src pIn)
+        in do
+            A.copyI pLen dest (pOut - pLen + 1) src pIn
+            go (pIn + pLen) (pOut - pLen)
 #else
 reverse (Text (A.ByteArray ba) off len) = runST $ do
     marr@(A.MutableByteArray mba) <- A.new len
Benchmark text-benchmarks: RUNNING...
All
  Pure
    tiny
      reverse
        Text:     OK (0.28s)
          63.8 ns ± 5.3 ns, 79% more than baseline
        LazyText: OK (0.22s)
          50.2 ns ± 2.8 ns, 10% more than baseline
    ascii-small
      reverse
        Text:     OK (0.19s)
          732  μs ±  54 μs, 101% more than baseline
        LazyText: OK (0.12s)
          472  μs ±  43 μs, 29% more than baseline
    ascii
      reverse
        Text:     OK (2.02s)
          637  ms ±  19 ms, 110% more than baseline
        LazyText: OK (1.26s)
          395  ms ± 5.5 ms, 32% more than baseline
    english
      reverse
        Text:     OK (0.33s)
          41.9 ms ± 1.5 ms, 98% more than baseline
        LazyText: OK (0.19s)
          26.4 ms ± 1.5 ms, 32% more than baseline
    russian
      reverse
        Text:     OK (0.32s)
          75.4 μs ± 6.3 μs, 83% more than baseline
        LazyText: OK (0.21s)
          51.4 μs ± 3.0 μs, 25% more than baseline
    japanese
      reverse
        Text:     OK (0.21s)
          49.4 μs ± 2.8 μs, 103% more than baseline
        LazyText: OK (0.29s)
          34.6 μs ± 2.9 μs, 42% more than baseline

All 12 tests passed (6.42s)
Benchmark text-benchmarks: FINISH

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprisingly, I can reproduce your measurements, but, looking at Core, I'm fairly certain that this is some sort of benchmarking quirk. You can compare yourself by putting

{-# OPTIONS_GHC -ddump-to-file -ddump-simpl -dsuppress-all -dno-suppress-type-signatures #-}
  • Let's go with go pIn pOut, Core output is much nicer.
  • You most certainly want go !_ pOut, not just go _ pOut, otherwise GHC believes that this function is lazy in pIn and would not unbox it.
  • Common subexpression elimination pass is not powerful enough to share pOut - pLen between two expressions, please do it manually.
  • Another microoptimisation is to pass reversePoints ba off dest len, check pOut <= 0, and then you shave off +1 in A.copyI pLen dest (pOut - pLen) src pIn.

Copy link
Contributor

@Bodigrim Bodigrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks reasonable, thanks for starting JS preparations.

@chreekat chreekat changed the title Draft: Isolate reverse in preparation for a pure-Haskell implementation Isolate reverse in preparation for a pure-Haskell implementation Sep 30, 2023
@chreekat
Copy link
Contributor Author

I believe I've addressed all feedback now.

reversePoints src pIn dest pOut =
let pointLength = utf8LengthByLeader (A.unsafeIndex src pIn)
pOut' = pOut - pointLength + 1
-- Repeated unsafeWrite is faster than copyI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know which is faster with JS backend?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, unless we know JS backend behaviour, I'd strive for simpler Core, which should give GHC more opportunities to do backend-dependent optimizations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll double check. TBH, in many cases I'd rather have simpler code even if it's a little slower.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hsyl20 had suggested this change for backends other than JavaScript, by the way. tbh I'd rather use copyI. If copyI (a primop) is truly slower than repeated calls to unsafeWrite (another primop) maybe it's the primops that need to change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just used copyI because it seems like the right thing to do. I've squashed all commits as well.

@Bodigrim
Copy link
Contributor

@chreekat could you please fix https://github.com/haskell/text/actions/runs/6597744008/job/17925891514?pr=536#step:6:106? Other two CI failures are spurious, but this one is genuine.

@chreekat
Copy link
Contributor Author

@chreekat could you please fix https://github.com/haskell/text/actions/runs/6597744008/job/17925891514?pr=536#step:6:106? Other two CI failures are spurious, but this one is genuine.

@Bodigrim whoops! Fixed now.

@Bodigrim Bodigrim mentioned this pull request Oct 22, 2023
@Bodigrim Bodigrim requested a review from Lysxia October 23, 2023 21:00
@Bodigrim Bodigrim merged commit 1ae86be into haskell:master Oct 28, 2023
25 of 27 checks passed
@Bodigrim
Copy link
Contributor

Thanks!

@chreekat
Copy link
Contributor Author

chreekat commented Nov 1, 2023

Thanks to everyone who participated and patiently waited for my progress!

I'll start on the next C function soon.

@chreekat chreekat deleted the b/pure-haskell branch November 10, 2023 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants