Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better performance with a left associative .|? #509

Open
kabuhr opened this issue May 3, 2024 · 2 comments
Open

Better performance with a left associative .|? #509

kabuhr opened this issue May 3, 2024 · 2 comments

Comments

@kabuhr
Copy link

kabuhr commented May 3, 2024

In a recent Stack Overflow question, Kwobny asked why .| was infixr. The potential "performance issue" raised in that question wasn't very compelling (i.e., so what if ((...((c1 .| c2) .| c3) ... .| c999999) .| c1000000) | return () is faster than the alternative), but I ended up running a few quick Criterion benchmarks on simple, pure conduits:

rassoc1 :: ConduitT () Void Identity Integer
rassoc1 = yieldMany [0..1000000] .| mapC (*2) .| mapC (+4) .| sumC

rassoc2 :: ConduitT () Void Identity Integer
rassoc2 = yieldMany [1..] .| filterC (50 `divides`) .| takeC 10000 .| mapC (*2) .| sumC
    where d `divides` n = n `mod` d == 0

versus their left-associative equivalents (see below) and was surprised to discover that the left-associative versions were orders of magnitude faster than the right-associative versions (like 50ms to 10ms for rassoc1 versus its left-associative equivalent, and 8ms to 100us for rassoc2).

I admittedly haven't spent a lot of time studying this, but I thought I'd raise an issue here in case there's a quick explanation. Was the decision to encourage right associative conduits made early on because they "looked right", or is there some specific advantage to right associative conduits that makes the default associativity of .| the right choice (pun!). Maybe left associative conduits don't actually achieve constant memory usage? Maybe there's some advantage in the ordering of monadic effects for right associative conduits? Or maybe the performance issue above is just a missing REWRITE rule, and right associative conduits achieve better performance in more realistic workflows. Any thoughts?

The benchmark:

import Criterion
import Criterion.Main
import Conduit

(|.) ::  Monad m => ConduitT a b m () -> ConduitT b c m r -> ConduitT a c m r
(|.) = (.|)
infixl 2 |.

d `divides` n = n `mod` d == 0

rassoc1, rassoc2 :: ConduitT () Void Identity Integer
rassoc1 = yieldMany [0..1000000] .| mapC (*2) .| mapC (+4) .| sumC
rassoc2 = yieldMany [1..] .| filterC (50 `divides`) .| takeC 10000 .| mapC (*2) .| sumC

lassoc1, lassoc2 :: ConduitT () Void Identity Integer
lassoc1 = yieldMany [0..1000000] |. mapC (*2) |. mapC (+4) |. sumC
lassoc2 = yieldMany [1..] |. filterC (50 `divides`) |. takeC 10000 |. mapC (*2) |. sumC

main = defaultMain
  [ bench "rassoc1" $ whnf runConduit rassoc1
  , bench "lassoc1" $ whnf runConduit lassoc1
  , bench "rassoc2" $ whnf runConduit rassoc2
  , bench "lassoc2" $ whnf runConduit lassoc2
  ]

{-
   $ ghc --version
   The Glorious Glasgow Haskell Compilation System, version 9.4.8
   $ ghc -O ConduitAssoc1.hs && ./ConduitAssoc1
   Loaded package environment from /u/buhr/.ghc/x86_64-linux-9.4.8/environments/default
   benchmarking rassoc1
   time                 49.88 ms   (49.60 ms .. 50.58 ms)
   ...
   benchmarking lassoc1
   time                 11.51 ms   (11.28 ms .. 11.76 ms)
   ...
   benchmarking rassoc2
   time                 8.887 ms   (8.881 ms .. 8.893 ms)
   ...
   benchmarking lassoc2
   time                 118.8 μs   (118.0 μs .. 120.4 μs)
   ...
-}
@snoyberg
Copy link
Owner

snoyberg commented May 5, 2024

I don't remember a specific motivation. I am surprised by this, since we used the codensity transform which should avoid the worst performance issues of left vs right associativity. The history of conduit though has lots of cases where new versions of GHC have changed behavior enough that performance degraded when upgrading, especially around rewrite rules.

@kwobny
Copy link

kwobny commented May 5, 2024

Hi, I am the original poster of the stack overflow question. Here is the link to the question, which explains why I suspected the performance issue.
https://stackoverflow.com/questions/78420681/in-the-haskell-conduit-library-why-is-the-operator-infixr
Furthermore, I believe the associativity advantage snoyberg refers to regarding the codensity transform applies to monadic composition only. Fusing of components is a different kind of composition for which I believe this advantage does not apply.

In addition, I'd like to admit I'm still kind of a beginner in haskell, so what I say might not be thorough or use the proper language. I also edited the question just now to more clearly explain my reasoning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants