Skip to content

Conversation

nikita-volkov
Copy link
Contributor

Reduce the amount of operations, avoiding redundant writes and hence reducing the chance of conflicts

Reduce the amount of operations, avoiding redundant writes and hence reducing the chance of conflicts
@Fuuzetsu
Copy link
Member

Would it be possible to add some benchmarks? It'd be extremely nice to see what kind of difference this makes, if any.

It should help just by looking at it but I'm not a fan of increasing code complexity without a metric we can point at and say "see?"... Especially in such a core library where impact of the change will transitively touch probably every non-trivial Haskell program.

@nikita-volkov
Copy link
Contributor Author

Unfortunately I don't have the time for that. Also I don't think that it needs proving that avoiding redundant writes reduces contention, and hence increases performance.

@hvr
Copy link
Member

hvr commented Sep 18, 2018

ping @mitchellwrosen

@mitchellwrosen
Copy link
Contributor

LGTM, and I agree benchmarking this is not necessary. I did a quick and dirty one anyway:

  • 2 threads each writing 2M elements
  • 4 threads each peeking 2M elements

Old:

benchmarking ./peekbench +RTS -N4
time                 2.592 s    (2.285 s .. 3.051 s)
                     0.995 R²   (0.994 R² .. 1.000 R²)
mean                 2.516 s    (2.465 s .. 2.615 s)
std dev              85.33 ms   (0.0 s .. 86.57 ms)
variance introduced by outliers: 19% (moderately inflated)

New:

benchmarking ./peekbench +RTS -N4
time                 1.425 s    (1.215 s .. 1.575 s)
                     0.998 R²   (0.992 R² .. 1.000 R²)
mean                 1.408 s    (1.360 s .. 1.432 s)
std dev              41.60 ms   (0.0 s .. 41.77 ms)
variance introduced by outliers: 19% (moderately inflated)

Code:

import Control.Concurrent
import Control.Concurrent.STM
import Control.Monad

main :: IO ()
main = do
  q <- newTQueueIO
  done <- newEmptyMVar

  forkIO $ replicateM_ 2000000 (atomically (writeTQueue q 1)) >> putMVar done ()
  forkIO $ replicateM_ 2000000 (atomically (writeTQueue q 1)) >> putMVar done ()

  forkIO $ replicateM_ 2000000 (atomically (peekTQueue q))    >> putMVar done ()
  forkIO $ replicateM_ 2000000 (atomically (peekTQueue q))    >> putMVar done ()
  forkIO $ replicateM_ 2000000 (atomically (peekTQueue q))    >> putMVar done ()
  forkIO $ replicateM_ 2000000 (atomically (peekTQueue q))    >> putMVar done ()

  replicateM_ 6 $ takeMVar done

@Fuuzetsu
Copy link
Member

That's a huge improvement. Why not add these benchmarks into the package?

@mitchellwrosen
Copy link
Contributor

mitchellwrosen commented Sep 19, 2018

It definitely could. Who should make that call? Not me :) Again I agree with @nikita-volkov that his implementation is just "obviously" better (for lack of a better word - I know it's not a good one to use when discussing code).

A peek is just a read that is not followed by writing back the tail, and now it's implemented as such. So an understanding of how STM works, or really just an understanding of this patch, is (to my mind) sufficient proof that the old way (read + write + read + write) is worse.

@mitchellwrosen
Copy link
Contributor

Bump, any plans to merge this one?

@simonmar simonmar merged commit 3860cd1 into haskell:master Jun 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants