Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Control.Concurrent.Counter.Unlifted in CMM #3

Merged
merged 2 commits into from
Apr 15, 2023
Merged

Implement Control.Concurrent.Counter.Unlifted in CMM #3

merged 2 commits into from
Apr 15, 2023

Conversation

Bodigrim
Copy link
Contributor

@Bodigrim Bodigrim commented Apr 1, 2023

Just an experiment to avoid any overhead for MutableByteArray inspired by @TerrorJack's https://gist.github.com/TerrorJack/4a48ed790155cc79619fffe9f5844521.

Caveats:

  • Does not work in GHC JavaScript backend, which should probably retain the previous, pure Haskell implementation.

  • Needs more care on 32-bit machines, current code is 64 bit only.

  • ccall hs_atomic_add64 is suboptimal on x86_64, but at the moment hand-written CMM cannot access call MO_AtomicRMW W64 AMO_Add, see https://gitlab.haskell.org/ghc/ghc/-/issues/23206.

@Bodigrim
Copy link
Contributor Author

Bodigrim commented Apr 1, 2023

Hmm, tests are failing on GHC 9.4.4, macOS x86_64:

 All
  Correctness:            OK (9.99s)
    +++ OK, passed 500 tests.
  Correctness, no delays: FAIL (0.27s)
    *** Failed! Falsified (after 3479 tests):
    Threads {unThreads = Thread {tDelay = Delay {unDelay = 6}, tIncrement = -950, tIterations = Iterations {unIterations = 7}} :| [Thread {tDelay = Delay {unDelay = 8}, tIncrement = -441, tIterations = Iterations {unIterations = 37}},Thread {tDelay = Delay {unDelay = 6}, tIncrement = -452, tIterations = Iterations {unIterations = 0}},Thread {tDelay = Delay {unDelay = 4}, tIncrement = -389, tIterations = Iterations {unIterations = 49}},Thread {tDelay = Delay {unDelay = 0}, tIncrement = -294, tIterations = Iterations {unIterations = 40}}]}
    -110710340 /= -53788
    Use --quickcheck-replay=512982 to reproduce.
    Use -p '/Correctness, no delays/' to rerun this test only.

Whatever the sequence of updates, the result must have been below five digits, so -110710340 suggests memory corruption.

@TerrorJack, any idea where I messed up?

@alt-romes
Copy link

Great! I was just thinking about this too. Thanks @Bodigrim

@TerrorJack
Copy link
Contributor

Pong, will take a closer look next week.

@alt-romes
Copy link

I'm also interested in seeing this work, don't mind my second ping @TerrorJack

@TerrorJack
Copy link
Contributor

Would you give https://gist.github.com/TerrorJack/e3f1231fa65b3d1f5ea63065dd33a87f a try and see if it fixes the issue? Honestly I'm no less puzzled than you why the original version has this memory issue, but this diff seems to help.

@Bodigrim
Copy link
Contributor Author

Thanks @TerrorJack, this seems to work.

I think this is ready for review now.

@alt-romes
Copy link

FWIW LGTM

@Bodigrim
Copy link
Contributor Author

Benchmarks:

Name 07f4b51 3dd86a3 Ratio
10 iterations and 1 threads 626901 650901 1.03828
100 iterations and 1 threads 1057955 1065232 1.00688
1000 iterations and 1 threads 4767930 4846413 1.01646
10000 iterations and 1 threads 40603875 40717926 1.00281
10 iterations and 2 threads 6199483 6215062 1.00251
100 iterations and 2 threads 6786286 6637318 0.978049
1000 iterations and 2 threads 36046876 37277081 1.03413
10000 iterations and 2 threads 466461621 459297509 0.984642
10 iterations and 4 threads 14304393 14116542 0.986868
100 iterations and 4 threads 22378799 21837321 0.975804
1000 iterations and 4 threads 241044897 221475244 0.918813
10000 iterations and 4 threads 2411656445 2126885546 0.881919
10 iterations and 6 threads 29380970 31528617 1.0731
100 iterations and 6 threads 51806632 49229728 0.950259
1000 iterations and 6 threads 421304931 389744873 0.92509
10000 iterations and 6 threads 10803454687 10544148437 0.975998
10 iterations and 8 threads 33264868 33220318 0.998661
100 iterations and 8 threads 70489263 68710925 0.974772
1000 iterations and 8 threads 751835473 614695391 0.817593
10000 iterations and 8 threads 18112116015 17322731250 0.956417
10 iterations and 12 threads 37138810 37928328 1.02126
100 iterations and 12 threads 111093969 99659307 0.897072
1000 iterations and 12 threads 2238697705 1766967041 0.789283
10000 iterations and 12 threads 24245685546 23640137500 0.975025
10 iterations and 16 threads 43222968 43582453 1.00832
100 iterations and 16 threads 145973196 137985633 0.945281
1000 iterations and 16 threads 3376071093 2974715039 0.881117
10000 iterations and 16 threads 35231724609 34480696093 0.978683
10 iterations and 20 threads 62185274 62320446 1.00217
100 iterations and 20 threads 183011389 173122430 0.945965
1000 iterations and 20 threads 4215415478 3965037304 0.940604
10000 iterations and 20 threads 42896585937 42560037500 0.992154
10 iterations and 32 threads 97156437 99794003 1.02715
100 iterations and 32 threads 260610671 260671594 1.00023
1000 iterations and 32 threads 7016384375 6485634765 0.924356
10000 iterations and 32 threads 73145804687 69175517187 0.945721
10 iterations and 64 threads 172624707 178914562 1.03644
100 iterations and 64 threads 610634515 591822558 0.969193
1000 iterations and 64 threads 14407456250 13955856250 0.968655
10000 iterations and 64 threads 151388475000 148245268750 0.979237
10 iterations and 128 threads 294979428 300389105 1.01834
100 iterations and 128 threads 1733554345 1293563940 0.746192
1000 iterations and 128 threads 29561675000 27645350000 0.935175
10000 iterations and 128 threads 303322475000 296239175000 0.976648
Geometric mean 0.96143

Interestingly the biggest speed up, up to 25%, is observed for 1000 iterations, while measurements for 10000 iterations are consistently better, but less impressive, up to 11%.

@sergv sergv merged commit d35ffb1 into sergv:master Apr 15, 2023
6 checks passed
@sergv
Copy link
Owner

sergv commented Apr 15, 2023

Impressive, thanks a lot!

@sergv
Copy link
Owner

sergv commented Apr 15, 2023

NB for comparison of CMM against mutable prim array one can compare against Addr row in the benchmarks - it was on par with singleton array previously.

@sergv
Copy link
Owner

sergv commented Apr 15, 2023

Released as 0.1.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants