Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up shuffle #348

Merged
merged 1 commit into from
Jan 6, 2020
Merged

Speed up shuffle #348

merged 1 commit into from
Jan 6, 2020

Conversation

treeowl
Copy link
Contributor

@treeowl treeowl commented Dec 25, 2019

  • Add shuffleSeq to shuffle sequences.

  • Use shuffleSeq to implement shuffle much more efficiently. The shuffling
    algorithm we use finds and removes an arbitrary element, which sequences
    can do in logarithmic time and space rather than linear time and
    space. It also uses length, which is constant time for sequences and
    linear time for lists.

  • Bump lower bound on containers. We could avoid this, if necessary,
    by not using pattern synonyms for sequences.

@HuwCampbell
Copy link
Member

This looks good to me; some benchmarking results would be good though.

With regards to bumping containers, I don't think it should be necessary as it looks like the pattern synonym is only used for adding an element, so plain (<|) should suffice.

@treeowl
Copy link
Contributor Author

treeowl commented Dec 29, 2019

I used it for more earlier, but shifted. Yeah, I'll take that out.

@moodmosaic
Copy link
Member

@treeowl, thank you, for all the hard work you've been putting in here and on the other PRs 👍 🍻 Often we're all swamped and so there're delays. (Note to self, perhaps it's time for a CONTRIBUTING.md.)


Agreed with @HuwCampbell on trying to avoid bumping dependencies when possible.

Also, it'd be nice with some benchmarks for future reference, as a comment here, or better in the commit message itself. (Yep, we've had some fairly long commit messages in the past.)

Once we resolve those, we could look forward getting this merged /cc @jacobstanley

@treeowl
Copy link
Contributor Author

treeowl commented Dec 29, 2019

I've done everything except benchmarking. I don't really have the time right now to set up Criterion or Gauge benchmarks for this. Messing around in GHCi with moderately long lists seemed a pretty convincing demonstration, and it drops the algorithm from quadratic to linearithmic in both time and allocation.

@treeowl treeowl force-pushed the shuffle branch 2 times, most recently from 86a96c6 to 2b29019 Compare December 30, 2019 02:51
* Add `shuffleSeq` to shuffle sequences.

* Use `shuffleSeq` to implement `shuffle` much more efficiently. The shuffling
  algorithm we use finds and removes an arbitrary element, which sequences
  can do in logarithmic time and space rather than linear time and
  space. It also uses `length`, which is constant time for sequences and
  linear time for lists.
@treeowl
Copy link
Contributor Author

treeowl commented Jan 4, 2020

@moodmosaic any chance you could have another look at this?

@moodmosaic
Copy link
Member

Yep, during the weekend hopefully, as soon as I'm back at my laptop.

Copy link
Member

@moodmosaic moodmosaic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@treeowl, I've just setup a quick Criterion benchmark. Looking good 👍 💯 See comment below.

hedgehog/src/Hedgehog/Internal/Gen.hs Show resolved Hide resolved
@treeowl
Copy link
Contributor Author

treeowl commented Jan 5, 2020

Yeah, deleteAt is relatively new. I'm glad my guess about the relative performance of those turned out right.

@moodmosaic
Copy link
Member

moodmosaic commented Jan 5, 2020

Yep, FTR it was added in 0.5.2 https://hackage.haskell.org/package/containers-0.5.2.0/docs/doc-index-D.html

Edit: You are right, it was added in 0.5.8.

But since the other one in the #else branch is nearly as fast, perhaps we could just use that one. We could add a FIXME there with a link pointing at this discussion and update it once we bump containers.

Just to avoid the extra maintenance and the CPP is all, but absolutely open for keeping both if there's a valid reason for it.

@moodmosaic
Copy link
Member

OTOH the one in the #else branch is ugly (and a bit slower) compared to the one using deletedAt

@moodmosaic
Copy link
Member

moodmosaic commented Jan 5, 2020

containers v0.4 is 10 years old, could very well be a reason to bump the lower version to v0.5.8...

@treeowl
Copy link
Contributor Author

treeowl commented Jan 6, 2020

Only you can decide what bounds you consider acceptable. I think it would be a shame not to take the improved performance when we can get it. Do you think you could break up the benchmarks a bit? It would be useful to see scaled comparisons (or just text) for 3/30/300 and also 3000 elements. The small end in the graph above is too small to see, and at the large end you don't seem to get quite large enough for the new algorithm to blow the old one out of the water.

@moodmosaic
Copy link
Member

👍 Sounds reasonable, will do and try to upload the full benchmark HTML here.

@moodmosaic
Copy link
Member

The one taking advantage of containers 0.5.8 is nearly twice faster than the one that works with 0.4.0 🤓

image

benchmarking current/3
time                 2.186 μs   (2.084 μs .. 2.326 μs)
                     0.982 R²   (0.978 R² .. 0.988 R²)
mean                 2.301 μs   (2.209 μs .. 2.395 μs)
std dev              324.8 ns   (271.9 ns .. 399.3 ns)
variance introduced by outliers: 94% (severely inflated)

benchmarking current/30
time                 26.13 μs   (25.69 μs .. 26.59 μs)
                     0.995 R²   (0.992 R² .. 0.997 R²)
mean                 28.04 μs   (27.23 μs .. 29.02 μs)
std dev              2.849 μs   (2.200 μs .. 3.849 μs)
variance introduced by outliers: 85% (severely inflated)

benchmarking current/300
time                 2.187 ms   (2.103 ms .. 2.283 ms)
                     0.988 R²   (0.982 R² .. 0.994 R²)
mean                 2.122 ms   (2.078 ms .. 2.179 ms)
std dev              173.2 μs   (139.5 μs .. 215.5 μs)
variance introduced by outliers: 59% (severely inflated)

benchmarking current/3000
time                 184.7 ms   (177.6 ms .. 191.6 ms)
                     0.999 R²   (0.997 R² .. 1.000 R²)
mean                 197.0 ms   (190.4 ms .. 215.1 ms)
std dev              14.72 ms   (435.9 μs .. 20.86 ms)
variance introduced by outliers: 15% (moderately inflated)

benchmarking faster_min_containers_0.5.8/3
time                 2.108 μs   (2.050 μs .. 2.189 μs)
                     0.992 R²   (0.986 R² .. 0.997 R²)
mean                 2.155 μs   (2.112 μs .. 2.212 μs)
std dev              170.6 ns   (135.0 ns .. 225.7 ns)
variance introduced by outliers: 82% (severely inflated)

benchmarking faster_min_containers_0.5.8/30
time                 28.31 μs   (26.20 μs .. 29.86 μs)
                     0.979 R²   (0.970 R² .. 0.989 R²)
mean                 26.60 μs   (25.68 μs .. 27.72 μs)
std dev              3.244 μs   (2.673 μs .. 3.972 μs)
variance introduced by outliers: 89% (severely inflated)

benchmarking faster_min_containers_0.5.8/300
time                 405.3 μs   (381.2 μs .. 423.8 μs)
                     0.974 R²   (0.963 R² .. 0.983 R²)
mean                 439.3 μs   (417.8 μs .. 464.5 μs)
std dev              80.82 μs   (66.11 μs .. 99.06 μs)
variance introduced by outliers: 93% (severely inflated)

benchmarking faster_min_containers_0.5.8/3000
time                 9.775 ms   (9.378 ms .. 10.33 ms)
                     0.978 R²   (0.955 R² .. 0.993 R²)
mean                 11.50 ms   (11.04 ms .. 12.18 ms)
std dev              1.599 ms   (1.160 ms .. 2.401 ms)
variance introduced by outliers: 68% (severely inflated)

benchmarking faster_min_containers_0.4.0/3
time                 2.435 μs   (2.310 μs .. 2.550 μs)
                     0.988 R²   (0.980 R² .. 0.995 R²)
mean                 2.270 μs   (2.224 μs .. 2.345 μs)
std dev              194.2 ns   (138.1 ns .. 304.8 ns)
variance introduced by outliers: 84% (severely inflated)

benchmarking faster_min_containers_0.4.0/30
time                 32.13 μs   (31.03 μs .. 33.06 μs)
                     0.991 R²   (0.985 R² .. 0.994 R²)
mean                 30.44 μs   (29.48 μs .. 31.42 μs)
std dev              3.372 μs   (2.949 μs .. 3.863 μs)
variance introduced by outliers: 87% (severely inflated)

benchmarking faster_min_containers_0.4.0/300
time                 805.6 μs   (750.3 μs .. 863.3 μs)
                     0.967 R²   (0.950 R² .. 0.983 R²)
mean                 733.4 μs   (696.7 μs .. 771.1 μs)
std dev              121.6 μs   (100.9 μs .. 152.5 μs)
variance introduced by outliers: 89% (severely inflated)

benchmarking faster_min_containers_0.4.0/3000
time                 16.59 ms   (15.38 ms .. 17.80 ms)
                     0.978 R²   (0.960 R² .. 0.991 R²)
mean                 20.03 ms   (19.02 ms .. 21.93 ms)
std dev              3.180 ms   (1.917 ms .. 5.463 ms)
variance introduced by outliers: 71% (severely inflated)

@moodmosaic moodmosaic merged commit ecad248 into hedgehogqa:master Jan 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants