-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve filter
#281
Comments
Side note: we could do better still if rather than using |
Second side note: if (some form of) my |
Third side note: all of this means that writing a generator for even numbers is not as easy as it seems. For example, consider a generator for natural numbers that produces a tree like
which would be a correct, if somewhat inefficient, shrinker.
That would be fine for this case (the efficiency concerns noted above don't really apply), but this only works for pure generators, and so would have a more restrictive type. |
Fourth and final side note: I wasn't sure if this was a bugfix or an improvement :) Still not entirely sure, but I guess it could be argued that trying again from scratch after finding a shrunk value that doesn't satisfy the predicate may be useful; after all, next time we go round we may happen to start with different value that will eventually lead to a smaller final value. Note however that if the tree contains many elements that don't satisfy the predicate, it is very likely we'll actually run the generator 100 times (because we're likely to hit a shrunk value not satisfying the predicate each time), and only at the very last run stop when we hit a value that doesn't satisfy the predicate. In other words, at least for cases where there are many elements being filtered out, my |
Thanks for the detailed analysis, I will at the very lease be implementing some variant of your This implementation is partially a hangover from when Fwiw, The F# version implements the intention properly: https://github.com/hedgehogqa/fsharp-hedgehog/blob/master/src/Hedgehog/Gen.fs#L276-L280 Random.bind returns the tree itself and the predicate is checked only against the root. If successful the loop terminates. The tree is then filtered separately and would result in the same tree as your I would say this is a bugfix, it was intended to work like |
At the moment,
filter
is defined likeI think this is not quite right. The problem is that this is using the monadic interface to the generator; therefore the check
if p x then ... else ..
gets applied at every level of the shrink tree forx
. That's not correct: that means that the entire generator is re-run every time a shrunk value happens not to satisfy the predicate, at which point we'd basically start over. This leads to large shrink trees and consequently bad performance (https://stackoverflow.com/questions/54412108/why-the-does-this-shrink-tree-looks-the-way-it-does-when-using-filter). Indeed, it means that we may end up "shrinking" a value to a larger one.Instead what should happen after we find a value that satisfies the predicate, we should remove any shrunk values that don't satisfy the predicate. Just as a proof of concept, this is easily defined in terms of the existing filter:
(Not entirely sure why
ensure
is not exported?). I'm calling this a proof of concept because what should probably happen instead is thatfilter
should be changed to behave likefilter'
(or at least the existingfilter
should not be the default).The definition of
filter'
is not quite ideal. The problem is thatensure
will stop as soon as it finds one shrunk value that doesn't satisfy the predicate, rather than skipping to the next level in the tree. However, QuickCheck does this too; givenwe get
Nonetheless, I still don't think that it makes much sense to re-run the entire generator when shrinking happens to hit on an element that doesn't satisfy the predicate.
Variant for better shrinking
(The below is independent from the bugfix above.)
For a while I thought that we could do better in HH, given that we have the entire shrink tree available. The idea would be that we can "flatten" the tree, replacing a value that doesn't satisfy the predicate with its children; i.e., we could go from
to
However, unfortunately the monad stack gets in the way. We start with a tree whose root satisfies the predicate (this would be guaranteed by the primitive
filter
function, the one that doesn't shrink). Now we need to decide for each of the children of that root whether to include those children as is, or replace them by their children (the root's grandchildren). The trouble is that that would require us to run the effects required to evaluate those children; worse, the problem repeats the next level down and so it would mean that we'd evaluate all effects in the entire tree always, which would obviously be disastrous (we certainly don't want to evaluate the entire tree if we can help it).However, we can do this for pure generators:
It may be worthwhile including this variant in the library.
Just for completeness, here's an example to play with:
The first one uses the standard
filter
and produces a huge tree (restarting the generator every time that the integer shrinks to an odd number, to a maximum of a depth of 100); the second hasQuickCheck
-like behaviour and produces trees such asFinally,
example''
produces trees such aswhich are near-ideal (except that the flattening may introduce some duplicates).
(For a while I thought we could do better still, and have this pattern be available when when we do have monadic effects, by cleverly only evaluating as much of the tree as we need to to find a root that satisfies the predicate, and then reattaching any nodes we skipped as children of that new root. However, that would result in trees in which nodes lower in the tree are not necessarily "smaller" than their parents, which is of course not a very good shrink tree.)
The text was updated successfully, but these errors were encountered: