Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve filter #281

Closed
edsko opened this issue Apr 28, 2019 · 5 comments
Closed

Improve filter #281

edsko opened this issue Apr 28, 2019 · 5 comments

Comments

@edsko
Copy link
Contributor

edsko commented Apr 28, 2019

At the moment, filter is defined like

filter :: MonadGen m => (a -> Bool) -> m a -> m a
filter p gen =
  let
    try k =
      if k > 100 then
        discard
      else do
        x <- scale (2 * k +) gen
        if p x then
          pure x
        else
          try (k + 1)
  in
    try 0

I think this is not quite right. The problem is that this is using the monadic interface to the generator; therefore the check if p x then ... else .. gets applied at every level of the shrink tree for x. That's not correct: that means that the entire generator is re-run every time a shrunk value happens not to satisfy the predicate, at which point we'd basically start over. This leads to large shrink trees and consequently bad performance (https://stackoverflow.com/questions/54412108/why-the-does-this-shrink-tree-looks-the-way-it-does-when-using-filter). Indeed, it means that we may end up "shrinking" a value to a larger one.

Instead what should happen after we find a value that satisfies the predicate, we should remove any shrunk values that don't satisfy the predicate. Just as a proof of concept, this is easily defined in terms of the existing filter:

filter' :: MonadGen m => (a -> Bool) -> m a -> m a
filter' p gen = Gen.Internal.ensure p $
                  join (snd <$> Gen.filter (p . fst) (Gen.freeze gen))

(Not entirely sure why ensure is not exported?). I'm calling this a proof of concept because what should probably happen instead is that filter should be changed to behave like filter' (or at least the existing filter should not be the default).

The definition of filter' is not quite ideal. The problem is that ensure will stop as soon as it finds one shrunk value that doesn't satisfy the predicate, rather than skipping to the next level in the tree. However, QuickCheck does this too; given

newtype Uniform = Uniform Int
  deriving (Show)

instance Arbitrary Uniform where
  arbitrary = Uniform <$> choose (0, 100)
  shrink (Uniform n) = Uniform <$> shrink n

we get

> quickCheck (\(Uniform n) -> even n)
*** Failed! Falsified (after 1 test):                  
Uniform 31

Nonetheless, I still don't think that it makes much sense to re-run the entire generator when shrinking happens to hit on an element that doesn't satisfy the predicate.

Variant for better shrinking

(The below is independent from the bugfix above.)

For a while I thought that we could do better in HH, given that we have the entire shrink tree available. The idea would be that we can "flatten" the tree, replacing a value that doesn't satisfy the predicate with its children; i.e., we could go from

4 --- 1
  |
  --- 3 --- 2
        |
        --- 0

to

4 --- 2
  |
  --- 0

However, unfortunately the monad stack gets in the way. We start with a tree whose root satisfies the predicate (this would be guaranteed by the primitive filter function, the one that doesn't shrink). Now we need to decide for each of the children of that root whether to include those children as is, or replace them by their children (the root's grandchildren). The trouble is that that would require us to run the effects required to evaluate those children; worse, the problem repeats the next level down and so it would mean that we'd evaluate all effects in the entire tree always, which would obviously be disastrous (we certainly don't want to evaluate the entire tree if we can help it).

However, we can do this for pure generators:

filter'' :: forall a. (a -> Bool) -> Gen a -> Gen a
filter'' p gen = withGenT (mapGenT (go . runDiscardEffect)) $
                   join (snd <$> Gen.filter (p . fst) (Gen.freeze gen))
  where
    go :: Maybe (Tree a) -> TreeT (MaybeT Identity) a
    go mt =
        case mt of
          Nothing ->
            TreeT $ MaybeT $ Identity $ Nothing
          -- Gen.filter guarantees that @x@ must satisfy @p@
          Just (TreeT (Identity (NodeT x xs))) ->
            hoist generalize $
              TreeT (Identity (NodeT x (concatMap (flattenTree p) xs)))

flattenTree :: (a -> Bool) -> Tree a -> [Tree a]
flattenTree p (TreeT (Identity (NodeT x xs)))
    | p x       = [TreeT (Identity (NodeT x xs'))]
    | otherwise = xs'
  where
    xs' = concatMap (flattenTree p) xs

It may be worthwhile including this variant in the library.

Just for completeness, here's an example to play with:

example :: Gen Int
example = Gen.filter even (Gen.element [1..10])

example' :: Gen Int
example' = filter' even (Gen.element [1..10])

example'' :: Gen Int
example'' = filter'' even (Gen.element [1..10])

The first one uses the standard filter and produces a huge tree (restarting the generator every time that the integer shrinks to an odd number, to a maximum of a depth of 100); the second has QuickCheck-like behaviour and produces trees such as

 6
 └╼ 4

 8

10
 ├╼ 6
 │  └╼ 4
 └╼ 8

Finally, example'' produces trees such as

 8
 ├╼ 2
 ├╼ 4
 │  └╼ 2
 ├╼ 4
 │  └╼ 2
 └╼ 6
    ├╼ 4
    │  └╼ 2
    ├╼ 2
    └╼ 4
       └╼ 2

which are near-ideal (except that the flattening may introduce some duplicates).

(For a while I thought we could do better still, and have this pattern be available when when we do have monadic effects, by cleverly only evaluating as much of the tree as we need to to find a root that satisfies the predicate, and then reattaching any nodes we skipped as children of that new root. However, that would result in trees in which nodes lower in the tree are not necessarily "smaller" than their parents, which is of course not a very good shrink tree.)

@edsko
Copy link
Contributor Author

edsko commented Apr 28, 2019

Side note: we could do better still if rather than using MaybeT, to indicate that we might fail to return a tree entirely, instead use Maybe on the elements, and have the evaluation of a generator simply skip Nothings. With that setup we could run the primitive filter (without shrinking), just like I do above, then replace any Just as with Nothing for as in the subtree that don't satisfy the predicate. This would then very naturally lead to shrinking skipping elements, although it does beg the question of when to try a subtree that starts with Nothing. Anyway, this would be a larger change to HH of course, which may well have ramifications of its own. Probably not worth it.

@edsko
Copy link
Contributor Author

edsko commented Apr 28, 2019

Second side note: if (some form of) my filter'' variant above is included it should probably come with a warning that although it leads to better shrinking, this is not free; for the standard variant, a call to filter' will improve the performance of shrinking, as it will stop earlier in the tree; for filter'' that is not the case.

@edsko
Copy link
Contributor Author

edsko commented Apr 28, 2019

Third side note: all of this means that writing a generator for even numbers is not as easy as it seems. For example, consider a generator for natural numbers that produces a tree like

6
|- 5
   |- 4
      |- 3
         |- 2
            |- 1
               |- 0

which would be a correct, if somewhat inefficient, shrinker.

  • With the existing definition of filter, we might start with 6 and end up with a larger number (since we re-run the generator)
  • With my filter' above, this would not shrink at all.
  • With filter'' this would work fine, resulting in
6
|- 4
   |- 2
      |- 0

That would be fine for this case (the efficiency concerns noted above don't really apply), but this only works for pure generators, and so would have a more restrictive type.

@edsko edsko changed the title Improve filter Fix filter Apr 28, 2019
@edsko edsko changed the title Fix filter Improve filter Apr 28, 2019
@edsko
Copy link
Contributor Author

edsko commented Apr 28, 2019

Fourth and final side note: I wasn't sure if this was a bugfix or an improvement :) Still not entirely sure, but I guess it could be argued that trying again from scratch after finding a shrunk value that doesn't satisfy the predicate may be useful; after all, next time we go round we may happen to start with different value that will eventually lead to a smaller final value. Note however that if the tree contains many elements that don't satisfy the predicate, it is very likely we'll actually run the generator 100 times (because we're likely to hit a shrunk value not satisfying the predicate each time), and only at the very last run stop when we hit a value that doesn't satisfy the predicate. In other words, at least for cases where there are many elements being filtered out, my filter' above has almost the same behaviour as the existing filter, but is 100x times faster (or whatever the retry limit for filter is).

@jacobstanley
Copy link
Member

Thanks for the detailed analysis, I will at the very lease be implementing some variant of your filter'.

This implementation is partially a hangover from when Gen was pure and was split in to two data types, Random + Tree. The bug was likely introduced when moving to GenT and merging everything in to one. The looping is only supposed to check the predicate on the root node.

Fwiw, Gen.filter as it is currently is essentially identical to suchThat / suchThatMaybe from QuickCheck.

The F# version implements the intention properly: https://github.com/hedgehogqa/fsharp-hedgehog/blob/master/src/Hedgehog/Gen.fs#L276-L280

Random.bind returns the tree itself and the predicate is checked only against the root. If successful the loop terminates.

The tree is then filtered separately and would result in the same tree as your filter' function I believe.\

I would say this is a bugfix, it was intended to work like filter' but with retries on the root node rather than immediate discard. Similar to suchThat / suchThatMaybe from QuickCheck.

jacobstanley added a commit to jacobstanley/haskell-hedgehog that referenced this issue Apr 29, 2019
jacobstanley added a commit that referenced this issue Apr 29, 2019
erikd pushed a commit to erikd/haskell-hedgehog that referenced this issue Mar 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants