Improve `filter` #281

edsko · 2019-04-28T13:07:57Z

At the moment, filter is defined like

filter :: MonadGen m => (a -> Bool) -> m a -> m a
filter p gen =
  let
    try k =
      if k > 100 then
        discard
      else do
        x <- scale (2 * k +) gen
        if p x then
          pure x
        else
          try (k + 1)
  in
    try 0

I think this is not quite right. The problem is that this is using the monadic interface to the generator; therefore the check if p x then ... else .. gets applied at every level of the shrink tree for x. That's not correct: that means that the entire generator is re-run every time a shrunk value happens not to satisfy the predicate, at which point we'd basically start over. This leads to large shrink trees and consequently bad performance (https://stackoverflow.com/questions/54412108/why-the-does-this-shrink-tree-looks-the-way-it-does-when-using-filter). Indeed, it means that we may end up "shrinking" a value to a larger one.

Instead what should happen after we find a value that satisfies the predicate, we should remove any shrunk values that don't satisfy the predicate. Just as a proof of concept, this is easily defined in terms of the existing filter:

filter' :: MonadGen m => (a -> Bool) -> m a -> m a
filter' p gen = Gen.Internal.ensure p $
                  join (snd <$> Gen.filter (p . fst) (Gen.freeze gen))

(Not entirely sure why ensure is not exported?). I'm calling this a proof of concept because what should probably happen instead is that filter should be changed to behave like filter' (or at least the existing filter should not be the default).

The definition of filter' is not quite ideal. The problem is that ensure will stop as soon as it finds one shrunk value that doesn't satisfy the predicate, rather than skipping to the next level in the tree. However, QuickCheck does this too; given

newtype Uniform = Uniform Int
  deriving (Show)

instance Arbitrary Uniform where
  arbitrary = Uniform <$> choose (0, 100)
  shrink (Uniform n) = Uniform <$> shrink n

we get

> quickCheck (\(Uniform n) -> even n)
*** Failed! Falsified (after 1 test):                  
Uniform 31

Nonetheless, I still don't think that it makes much sense to re-run the entire generator when shrinking happens to hit on an element that doesn't satisfy the predicate.

Variant for better shrinking

(The below is independent from the bugfix above.)

For a while I thought that we could do better in HH, given that we have the entire shrink tree available. The idea would be that we can "flatten" the tree, replacing a value that doesn't satisfy the predicate with its children; i.e., we could go from

4 --- 1
  |
  --- 3 --- 2
        |
        --- 0

to

4 --- 2
  |
  --- 0

However, unfortunately the monad stack gets in the way. We start with a tree whose root satisfies the predicate (this would be guaranteed by the primitive filter function, the one that doesn't shrink). Now we need to decide for each of the children of that root whether to include those children as is, or replace them by their children (the root's grandchildren). The trouble is that that would require us to run the effects required to evaluate those children; worse, the problem repeats the next level down and so it would mean that we'd evaluate all effects in the entire tree always, which would obviously be disastrous (we certainly don't want to evaluate the entire tree if we can help it).

However, we can do this for pure generators:

filter'' :: forall a. (a -> Bool) -> Gen a -> Gen a
filter'' p gen = withGenT (mapGenT (go . runDiscardEffect)) $
                   join (snd <$> Gen.filter (p . fst) (Gen.freeze gen))
  where
    go :: Maybe (Tree a) -> TreeT (MaybeT Identity) a
    go mt =
        case mt of
          Nothing ->
            TreeT $ MaybeT $ Identity $ Nothing
          -- Gen.filter guarantees that @x@ must satisfy @p@
          Just (TreeT (Identity (NodeT x xs))) ->
            hoist generalize $
              TreeT (Identity (NodeT x (concatMap (flattenTree p) xs)))

flattenTree :: (a -> Bool) -> Tree a -> [Tree a]
flattenTree p (TreeT (Identity (NodeT x xs)))
    | p x       = [TreeT (Identity (NodeT x xs'))]
    | otherwise = xs'
  where
    xs' = concatMap (flattenTree p) xs

It may be worthwhile including this variant in the library.

Just for completeness, here's an example to play with:

example :: Gen Int
example = Gen.filter even (Gen.element [1..10])

example' :: Gen Int
example' = filter' even (Gen.element [1..10])

example'' :: Gen Int
example'' = filter'' even (Gen.element [1..10])

The first one uses the standard filter and produces a huge tree (restarting the generator every time that the integer shrinks to an odd number, to a maximum of a depth of 100); the second has QuickCheck-like behaviour and produces trees such as

 6
 └╼ 4

 8

10
 ├╼ 6
 │  └╼ 4
 └╼ 8

Finally, example'' produces trees such as

 8
 ├╼ 2
 ├╼ 4
 │  └╼ 2
 ├╼ 4
 │  └╼ 2
 └╼ 6
    ├╼ 4
    │  └╼ 2
    ├╼ 2
    └╼ 4
       └╼ 2

which are near-ideal (except that the flattening may introduce some duplicates).

(For a while I thought we could do better still, and have this pattern be available when when we do have monadic effects, by cleverly only evaluating as much of the tree as we need to to find a root that satisfies the predicate, and then reattaching any nodes we skipped as children of that new root. However, that would result in trees in which nodes lower in the tree are not necessarily "smaller" than their parents, which is of course not a very good shrink tree.)

The text was updated successfully, but these errors were encountered:

edsko · 2019-04-28T13:16:26Z

Side note: we could do better still if rather than using MaybeT, to indicate that we might fail to return a tree entirely, instead use Maybe on the elements, and have the evaluation of a generator simply skip Nothings. With that setup we could run the primitive filter (without shrinking), just like I do above, then replace any Just as with Nothing for as in the subtree that don't satisfy the predicate. This would then very naturally lead to shrinking skipping elements, although it does beg the question of when to try a subtree that starts with Nothing. Anyway, this would be a larger change to HH of course, which may well have ramifications of its own. Probably not worth it.

edsko · 2019-04-28T13:20:44Z

Second side note: if (some form of) my filter'' variant above is included it should probably come with a warning that although it leads to better shrinking, this is not free; for the standard variant, a call to filter' will improve the performance of shrinking, as it will stop earlier in the tree; for filter'' that is not the case.

edsko · 2019-04-28T13:33:56Z

Third side note: all of this means that writing a generator for even numbers is not as easy as it seems. For example, consider a generator for natural numbers that produces a tree like

which would be a correct, if somewhat inefficient, shrinker.

With the existing definition of filter, we might start with 6 and end up with a larger number (since we re-run the generator)
With my filter' above, this would not shrink at all.
With filter'' this would work fine, resulting in

6
|- 4
   |- 2
      |- 0

That would be fine for this case (the efficiency concerns noted above don't really apply), but this only works for pure generators, and so would have a more restrictive type.

edsko · 2019-04-28T13:54:05Z

Fourth and final side note: I wasn't sure if this was a bugfix or an improvement :) Still not entirely sure, but I guess it could be argued that trying again from scratch after finding a shrunk value that doesn't satisfy the predicate may be useful; after all, next time we go round we may happen to start with different value that will eventually lead to a smaller final value. Note however that if the tree contains many elements that don't satisfy the predicate, it is very likely we'll actually run the generator 100 times (because we're likely to hit a shrunk value not satisfying the predicate each time), and only at the very last run stop when we hit a value that doesn't satisfy the predicate. In other words, at least for cases where there are many elements being filtered out, my filter' above has almost the same behaviour as the existing filter, but is 100x times faster (or whatever the retry limit for filter is).

jacobstanley · 2019-04-28T20:48:50Z

Thanks for the detailed analysis, I will at the very lease be implementing some variant of your filter'.

This implementation is partially a hangover from when Gen was pure and was split in to two data types, Random + Tree. The bug was likely introduced when moving to GenT and merging everything in to one. The looping is only supposed to check the predicate on the root node.

Fwiw, Gen.filter as it is currently is essentially identical to suchThat / suchThatMaybe from QuickCheck.

The F# version implements the intention properly: https://github.com/hedgehogqa/fsharp-hedgehog/blob/master/src/Hedgehog/Gen.fs#L276-L280

Random.bind returns the tree itself and the predicate is checked only against the root. If successful the loop terminates.

The tree is then filtered separately and would result in the same tree as your filter' function I believe.\

I would say this is a bugfix, it was intended to work like filter' but with retries on the root node rather than immediate discard. Similar to suchThat / suchThatMaybe from QuickCheck.

Fix filter shrinking bug #281

edsko changed the title ~~Improve filter~~ Fix filter Apr 28, 2019

edsko changed the title ~~Fix filter~~ Improve filter Apr 28, 2019

charleso mentioned this issue Apr 28, 2019

Fix Gen filter hedgehogqa/scala-hedgehog#80

Open

jacobstanley added a commit to jacobstanley/haskell-hedgehog that referenced this issue Apr 29, 2019

Fix filter shrinking bug hedgehogqa#281

28fb6c8

jacobstanley mentioned this issue Apr 29, 2019

Fix filter shrinking bug #281 #282

Merged

jacobstanley added a commit that referenced this issue Apr 29, 2019

Merge pull request #282 from jystic/topic/filter

61a3f3d

Fix filter shrinking bug #281

jacobstanley closed this as completed May 1, 2019

erikd pushed a commit to erikd/haskell-hedgehog that referenced this issue Mar 2, 2020

Fix filter shrinking bug hedgehogqa#281

6a93269

ChickenProp mentioned this issue Apr 13, 2023

Gen.filter scans entire shrink trees #484

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve `filter` #281

Improve `filter` #281

edsko commented Apr 28, 2019 •

edited

Loading

edsko commented Apr 28, 2019

edsko commented Apr 28, 2019 •

edited

Loading

edsko commented Apr 28, 2019

edsko commented Apr 28, 2019

jacobstanley commented Apr 28, 2019

Improve filter #281

Improve filter #281

Comments

edsko commented Apr 28, 2019 • edited Loading

Variant for better shrinking

edsko commented Apr 28, 2019

edsko commented Apr 28, 2019 • edited Loading

edsko commented Apr 28, 2019

edsko commented Apr 28, 2019

jacobstanley commented Apr 28, 2019

Improve `filter` #281

Improve `filter` #281

edsko commented Apr 28, 2019 •

edited

Loading

edsko commented Apr 28, 2019 •

edited

Loading