Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow skipping to a specific test number or shrink result #454

Merged
merged 18 commits into from
Aug 21, 2022

Conversation

ChickenProp
Copy link
Contributor

This won't be good to merge as-is, but if the feature is desirable, I'd be happy to help get it ready, including making supporting PRs to other repositories. (hspec-hedgehog will need one, and support for the feature in hspec would be nice.)

Using environment variables, this lets the user either

  • HEDGEHOG_SKIP_TO_TEST: Start testing at a specific test number, and then continue from there. If it fails it'll shrink, if it passes it'll move on to the next test. Or,
  • HEDGEHOG_SKIP_TO_SHRINK: Test at a specific test number and shrink state. If it fails it won't shrink further, if it passes the test will pass. The shrink state is encoded as an alphanumeric string and printed below the test and shrink counts.

Caveat: I don't think it's possible to walk the shrink tree without running tests at all? Please correct me if I'm wrong about that. So in the second case, you run the test at the desired shrink state, plus all its parents in the tree. But you don't run it at any of the shrinks that didn't cause the test to fail.

The value for me is that when retrying tests, they often spend several minutes before they reach the failing test case. This can eliminate almost all of that time. Additionally, I often use print-debugging, and that gets awkward if there's output from passing tests below the output from the failed test. By telling the runner exactly where to stop shrinking, that problem goes away.

@TysonMN
Copy link
Member

TysonMN commented Apr 1, 2022

Caveat: I don't think it's possible to walk the shrink tree without running tests at all? Please correct me if I'm wrong about that.

I think you are partly wrong about that. I want to start by saying though that I don't know Haskell that well. Instead, my experience here is with F# Hedgehog and how I recently implemented a similar feature there.

I will try to explain what I mean by "partly wrong" using Haskell notation/terminology.

In general, the shrink tree is created through user calls to >>= and fmap. Calls to >>= (typically) change the structure of the tree (by making it larger) and calls to fmap (typically) change the data stored at each node in the tree while (always) keeping the structure unchanged. Therefore, in order to walk the final shrink tree, it is necessary to evaluation all calls to >>=. However, it is not necessary to evaluate any calls to fmap after the last call to >>=.

The data stored at each node in the final shrink tree contains all the inputs needed to run the test. "The last call to fmap" maps these inputs to the result of executing the test on those inputs. It is this execution that can be avoided.

To write the best property-based tests, I think the user needs to make use of that fact by putting as much of their testing logic after all their test inputs have been generated. However, I also think it is easy enough to "accidentally" do this. We are all familiar with the test organization

  1. arrange
  2. act
  3. assert

or

  1. given
  2. when
  3. then

For property-based tests, the "correct" test organization is

  1. generate (inputs)
  2. arrange/given
  3. act/when
  4. assert/then

As an example of this, see this test from Elmish.WPF (a project that I maintain):

let! array = GenX.auto<Guid array>

let observableCollection = ObservableCollection<_> ()
let createTracker = InvokeTester2 createAsId
let updateTracker = InvokeTester3 updateNoOp

merge getIdAsId getIdAsId createTracker.Fn updateTracker.Fn observableCollection array

testObservableCollectionContainsDataInArray observableCollection array
test <@ createTracker.Count = array.Length @>
test <@ updateTracker.Count = 0 @>

There is a blank line between each of part of the test. In particular, the initial generate step contains all the calls to let!, which is the F# equivalent of Haskell's assignment within do notation.

I often use print-debugging, and that gets awkward if there's output from passing tests below the output from the failed test.

If you put all of your print statements after the last all to >>=, then you should only see output from the test on the shrunken input.


Anyway, back to the feature in question. In F# Hedgehog, I implemented the feature of only (fully) executing the test on the shrunken inputs in PR hedgehogqa/fsharp-hedgehog#336. The feature was somewhat hard to implement in F# because F# is an eager language. That is, everything is executed as soon as possible by default. Therefore, I had to add .NET's Lazy<> throughout the code in order to delay computation.

In contrast, Haskell is a lazy language. For that reason, I think it should be significantly easier to implement this feature in Haskell. Maybe you have already implemented this in the branch of this PR.

I am not good enough at Haskell to review your code, but I will do what I can to help.

@ChickenProp
Copy link
Contributor Author

Thanks for your comment! I'm glad I'm not the only person thinking this would be useful.

In general, the shrink tree is created through user calls to >>= and fmap. Calls to >>= (typically) change the structure of the tree (by making it larger) and calls to fmap (typically) change the data stored at each node in the tree while (always) keeping the structure unchanged. Therefore, in order to walk the final shrink tree, it is necessary to evaluation all calls to >>=. However, it is not necessary to evaluate any calls to fmap after the last call to >>=.

Yes, agreed on this.

The data stored at each node in the final shrink tree contains all the inputs needed to run the test. "The last call to fmap" maps these inputs to the result of executing the test on those inputs. It is this execution that can be avoided.

Hm, I don't think that's how it works in Haskell's hedgehog, at least?

That is, yes, the inputs needed to run the test are stored at the shrink tree nodes. But I think the test is executed with >>=, not fmap.

A very simple hedgehog test might look like

property $ do
  val <- forAll someGenerator
  assert $ isSorted $ sort val

which desugaring the do notation is

property $ forAll someGenerator >>= (\val -> assert $ isSorted $ sort val)

That is, it does use >>= after the forAll. And I don't think we'd be able to manually rewrite to use fmap. We have (ignoring constraints in types)

someGenerator :: Gen [Int] -- for example
forAll :: Gen a -> PropertyT m a
assert :: Bool -> PropertyT m ()
property :: PropertyT IO () -> Property
fmap :: (a -> b) -> f a -> f b

The property call constrains m in forAll and assert to be IO.

If we did fmap someFunc (forAll someGenerator), that would have type PropertyT m b where f :: [Int] -> b. But for this to be something we can pass to property, we need b ~ (). So f can only be const (), which doesn't help us.

I couldn't confidently prove that there's no way to avoid evaluating. It might be possible to do by taking advantage of Haskell's laziness somehow? But I think it would have to do it while working with >>=.

(Or maybe there are ways to do it while changing the tests somewhat? For example we could have

property2 :: PropertyT IO (TestT IO ())
property3 :: Gen a -> (a -> TestT IO ()) -> PropertyT IO ()

With either of these we could shrink everything needed to generate the TestT IO () without executing that.

But these might be less ergonomic, and perhaps the value of "don't execute intermediate shrinks" isn't that high. Certainly, I don't think I'd suggest adding them in the first rollout of this feature. If it can't be done without them I'd want to see if it's a pain point and then maybe add them.)

If you put all of your print statements after the last all to >>=, then you should only see output from the test on the shrunken input.

To have a concrete example, I've been using this test:

Hspec.it "test shrinking monadic" $ hedgehog $ do
  let genInt = Gen.integral (Range.linear 1 100)
  liftIO $ putStrLn "before"
  a :: Int <- forAll $ genInt
  liftIO $ print a
  b :: Int <- forAll $ genInt
  liftIO $ print (a, b)
  assert $ if b > 10 then a <= 25 else if b == 10 then a <= 20 else True

(This is in hspec because it's what I'm comfortable with. It can run simply in IO with the first line as check $ property $ do ... but then the printing apparently interferes with the console region stuff and gets messed up?)

When I run this with HEDGEHOG_SKIP_TO_SHRINK=43cAdBa2 I get the output

before
42
(42,35)
32
(32,35)
26
(26,35)
(26,18)
(26,14)
(26,11)
(26,10)

That is, the before prints only once, and the print a only executes each time we get a new a. But the print (a, b) executes every time we get a new b.

@TysonMN
Copy link
Member

TysonMN commented Apr 1, 2022

assert :: Bool -> PropertyT m ()

This is your problem. I had the same problem in F# Hedgehog.

In F# Hedgehog, I think that assert function is essentially the same as Property.ofBool.

In my experience, a more conventional (non-property-based) test calls some assert function where failure is indicated by throwing an exception and passing is indicated by not throwing an exception. With F# Hedgehog, the user needs to write their test in this way in order to not execute their tests when rechecking is walking the shrink tree.

In the test I added along with the implementation of this feature in PR hedgehogqa/fsharp-hedgehog#336, I used this =! operator to throw an exception if its two arguments are not equal. That line of code is desugared from within an F# computational expression (the F# (near) equivalent of Haskell's do notation) named property to a call to Property.map via this code. That code is more complicated that one would initially expect because that code also switches from the Gen<> monad to the Property<> monad. The call to != (as well as all the code after the last line starting with let!) is contained within that function f. That function f is passed to Property.map, which executes f within this try-catch block. Any thrown exception is caught and mapped to a representation of a failed test. However, that function f is executed inside of another function g, and g is passed to GenLazy.map, which calls Gen.map and Lazy.map, that latter of which of course doesn't execute the function given to it but instead delays that computation by creating a new Lazy<> instance.

To summarize,

  1. that function f is not executed until the data representing the test outcome is accessed and
  2. in order to avoid executing f while walking the final shrink tree, it is necessary for the outcome representing test failure to be created when f throws an exception (which happens within Property.map) instead of by calling Property.ofBool.

...it is necessary...

Of course though, "necessary" is a strong word. Writing this is helping me to see how one might map a bool to the representation of a failed test. Having an instance of Property<bool> is too late, because Property<bool> is just a Gen<Lazy<Journal * Outcome<bool>>>, which already represents some partial test execution. So mapping that bool to another partial test execution then requires merging them into another partial test execution via Proeprty.bind.

Instead, having an instance of Gen<bool> is soon enough. That bool can be mapped to a (partial) test execution with the appropriate outcome (which has type Journal * Outcome<bool>) at the same time that this Gen<> monad is lifted to Gen<Lazy<>>. That is exactly what happens in that Property.BindReturn method.

But there is still room between those two cases. It should be possible to map Gen<Lazy<Journal * Outcome<bool>>> to Gen<Lazy<Journal * Outcome<unit>>> via a function that maps Journal * Outcome<bool> to Journal * Outcome<unit> and involves merging the partial test executions. I don't think F# Hedgehog has such a function. This makes me feel like we are missing some abstraction. We could give an alias like PartialTestExecution<'a> = Journal * Outcome<'a> and then define bind for it. The logic is essentially the same as here. I will definitely think more about this.

@ChickenProp
Copy link
Contributor Author

Ah, for some context here, I have no existing relationship with this repository, I've opened another PR and a couple of issues but I haven't yet had any code accepted.

I say this because, it sounds like you're suggesting a pretty fundamental overhaul of how hedgehog would be implemented? I confess I'd need to read in more detail to fully understand, I don't know F# myself and I haven't looked at how TestT is implemented in this repo. But I don't have any kind of qualification or authority to make such a change; and even if I did, I'd want to do so independently of this change.

@TysonMN
Copy link
Member

TysonMN commented Apr 4, 2022

...I have no existing relationship with this repository...
...
I say this because, it sounds like you're suggesting a pretty fundamental overhaul of how hedgehog would be implemented?

Yes, I noticed that. I think the Haskell Hedgehog maintainers are open minded, and I expect this change can be made with minimal to no "other" changes. In F# Hedgehog, I had to remove the ability for an instance of Property<bool> to be implicitly converted to an instance of Property<unit>. Instead, I added the function Property.failureOnFalse to do that mapping explicitly.

I confess I'd need to read in more detail to fully understand, I don't know F# myself...

It's ok. I similarly struggle to read Haskell, so we have a programming language barrier. However, we have in common a shared understanding of the domain, which is property-based testing. I have collaborated with the Haskell Hedgehog contributors before to help us implement the same feature in both repositories. I think this is one of the best parts of the Hedgehog community.

...I'd want to do so independently of this change.

I think it might be that the feature I implement in F# Hedgehog is strictly better than what you have implemented in this PR. However, I don't fully understand what you have implemented, so I might be wrong about that.

@ChickenProp
Copy link
Contributor Author

I think it might be that the feature I implement in F# Hedgehog is strictly better than what you have implemented in this PR. However, I don't fully understand what you have implemented, so I might be wrong about that.

So my understanding is that you're proposing something that would let us generate the shrink tree of a test without evaluating the test, or at least not any parts of the test that aren't required to generate it.

What I implement in this PR is a way for the user to tell hedgehog to navigate to a desired place in the shrink tree, to speed up the test-retest cycle.

How I think the two interact is:

  • The feature implemented in this PR works mostly fine without what you propose. What you propose would make it somewhat better; but there are probably other ways we could get those benefits, too.
  • Given what you propose, we don't get the feature this PR implements for free; we still need this PR or one similar.

So it seems to me that the two are fairly independent.

@TysonMN
Copy link
Member

TysonMN commented Apr 5, 2022

So my understanding is that you're proposing something that would let us generate the shrink tree of a test without evaluating the test, or at least not any parts of the test that aren't required to generate it.

When rechecking, yes. That is what I implemented in F# Hedgehog. When checking, of course we must execute the whole test to know if it passes or fails.

What I implement in this PR is a way for the user to tell hedgehog to navigate to a desired place in the shrink tree, to speed up the test-retest cycle.

Yes, that was what I thought. I think the feature I implemented in F# is strictly better. Without user involvement, the rechecking process only fully executes the test on the shrunken input. I expect that is the input a user would direct the code in your branch to execute.

  • Given what you propose, we don't get the feature this PR implements for free; we still need this PR or one similar.

So it seems to me that the two are fairly independent.

Am I wrong though? How would a user reasonably use the code in your branch other than directing it to the shrunken input?

@ChickenProp
Copy link
Contributor Author

So stepping back a bit, I think there are two questions here:

  1. Based on the types, is it possible to get a shrink tree for a test without executing it?
  2. Is there some way to navigate to a specific point on the shrink tree for a test?

This PR is only about the second question, making the answer be "yes" instead of "no". If you can't get a tree without executing the test, then when navigating, you'll execute the test repeatedly, and there's no way around that. But I really think that's a minor flaw.

I think the change you suggested was about the first question.

It sounds like you've implemented the feature in (2) for F#. And because the answer to (1) in F# is "yes", while it's "no" for Haskell, you've got a better implementation. So in that sense what you have in F# is better than what I have here.

But I don't think there's any way around that that doesn't involve making fairly deep changes to Hedgehog. Whether those changes would be good or bad, I think they should be considered entirely separately from this PR. I don't think they're needed for this PR to be useful, and if we make them, we'd still need something like this PR to get the feature I'm interested in.

@TysonMN
Copy link
Member

TysonMN commented Apr 6, 2022

Ok, I have a better understanding now.

Yes, I agree that making it possible for the user to specify a path through the shrink tree is necessary if one also wants to implement the optimized recheck feature of only fully executing the test on the shrunken input.

In F# Hedgehog, when a call to Property.check finds a input on which the test fails, then after shrinking, part of the output includes text like this:

This failure can be reproduced by running:
Property.recheck "0_16946079921875838235_15666946451763157333_10101010101010101010101010101010101010"

That string is parsed by this code. The first number is the size the second and third numbers are the Value and Gamma data of the Seed, and the last number is the ShrinkPath. Of course this path was created by the shrinking process, but the user could call Property.recheck and provide that string after modifying the path with their own choices of 1's and 0's. That sounds like what you have implemented in your branch.

One thing that is different though is how the user provides this data. Your OP says the data is provided via environment variables. Is that idiomatic for Haskell? What are the tradeoffs with providing this type of data as function arguments like I did in F# Hedgehog?

@ChickenProp
Copy link
Contributor Author

One thing that is different though is how the user provides this data. Your OP says the data is provided via environment variables. Is that idiomatic for Haskell? What are the tradeoffs with providing this type of data as function arguments like I did in F# Hedgehog?

It's not remotely idiomatic. I should probably have said in the OP, that's one of the things about this that makes it not ready to go in.

The reason I did it that way, is because I don't use check or recheck myself. I use https://github.com/parsonsmatt/hspec-hedgehog, which in turn uses the internal function checkReport. check and recheck ultimately call out to that too.

So I think the idiomatic thing to do here would be to add an argument to checkReport, expose it to the user through recheck, and then have hspec-hedgehog figure out how to also expose it in a way that works with hspec. That could be through environment variables, or maybe hspec would accept a PR to expose it through command line arguments.

I'm happy to do that work, but first I'd like the go-ahead from someone associated with this repo. If they're happy to merge the feature when it's ready, I can try to get it ready. If that's not going to happen, then what I have right now works for me.

That string is parsed by this code. The first number is the size the second and third numbers are the Value and Gamma data of the Seed, and the last number is the ShrinkPath. Of course this path was created by the shrinking process, but the user could call Property.recheck and provide that string after modifying the path with their own choices of 1's and 0's. That sounds like what you have implemented in your branch.

Yeah, Haskell hedgehog has a similar message, though with separate parameters for size and seed, and obviously no shrink path yet. I expect I'd add the shrink path as a third parameter under a Maybe, to allow both "use this size and seed, then shrink as normal" and "use this size and seed, jump straight to this point in the shrink tree and stop there".

@jacobstanley
Copy link
Member

jacobstanley commented May 22, 2022

I'm happy to do that work, but first I'd like the go-ahead from someone associated with this repo. If they're happy to merge the feature when it's ready, I can try to get it ready. If that's not going to happen, then what I have right now works for me.

Yes we definitely want this 🎉

I haven't caught up completely on the conversation so forgive me if anything was already discussed.

Environment Variables

No matter which way we go the environment variables are probably handy, we already have some related things so it wouldn't be unidiomatic for hedgehog.

Current set of env vars read by Hedgehog.Internal.Config:

  • HEDGEHOG_VERBOSITY
  • HEDGEHOG_WORKERS
  • HEDGEHOG_COLOR
  • HEDGEHOG_SEED

We'd probably want to combine it into a single var like with HEDGEHOG_SEED I think. You have a similar comment in the code for the function argument.

One weird thing is also that it really only applies to one test and we'd be setting it for all tests. We could perhaps include the test name and use * for the currently behaviour? Idk something to ponder. Running a single test would be a useful feature in its own right for the default hedgehog runner.

Workflow

My personal workflow is to run the tests :: IO Bool function for the module in ghci after a change. I have a vim binding which makes this a key combo.

I think for that style it makes sense to add a new entry to PropertyConfig and another withXXX combinator for skipping.

We currently have the following:

  • withTests
  • withDiscards
  • withShrinks
  • withRetries

Those are all intended to be permanent options so this would be kinda new in that regard, but I think it'd be quite usable.

It would mean we can change the failure error message to include code for pasting to reproduce the failure. That's a killer feature for sure.

Happily this would make it work with no changes to hspec-hedgehog but maybe it'd still be valuable for them to support some improved ergonomics around it. I don't use hspec so I'm not sure what's appropriate here.

@jacobstanley jacobstanley force-pushed the master branch 3 times, most recently from 4139585 to c228279 Compare May 22, 2022 14:42
@ChickenProp
Copy link
Contributor Author

Awesome! I'll try to work on this over the coming week or so.

Environment - sure, I'm happy to keep this as an option. If it's in one var, perhaps we want a type

data Skip = SkipToTest TestCount | SkipToShrink TestCount ShrinkPath

and a variable HEDGEHOG_SKIP. We want to distinguish between SkipToTest n and SkipToShrink n (ShrinkPath []), but we can put a : in the shrink path for that. So

# Skip to test 83, shrink from there.
# Currently HEDGEHOG_SKIP_TO_TEST=83
HEDGEHOG_SKIP=83

# Skip to test 83, don't shrink
# Currently HEDGEHOG_SKIP_TO_SHRINK=83
HEDGEHOG_SKIP=83:

# Skip to a specific shrink of test 83, don't shrink further
# Currently HEDGEHOG_SKIP_TO_SHRINK=83a4B
HEDGEHOG_SKIP=83:a4B

Workflow - So to check, you're thinking the "this failure can be reproduced by" would change from

recheck size seed <property>

to something like

recheck size seed $ withSkip "83:a4B" <property>

? I kinda think it feels more natural as an argument to recheck, so

recheck size seed (Just "83:a4B") <property>

But either can work, and I don't feel strongly.

(Either of these is kind of assuming an IsString instance for Skip, which is a bit dubious because it would have to error on some inputs. I think it's fine?)

(Perhaps one question to ask is, supposing that the runner does get a way to only run one test, where would that go? PropertyConfig feels off here. RunnerConfig isn't used much, but it kinda fits, and already has a seed and a size.)

It's true that if we put it in PropertyConfig, then (as long as the default is "take from environment") hspec-hedgehog shouldn't need any changes. If it becomes an extra argument to checkReport then hspec-hedgehog would need changes, since that's the function it uses. They'd be easy though, presumably hedgehog would have a resolveSkip function that checks the environment and hespec-hedgehog would use that.

I guess we need some way to distinguish "take the skip from the environment" from "don't skip", which might mean a Maybe (Maybe Skip). Or perhaps Skip gets a SkipNothing constructor.

I also removed the period from the end of the "shrink path:" line. At
least in my terminal, if I double click the path, I highlight and copy
the whole thing, including the trailing period. So it's slightly more
convenient to not have it there. (Though I can also double click the
path inside quotes. Maybe this line isn't useful at all now?)
Copy link
Contributor Author

@ChickenProp ChickenProp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few specific questions, but I think this is ready for review now.

Comment on lines 385 to 392
-- | The path taken to reach a shrink state.
--
-- This is in reverse order, most recent path element first, for efficient
-- appending.
--
newtype ShrinkPath =
ShrinkPath [Int]
deriving (Eq, Ord, Show, Lift)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being in reverse order is awkward, and having to know this fact while interfacing with it is awkward too. I think there are two independent questions here:

First, what should ShrinkPath wrap? Some options:

  • Dlist Int. Requires dlist as a dependency, but offers O(1) snoc.
  • [Int] -> [Int], which is what a DList Int is under the hood.
  • [Int] as currently, but keep it in order. snoc becomes O(n), which means constructing the full list is O(n^2). Maybe n is small enough that that doesn't matter? I've seen it be in the hundreds, so my guess is this is worth some complexity to avoid, but I don't have strong intuitions about that.
  • Keep as-is.

Second, how should it be exposed and interfaced with? Some options:

  • Expose constructor as currently.
  • Hide the constructor, offer functions like shrinkPathToList, shrinkPathFromList, shrinkPathSnoc.
  • Hide the constructor, put it in its own module. Name the functions simply toList, fromList, snoc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd give it a try with the [Int] in the right order as this is the config API.

It looks like the the snoc only occurs in takeSmallest so you could always switch to a reversed list only inside that function if you think it could be an issue.

In what situations do think people will interact with ShrinkPath? That will guide what we should expose. You can see TestLimit and the like do not have their constructors exposed.

I'm always cautious to expose anything that's not needed as I'm pretty strict with maintaining backwards compatibility and making upgrades easy on users. They always have the option to go to the internal modules for advanced use cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, I've put it in order and gone from three uses of reverse to just two, in takeSmallest. I agree that's better than before.

I wasn't actually thinking of exposing the constructor to users, I can't think of a reason they'd use it. I was more thinking of how other internal parts would interface with it.

Comment on lines 175 to 179
detectSkip :: MonadIO m => m Skip
detectSkip =
liftIO $ do
menv <- (skipDecompress =<<) <$> lookupEnv "HEDGEHOG_SKIP"
pure $ fromMaybe SkipNothing menv
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid values here are silently ignored. That's consistent with the others, but I kind of feel like an error would be friendlier?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah all of the env vars are a bit sketchy like that! so yeah add an error message to this one 👍

This whole module probably needs a bit of an overhaul so we just read and parse all the env vars once and report warnings/errors a single time.

I'm happy for you to just error out though for this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to an error here.

Comment on lines 377 to 383
instance IsString Skip where
fromString s =
case skipDecompress s of
Nothing ->
error $ "Not a valid Skip: " ++ s
Just skip ->
skip
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use this instance to suggest withSkip "3:aBc" to the user for recheck. It's maybe a bit overkill? And if the user doesn't have OverloadedStrings they're out of luck.

I don't want to add anything complicated to that line. withSkip (fromJust $ skipDecompress "3:aBc") feels bad for example. But perhaps an extra withSkipStr function would work, which could either error or default to SkipNothing if decoding fails. I prefer error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's all good, I can't see it causing any problems, just make sure the compiler output makes the problem obvious if you do give an invalid string. I suspect the fromJust would be way worse! haha

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's a runtime error, not compile-time. (A compile-time IsString would be great.) But I've added fromString to the error message to make it a bit clearer where it's coming from.

Comment on lines +1247 to +1251
-- | Set the target that a property will skip to before it starts to run.
--
withSkip :: Skip -> Property -> Property
withSkip s =
mapConfig $ \config -> config { propertySkip = Just s }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no way to set this back to Nothing. I can't immediately think why that would be useful, but do we want it anyway? If we have withSkipStr then I think withSkip could reasonably take a Maybe Skip.

Copy link
Member

@jacobstanley jacobstanley Jun 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nah I think you just wouldn't include the withSkip line

@ChickenProp ChickenProp marked this pull request as ready for review May 31, 2022 15:39
@jacobstanley
Copy link
Member

jacobstanley commented Jun 2, 2022

? I kinda think it feels more natural as an argument to recheck

Yeah I mean the size+seed+skip do belong together. If we extended this it'd need to be a new function name to keep backwards compatibility.

I think a good workflow would be to paste something (i.e. withSkip) into the test code directly so that it can be re-run over and over. Whatever gets pasted is going to need to include the size/seed as well. In that workflow you wouldn't use recheck at all.

@ChickenProp
Copy link
Contributor Author

ChickenProp commented Jun 6, 2022

To clarify - I'm not sure if you're asking for me to add a new function similar to recheck but with an additional argument for the skip; or asking for a different workflow (I'm not sure what it would look like); or just sort of thinking out loud?

What I currently have replaces

recheck (size) (seed) <property>

with

recheck (size) (seed) $ withSkip "..." <property>

Though, uh, come to think of it... I don't know if I tested that, and it seems like it probably won't work?

IIUC, recheck works by taking the size of the failed test, and the seed used for the failed test, and using those as a starting point. (And if that test passes, it'll keep going. Potentially it could test at sizes that check never would. E.g. with NoConfidenceTermination 10, check will run 10 tests from sizes 0 to 9. If you use recheck starting with size 8, and it passes, you'll do sizes 8 to 17. I think? I missed that recheck uses withTests 1 to avoid that.)

But withSkip assumes that you're replaying given a fixed initial seed, and then the seed for the test you're replaying is found by splitting the RNG the same number of times as when we first ran the test. In my workflow, we get the initial seed from hspec's failure message and then use a --seed command line argument when replaying.

So if we use recheck with withSkip, we'll use a different starting point and test at a different size and seed, and won't replay the thing that failed.

I'm not sure the best way to reconcile these, so that both the hedgehog and hspec workflows can work.

I think the way I'd do this is to make the failure report print out the initial seed, not the final seed. (I guess the simple way to do this would involve changing failureSeed in FailureReport to reportSeed in Report.) That would look the same from the user's perspective. The seeds it gives would no longer work with recheck, but we'd no longer be suggesting recheck to users, so that seems fine. (Perhaps we deprecate and later remove it? I imagine it's mostly only used temporarily, doesn't get committed much.)

(Another benefit of that: I think the user could take the Seed given in the failure report, rerun tests with HEDGEHOG_SEED set to that, and see the same failure; which I think currently won't work.)

Makes it slightly clearer where it's coming from.
The previous message would suggest

    recheck (size) (seed) $ withSkip "(skip)" <property>

but this didn't work. `recheck` takes the size and seed of the failed
test, but `withSkip` assumes you're replaying from a given initial seed
at size 0.

We now suggest

    recheckAt (seed) "(skip)" <property>

where the seed is now the initial seed, not the seed that eventually
failed.

This also means that if you see that message, you can rerun with

    HEDGEHOG_SEED='...'

or

    HEDGEHOG_SEED='...' HEDGEHOG_SHRINK='...'

to reproduce the failure, instead of having to edit the source code.
Previously this wouldn't work; `HEDGEHOG_SEED` has to take the initial
seed, which was completely lost.

Since `FailureReport` no longer has a seed, it's now impossible to
generate a `recheck` line that will work, with or without `withSkip`.
But we no longer suggest `recheck` to users.
It was no longer used for anything.
@ChickenProp
Copy link
Contributor Author

ChickenProp commented Jun 7, 2022

I think the way I'd do this is to make the failure report print out the initial seed, not the final seed.

I've done this, using a new function recheckAt :: Seed -> Skip -> Property -> m (). I checked and it does allow the "rerun with HEDGEHOG_SEED" workflow.

I currently have no more changes planned, if you want to take another look. (The CI failure looks unrelated, to me.)

@ChickenProp
Copy link
Contributor Author

@jacobstanley gentle ping - have you managed to take another look yet?

Copy link
Member

@moodmosaic moodmosaic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @ChickenProp 👍 💯 🥇

@moodmosaic
Copy link
Member

Also, this PR brings us one step closer to @TysonMN's hedgehogqa/fsharp-hedgehog#336, wrt shrink trees.

Nothing ->
-- It's clearer for the user if we error out here, rather than
-- silently defaulting to SkipNothing.
error "HEDGEHOG_SKIP is not a valid Skip."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

failureSize :: !Size
, failureSeed :: !Seed
, failureShrinks :: !ShrinkCount
failureShrinks :: !ShrinkCount
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is replacing failureSize and failureSeed with ShrinkPath backwards compatible? Happy to try this branch on a few fairly sized hedgehog test-suites to find out, but I thought I should ask anyways. 👀

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps hspec-hedgehog and/or tasty-hedgehog rely on FailureReport(?)

I think if they do, it'll be probably the locationXyz fields, so this change isn't going to break anything. /cc @parsonsmatt @dalaing @gwils

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're only using failureLocation in hspec-hedgehog - this shouldn't break us. Thanks for tagging me in 😄

Copy link
Member

@moodmosaic moodmosaic Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@parsonsmatt, cool!

OTOH, it looks like tasty-hedgehog might be affected, though. @gwils @dalaing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this will break us. I'm happy to change to the new type and revise affected older versions. No worries :)
Since there are a couple of libraries downstream though, could we consider perhaps un-"internal"-ing this in a future release? It would be nice (from my perspective at least) if this API were covered under the PVP version number.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In F# Hedgehog, I first changed recheck to only take a string: the seed and size are encoded as a single string (by separating them with an underscore). Then I added shrinkPath to this encoding, which maintains binary compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so you can't derive them from it. Those precise data points are no longer available. FailureReport.failureSeed has been replaced with Report.reportSeed, but it's a different seed.

The idea is that instead of calling

recheck (failureSize failureReport) (failureSeed failureReport) prop

you'd call one of

recheckAt (reportSeed report) (SkipToTest $ reportTests report)
recheckAt (reportSeed report) (SkipToShrink (reportTests report) (failureShrinkPath failureReport))

The first mostly reproduces the current behaviour, with the advantage that the seed shown to the user can be passed into HEDGEHOG_SEED without changing the code at all. (It's not an exact reproduction, because if the given test case passes this time, the new version will check the remaining test cases, while the old version would only have run that single case.)

The second uses the new behavior, following the shrink path.


I see tasty-hedgehog doesn't use recheck, but it'll be the same principle. I think the thing to do will be

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @ChickenProp, and @TysonMN, for the input here. 👍

@gwils, if you can change to the new type and revise affected older versions, I can merge, and do a release on Hackage.

We can add FailureReport to the public API (to be covered by PVP). But in any case, I don't expect another breaking change in this type any time soon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like our bounds are conservative enough already that you should be right to release. Go ahead!
I've created an issue on tasty-hedgehog to support the functionality here qfpl/tasty-hedgehog#61

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, thank you @gwils!

@moodmosaic moodmosaic merged commit f670bdf into hedgehogqa:master Aug 21, 2022
@moodmosaic
Copy link
Member

Thank you, @ChickenProp! 👍🦔🐾

@moodmosaic
Copy link
Member

Available in hedgehog 1.2.

ChickenProp added a commit to ChickenProp/haskell-hedgehog that referenced this pull request May 25, 2023
In hedgehogqa#454 we introduced test skipping, with the idea that a failed test
could report a way for you to jump back to reproduce it without all the
preceding tests.

But it didn't work if any of the preceding tests had been discarded,
because each discard also changes the seed and the size. Users could
manually add the discard count to the test count in the `Skip`, but
that's no fun. Plus, it wouldn't work if the test count plus discard
count exceeded the test limit, because that would generate a success
without running any tests.

So now a `Skip` (other than `SkipNothing`) includes a `DiscardCount` as
well as a `TestCount`. It's rendered in the compressed path as
`testCount/discardCount`, or just `testCount` if `discardCount` is 0.
The exact sequence of passing tests and discards doesn't affect the
final seed or size, so the counts are all we need.

This changes an exposed type, so PVP requires a major version bump.
ChickenProp added a commit to ChickenProp/haskell-hedgehog that referenced this pull request May 25, 2023
Closes hedgehogqa#487.

In hedgehogqa#454 we introduced test skipping, with the idea that a failed test
could report a way for you to jump back to reproduce it without all the
preceding tests.

But it didn't work if any of the preceding tests had been discarded,
because each discard also changes the seed and the size. Users could
manually add the discard count to the test count in the `Skip`, but
that's no fun. Plus, it wouldn't work if the test count plus discard
count exceeded the test limit, because that would generate a success
without running any tests.

So now a `Skip` (other than `SkipNothing`) includes a `DiscardCount` as
well as a `TestCount`. It's rendered in the compressed path as
`testCount/discardCount`, or just `testCount` if `discardCount` is 0.
The exact sequence of passing tests and discards doesn't affect the
final seed or size, so the counts are all we need.

This changes an exposed type, so PVP requires a major version bump.
ChickenProp added a commit to ChickenProp/haskell-hedgehog that referenced this pull request Jul 26, 2023
Closes hedgehogqa#487.

In hedgehogqa#454 we introduced test skipping, with the idea that a failed test
could report a way for you to jump back to reproduce it without all the
preceding tests.

But it didn't work if any of the preceding tests had been discarded,
because each discard also changes the seed and the size. Users could
manually add the discard count to the test count in the `Skip`, but
that's no fun. Plus, it wouldn't work if the test count plus discard
count exceeded the test limit, because that would generate a success
without running any tests.

So now a `Skip` (other than `SkipNothing`) includes a `DiscardCount` as
well as a `TestCount`. It's rendered in the compressed path as
`testCount/discardCount`, or just `testCount` if `discardCount` is 0.
The exact sequence of passing tests and discards doesn't affect the
final seed or size, so the counts are all we need.

This changes an exposed type, so PVP requires a major version bump.
moodmosaic pushed a commit that referenced this pull request Aug 1, 2023
Closes #487.

In #454 we introduced test skipping, with the idea that a failed test
could report a way for you to jump back to reproduce it without all the
preceding tests.

But it didn't work if any of the preceding tests had been discarded,
because each discard also changes the seed and the size. Users could
manually add the discard count to the test count in the `Skip`, but
that's no fun. Plus, it wouldn't work if the test count plus discard
count exceeded the test limit, because that would generate a success
without running any tests.

So now a `Skip` (other than `SkipNothing`) includes a `DiscardCount` as
well as a `TestCount`. It's rendered in the compressed path as
`testCount/discardCount`, or just `testCount` if `discardCount` is 0.
The exact sequence of passing tests and discards doesn't affect the
final seed or size, so the counts are all we need.

This changes an exposed type, so PVP requires a major version bump.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants