Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce time-bounded execution #402

Open
MaximilianAlgehed opened this issue May 14, 2024 · 7 comments
Open

Introduce time-bounded execution #402

MaximilianAlgehed opened this issue May 14, 2024 · 7 comments

Comments

@MaximilianAlgehed
Copy link
Collaborator

Right now the only conventient way to control how long a test runs for in QuickCheck is by modifying maxSuccess in one way or another. However, this is a bit inconvenient when setting up test suites (e.g. hspec) whose properties vary a lot in execution time.

It would be nice to have a flag (e.g. maxTestTime) that tells QuickCheck "please keep testing for this long" to avoid having to set different maxSuccess for different tests (and, more importantly, having to go through the slow and boring process of measuring execution time to figure out what it should be for each individual test!).

There are a couple of questions that need to be resolved when designing this feature, however:

  • How should it interact with withMaxSuccess?
    • The name maxSuccess aside the semantics of withMaxSuccess is currently that other things (checkCoverage) can make the tests run for more tests than maxSuccess and in keeping with this it would probably be correct to keep the same semantics here: run for the maximum of the allocated time and the given maxSuccess.
  • How should it interact with checkCoverage?
    • I think the right solution is probably to do what we do with maxSuccess and keep going even if the coverage checker is happy.
@Bodigrim
Copy link
Contributor

As a maintainer of tasty, I would generally prefer for QuickCheck to focus on "pure" part of property testing and leave dirty IO to a test framework. Measuring time is not that meaningful unless you know and manage a wider landscape, e. g., concurrency level.

I wonder if it could be possible to share a mutable cell (say IORef) between a test framework and QuickCheck, so that a test framework could tell QuickCheck to stop by flipping a flag and QuickCheck could report progress (number of tests executed, etc.).

@MaximilianAlgehed
Copy link
Collaborator Author

@Bodigrim there is really nothing stopping us doing both!

There is at least one other issue (#399) talking about what interface we want for testing frameworks. I think it would be wise for you, @sol, and us to have a discussion (perhaps in that issue?) where we can hash out the details of how we want such an interface to be designed?

@Bodigrim
Copy link
Contributor

@Bodigrim there is really nothing stopping us doing both!

You are the boss here :) Although mind you that it's important that QuickCheck does not gain any more dependencies even such as time, because it will cause circular dependencies when using QuickCheck to test time and its upstream dependencies. Life of maintainers of boot packages is difficult enough already. You can probably get away with System.CPUTime.

I think it would be wise for you, @sol, and us to have a discussion (perhaps in that issue?) where we can hash out the details of how we want such an interface to be designed?

Sorry, I don't have much bandwidth these days to do open-ended design work myself, but feel free to ping me if you'd like to hear my opinion on anything specific.

@sol
Copy link
Contributor

sol commented May 23, 2024

@MaximilianAlgehed so the idea is to have a time budget for each individual test case, right? QuickCheck then would use up the whole time budget. As a consequence, if you set your time budget to 100 ms and you have 100 test cases then your test suite would need 10 seconds to finish.

The issue I see here is that this approach may very well just increase the runtime of your test suite, instead of limiting it. Even if you reduce the time budget to just 10 ms, your test suite would still run for one whole second.

I don't see myself using this.

What I generally try to do:

  1. Keep the runtime of my test suite < 2s. I want to trigger a test run on every modification, which isn't practical if tests take too long (though admittedly, for huge code bases, you will likely need to focus on a sub tree of the test suite to be able to trigger on each modification).
  2. Use a healthy mix of properties and unit tests. If three unit tests can achieve the same as one property, but in a fraction of the time, then that may be a worthwhile trade-off.
  3. Use large values for maxSuccess more as an exploratory tool, rather than as an integral part of my testing strategy.

NB: As for properties where 100 repetitions take more than a couple of milliseconds to complete, this may just indicate that there is something wrong with my code (I probably wanna run this in production more than a 100 times without using an exorbitant amount of resources, right?), or maybe there is something wrong with my testing strategy. To get the maximum benefit from test driven development, my test suite needs to run fast. While in general, I want full test coverage, it doesn't buy me anything if it slows the whole team down. So the challenge here is really, how can I ensure reasonable test coverage without testes that run for 10s of seconds or CI jobs that run for hours.

@MaximilianAlgehed
Copy link
Collaborator Author

@sol the feature you describe already exists in within. What I'm proposing is a feature that says "run this test for as many tests as it takes to run for X seconds". This feature is used heavily in erlang quickcheck because it gives you a neat hook for spreading the actual resource, time, across tests without going through the pain of translating time to number of tests to run.

As for your strategy of only running for a few seconds I personally think is misguided as a general rule but we should support it while also supporting users who want to run for hours (which I also think is misguided) and anything in between.

Having this feature means one can support your strategy while also supporting e.g. nightly tests that can afford to run for longer than 2 seconds.

@sol
Copy link
Contributor

sol commented May 23, 2024

@Bodigrim there is really nothing stopping us doing both!

As I said before, I don't see myself using it. There already is within, which I think can be used to guard against pathological properties.

But if it gets added anyway, that's fine with me too.

There is at least one other issue (#399) talking about what interface we want for testing frameworks. I think it would be wise for you, @sol, and us to have a discussion (perhaps in that issue?) where we can hash out the details of how we want such an interface to be designed?

Hspec mostly achieves what it needs to and I don't have too many free cycles to burn on "making things more beautiful".

If there's something that makes the hspec code "cleaner" then that's great. I still put more value on API stability, so if something happens here, I would prefer a plan that is backwards compatible.

@MaximilianAlgehed
Copy link
Collaborator Author

As I said before, I don't see myself using it. There already is within, which I think can be used to guard against pathological properties.

I repeat, this is not about dealing with pathological properties, it's about adding a convenient way of distributing resources between tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants