Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - feat: tactic frontend for slim_check #3114

Closed
wants to merge 5 commits into from

Conversation

semorrison
Copy link
Contributor

Adds a tactic front end for slim_check, and provides commands #test and #sample.
Updates all the mathlib3 tests, although many are broken.

There are still some missing parts of Mathlib.Testing.SlimCheck.Sampleable: in particular we can't currently sample from lists, and this explains most of the broken tests.


Open in Gitpod

@semorrison
Copy link
Contributor Author

Closes #1103.

@hrmacbeth hrmacbeth linked an issue Mar 27, 2023 that may be closed by this pull request
@semorrison semorrison added awaiting-review t-meta Tactics, attributes or user commands labels Mar 27, 2023
@hrmacbeth hrmacbeth requested review from robertylewis and removed request for hrmacbeth April 18, 2023 14:22
@hrmacbeth
Copy link
Member

I don't actually know anything about slim_check (though now I'm excited to try it) -- the request on the teaching list had been passed on from @robertylewis so I'm requesting his review instead.

Copy link
Collaborator

@hargoniX hargoniX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry the review request slipped past me :/

I think the change looks good. Leo has also told me that
the wider verification community is very interested in counter
example generation so this is certainly a good step towards it.

Regarding the teaching purposes I will write down a basic explanation
of what it does:

  1. You give it some proof goal or proposition
  2. It tries to generate random examples for inputs of that proposition
  3. Checks if they hold, if a certain amount does it will abort and be happy, if they do not it will instead try to minimize the counter example by a certain shrinking strategy

This general technique is called property testing, it was popularized by a Haskell library called quickcheck if you want to look more into it

So this might be pretty useful for students to figure out they are trying to prove False early on. Note that slim_check in its current state is pretty bare bones though. It barely has instances for the sampling values of types. I think in order to actually use it on a larger scale we would definitely be interested in a deriving handler and in general more base instances.

@eric-wieser
Copy link
Member

in particular we can't currently sample from lists, and this explains most of the broken tests.

Can't the mathlib3 instances be forward-ported here?

@hargoniX
Copy link
Collaborator

hargoniX commented May 5, 2023

It can in principle but the mathlib3 code here looks vastly different from what you would write in Lean 4. I remember trying for like an hour to get it to work before eventually giving up and just PRing what I already head.

@semorrison
Copy link
Contributor Author

I propose merging as is. Instance and derive handlers can be added as there is energy to do so, but likely won't happen if people can't see that slim_check is available in the first place.

(e.g. it might have been useful already here).

@hrmacbeth
Copy link
Member

hrmacbeth commented May 6, 2023

I just tried this out for the first time, it's very cool! Here's a counterexample it couldn't find:

example (a b c d : ℤ) (h1 : a ≤ b) (h2 : d ≤ c) (h4 : 0 ≤ a) (h3 : 0 ≤ d) : a / c ≤ b / d := by
  slim_check -- 2 3 1 0 is a counterexample

Is this to be expected? (eg is Int not sampleable yet?)

Anyway, I'm on board with merging now.

@semorrison
Copy link
Contributor Author

example (a b c d : ℤ) (h1 : a ≤ b) (h2 : d ≤ c) (h4 : 0 ≤ a) (h3 : 0 ≤ d) : a / c ≤ b / d := by
  slim_check -- 2 3 1 0 is a counterexample

I'm actually finding that slim_check finds a counterexample here about half the time, and the other half it prints a message about giving up after ~80 tries.

Using slim_check (config := {numRetries := 100}) for various values of 100 does work better. I guess it would be possible to have slim_check watch the maxHeartbeat clock, and keep retrying while it has time...? Not sure if this belongs here, however.

@hargoniX
Copy link
Collaborator

hargoniX commented May 7, 2023

example (a b c d : ℤ) (h1 : a ≤ b) (h2 : d ≤ c) (h4 : 0 ≤ a) (h3 : 0 ≤ d) : a / c ≤ b / d := by
  slim_check -- 2 3 1 0 is a counterexample

The integers are very much sampleable, the issue is just that slim_check is throwing
randomized values up to a certain (configurable) size (default value 100) at the formula.
That is, unlike something like an SMT solver there is not any intelligence behind
the way that we choose values to test here.

So it might very well be that as soon as the amount of variables rises and there are
many combinations that fulfill the property it does not always find an issue. There
are two ways to try and address this. One is if you are only looking for small counter
examples you can reduce the maximum size for which we search:

example (a b c d : ℤ) (h1 : a ≤ b) (h2 : d ≤ c) (h4 : 0 ≤ a) (h3 : 0 ≤ d) : a / c ≤ b / d := by
  slim_check (config := { maxSize := 10 })

This will force it to only explore a smaller space but with the same amount of tries so
the probability of finding a counter example is higher, in order to increase this probability
even more you can also increase the amount of tries:

example (a b c d : ℤ) (h1 : a ≤ b) (h2 : d ≤ c) (h4 : 0 ≤ a) (h3 : 0 ≤ d) : a / c ≤ b / d := by
  slim_check (config := { maxSize := 10, numInst := 1000 })

Which is now basically guaranteed to find a counter example

Copy link
Member

@jcommelin jcommelin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hargoniX Thanks for the review! 🎉

bors merge

@semorrison semorrison added ready-to-merge This PR has been sent to bors. and removed awaiting-review labels May 10, 2023
bors bot pushed a commit that referenced this pull request May 10, 2023
Adds a tactic front end for `slim_check`, and provides commands `#test` and `#sample`.
Updates all the mathlib3 tests, although many are broken.

There are still some missing parts of `Mathlib.Testing.SlimCheck.Sampleable`: in particular we can't currently sample from lists, and this explains most of the broken tests.





Co-authored-by: Scott Morrison <scott.morrison@gmail.com>
Co-authored-by: Scott Morrison <scott.morrison@anu.edu.au>
@bors
Copy link

bors bot commented May 10, 2023

Pull request successfully merged into master.

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

@bors bors bot changed the title feat: tactic frontend for slim_check [Merged by Bors] - feat: tactic frontend for slim_check May 10, 2023
@bors bors bot closed this May 10, 2023
@bors bors bot deleted the slim_check_frontend branch May 10, 2023 06:33
bors bot pushed a commit that referenced this pull request May 10, 2023
Per [zulip](https://leanprover.zulipchat.com/#narrow/stream/287929-mathlib4/topic/slim_check.20is.20broken/near/356447820).

- [x] depends on: #3114

[![Open in Gitpod](https://gitpod.io/button/open-in-gitpod.svg)](https://gitpod.io/from-referrer/)


Co-authored-by: Scott Morrison <scott.morrison@gmail.com>
Co-authored-by: Scott Morrison <scott.morrison@anu.edu.au>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge This PR has been sent to bors. t-meta Tactics, attributes or user commands
Projects
None yet
Development

Successfully merging this pull request may close these issues.

slim_check frontend
5 participants