-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proof checks should happen after every tactic #7
Comments
I think having proof checks only at the end should be OK as long as we ensure that proofs can be replayed interactively, and that we filter training data to only include valid trajectories --- the same data will be shown to the policy for training, regardless of whether the filtering happens eagerly or at the end. |
I think your point @jesse-michael-han is that as long as good data is used for training, then these proof checks error are really not going to come up and one doesn't need to over-engineer them. I sort of agree that this isn't a huge deal specifically because However, there are still a few points for concern. First, this project doesn't provide any training data. That comes from another project, and it is bit dangerous to assume they will stay in sync in that that other project will only have valid proofs. Further, Lean proofs are not linear, and for other reasons as well they are not easily repayable in lean-gym, so it doesn't make sense to use lean-gym to filter the training data to have valid proofs. (If you doubt me, I challenge you to try validating the data without throwing away a large amount of perfectly good and helpful training data.) |
While I think it doesn't make sense to filter the training data in lean-gym, I suppose one could filter the proofs a bit more in |
Also, let me better explain my worry. Suppose my AI comes up with the following trajectory for this (obviously contrived) example
Everything will succeed until the final trivial. That line will error, even though the goal is Now, if a user knows what they are doing (and that is a big if), they will encounter that particular error, and say "I can't trust anything in the whole trajectory". The best course of action is probably to just ignore this theorem entirely. Now, that might be a bit extreme, but not much. If one is doing a non-heuristic search like BFS, then eventually they may come to a proof not using If like Metamath-GPTf, one uses both successful and failing trajectories for training, then this proof may lead to bad training data. Since this trajectory doesn't work, one might train their AI to think that Obviously, my example is contrived and you can decide if I'm just being alarmist. If one can really make sure that |
One more comment. Obviously, upstream actions effect downstream rewards and this is a common problem in RL, the credit assignment problem. Nonetheless, there is an assumption here, at least within a fixed environment, that how a tactic behaves should only depend on the (text presentation of the) current goal state. So |
I presume you're proposing to simply check for sorries at each step? There's not much more we can do at each step, is there? The check for undefined and all other more advanced checks are all relying on the final proof term validation. |
Hi is there any updates? Thanks |
Right now it seems that all proof checks are at the end. This violates the Markovian assumption of Lean gym since a bad tactic upstream can lead to a failure of the final tactic step. Most of the current checks for
sorry
,undefined
, and maybe some of the others are valid checks to do after every proof step. The only one I know for sure won't work is the metavariable check. That isn't supposed to hold until a proof is complete.The text was updated successfully, but these errors were encountered: