Replies: 3 comments 6 replies
-
For Lean 3, I don't think there is an easy way to do that, as the For Lean 4, it should be a fairly straightforward change to ExtractData.lean. Currently, in |
Beta Was this translation helpful? Give feedback.
-
Hi @antonkov I made a fork of the LeanDojo example repo here with some basic examples: https://github.com/AG161/lean4-example/blob/main/Lean4Example.lean
In the first theorem, the
The second theorem returns this:
Here the tactic combinator <;> has Lean first apply
This theorem returns the following tactic tree:
The way the A final example (maybe @yangky11 can take a look at this one example, I found the result perplexing and I'm not sure if this is intended):
Here the combined tactic continuously goes through the listed tactics until there are no goals left and applies the first one which works, then starts over. For this one
The first tactic in the list looks like what I expected the entire thing to look like - the combined tactic solves the entire goal. However, there are several more tactics that appear in the list which have no goals before and after. At a certain point in the list, it looks like it is tracking which tactics were selected out of the combined tactic (where it says In principle, it might not be reasonable to get the "true" AST from any proof because someone could define their own arbitrarily complex tactics. But in practice, for most of the things in mathlib4 I imagine just decomposing the basic tactic combinators Lean provides would allow us to see the "largest" and most informative AST of the theorem. Anyway, that's what I was thinking about. |
Beta Was this translation helpful? Give feedback.
-
Thanks @AG161, the example
will be parsed correctly by https://github.com/Paper-Proof/paper-proof parser (please take a look if you're curious) but Paper Proof somehow chokes on the second example too - I'll take a look to see what's the bug Paper proof parser works more in a way you expect/describe and parses elaboration tree, (Lean InfoTree). LeanDojo parser parses InfoTree for tactic nodes too but tries to overlay InfoTree tactics over initial Syntax (therefore Regarding the The PR for rewrite's is ready - I can take a look at other tactic combinators @yangky11 let me know what you think on the format for trace.xml |
Beta Was this translation helpful? Give feedback.
-
I'm trying to use LeanDojo to extract the ASTs of mathlib theorems for LLM training data, and I have a concern about combined tactics obfuscating the full AST of a proof. E.g. the
get_single_tactic_proof
in https://github.com/lean-dojo/LeanDojo/blob/main/src/lean_dojo/data_extraction/traced_data.py can combine a whole proof into a single tactic. In this case, the before and after state of the tactic is just the goal and then no goals. That's an extreme case but tactic combinators are quite common, particularly sequences of rewrites. What I'd like to be able to do is get the before and after state for each tactic that comprises some larger combined tactic. Essentially the opposite ofget_single_tactic_proof
in that it decomposes the entire proof to its smallest tactic components, giving me the "biggest" AST of the theorem.Anyway I just wanted to ask how involved would something like that be, given the way LeanDojo works? Would it be some minor modifications of https://github.com/lean-dojo/LeanDojo/blob/main/src/lean_dojo/data_extraction/ExtractData.lean or would it require a major rework? I'm only asking because I'm not really familiar with metaprogramming in Lean. Thanks!
Beta Was this translation helpful? Give feedback.
All reactions