Execution errors #50

josojo · 2023-09-01T06:31:21Z

josojo
Sep 1, 2023

I really like this project :)

The paper states that the leandojo environment does not always evaluate the state correctly. In 2.1% of the cases, it produces invalid state transitions even though the state transition is valid.

Why is this the case? Can we help to fix it?

Side question:
I saw the comparison to lean -gym(https://github.com/openai/lean-gym/tree/main). But openai uses the following tool in the newer publications. Has anyone done a comparison to that one:
https://github.com/jesse-michael-han/lean-tpe-public/tree/master

I wanna help build a good foundation for theorem proving, but I don't wanna work on it if others have fixed it already.

yangky11 · 2023-09-02T03:45:36Z

yangky11
Sep 2, 2023
Maintainer

Hi,

The errors fall into a few categories, and we documented a few examples from each category in tests/interaction/test_unexpected_errors.py. You're welcome to try to fix them. However, the 2.1% is from our analysis on Lean 3. The community is moving to Lean 4, and it might be more valuable to perform a similar analysis with Lean 4 and fix any problems there.

I believe lean-tpe-public was developed around the same time as lean-gym, and they share actually share some code.

0 replies

josojo · 2023-09-07T06:19:22Z

josojo
Sep 7, 2023
Author

Thanks for the answer. I am happy to try to do the evaluation for lean4.

Here is one of my scripts: josojo#1.
It works nicely for your example repo. The only issue is that is soooo slow - even with cache repo. I am wondering whether there are smart ways to improve the performance

Here are the logs from my execution that took over 4 minutes on my moderate computer: ``` python scripts/evaluate_lean_4_interaction.py 2023-09-07 20:14:52.900 | DEBUG | lean_dojo.constants::59 - Using GitHub personal access token for authentication 2023-09-07 20:14:59.976 | INFO | __main__:main:61 - Namespace() 2023-09-07 20:15:02.618 | DEBUG | lean_dojo.data_extraction.trace:get_traced_repo_path:139 - The traced repo is available in the cache. 2023-09-07 20:15:02.618 | INFO | lean_dojo.data_extraction.trace:trace:163 - Loading the traced repo from /Users/josojo/.cache/lean_dojo_manual/yangky11-lean4-example-7d711f6da4584ecb7d4f057715e1f72ba175c910/lean4-example 2023-09-07 20:15:03.372 | DEBUG | lean_dojo.data_extraction.traced_data:load_from_disk:1470 - Loading 621 traced XML files from /Users/josojo/.cache/lean_dojo_manual/yangky11-lean4-example-7d711f6da4584ecb7d4f057715e1f72ba175c910/lean4-example with 9 workers 2023-09-07 20:15:06,486 INFO worker.py:1612 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 0%| | 0/621 [00:00:59 - Using GitHub personal access token for authentication 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 621/621 [00:26<00:00, 23.15it/s] (pid=68113) 2023-09-07 20:15:08.000 | DEBUG | lean_dojo.constants::59 - Using GitHub personal access token for authentication [repeated 8x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/ray-logging.html#log-deduplication for more options.) 2023-09-07 20:15:35.521 | DEBUG | lean_dojo.data_extraction.lean:_get_lean4_dependencies:522 - Querying the dependencies of LeanGitRepo(url='https://github.com/yangky11/lean4-example', commit='7d711f6da4584ecb7d4f057715e1f72ba175c910') 2023-09-07 20:15:38.093 | DEBUG | lean_dojo.data_extraction.traced_data:check_sanity:1334 - Checking the sanity of TracedRepo(repo=LeanGitRepo(url='https://github.com/yangky11/lean4-example', commit='7d711f6da4584ecb7d4f057715e1f72ba175c910'), dependencies={'lean4': LeanGitRepo(url='https://github.com/leanprover/lean4', commit='13ca443f058b0e33e44a9fdba347a53330463348')}, root_dir=PosixPath('/Users/josojo/.cache/lean_dojo_manual/yangky11-lean4-example-7d711f6da4584ecb7d4f057715e1f72ba175c910/lean4-example')) 2023-09-07 20:15:38.799 | INFO | __main__:main:71 - Loading the theorems 2023-09-07 20:15:39.389 | INFO | __main__:main:72 - number of theorems in repo 0%| | 0/721 [00:00:59 - Using GitHub personal access token for authentication (RayHelper pid=68489) 2023-09-07 20:15:49.909 | INFO | __main__:_validate_ground_truth:27 - {'theorem': Theorem(repo=LeanGitRepo(url='https://github.com/yangky11/lean4-example', commit='7d711f6da4584ecb7d4f057715e1f72ba175c910'), file_path=PosixPath('Lean4Example.lean'), full_name='hello_world'), 'proof': 'rw [add_assoc, add_comm b, ←add_assoc]'} (RayHelper pid=68489) 2023-09-07 20:15:49.909 | DEBUG | lean_dojo.interaction.dojo:__enter__:193 - Initializing Dojo for Theorem(repo=LeanGitRepo(url='https://github.com/yangky11/lean4-example', commit='7d711f6da4584ecb7d4f057715e1f72ba175c910'), file_path=PosixPath('Lean4Example.lean'), full_name='hello_world') (RayHelper pid=68489) 2023-09-07 20:15:49.910 | DEBUG | lean_dojo.data_extraction.trace:get_traced_repo_path:139 - The traced repo is available in the cache. (pid=68489) 2023-09-07 20:15:43.785 | DEBUG | lean_dojo.constants::59 - Using GitHub personal access token for authentication [repeated 8x across cluster] (RayHelper pid=68489) 2023-09-07 20:15:51.379 | DEBUG | lean_dojo.interaction.dojo:_modify_file:402 - Modifying Lean4Example.lean (RayHelper pid=68489) 2023-09-07 20:15:51.380 | DEBUG | lean_dojo.interaction.dojo:__enter__:227 - Launching the proof using (RayHelper pid=68489) 2023-09-07 20:15:51.380 | DEBUG | lean_dojo.container:run:305 - docker run --cidfile vry0w7iu.cid --rm -u 501 --mount type=bind,src="/private/var/folders/3q/y2wz8sb93q591g0ql084rngm0000gn/T/tmpbiwwad63/lean4-example",target="/workspace/lean4-example" --workdir /workspace/lean4-example yangky11/lean-dojo lake build Lean4Repl (RayHelper pid=68489) 2023-09-07 20:18:12.910 | DEBUG | lean_dojo.container:run_interactive:341 - docker run --cidfile aljfo4fj.cid --rm -u 501 --mount type=bind,src="/private/var/folders/3q/y2wz8sb93q591g0ql084rngm0000gn/T/tmpbiwwad63/lean4-example",target="/workspace/lean4-example" --cpus 1 --memory 16g --workdir /workspace/lean4-example -i yangky11/lean-dojo lake env lean Lean4Example.lean (RayHelper pid=68489) 2023-09-07 20:18:13.185 | DEBUG | lean_dojo.interaction.dojo:_read_next_line:591 - info: syncing channel updates for 'nightly' (RayHelper pid=68489) 2023-09-07 20:18:20.316 | DEBUG | lean_dojo.interaction.dojo:_read_next_line:591 - info: latest update on nightly, lean version nightly-2023-09-07 (RayHelper pid=68489) 2023-09-07 20:18:20.317 | DEBUG | lean_dojo.interaction.dojo:_read_next_line:591 - info: downloading component 'lean' (RayHelper pid=68489) 2023-09-07 20:19:53.048 | DEBUG | lean_dojo.interaction.dojo:_read_next_line:591 - info: installing component 'lean' (RayHelper pid=68489) 2023-09-07 20:20:02.984 | DEBUG | lean_dojo.interaction.dojo:_read_next_line:591 - warning: improperly formatted manifest: incompatible manifest version `4` (RayHelper pid=68489) 2023-09-07 20:20:04.516 | DEBUG | lean_dojo.interaction.dojo:_read_next_line:591 - REPL> {"tacticState": "a b c : Nat\n⊢ a + b + c = a + c + b", "sid": 0, "error": null} (RayHelper pid=68489) 2023-09-07 20:20:04.517 | DEBUG | lean_dojo.interaction.dojo:_submit_request:551 - Request: {"sid": 0, "cmd": "rw [add_assoc, add_comm b, \u2190add_assoc]"} (RayHelper pid=68489) 2023-09-07 20:20:04.598 | DEBUG | lean_dojo.interaction.dojo:_read_next_line:591 - REPL> {"tacticState": "no goals", "sid": 1, "error": null} (RayHelper pid=68489) 2023-09-07 20:20:04.598 | DEBUG | lean_dojo.interaction.dojo:_submit_request:551 - Request: exit (RayHelper pid=68489) 2023-09-07 20:20:05.298 | DEBUG | lean_dojo.interaction.dojo:_read_next_line:591 - (RayHelper pid=68489) 2023-09-07 20:20:05.298 | DEBUG | lean_dojo.interaction.dojo:_cleanup:331 - Cleaning up. (RayHelper pid=68489) 2023-09-07 20:20:05.298 | DEBUG | lean_dojo.interaction.dojo:_cleanup_container:341 - Cleaning up the container. (RayHelper pid=68489) 2023-09-07 20:20:05.344 | DEBUG | lean_dojo.interaction.dojo:_cleanup_tmp_dir:354 - Cleaning up the temporary directory. 2023-09-07 20:20:05.499 | INFO | __main__:main:120 - LeanDojo: 1/1 ```

It looks like most of the time is lost during `lake build` of the new repo in the docker image. If we would also copy over the cache of the build folder into the docker image, the `lake build` could use the lake-cache and only need a fraction of the time. Wdyt? is this possible?

Have you seen other tools like this one: https://github.com/leanprover-community/repl. Maybe it provides a better gym environment for lean 4, or do you think one would run ultimately into the same issues as with lean-gymp 3?

0 replies

yangky11 · 2023-09-09T22:59:33Z

yangky11
Sep 9, 2023
Maintainer

Hi,

LeanDojo wasn't designed to optimize the speed, especially the speed for initializing proof search. I believe there is significant room for improvement. We currently don't have the capacity to work on it, but contributions are welcome and appreciated!

It looks like most of the time is lost during lake build of the new repo in the docker image. If we would also copy over the cache of the build folder into the docker image, the lake build could use the lake-cache and only need a fraction of the time. Wdyt? is this possible?

If you're talking about lake build Lean4Repl, this step itself shouldn't take a long time, since Lean4Repl.lean is just a single file w/o additional dependencies.

From your log, I also see a lot of time spent by elan in downloading the right version of Lean. This is because our Docker image (built using this file) may not have the exact version of Lean required by the Lean repo you're working with. So it tries to download and install Lean every time, which is obviously a waste.

I'm thinking about probably we can make running w/o Docker the default setting, at least for Lean 4. The reason we adopted Docker was that, for Lean 3, LeanDojo needs to change its C++ source code and re-compile, which is very brittle when running w/o Docker. For Lean 4, Docker may not be necessary. When running w/o Docker, the user can pre-install the correct version of Lean so that it does not need to be downloaded/installed every time.

Have you seen other tools like this one: https://github.com/leanprover-community/repl. Maybe it provides a better gym environment for lean 4, or do you think one would run ultimately into the same issues as with lean-gymp 3?

I don't think it suffers from lean-gym's problem. However, you need to set up the environment of a theorem by yourself (e.g., importing libraries and running everything before the theorem). Once you do that, I'm not sure if it will save you any time, since their mechanism for interacting with Lean is quite similar to ours.

Also, I see that you're trying to implement the get_single_tactic_proof function for Lean 4. We tried implementing this function before but found it non-trivial to get correct proofs from Lean 4. If you simply concatenate all tactics extracted by LeanDojo, the result may not be a correct proof, since the tactics may overlap with each other, due to compound tactics. I'd suggest you look at some examples produced by your get_single_tactic_proof to double-check if they are correct.

0 replies

josojo · 2023-09-10T06:18:17Z

josojo
Sep 10, 2023
Author

Nice. Thanks for the response!

Yes, running it "Native" without docker reduces the time significant. Thanks for the hint. The only low-hanging fruit left over optimization is that takes 6 seconds to run lake build Lean4Repl.
I am planning not to build it for every theorem, but just copy the binaries in a separate folder. That should save 6 seconds for each theorem to test. So I would do the following:

Have Lean4Repl in an external project, like this: https://github.com/josojo/Lean4Repl
get the project once before starting the interaction and build the library.
copy the built library into the temporarily created directory with the proof modification
run lake env lean Lean4Example.lean

I tested it roughly and it should work. Would you accept such a PR to save the 6 secs per evaluation? If so, feel free to clone or recreate https://github.com/josojo/Lean4Repl on the LeanDojo repo and I will reference to it.

the result may not be correct proof, since the tactics may overlap with each other, due to compound tactics.

Thank you for warning. I will take a look

1 reply

yangky11 Sep 10, 2023
Maintainer

Great, I'll take a look at making "native" the default when I get a chance.

Regarding Lean4Repl, when initializing the proof search, how do you check if Lean4Repl has already been built? Do you plan to use some kind of cache here? I'm good with this PR if it doesn't change the current interface. I.e., it happens automatically under the hood without the user being aware of it.

josojo · 2023-09-14T07:27:37Z

josojo
Sep 14, 2023
Author

I made a PR as discussed:
josojo#3

Currently it depends on:
https://github.com/josojo/Lean4Repl
but if you clone it or provide an own Lean4Repl in the lean_dojo github repo, I would change it for sure.

0 replies

josojo · 2023-09-19T16:09:46Z

josojo
Sep 19, 2023
Author

I am still working on this. I can now run the tests and did quite some optimization to run it quickly.

But the tests results are not that great :( There are many things that I don't understand. I am posting here the most annoying one, maybe someone can jump in:

For proving: theorem minFacHelper_0:
The lean_dojo code from this repo will run the following code and then gets the following reply:

>{"sid": 0, "cmd": " refine \u27e8by norm_num, by norm_num, ?_\u27e9 \n refine (le_minFac'.mpr \u03bb p hp hpn \u21a6 ?_).resolve_left (Nat.ne_of_gt (Nat.le_of_ble_eq_true h1)) \n rcases hp.eq_or_lt with rfl|h \n \u00b7 simp [(Nat.dvd_iff_mod_eq_zero ..).1 hpn] at h2 \n \u00b7 exact h"}
I am getting:
REPL> {"tacticState": null, "sid": null, "error": "<stdin>:3:20: expected end of input"}

During the investigations, I was running also each line for itself, and then everything works:

REPL> {"tacticState": "n : ℕ\nh1 : ble 2 n = true\nh2 : 1 = n % 2\n⊢ MinFacHelper n 3", "sid": 0, "error": null}
{"sid": 0, "cmd": " refine \u27e8by norm_num, by norm_num, ?_\u27e9 "}
REPL> {"tacticState": "n : ℕ\nh1 : ble 2 n = true\nh2 : 1 = n % 2\n⊢ 3 ≤ minFac n", "sid": 1, "error": null}
{"sid": 1, "cmd": "  refine (le_minFac'.mpr \u03bb p hp hpn \u21a6 ?_).resolve_left (Nat.ne_of_gt (Nat.le_of_ble_eq_true h1))"}
REPL> {"tacticState": "n : ℕ\nh1 : ble 2 n = true\nh2 : 1 = n % 2\np : ℕ\nhp : 2 ≤ p\nhpn : p ∣ n\n⊢ 3 ≤ p", "sid": 2, "error": null}
{"sid": 2, "cmd": "rcases hp.eq_or_lt with rfl|h"}
REPL> {"tacticState": "case inl\nn : ℕ\nh1 : ble 2 n = true\nh2 : 1 = n % 2\nhp : 2 ≤ 2\nhpn : 2 ∣ n\n⊢ 3 ≤ 2\n\ncase inr\nn : ℕ\nh1 : ble 2 n = true\nh2 : 1 = n % 2\np : ℕ\nhp : 2 ≤ p\nhpn : p ∣ n\nh : 2 < p\n⊢ 3 ≤ p", "sid": 3, "error": null}
{"sid": 3, "cmd": " \u00b7 simp [(Nat.dvd_iff_mod_eq_zero ..).1 hpn] at h2"}
REPL> {"tacticState": "case inr\nn : ℕ\nh1 : ble 2 n = true\nh2 : 1 = n % 2\np : ℕ\nhp : 2 ≤ p\nhpn : p ∣ n\nh : 2 < p\n⊢ 3 ≤ p", "sid": 4, "error": null}
{"sid": 4, "cmd": " \u00b7 exact h"}
REPL> {"tacticState": "no goals", "sid": 5, "error": null}

So I am quite clueless as to why the notation \n is not always doing the same as starting a new command. In another language, I would start to a debugger, but doing it in Lean seems quite some work :)

Any help is welcome. :)

4 replies

josojo Sep 19, 2023
Author

Since, I could not really figure out the difference, I changed the Lean4Repl script to accept an array of commands. Then everything runs through smoothly.
I.e. it accepts:

{"sid": 0, "cmd": [" refine \u27e8by norm_num, by norm_num, ?_\u27e9 ","  refine (le_minFac'.mpr \u03bb p hp hpn \u21a6 ?_).resolve_left (Nat.ne_of_gt (Nat.le_of_ble_eq_true h1)) ","rcases hp.eq_or_lt with rfl|h"," \u00b7 simp [(Nat.dvd_iff_mod_eq_zero ..).1 hpn] at h2 "," \u00b7 exact h"]}

and runs each command one after the other.

yangky11 Sep 22, 2023
Maintainer

Sorry for the late reply. Has the issue been resolved? I don't quite understand why changing Lean4Repl to take an array would solve the problem.

josojo Sep 23, 2023
Author

No, I could not 100% understand it. From my debugging, it seems the function evalTactic from the lean community

LeanDojo/src/lean_dojo/interaction/Lean4Repl.lean

Line 218 in 99c5723

monadLift $ commitIfNoEx (evalTactic stx)

just treats an \n and a new command(";") not always the same, just most of the time.

This is quite often also super helpful, to format proofs better: Many proofs contain linebreaks just for readability, and not to be used to separate proof steps like ";".

To improve the overall success of the interaction I am now trying out in Python which linebreaks mean: ";" and which ones are not necessary. :)

PS: requiring an array as cmd input also helps with all proofs that contain: ";" in lean4. I can just break the proof then into several steps and it works nicely. I will provide a PR once it's more ironed out.

yangky11 Sep 25, 2023
Maintainer

LeanDojo was not designed to take a block of tactics separated by \n, and I'm not sure what would happen in that case. Is there a reason that you want to do it this way instead of feeding the tactics one by one?

just treats an \n and a new command(";") not always the same, just most of the time.

They are just not the same in Lean 4. Proofs in Lean are indentation-sensitive. If you use \n, you have to take care of indentation. ; does not have this problem.

yangky11 · 2023-10-12T20:59:47Z

yangky11
Oct 12, 2023
Maintainer

Update: We have made "running w/o Docker" the default setting: #74. It should work out of the box for Lean 4. If you use LeanDojo with Lean 3, now you need to set the environment variable CONTAINER to docker.

0 replies

josojo · 2023-10-15T14:24:58Z

josojo
Oct 15, 2023
Author

I just wanted to drop a small update here:

Also, I see that you're trying to implement the get_single_tactic_proof function for Lean 4. We tried implementing this function before but found it non-trivial to get correct proofs from Lean 4. If you simply concatenate all tactics extracted by LeanDojo, the result may not be a correct proof, since the tactics may overlap with each other, due to compound tactics. I'd suggest you look at some examples produced by your get_single_tactic_proof to double-check if they are correct.

I provide a POC implementation in the following repo: https://github.com/josojo/lean_ai_helper
I used the tactics provided by your extraction lean script and then sort out all tactics that are surrounded by another tactic of the same theorem. https://github.com/josojo/lean_ai_helper/blob/main/src/trace/trace.py#L142. This works well.

With this implementation and many more small improvements, I got roughly ~99% percent of all theorems proven in the lean_dojo repl on my test-set of mathlib theorems. This is a small indication that this repl environment is even more stable than the lean 3 one, which had only a ~98% success rate.

Additional tweaks that I also applied and that could be ported over here:
Avoid utf surrogates notion during json.dump josojo/lean_ai_helper#31
Deal with prelude imports josojo/lean_ai_helper#34
Parse level parameters better: josojo/Lean4Repl@69f415c

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Execution errors #50

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 8 comments 5 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Execution errors #50

josojo Sep 1, 2023

Replies: 8 comments · 5 replies

yangky11 Sep 2, 2023 Maintainer

josojo Sep 7, 2023 Author

yangky11 Sep 9, 2023 Maintainer

josojo Sep 10, 2023 Author

yangky11 Sep 10, 2023 Maintainer

josojo Sep 14, 2023 Author

josojo Sep 19, 2023 Author

josojo Sep 19, 2023 Author

yangky11 Sep 22, 2023 Maintainer

josojo Sep 23, 2023 Author

yangky11 Sep 25, 2023 Maintainer

yangky11 Oct 12, 2023 Maintainer

josojo Oct 15, 2023 Author

josojo
Sep 1, 2023

Replies: 8 comments 5 replies

yangky11
Sep 2, 2023
Maintainer

josojo
Sep 7, 2023
Author

yangky11
Sep 9, 2023
Maintainer

josojo
Sep 10, 2023
Author

yangky11 Sep 10, 2023
Maintainer

josojo
Sep 14, 2023
Author

josojo
Sep 19, 2023
Author

josojo Sep 19, 2023
Author

yangky11 Sep 22, 2023
Maintainer

josojo Sep 23, 2023
Author

yangky11 Sep 25, 2023
Maintainer

yangky11
Oct 12, 2023
Maintainer

josojo
Oct 15, 2023
Author