Replies: 8 comments 5 replies
-
Hi, The errors fall into a few categories, and we documented a few examples from each category in tests/interaction/test_unexpected_errors.py. You're welcome to try to fix them. However, the 2.1% is from our analysis on Lean 3. The community is moving to Lean 4, and it might be more valuable to perform a similar analysis with Lean 4 and fix any problems there. I believe lean-tpe-public was developed around the same time as lean-gym, and they share actually share some code. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the answer. I am happy to try to do the evaluation for lean4. Here is one of my scripts: josojo#1.
Here are the logs from my execution that took over 4 minutes on my moderate computer:
```
python scripts/evaluate_lean_4_interaction.py
2023-09-07 20:14:52.900 | DEBUG | lean_dojo.constants::59 - Using GitHub personal access token for authentication
2023-09-07 20:14:59.976 | INFO | __main__:main:61 - Namespace()
2023-09-07 20:15:02.618 | DEBUG | lean_dojo.data_extraction.trace:get_traced_repo_path:139 - The traced repo is available in the cache.
2023-09-07 20:15:02.618 | INFO | lean_dojo.data_extraction.trace:trace:163 - Loading the traced repo from /Users/josojo/.cache/lean_dojo_manual/yangky11-lean4-example-7d711f6da4584ecb7d4f057715e1f72ba175c910/lean4-example
2023-09-07 20:15:03.372 | DEBUG | lean_dojo.data_extraction.traced_data:load_from_disk:1470 - Loading 621 traced XML files from /Users/josojo/.cache/lean_dojo_manual/yangky11-lean4-example-7d711f6da4584ecb7d4f057715e1f72ba175c910/lean4-example with 9 workers
2023-09-07 20:15:06,486 INFO worker.py:1612 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
0%| | 0/621 [00:00:59 - Using GitHub personal access token for authentication
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 621/621 [00:26<00:00, 23.15it/s]
(pid=68113) 2023-09-07 20:15:08.000 | DEBUG | lean_dojo.constants::59 - Using GitHub personal access token for authentication [repeated 8x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/ray-logging.html#log-deduplication for more options.)
2023-09-07 20:15:35.521 | DEBUG | lean_dojo.data_extraction.lean:_get_lean4_dependencies:522 - Querying the dependencies of LeanGitRepo(url='https://github.com/yangky11/lean4-example', commit='7d711f6da4584ecb7d4f057715e1f72ba175c910')
2023-09-07 20:15:38.093 | DEBUG | lean_dojo.data_extraction.traced_data:check_sanity:1334 - Checking the sanity of TracedRepo(repo=LeanGitRepo(url='https://github.com/yangky11/lean4-example', commit='7d711f6da4584ecb7d4f057715e1f72ba175c910'), dependencies={'lean4': LeanGitRepo(url='https://github.com/leanprover/lean4', commit='13ca443f058b0e33e44a9fdba347a53330463348')}, root_dir=PosixPath('/Users/josojo/.cache/lean_dojo_manual/yangky11-lean4-example-7d711f6da4584ecb7d4f057715e1f72ba175c910/lean4-example'))
2023-09-07 20:15:38.799 | INFO | __main__:main:71 - Loading the theorems
2023-09-07 20:15:39.389 | INFO | __main__:main:72 - number of theorems in repo
0%| | 0/721 [00:00:59 - Using GitHub personal access token for authentication
(RayHelper pid=68489) 2023-09-07 20:15:49.909 | INFO | __main__:_validate_ground_truth:27 - {'theorem': Theorem(repo=LeanGitRepo(url='https://github.com/yangky11/lean4-example', commit='7d711f6da4584ecb7d4f057715e1f72ba175c910'), file_path=PosixPath('Lean4Example.lean'), full_name='hello_world'), 'proof': 'rw [add_assoc, add_comm b, ←add_assoc]'}
(RayHelper pid=68489) 2023-09-07 20:15:49.909 | DEBUG | lean_dojo.interaction.dojo:__enter__:193 - Initializing Dojo for Theorem(repo=LeanGitRepo(url='https://github.com/yangky11/lean4-example', commit='7d711f6da4584ecb7d4f057715e1f72ba175c910'), file_path=PosixPath('Lean4Example.lean'), full_name='hello_world')
(RayHelper pid=68489) 2023-09-07 20:15:49.910 | DEBUG | lean_dojo.data_extraction.trace:get_traced_repo_path:139 - The traced repo is available in the cache.
(pid=68489) 2023-09-07 20:15:43.785 | DEBUG | lean_dojo.constants::59 - Using GitHub personal access token for authentication [repeated 8x across cluster]
(RayHelper pid=68489) 2023-09-07 20:15:51.379 | DEBUG | lean_dojo.interaction.dojo:_modify_file:402 - Modifying Lean4Example.lean
(RayHelper pid=68489) 2023-09-07 20:15:51.380 | DEBUG | lean_dojo.interaction.dojo:__enter__:227 - Launching the proof using
(RayHelper pid=68489) 2023-09-07 20:15:51.380 | DEBUG | lean_dojo.container:run:305 - docker run --cidfile vry0w7iu.cid --rm -u 501 --mount type=bind,src="/private/var/folders/3q/y2wz8sb93q591g0ql084rngm0000gn/T/tmpbiwwad63/lean4-example",target="/workspace/lean4-example" --workdir /workspace/lean4-example yangky11/lean-dojo lake build Lean4Repl
(RayHelper pid=68489) 2023-09-07 20:18:12.910 | DEBUG | lean_dojo.container:run_interactive:341 - docker run --cidfile aljfo4fj.cid --rm -u 501 --mount type=bind,src="/private/var/folders/3q/y2wz8sb93q591g0ql084rngm0000gn/T/tmpbiwwad63/lean4-example",target="/workspace/lean4-example" --cpus 1 --memory 16g --workdir /workspace/lean4-example -i yangky11/lean-dojo lake env lean Lean4Example.lean
(RayHelper pid=68489) 2023-09-07 20:18:13.185 | DEBUG | lean_dojo.interaction.dojo:_read_next_line:591 - info: syncing channel updates for 'nightly'
(RayHelper pid=68489) 2023-09-07 20:18:20.316 | DEBUG | lean_dojo.interaction.dojo:_read_next_line:591 - info: latest update on nightly, lean version nightly-2023-09-07
(RayHelper pid=68489) 2023-09-07 20:18:20.317 | DEBUG | lean_dojo.interaction.dojo:_read_next_line:591 - info: downloading component 'lean'
(RayHelper pid=68489) 2023-09-07 20:19:53.048 | DEBUG | lean_dojo.interaction.dojo:_read_next_line:591 - info: installing component 'lean'
(RayHelper pid=68489) 2023-09-07 20:20:02.984 | DEBUG | lean_dojo.interaction.dojo:_read_next_line:591 - warning: improperly formatted manifest: incompatible manifest version `4`
(RayHelper pid=68489) 2023-09-07 20:20:04.516 | DEBUG | lean_dojo.interaction.dojo:_read_next_line:591 - REPL> {"tacticState": "a b c : Nat\n⊢ a + b + c = a + c + b", "sid": 0, "error": null}
(RayHelper pid=68489) 2023-09-07 20:20:04.517 | DEBUG | lean_dojo.interaction.dojo:_submit_request:551 - Request: {"sid": 0, "cmd": "rw [add_assoc, add_comm b, \u2190add_assoc]"}
(RayHelper pid=68489) 2023-09-07 20:20:04.598 | DEBUG | lean_dojo.interaction.dojo:_read_next_line:591 - REPL> {"tacticState": "no goals", "sid": 1, "error": null}
(RayHelper pid=68489) 2023-09-07 20:20:04.598 | DEBUG | lean_dojo.interaction.dojo:_submit_request:551 - Request: exit
(RayHelper pid=68489) 2023-09-07 20:20:05.298 | DEBUG | lean_dojo.interaction.dojo:_read_next_line:591 -
(RayHelper pid=68489) 2023-09-07 20:20:05.298 | DEBUG | lean_dojo.interaction.dojo:_cleanup:331 - Cleaning up.
(RayHelper pid=68489) 2023-09-07 20:20:05.298 | DEBUG | lean_dojo.interaction.dojo:_cleanup_container:341 - Cleaning up the container.
(RayHelper pid=68489) 2023-09-07 20:20:05.344 | DEBUG | lean_dojo.interaction.dojo:_cleanup_tmp_dir:354 - Cleaning up the temporary directory.
2023-09-07 20:20:05.499 | INFO | __main__:main:120 - LeanDojo: 1/1
```
It looks like most of the time is lost during `lake build` of the new repo in the docker image. If we would also copy over the cache of the build folder into the docker image, the `lake build` could use the lake-cache and only need a fraction of the time. Wdyt? is this possible?
Have you seen other tools like this one: https://github.com/leanprover-community/repl. Maybe it provides a better gym environment for lean 4, or do you think one would run ultimately into the same issues as with lean-gymp 3? |
Beta Was this translation helpful? Give feedback.
-
Hi, LeanDojo wasn't designed to optimize the speed, especially the speed for initializing proof search. I believe there is significant room for improvement. We currently don't have the capacity to work on it, but contributions are welcome and appreciated!
If you're talking about From your log, I also see a lot of time spent by elan in downloading the right version of Lean. This is because our Docker image (built using this file) may not have the exact version of Lean required by the Lean repo you're working with. So it tries to download and install Lean every time, which is obviously a waste. I'm thinking about probably we can make running w/o Docker the default setting, at least for Lean 4. The reason we adopted Docker was that, for Lean 3, LeanDojo needs to change its C++ source code and re-compile, which is very brittle when running w/o Docker. For Lean 4, Docker may not be necessary. When running w/o Docker, the user can pre-install the correct version of Lean so that it does not need to be downloaded/installed every time.
I don't think it suffers from lean-gym's problem. However, you need to set up the environment of a theorem by yourself (e.g., importing libraries and running everything before the theorem). Once you do that, I'm not sure if it will save you any time, since their mechanism for interacting with Lean is quite similar to ours. Also, I see that you're trying to implement the |
Beta Was this translation helpful? Give feedback.
-
Nice. Thanks for the response! Yes, running it "Native" without docker reduces the time significant. Thanks for the hint. The only low-hanging fruit left over optimization is that takes 6 seconds to run lake build Lean4Repl.
I tested it roughly and it should work. Would you accept such a PR to save the 6 secs per evaluation? If so, feel free to clone or recreate https://github.com/josojo/Lean4Repl on the LeanDojo repo and I will reference to it.
Thank you for warning. I will take a look |
Beta Was this translation helpful? Give feedback.
-
I made a PR as discussed: Currently it depends on: |
Beta Was this translation helpful? Give feedback.
-
I am still working on this. I can now run the tests and did quite some optimization to run it quickly. But the tests results are not that great :( There are many things that I don't understand. I am posting here the most annoying one, maybe someone can jump in: For proving: theorem minFacHelper_0:
During the investigations, I was running also each line for itself, and then everything works:
So I am quite clueless as to why the notation \n is not always doing the same as starting a new command. In another language, I would start to a debugger, but doing it in Lean seems quite some work :) Any help is welcome. :) |
Beta Was this translation helpful? Give feedback.
-
Update: We have made "running w/o Docker" the default setting: #74. It should work out of the box for Lean 4. If you use LeanDojo with Lean 3, now you need to set the environment variable |
Beta Was this translation helpful? Give feedback.
-
I just wanted to drop a small update here:
I provide a POC implementation in the following repo: https://github.com/josojo/lean_ai_helper With this implementation and many more small improvements, I got roughly ~99% percent of all theorems proven in the lean_dojo repl on my test-set of mathlib theorems. This is a small indication that this repl environment is even more stable than the lean 3 one, which had only a ~98% success rate. Additional tweaks that I also applied and that could be ported over here: |
Beta Was this translation helpful? Give feedback.
-
I really like this project :)
The paper states that the leandojo environment does not always evaluate the state correctly. In 2.1% of the cases, it produces invalid state transitions even though the state transition is valid.
Why is this the case? Can we help to fix it?
Side question:
I saw the comparison to lean -gym(https://github.com/openai/lean-gym/tree/main). But openai uses the following tool in the newer publications. Has anyone done a comparison to that one:
https://github.com/jesse-michael-han/lean-tpe-public/tree/master
I wanna help build a good foundation for theorem proving, but I don't wanna work on it if others have fixed it already.
Beta Was this translation helpful? Give feedback.
All reactions