[RFC] New parallel compilation infrastructure #1405
Conversation
Still reading through the RFC, but https://clang.llvm.org/docs/LeakSanitizer.html might be worth a try. |
I tried a few experiments.
|
I will take a look. |
I tried adding this in module_mgr on file invalidation: m_modules.clear(); When I check in the debugger, the task queue is empty, there is just a single loaded module (init.core), log_tree is also cleared, yet we're using a gigabyte of memory. I really can't see where we still keep references to all that memory. |
It could be the thread local caches. |
I have added the following compilation flags
Tracks the amount of memory deallocated by
Enable counter for the number of live BTW, Lean displays the amount of deallocated memory and live expressions before/after finalization.
Disable Here is some data for an example that consumes 1Gb by making simple modifications to a) with
So, the custom allocators have released 608Mb in the end of the execution. Note that this amount is not including the overhead introduced by b) With |
The compilation flags are nice. I just had to move them around and make them write to stderr, so that I can see them from emacs. I also added an extra output right before thread finalization:
(This is from while the server is running.) So it is really the thread local data that uses up memory. However all worker threads are stopped as soon as they are idle. The lthread objects are destroyed as well (there was a reference cycle there), but that doesn't help either. |
I have just added
and lean server running with |
Hmm, apparently all the thread-local data is on the main thread: if I comment out the following line, then
That is, I get:
@Kha LeakSan is awesomely fast, thanks for pointing me to it. However it doesn't detect the leak above. |
@gebner I tried to reproduce your last experiment. Here is the data I get (I'm using the master branch)
After I commented the line you have referenced in your message
|
@gebner Here is a conjecture: I will test this conjecture by putting a limit on the |
@gebner Here is the same experiment, but
We are still consuming a lot of memory 1Gb, but the main thread is holding only 3Mb instead of 300Mb during finalization. Another thing to keep in mind that there is overhead and fragmentation going on in the |
See leanprover#1405 Memory consumption is still high, but I didn't manage to cross the 2Gb limit anymore with this commit even after hundreds of modifications. @gebner I'm not seeing a big difference betwee Lean without memory_pool, with bounded memory_pool and unbounded memory_pool. We may even consider removing it in the future after a more careful benchmarking. In the benchmark (https://gist.github.com/leodemoura/b27fb4203a13a67274b388a602149303), I'm getting the following numbers: - No memory_pool: runtimes between 3.532s - 3.556s - With memory_pool bounded by 8192: runtimes between 3.32s - 3.44s - With memory_pool (with no limit): runtimes between 3.32s - 3.44s On the other hand, the small object allocator makes a big difference. I used your list_rev.lean example. - with: 2.62s - without: 3.75s
This is very true. We deallocate all the environments on the main thread, as well as the info managers, etc. I tried the memory_pool change, but it doesn't change the memory consumption for me. I found a different memory leak though that sometimes happens at the end of a server process, I'm looking into it right now. |
Yes, the memory consumption is high, but it seems to stabilize for me. The commit also seems to address the following observation you made:
Do you still observe this behavior after the
Does this leak affect the |
It still keeps growing for me. I need to test master again to see the difference.
No, the main thread finalizer does not change the deallocation number anymore, so that seems to be fixed. |
Okay, there were quite a few leaks all over the place. Now it stops at 1.5G for me as well. |
I've added the region-of-interest support to the vscode extension as well: leanprover/vscode-lean@master...gebner:parallel2 Unfortunately, it's not possible to determine which lines are visible with the current vscode API: microsoft/vscode#14756 |
I investigated this one. However, I didn't find a good solution yet (only hacks).
On the other hand,
So, we use the
|
Here is one ugly hack to fix the problem above. |
I think that in the long run all error messages during tactic execution should be kept in the tactic_state. Then If I recall correctly, the main motivation for For now, the |
This would be bad. The error messages are thunks, because we want to discard them at
This is a bug in the current support for |
I did not suggest to switch from thunks to error messages. I was referring to the fact that some operations like |
This problematic since to put them into the |
I'm not sure I'm following yet - why would we ever need to report errors immediately or store them in the tactic state, instead of returning them in |
In very rare cases we produce warnings instead of errors, for example when adding declarations with the wrong noncomputability annotation. The elaborator can also produce more than one error (when error recovery is enabled). Maybe we want to produce more warnings in the future. |
@leodemoura How do we proceed with this branch? BTW, I added your examples as a test case and everything works as expected. |
@gebner, I just tried out this branch with my code. I can get the Lean server to consistently crash, by opening VS Code with just monoidal_categories/monoidal_category.lean open. It parses for a few seconds and then crashes. It's fine on the master branch.
|
@semorrison Thanks for testing! This branch requires an updated version of the vscode extension from https://github.com/gebner/vscode-lean/tree/parallel2 |
@gebner I can merge it today. Is it ready? |
@leodemoura Great. I'm still tracking down the bug reported by Scott. Aside from that the branch is ready. |
@leodemoura The bug is fixed, feel free to merge. |
Just so that I don't lose track of it: I think this is the upstream vscode issue that would allow us to implement the "visible lines" mode. microsoft/vscode#588 Edit: here is yet another issue for it. microsoft/vscode#17362 |
TL;DR This is why you want it:
I have been using this branch
almostexclusively for the last fewweeksmonths, and itshould be pretty stable now.
There is one nasty memory leak though (that is also in master): open
init/core.lean
andstandard.lean
. Then change something ininit/core.lean
.Every change uses an additional few hundred megabytes for me. I would really
appreciate any ideas on how to debug this. So far I've tried valgrind (too slow
on the whole library, small examples don't show any leaks) and heaptrack
(doesn't give any clues either), without success.
New parallel compilation infrastructure
While the current parallel compilation and server mode works pretty well, there
a few issues:
The code for
mt_task_queue
is horribly complex at the moment because it haslots of responsibilities, and there are probably lots of bugs.
to other programming languages, or even just multiple Lean instances inside
the same process (which might reasonably happen when Lean is used as a
library).
dependencies (in particular proof elaborations) are always executed.
problem if you edit
init/core.lean
and havedata/list/comb.lean
open atthe same time.
"hourglass" icon in Emacs), just because the command spawns a task.
This PR solves all of these issues. I think a good way to view the problem is
to consider 4 very similar, but subtly different graphs:
another one. For example, type-checking must run after elaboration.)
keep the messages form the old proof elaborations around, and only replace
them once they are re-elaborated as well.)
file?)
These are now separate and mostly explicitly represented. Currently tasks have
metadata which we use for 3. and 4., this metadata is replaced by explicit data
structures.
Additionally, this PR adds the following new features:
nothing, or the complete visible files.
matches the error reporting in Emacs. Concretely, we missed two error in the
run tests due to the current mismatch:
quote1.lean
andexact_perf.lean
.@leodemoura
quote1
was an easy fix, butexact_perf.lean
reports an errorbecause
try { exact h2 }
fails, andh2
has an elaboration error. Youadded this error message at
add5266
Any ideas?
I have removed all incremental message/tasks updates from the server. The
server resends all running tasks / all messages every time (about every 100ms).
(See #1364 for how we arrived at this approach.)
However these are only sent for the region that is checked: for example if
you choose "check visible lines", then you only see errors for the visible lines.
There are two backwards-compatible changes in the server protocol:
additional_message
command got removed.roi
(region of interest) command:Task dependency graph (and tasks in general)
As before, the task dependency graph is specified by the dependencies of each
task. Every task has a method that computes a list of its dependencies:
The reason this is a method is because the dependencies can change dynamically.
Consider for example a dependency on
task<task<expr>>
, here theget_dependencies
method would first return the outer task, and then both theouter and inner task once the outer one is finished.
Since tasks no longer require so much metadata, it is feasible to construct
them with lambdas, including nice combinators for mapping, etc.:
(The snippet above causes
sorry
to be only reported at most once perdeclaration, and only if no other error was reported. It is also completely
deterministic.)
Tasks now have the the type
task<T>
(much easier to type). Tasks aretypically constructed using the
task_builder<T>
helper. A common operation is towrap the execution function, e.g. to set thread-local variables. This is
accomplished using the
.wrap()
function; implementation-wise, each wrapperadds a new heap-allocated
gtask_imp
object that delegates the execution anddependency methods to the previous
gtask_imp
. (We could avoid theheap-allocation with some template trickery, but I'd rather wait for C++14 for
that--it has the equivalent of Rust's impl Trait feature.)
Dependencies are usually constructed with combinators as well, and are tasks as
well (with the generic
gtask
type).In general, tasks are now no longer submitted immediately. They are stored in
the
log_tree
for lazy evaluation--see the corresponding section for details.Module dependency graph
This part remains essentially unchanged. In the
module_mgr
every module hasa list of dependencies. We might want to use the cancellation graph to
invalidate modules as well, but this has the (positive?) side-effect that
invalidated modules are immediately cancelled.
I'm waiting with this refactoring until we have the parser monad.
Message tree (
log_tree
)The
log_tree
is a mutable, concurrent data structure. It stores producedmessages, the info managers, as well as the tasks that need to be executed to
produce these messages. More concretely, a node of the
log_tree
has thefollowing fields:
name_map<node> m_children; std::vector<log_entry> m_entries; // e.g. messages location m_location; std::string m_description; gtask m_producer;
Every node has an associated location (file name + position range). The tree is
structured in a way such that the children's location is always a subset of the
parent's location. We could enforce this invariant, as well as that messages
need to be contained in the node's location. This would solve the problem where
we run a command, and report error messages in a different part of the file.
The tree is designed for lazy evaluation: we first insert nodes that contain
unsubmitted tasks. The tasks are then submitted by a separate listener, and only
when the editor/command-line requests the error messages for that location. When
a task is finished, it resets the producer field in its node to null. We only
modify nodes where an ancester has a non-null producer. Nodes that have a
non-null producer are shown as in-progress in the editor.
When sending information to the editor, we simply traverse this tree. (Recall
that we only send messages for the currently visible region, so this is pretty
efficient. We don't even look at subtrees whose location is disjoint.)
The progress messages at the command line and in the server mode are obtained by
traversing the
log_tree
as well, instead of looking at the state of the taskqueue.
Tasks are typically inserted by using the
add_library_task<T>
function.Cancellation graph
At the moment, we cancel tasks based on their position in the file, by storing
the file name and position in every task, and sending a predicate to the task
queue when we want to cancel something. This PR takes an approach inspired by
vscode, where every command gets an explicit
CancellationToken
object, whereyou can attach listeners. We have a thread-local variable for a
cancellation_token
, which can be cancelled and where we can add children.It is notable to describe the ownership convention: parents hold weak references to
the children, children hold (strong) references to the parents. This ensures,
that as long as you have a
cancellation_token
, you will receive cancellationevents from every parent, without creating memory leaks (by never freeing
children).
To do
Different priorities for the tasks. Currently we prioritize parsing overelaboration. This has been lost in the refactoring. It does not seem to be a
major issue for now.
Do not elaborate proofs in dependent modules. With this PR, this would happenfor a "stupid" reason: we check for every import whether it contains sorry.
This causes the proofs to be elaborated. We just need to find a clean way to
skip this sorry check, probably only when "checking visible lines".
Once we refactor the parser to process one command at a time, I'd like to
schedule one parsing task per command. This brings a few immediate advantages:
automatically resumed when scrolling down.
now we only get the snapshots after we finish parsing the whole file.
Performance
There is no (significant) performance change in either direction when compiling
the standard library. The editor interface seems to be more responsive now.