You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Executing the last portion of a Task.bind can run into the closure for it already having been nulled.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007ff709a56a7c in lean_ptr_tag (o=0x0) at /home/sebastian/lean/lean/build/debug/stage1/include/lean/lean.h:386
386 return o->m_tag;
[Current thread is 1 (Thread 0x7ff6feffd6c0 (LWP 1784522))]
(gdb) up
...
#5 0x00007ff709a6498c in lean::task_manager::run_task (this=0x55b73faa1380, lock=..., t=0x7ff6fdc4c098) at /home/sebastian/lean/lean/src/runtime/object.cpp:718
718 add_dep(lean_to_task(closure_arg_cptr(t->m_imp->m_closure)[0]), t);
To reproduce:
Not easily. This is a somewhat rare failure in my new snapshot types branch.
Analysis:
This appears to be a plain race condition in the following code:
We check that the task is not deleted then, in the last branch, we drop the lock and access the closure, but a concurrent thread can empty the closure field in deactivate_task_core in between! I've confirmed that retrieving the closure within the lock like is done in the same function before fixes my test failures: Kha@994a088. But it is not actually clear to me why/whether this is sufficient in either of these closure accesses: who is actually keeping the closure alive after we've dropped the lock? deactivate_task_core will dec it as soon as the task is unreferenced but I don't see or remember a counteracting inc.
@leodemoura In general I am concerned about the cost/benefit ratio of the two following task_object features that are involved in the bug:
the deactivated state that eagerly clears up specific fields (such as m_closure) even while the task is queued or running. If it merely set m_canceled and left all cleanup to when the task_manager is completely done with the task, there would be no interference with other task functions.
reusing the task object in task_bind_fn1 for a second logical task instead of allocating a new task object, which requires the special case branch in task_manager that broke above.
As far as I can see, both of these provide marginal memory use benefits while they significantly complicate the task_object invariants and thus its maintainability. Both parts had to be fixed before.
The text was updated successfully, but these errors were encountered:
As far as I can see, both of these provide marginal memory use benefits while they significantly complicate the task_object invariants and thus its maintainability. Both parts had to be fixed before.
Executing the last portion of a
Task.bind
can run into the closure for it already having been nulled.To reproduce:
Not easily. This is a somewhat rare failure in my new snapshot types branch.
Analysis:
This appears to be a plain race condition in the following code:
lean4/src/runtime/object.cpp
Lines 707 to 718 in b278172
We check that the task is not deleted then, in the last branch, we drop the lock and access the closure, but a concurrent thread can empty the closure field in
deactivate_task_core
in between! I've confirmed that retrieving the closure within the lock like is done in the same function before fixes my test failures: Kha@994a088. But it is not actually clear to me why/whether this is sufficient in either of these closure accesses: who is actually keeping the closure alive after we've dropped the lock?deactivate_task_core
willdec
it as soon as the task is unreferenced but I don't see or remember a counteractinginc
.@leodemoura In general I am concerned about the cost/benefit ratio of the two following
task_object
features that are involved in the bug:deactivated
state that eagerly clears up specific fields (such asm_closure
) even while the task is queued or running. If it merely setm_canceled
and left all cleanup to when thetask_manager
is completely done with the task, there would be no interference with other task functions.task_bind_fn1
for a second logical task instead of allocating a new task object, which requires the special case branch intask_manager
that broke above.As far as I can see, both of these provide marginal memory use benefits while they significantly complicate the
task_object
invariants and thus its maintainability. Both parts had to be fixed before.The text was updated successfully, but these errors were encountered: