Segmentation fault in `Task.bind` task #2853

Kha · 2023-11-09T12:31:58Z

Executing the last portion of a Task.bind can run into the closure for it already having been nulled.

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007ff709a56a7c in lean_ptr_tag (o=0x0) at /home/sebastian/lean/lean/build/debug/stage1/include/lean/lean.h:386
386	    return o->m_tag;
[Current thread is 1 (Thread 0x7ff6feffd6c0 (LWP 1784522))]
(gdb) up
...
#5  0x00007ff709a6498c in lean::task_manager::run_task (this=0x55b73faa1380, lock=..., t=0x7ff6fdc4c098) at /home/sebastian/lean/lean/src/runtime/object.cpp:718
718	            add_dep(lean_to_task(closure_arg_cptr(t->m_imp->m_closure)[0]), t);

To reproduce:
Not easily. This is a somewhat rare failure in my new snapshot types branch.

Analysis:
This appears to be a plain race condition in the following code:

lean4/src/runtime/object.cpp

Lines 707 to 718 in b278172

    
           if (t->m_imp->m_deleted) { 
        
               lock.unlock(); 
        
               if (v) lean_dec(v); 
        
               free_task(t); 
        
               lock.lock(); 
        
           } else if (v != nullptr) { 
        
               lean_assert(t->m_imp->m_closure == nullptr); 
        
               resolve_core(t, v); 
        
           } else { 
        
               // `bind` task has not finished yet, re-add as dependency of nested task 
        
               lock.unlock(); 
        
               add_dep(lean_to_task(closure_arg_cptr(t->m_imp->m_closure)[0]), t);

We check that the task is not deleted then, in the last branch, we drop the lock and access the closure, but a concurrent thread can empty the closure field in deactivate_task_core in between! I've confirmed that retrieving the closure within the lock like is done in the same function before fixes my test failures: Kha@994a088. But it is not actually clear to me why/whether this is sufficient in either of these closure accesses: who is actually keeping the closure alive after we've dropped the lock? deactivate_task_core will dec it as soon as the task is unreferenced but I don't see or remember a counteracting inc.

@leodemoura In general I am concerned about the cost/benefit ratio of the two following task_object features that are involved in the bug:

the deactivated state that eagerly clears up specific fields (such as m_closure) even while the task is queued or running. If it merely set m_canceled and left all cleanup to when the task_manager is completely done with the task, there would be no interference with other task functions.
reusing the task object in task_bind_fn1 for a second logical task instead of allocating a new task object, which requires the special case branch in task_manager that broke above.

As far as I can see, both of these provide marginal memory use benefits while they significantly complicate the task_object invariants and thus its maintainability. Both parts had to be fixed before.

The text was updated successfully, but these errors were encountered:

leodemoura · 2023-11-09T13:10:48Z

As far as I can see, both of these provide marginal memory use benefits while they significantly complicate the task_object invariants and thus its maintainability. Both parts had to be fixed before.

Agreed. Please feel free to remove them.

#2959) Fixes #2853, unblocking my work before I get to refactoring this part of the task manager.

Kha added the bug Something isn't working label Nov 9, 2023

Kha mentioned this issue Nov 24, 2023

fix: run_task/deactivate_task race condition on m_imp->m_closure #2959

Merged

github-merge-queue bot pushed a commit that referenced this issue Dec 12, 2023

fix: run_task/deactivate_task race condition on m_imp->m_closure (

78200b3

#2959) Fixes #2853, unblocking my work before I get to refactoring this part of the task manager.

Kha closed this as completed in #2959 Dec 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault in `Task.bind` task #2853

Segmentation fault in `Task.bind` task #2853

Kha commented Nov 9, 2023

leodemoura commented Nov 9, 2023

Segmentation fault in Task.bind task #2853

Segmentation fault in Task.bind task #2853

Comments

Kha commented Nov 9, 2023

leodemoura commented Nov 9, 2023

Segmentation fault in `Task.bind` task #2853

Segmentation fault in `Task.bind` task #2853