Closure transfer fix for `sched::clone` #920

xurtis · 2018-07-04T09:31:52Z

Fixes #919

Annotate the correct lifetime ('static) for the closure transferred to
the child process of the clone.

Properly box the closure in the parent then rebox it in the child to
ensure that the closure is dropped at the end of the invocation in the
child rather than at the end of the call to clone in the parent.

For some reason, unwrapping the boxed function itself caused all kinds
of mayhem, so it instead ends up wrapped in an additional box from which
it is then unwrapped.

asomers

Why do you need to double box cb? I don't understand why you can't just use a single Box.

xurtis · 2018-07-05T00:54:58Z

After looking at how std::thread achieves the same result, I changed the patch to mimic that instead of doing strange dances with boxes. The (much simpler) solution turns out to be just forgetting the boxed function rather than dropping it.

xurtis · 2018-07-05T01:22:22Z

It also appears that even at its lowest level, in order to implement the FnOnce semantics, libstd::thread uses a Box<Box<FnBox()>> which it then unwraps, sends, then boxes again before calling. Still not sure on the why though.

I think it would make more sense to require the callback to be a FnOnce, but I can't see a way of doing that in stable currently.

asomers · 2018-07-05T13:58:10Z

I'm skeptical of the mem::forget. That function deliberately leaks memory, and is normally used when you've made a low-level (unsafe) copy of a reference somewhere. But normally at least one copy gets destroyed, and I don't see that happening. Is it really necessary to forget the callback, when it's already static?

xurtis · 2018-07-05T14:11:49Z

In this case the 'static refers to the fact that the function that the box wraps is not bound to the lifetime of any other object (i.e. all items that it refers to have been moved into it). It does not mean that the boxed closure is static.

The existing implementation of clones as you have ti will only work when the child thread does not use the CLONE_VM flag, meaning it gets its own entire copy of the VM. The reason it doesn't work is because the boxed closure will be dropped by the end of the call to clone, as it is moved into that function and that function consumes it. There is only one copy to this box and the child thread gets a reference to it, but if that box is dropped be for the child thread exits, then the child thread could rely entirely on invalid memory and, in most cases, won't even get to the point of executing the closure.

Either, the box needs to be copied (which it can't) or have its ownership transferred to the child process, or forgotten and never dropped at all. The only to transfer ownership to the child process involves wrapping a the boxed closure in an additional box (I'm still not entirely clear on the why there).

xurtis · 2018-07-05T14:19:51Z

Another alternative fix that could be used here is raising that concern to the caller and using a more direct analogous type to what the syscall is actually expecting and have the wrapper take a fn(T) -> isize (a function pointer rather than a boxed closure) and a Box<T> (where T: Send + 'static) instead. This would leave more abstract implementations of executing closures to caller instead, whcih may be preferable in this case.

asomers · 2018-07-05T15:48:15Z

I agree. But I'm not comfortable with the double-box solution without knowing why it's necessary. Did you check the history of the file in libstd that uses that technique? That might be instructive.

xurtis · 2018-07-06T05:02:06Z

So, on your recommendation, I'm going back through the years of commits to https://github.com/rust-lang/rust/blob/master/src/libstd/sys/unix/thread.rs and it goes back quite some way through pre-stable rust (back before ~ notation for boxes was a thing).

I asked around on #rust and got some help answering the question. The short version is, don't cast between pointers of static and DSTs. DST pointers are actually 'fat' pointers which are a tuple of a pointer to data and a pointer to a vtable. When those get cast to a pointer to a void, the vtable data is dropped. To resolve this, an additional box should be used, so that you have a pointer to a static type (which contains a DST pointer) that can be safely casted to pointer to a static type, like *mut c_void.

The existing code masked a fatal compile error that would have caught this using a mem::transmute.

xurtis · 2018-07-06T06:28:18Z

I've fixed it up in a way that should disrupt the interface as little as possible using the double boxing. I've also noticed that the way the stack is passed into clone as a &mut [u8] is troublesome as that could mean that child thread outlives the memory for its own stack. Fixing that would break the interface.

Fixes nix-rust#919 Annotate the correct lifetime (`'static`) for the closure transferred to the child process of the clone. Properly box the closure in the parent then rebox it in the child to ensure that the closure is dropped at the end of the invocation in the child rather than at the end of the call to `clone` in the parent. For some reason, unwrapping the boxed function itself caused all kinds of mayhem, so it instead ends up wrapped in an additional box from which it is then unwrapped.

This reverts commit ba0a5e6.

This test appears to be incredibly flaky and causing errors in the `sys::tests::test_sigwait` test which can't be replicated locally.

asomers

Ok, that explanation for double-boxing makes sense. I think it'll make even more sense when expressing things using the new dyn Trait syntax introduced by the latest compiler.

Don't forget to add a CHANGELOG entry.

asomers · 2018-07-06T15:16:21Z

src/sched.rs

@@ -119,3 +119,34 @@ pub fn setns(fd: RawFd, nstype: CloneFlags) -> Result<()> {

    Errno::result(res).map(drop)
 }
+
+#[cfg(not)]


I wanted to see if the tests on travis would pass if the test was disabled. It seems that with this test in place, both this test and the sys::wait::tests::test_waitpid test would fail in various situations, but not in a way that I could replicate locally, even when building for targets that were failing in travis.

asomers · 2018-07-06T15:17:43Z

src/sched.rs

+    #[test]
+    fn simple_clone() {
+        // Stack *must* outlive the child.
+        let mut stack = Vec::new();


May as well combine these lines into let mut stack = vec![0u8; 4096]

This reverts commit e78c659.

Doesn't seem to actually work here.

jD91mZM2 · 2021-01-07T13:52:53Z

Anyone still working on this?

asomers reviewed Jul 4, 2018

View reviewed changes

xurtis added 11 commits July 6, 2018 21:53

Revert "Closure transfer fix for sched::clone"

45ccdfe

This reverts commit ba0a5e6.

Simply forget the boxed function.

f3fc220

Function should be safe to send.

28aa11d

Fix type annotation.

085b147

Remove masking of erroneous cast of function arguments.

0e5a6e6

Fix boxing of closure before sending.

9979947

Add tests for clone.

dd2e37f

Use dynamically allocated stack in test.

245d898

Simpler return from clone.

eaaf6e7

Fix test to work on 1.20

761c035

xurtis force-pushed the staging branch from 428fe9d to 761c035 Compare July 6, 2018 11:53

Mask clone test.

e78c659

This test appears to be incredibly flaky and causing errors in the `sys::tests::test_sigwait` test which can't be replicated locally.

asomers requested changes Jul 6, 2018

View reviewed changes

xurtis added 7 commits July 7, 2018 12:35

Add test output

f6f9a29

Revert "Mask clone test."

0e1a7da

This reverts commit e78c659.

Simplify stack allocation.

69569e2

Add changelog.

d1ae099

Add backtrace to test output.

d9588a2

Don't use child signal to wait for clone test.

7c46462

Remove backtrace flag.

d0499dd

Doesn't seem to actually work here.

bors bot force-pushed the staging branch from f1c651e to 2c42b30 Compare June 27, 2020 22:50

bors bot force-pushed the staging branch from d1ecf09 to 59e33bc Compare October 6, 2020 00:48

bors bot force-pushed the staging branch 4 times, most recently from 246bf3b to cde6e3e Compare November 16, 2020 05:13

bors bot force-pushed the staging branch 5 times, most recently from 44e21b6 to 769d664 Compare December 5, 2020 00:00

bors bot force-pushed the staging branch 3 times, most recently from eb4e312 to 199f83d Compare December 16, 2020 22:58

bors bot force-pushed the staging branch from 199f83d to b53a08e Compare December 18, 2020 15:33

bors bot force-pushed the staging branch from d8b8bbb to 5e491c8 Compare March 23, 2021 02:46

bors bot force-pushed the staging branch from 4ba9530 to d9d447d Compare July 8, 2021 22:11

yihuaf mentioned this pull request Jul 30, 2021

Fix how closure is transferred to the clone call. youki-dev/youki#173

Merged

bors bot force-pushed the staging branch 9 times, most recently from 8518bbb to 05657e2 Compare August 18, 2021 14:38

bors bot force-pushed the staging branch from 58e87ca to f9d508f Compare August 27, 2021 04:05

rtzoeller changed the base branch from staging to master December 15, 2021 17:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Closure transfer fix for `sched::clone` #920

Closure transfer fix for `sched::clone` #920

xurtis commented Jul 4, 2018

asomers left a comment

xurtis commented Jul 5, 2018

xurtis commented Jul 5, 2018

asomers commented Jul 5, 2018

xurtis commented Jul 5, 2018

xurtis commented Jul 5, 2018

asomers commented Jul 5, 2018

xurtis commented Jul 6, 2018

xurtis commented Jul 6, 2018

asomers left a comment

asomers Jul 6, 2018

xurtis Jul 7, 2018

asomers Jul 6, 2018

jD91mZM2 commented Jan 7, 2021

Closure transfer fix for sched::clone #920

Are you sure you want to change the base?

Closure transfer fix for sched::clone #920

Conversation

xurtis commented Jul 4, 2018

asomers left a comment

Choose a reason for hiding this comment

xurtis commented Jul 5, 2018

xurtis commented Jul 5, 2018

asomers commented Jul 5, 2018

xurtis commented Jul 5, 2018

xurtis commented Jul 5, 2018

asomers commented Jul 5, 2018

xurtis commented Jul 6, 2018

xurtis commented Jul 6, 2018

asomers left a comment

Choose a reason for hiding this comment

asomers Jul 6, 2018

Choose a reason for hiding this comment

xurtis Jul 7, 2018

Choose a reason for hiding this comment

asomers Jul 6, 2018

Choose a reason for hiding this comment

jD91mZM2 commented Jan 7, 2021

Closure transfer fix for `sched::clone` #920

Closure transfer fix for `sched::clone` #920