feat: Cancellable `Future` #1684

Hywan · 2023-08-02T09:50:50Z

This patch should ideally be reviewed commit-by-commit for an easier reviewer experience. All commits contain a descriptive message, so that I don't have to repeat everything here.

It fixes #1669 and closes #1667.

The most important patches are probably 9cf9f11 and e488828.

In a nutshell: this patch ensures that it's possible to cancel or drop a RustFuture along with its inner Future properly. A new FFI function is generated per async function, <name>_uniffi_drop. I'm fully aware that this patch “breaks” the black-box model of the RustFuture from the foreign language point of view. It's however absolutely necessary, otherwise it's impossible to trigger a cancellation or a proper drop of a RustFuture.

I also have hope that this patch will fix #1677 (comment), as now there is a double reference to the callbackHolder, which must ensure that the garbage collector won't collect it prematurely.

Finally, this patch focuses on Kotlin, and I'm starting to take a look at the Swift part now. I may update this PR to update Swift if that's necessary. Python will follow.

Follow up feature: We may simplify Pin<Arc<RustFuture>> to simply Arc<RustFuture> once this PR has been merged, because now RustFuture::future contains the Pin. Edit: that's actually done in db62d75.

This patch updates Kotlin codegen to use `suspendCancellableCoroutine` and `CancellableContinuation` to be able to intercept cancellation of a `Job`. This patch alone makes the code to fail as expected, even if the cancellation isn't handled yet.

Should not be necessary once https://youtrack.jetbrains.com/issue/KT-47902 is fixed.

Hywan · 2023-08-02T09:56:36Z

cc @mhammond since Ben is in holidays.

jmartinesp · 2023-08-02T09:58:26Z

I tried running the tests mentioned in #1677 (repeating those tests 100, 1000 times) which were getting stuck before at the 2nd-3rd iteration and with these changes I got until >50 iterations of repeating the same tests before I canceled them, so it seems to solve the issue!

`RustFuture` has a `future` field. It was: `UnsafeCell<F>`, now it becomes `RwLock<Option<Pin<Box<dyn Future<Output = T>>>>>`. First, `RwLock` is probably safer to use than `UnsafeCell` here. Second, `Option` is required for the next commits (being able to cancel a `Future`). Third, `Pin` is required to keep the multi-thread safety, and to simplify the next commits. Fourth, `Box<dyn Future<Output = T>>` erases the need for the `F` generic parameter The primary goal of this patch is to remove the `F` generic parameter.

This patch adds a new `FfiType` variant: `Future`, which represents a pointer to a `RustFuture`.

…ncelled. An async function in the foreign language side can be cancelled. For example, in Kotlin: ``` var job = launch { someAsyncFunction() } job.cancel() ``` The problem is: The `Future` running inside `someAsyncFunction` was not dropped/cancelled. It can easily create dead-lock if some locks are used inside `someAsyncFunction` because they are never released. This patch focuses on Kotlin only for the codegen part. However, let's start by explaining the modifications on the Rust side. The procedural macros generate a function with a C ABI, that's not new. What is new is that now this function returns a pointer to the `RustFuture`. That's why `RustFuture::into_raw` is now public. Previously, it was just calling `future.wake()` and nothing was returned from this C function. But to be able to cancel/drop the `RustFuture`, we need a pointer to it, otherwise the implementation is a complete black- box from the foreign language point of view: the `RustFuture` is consumed by `RustFuture::wake` (then leaked, then fetched again by the waker etc.) and that's it. OK, so now, it returns a pointer to the `RustFuture`. The procedural macros generates a _new_ function with a C ABI to drop a `RustFuture`. What it does is: ```rust ::uniffi::rust_call(uniffi_call_status, || { let uniffi_rust_future = ::uniffi::RustFuture::<#return_ty, crate::UniFfiTag>::from_raw( uniffi_future_ptr, ); Ok(uniffi_rust_future.cancel_or_drop_inner_future()) }); ``` First off, `RustFuture::from_raw` is now also public. Second, now it becomes obvious why the `F` generic parameter of `RustFuture` was creating a problem! How can we build a `RustFuture` if we don't know the type of the `Future` itself? This generic parameter has been removed in a previous patch (686f827). What does `RustFuture::cancel_or_drop_inner_future`? It increments the `wake_counter` so that it ensures the `RustFuture` _cannot_ be awaken anymore. It also takes the value inside `RustFuture.future`, leaving a `None`, which also means that the `RustFuture` cannot be awaken too. Double guards. By taking the `Future`, we can drop it, thus releasing all the code inside the `Future`. Our problem is now solved from the Rust point of view! Now, let's move on the foreign language land. This patch focuses on Kotlin, another patch may come for Swift and Python soon. A previous patch (0df55b2) has switched `Continuation` for `CancellableContinuation`. We can now install a `CompletionHandler` that will be called by `CancellableContinuation.invokeOnCancellation`, but also when the `CancellableContinuation.resume` or `.resumeWithException` have been called. In all cases, we are sure the `CompletionHandler` is called. What this `CompletionHandler` does is the following: * Call the `…_uniffi_drop` FFI function, * “Free” `callbackHolder` by freeing its previous value (the `callback`), hopefully to indicate the GC that it can be collected. It may solve the problem mentioned in mozilla#1677 (comment) as a bonus. The `completionHandler` is also passed to the function callback handler, so that it can be called after `continuation.resume` or `continuation.resumeWithException`. Note that `completionHandler` receives `null` if it's a completed or resumed termination, or a `CancellationException` if the continuation has been cancelled. That's why it's valid to see `continuation.resumeWithException(exception); completionHandler(null)`; we must not mix those exceptions. The rest of the patch is extra code to `FfiFunction` on the Rust side, to add “companion” functions, it's a simple way to add extra functions related to one `FfiFunction`.

This patch creates a new `use_shared_resource` async function. It uses a `Mutex` to lock a fake resource, the lock is released after `release_after` milliseconds. This patch adds 2 new tests: 1. The first test does the following. It calls `useSharedResource(100U)` within a `Job` but the `Job` is cancelled after 50ms, which doesn't let the time for `useSharedResource` to free the lock. Hopefully, with our previous patches, the `RustFuture` is cancelled properly, and the lock is released. That's new! After the `delay`, `useSharedResource` is called again. Before the previous patches, because `job.cancel()` was doing nothing, we had to wait on the first `useSharedResource` to finish, to acquire a new lock and continue. 2. The second test does almost like the first, but it doesn't call `job.cancel()`. It basically tests that the second call to `useSharedResource` waits on the first one to finish.

Since `RustFuture::future` holds a `Pin` to the `Future`, the entire `RustFuture` no longer needs to be wrapped inside a `Pin`. It simplifies the code a little bit.

Now, `RustFuture` constraints the `Future` with the `'static` lifetime bound. This patch ensures that `uniffi_self` lives long enough, because `rustc` was complaining about it. Nonetheless, a new problem arised: `RustBuffer` cannot be sent inside a `RustFuture` because it doesn't implement `Send`. And so far, it seems safe to implement `Send` for `RustBuffer`, which is what this patch does. And now, everything seems to work.

uniffi_bindgen/src/bindings/kotlin/templates/AsyncTypes.kt

uniffi_core/src/ffi/rustfuture.rs

Upstream PR: mozilla/uniffi-rs#1684

bendk · 2023-08-05T14:26:17Z

I also have hope that this patch will fix #1677 (comment), as now there is a double reference to the callbackHolder, which must ensure that the garbage collector won't collect it prematurely.

Can you explain this one more? I don't see where the extra reference gets stored.

bendk

I like a lot about this PR: the doc updates, the fix for #1677. But I wonder if we should take a different approach to fixing the cancellation issue.

It seems to me that there's currently a fundamental problem with the ForeignExecutor scheduling: some scheduled calls never happen. The main cause seems to be that once the executor (AKA the CoroutineScope in Kotlin) is canceled then future scheduled calls will be ignored. Another cause that I'm not 100% sure about is that a call is successfully scheduled, but the executor is cancelled before that call fires. Because of this, we're leaking Arc pointers because we schedule a wake callback, but the callback never happens.

If that analysis is correct, then I'd rather update the ForeignExecutor API to handle this. I think there's 2 changes that need to be made:

ForeignExecutorCallback should return a value indicating if the callback was scheduled successfully. We can then check this value in schedule_do_wake() and if we see a failure then unleak the Arc pointer.
The callbacks themselves should get an additional arg specifying if the executor was cancelled. wake_callback() should check that and if there was a cancellation, it should still call Arc::from_raw() to unleak the pointer, but not call do_wake().

I think this will require a lot less changes on the Rust side. I also think this should be do-able from the Kotlin side. I think the first bullet is just a matter of checking isCancelled. The second bullet might be a bit more tricky, but it seems to me that there has to be a way to do that.

The main reason I'm suggesting this is that I want to start using the ForeignExecutor for more features (#1675). I think we're going to run into the same issue there if we don't fix things at the ForeignExecutor level here. Also, in general, it feels like this PR is working around a broken primitive operation rather than fixing the primitive itself.

bendk · 2023-08-05T15:29:55Z

I'm not totally sure that analysis is correct, feel free to push back if I missed something.

If you do think it's correct, I'd be happy to work on the Rust ForeignExecutor side of things if that would help since I'm already working with that code for #1675.

It exists a very rare case where it's possible to cancel a `Job`, and thus a `CoroutineScope` before the FFI binding building and running the `RustFuture` can be called, and thus, no `rustFuturePtr` can be computed. In this rare case, the `completionHandler` will run `rustFuturePtr!!` and boom. This patch catches this case by checking that the `rustFuturePtr` isn't `null` before calling the `…_drop` function to drop the `RustFuture`.

uniffi_bindgen/src/bindings/kotlin/templates/ObjectTemplate.kt

Hywan · 2023-08-07T10:22:21Z

Can you explain this one more? I don't see where the extra reference gets stored.

I'm not sure. I was surprised to see that this patch fixes the problem, but it wasn't my first intent. I've zero experience with Kotlin :-p.

Hywan · 2023-08-07T10:32:33Z

@bendk

I like a lot about this PR: the doc updates, the fix for #1677. But I wonder if we should take a different approach to fixing the cancellation issue.

It seems to me that there's currently a fundamental problem with the ForeignExecutor scheduling: some scheduled calls never happen. The main cause seems to be that once the executor (AKA the CoroutineScope in Kotlin) is canceled then future scheduled calls will be ignored. Another cause that I'm not 100% sure about is that a call is successfully scheduled, but the executor is cancelled before that call fires. Because of this, we're leaking Arc pointers because we schedule a wake callback, but the callback never happens.

If that analysis is correct, then I'd rather update the ForeignExecutor API to handle this. I think there's 2 changes that need to be made:

ForeignExecutorCallback should return a value indicating if the callback was scheduled successfully. We can then check this value in schedule_do_wake() and if we see a failure then unleak the Arc pointer.

The callbacks themselves should get an additional arg specifying if the executor was cancelled. wake_callback() should check that and if there was a cancellation, it should still call Arc::from_raw() to unleak the pointer, but not call do_wake().

I think this will require a lot less changes on the Rust side. I also think this should be do-able from the Kotlin side. I think the first bullet is just a matter of checking isCancelled. The second bullet might be a bit more tricky, but it seems to me that there has to be a way to do that.

The main reason I'm suggesting this is that I want to start using the ForeignExecutor for more features (#1675). I think we're going to run into the same issue there if we don't fix things at the ForeignExecutor level here. Also, in general, it feels like this PR is working around a broken primitive operation rather than fixing the primitive itself.

To clarify, I did open the PR because we are hitting this problem hard, and it's blocking us. I know it's not the best fix possible, and I'm OK if you close this and start something from scratch :-). I'm aware this is a dressing on a deeper problem.

Hywan · 2023-08-07T10:39:30Z

@bendk Your suggestion would work well with https://developer.apple.com/documentation/swift/withtaskcancellationhandler(operation:oncancel:) for Swift too. I've a branch to work with this, and it fits pretty well with the current design of the PR. It would fit pretty well with your suggestion too.

Hywan · 2023-08-07T10:46:08Z

@bendk

The callbacks themselves should get an additional arg specifying if the executor was cancelled. wake_callback() should check that and if there was a cancellation, it should still call Arc::from_raw() to unleak the pointer, but not call do_wake().

The biggest problem with Kotlin is that you don't know when a Job has been cancelled, except if your Continuation has configured a invokeOnCancellation completion handler (CompletionHandler). There is no isCancelled property, like you have in Swift.

jmartinesp · 2023-08-07T10:51:18Z

@bendk

The callbacks themselves should get an additional arg specifying if the executor was cancelled. wake_callback() should check that and if there was a cancellation, it should still call Arc::from_raw() to unleak the pointer, but not call do_wake().

The biggest problem with Kotlin is that you don't know when a Job has been cancelled, except if your Continuation has configured a invokeOnCancellation completion handler (CompletionHandler). There is no isCancelled property, like you have in Swift.

Actually, there is: on one hand, if you already have a reference to the Job you can just check Job.isCancelled. If you're inside a coroutine (or a suspend function) you can just check the underlying coroutineContext.isActive property, it will be false if it's either completed, cancelling or cancelled it I'm not mistaken.

Also, if you change the call from suspendCoroutine to suspendCancellableCoroutine, its Continuation has those same isActive/isCancelled properties.

OtaK · 2023-08-07T12:57:26Z

Just a heads-up, this PR (in its current state) seems to introduce a regression for the following situation:

// error: future cannot be sent between threads safely
// the trait `Send` is not implemented for `*const c_void`
// note: captured value is not `Send`
// required by a bound in `RustFuture::<T, UT>::new`

#[derive(uniffi::Object)]
pub struct MyFirstObj;

#[uniffi::export]
impl MyFirstObj {
    pub async fn do_thing(&self, second_obj: Arc<MySecondObj>) {}
    //                           ^^^^^^^^^^ has type `*const c_void` which is not `Send`
}

#[derive(uniffi::Object)]
pub struct MySecondObj;

Such code used to compile without issues on 0.24.3.

Hywan · 2023-08-07T14:15:10Z

Yeah, because the Future has a Send + 'static bound. It was already the case before, but things are moved inside async move { … } and I think the compiler is complaining about some stuff now.

jplatte · 2023-08-07T14:20:09Z

@OtaK as per the discussion above, this PR is unlikely to be merged anyways. But it's interesting that we don't have a test case like that, it should probably be added to the next PR that aims to fix the cancellation problem.

…bled.

When we execute a future on Kotlin, we create a `jni.Callback` instance that Rust calls when the Future is ready. We need to hold a reference to that object or else the JVM will finalize it and when Rust tries to invoke the callback bad things will happen. In the unit tests this causes them to stall because the Kotlin coroutine continutation is never completed. The `callbackHolder` variable was intended to hold a reference, but it doesn't seem to be working (see [arg0d's analysis](mozilla#1677 (comment))). I belive the issue is that variable isn't actually used anywhere so Kotlin frees it before the coroutine has completed and the reason mozilla#1684 fixes the issue is that it uses the variable in the completion handler (https://github.com/mozilla/uniffi-rs/pull/1684/files#diff-f564b605a6ba32d19bcb47128dc9c5c2f01169e2a41fa60ffaaf0b6eea13845cR34) This commit changes things so instead of using a local variable, there's a global set that stores all active callbacks. It seems to fix the issue for me when using arg0d's `repeat(1000) { ... }` test.

bendk · 2023-08-07T18:39:30Z

@bendk

I like a lot about this PR: the doc updates, the fix for #1677. But I wonder if we should take a different approach to fixing the cancellation issue.
It seems to me that there's currently a fundamental problem with the ForeignExecutor scheduling: some scheduled calls never happen. The main cause seems to be that once the executor (AKA the CoroutineScope in Kotlin) is canceled then future scheduled calls will be ignored. Another cause that I'm not 100% sure about is that a call is successfully scheduled, but the executor is cancelled before that call fires. Because of this, we're leaking Arc pointers because we schedule a wake callback, but the callback never happens.
If that analysis is correct, then I'd rather update the ForeignExecutor API to handle this. I think there's 2 changes that need to be made:

ForeignExecutorCallback should return a value indicating if the callback was scheduled successfully. We can then check this value in schedule_do_wake() and if we see a failure then unleak the Arc pointer.

The callbacks themselves should get an additional arg specifying if the executor was cancelled. wake_callback() should check that and if there was a cancellation, it should still call Arc::from_raw() to unleak the pointer, but not call do_wake().

I think this will require a lot less changes on the Rust side. I also think this should be do-able from the Kotlin side. I think the first bullet is just a matter of checking isCancelled. The second bullet might be a bit more tricky, but it seems to me that there has to be a way to do that.
The main reason I'm suggesting this is that I want to start using the ForeignExecutor for more features (#1675). I think we're going to run into the same issue there if we don't fix things at the ForeignExecutor level here. Also, in general, it feels like this PR is working around a broken primitive operation rather than fixing the primitive itself.

To clarify, I did open the PR because we are hitting this problem hard, and it's blocking us. I know it's not the best fix possible, and I'm OK if you close this and start something from scratch :-). I'm aware this is a dressing on a deeper problem.

That makes sense. I've started some work on this and hoepfully will have a PR soon.

bendk · 2023-08-07T19:10:26Z

@Hywan are you able to reproduce this issue on swift? It seems to work for me when I try to write a similar unit test.

bendk · 2023-08-07T19:12:10Z

@Hywan are you able to reproduce this issue on swift? It seems to work for me when I try to write a similar unit test.

Nevermind about this, I just needed to adjust my timeouts.

Copied from mozilla#1684 (thanks Hywan!)

bendk · 2023-08-09T19:38:55Z

To clarify, I did open the PR because we are hitting this problem hard, and it's blocking us. I know it's not the best fix possible, and I'm OK if you close this and start something from scratch :-). I'm aware this is a dressing on a deeper problem.

I just made #1697 to address the deeper problem. How about closing this one?

This patch switches UniFFI from mozilla/uniffi-rs#1684 to mozilla/uniffi-rs#1697.

Hywan added 7 commits July 27, 2023 14:30

fix(fixtures): Update the namespace name in the UDL file.

e0bfe8f

doc: Fix and update doc of RustFuture.

adb813f

doc: Fix and update more doc of RustFuture.

c71e7a4

chore: Drop the RustFuture waker as soon as possible.

8e97d69

chore: Simplify fixtures/future a little bit.

6c254fc

fix: Add @Deprecated on Kotlin to avoid a warning.

1afc1f0

Should not be necessary once https://youtrack.jetbrains.com/issue/KT-47902 is fixed.

Hywan requested a review from a team as a code owner August 2, 2023 09:50

Hywan requested review from bendk and removed request for a team August 2, 2023 09:50

Hywan changed the title ~~feat: Cancellation Future~~ feat: Cancellable Future Aug 2, 2023

Hywan force-pushed the feat-cancellable-async-func branch 3 times, most recently from 65bf9bf to 2fbf296 Compare August 2, 2023 10:13

Hywan mentioned this pull request Aug 2, 2023

[DO NOT MERGE] Try and work out what's causing kotlin test timeouts #1677

Closed

Hywan force-pushed the feat-cancellable-async-func branch from 2fbf296 to 9ea66ca Compare August 2, 2023 10:21

Hywan added 5 commits August 2, 2023 14:08

feat: Add FfiType::Future.

c0891aa

This patch adds a new `FfiType` variant: `Future`, which represents a pointer to a `RustFuture`.

feat: RustFuture returns Arc<Self> instead of Pin<Arc<Self>>.

db62d75

Since `RustFuture::future` holds a `Pin` to the `Future`, the entire `RustFuture` no longer needs to be wrapped inside a `Pin`. It simplifies the code a little bit.

Hywan force-pushed the feat-cancellable-async-func branch from 9ea66ca to db62d75 Compare August 2, 2023 12:08

jplatte reviewed Aug 2, 2023

View reviewed changes

uniffi_bindgen/src/bindings/kotlin/templates/AsyncTypes.kt Outdated Show resolved Hide resolved

uniffi_core/src/ffi/rustfuture.rs Outdated Show resolved Hide resolved

chore: Clean up.

8a25f34

Hywan requested a review from jplatte August 2, 2023 13:46

jplatte added a commit to matrix-org/matrix-rust-sdk that referenced this pull request Aug 2, 2023

ffi: Use UniFFI fork with future cancellation support

31c14c0

Upstream PR: mozilla/uniffi-rs#1684

jplatte mentioned this pull request Aug 2, 2023

Use UniFFI fork with future cancellation support matrix-org/matrix-rust-sdk#2374

Merged

bendk reviewed Aug 5, 2023

View reviewed changes

jmartinesp reviewed Aug 7, 2023

View reviewed changes

uniffi_bindgen/src/bindings/kotlin/templates/ObjectTemplate.kt Outdated Show resolved Hide resolved

chore: Remove one useless comment.

622204d

chore: Use x?.let to remove !! in Kotlin.

931240f

feat: Drop Future with async_compat if the tokio feature is ena…

9e2ffac

…bled.

Hywan force-pushed the feat-cancellable-async-func branch from d0bc131 to 9e2ffac Compare August 7, 2023 15:47

bendk mentioned this pull request Aug 7, 2023

Fix Kotlin callback handler reference handling #1687

Merged

bendk added a commit to bendk/uniffi-rs that referenced this pull request Aug 8, 2023

Fix for Kotlin toChar deprecation (mozilla#1685)

153a09b

Copied from mozilla#1684 (thanks Hywan!)

bendk mentioned this pull request Aug 8, 2023

Fix for Kotlin toChar deprecation (#1685) #1692

Merged

Hywan closed this Aug 10, 2023

Hywan mentioned this pull request Aug 10, 2023

Avoid memory leaks with async cancellation #1697

Merged

Hywan added a commit to matrix-org/matrix-rust-sdk that referenced this pull request Aug 10, 2023

chore(uniffi): Update uniffi to another branch.

0372467

This patch switches UniFFI from mozilla/uniffi-rs#1684 to mozilla/uniffi-rs#1697.

Hywan mentioned this pull request Aug 10, 2023

chore(uniffi): Update uniffi to another branch matrix-org/matrix-rust-sdk#2404

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Cancellable `Future` #1684

feat: Cancellable `Future` #1684

Hywan commented Aug 2, 2023 •

edited

Hywan commented Aug 2, 2023

jmartinesp commented Aug 2, 2023 •

edited

bendk commented Aug 5, 2023

bendk left a comment

bendk commented Aug 5, 2023

Hywan commented Aug 7, 2023 •

edited

Hywan commented Aug 7, 2023 •

edited

Hywan commented Aug 7, 2023

Hywan commented Aug 7, 2023

jmartinesp commented Aug 7, 2023 •

edited

OtaK commented Aug 7, 2023

Hywan commented Aug 7, 2023

jplatte commented Aug 7, 2023

bendk commented Aug 7, 2023

bendk commented Aug 7, 2023

bendk commented Aug 7, 2023

bendk commented Aug 9, 2023

feat: Cancellable Future #1684

feat: Cancellable Future #1684

Conversation

Hywan commented Aug 2, 2023 • edited

Hywan commented Aug 2, 2023

jmartinesp commented Aug 2, 2023 • edited

bendk commented Aug 5, 2023

bendk left a comment

Choose a reason for hiding this comment

bendk commented Aug 5, 2023

Hywan commented Aug 7, 2023 • edited

Hywan commented Aug 7, 2023 • edited

Hywan commented Aug 7, 2023

Hywan commented Aug 7, 2023

jmartinesp commented Aug 7, 2023 • edited

OtaK commented Aug 7, 2023

Hywan commented Aug 7, 2023

jplatte commented Aug 7, 2023

bendk commented Aug 7, 2023

bendk commented Aug 7, 2023

bendk commented Aug 7, 2023

bendk commented Aug 9, 2023

feat: Cancellable `Future` #1684

feat: Cancellable `Future` #1684

Hywan commented Aug 2, 2023 •

edited

jmartinesp commented Aug 2, 2023 •

edited

Hywan commented Aug 7, 2023 •

edited

Hywan commented Aug 7, 2023 •

edited

jmartinesp commented Aug 7, 2023 •

edited