Concurrent metadata fetching #22

aochagavia · 2024-01-25T15:13:40Z

Closes #3

aochagavia · 2024-01-25T15:20:18Z

Some remarks and questions:

What do you think, on an API level, about the modifications to the DependencyProvider trait and the introduction of the OneShotSender struct?
I had to introduce RefCell everywhere, to be able to borrow &self instead of &mut self inside the future, so multiple futures could run in parallel. The result is pretty ugly and I'll be thinking of alternatives (e.g. more cleanly splitting off the piece of state that gets updated asynchronously). If you have any ideas please let me know.
The current implementation uses async, which IMO proved very useful. I picked smol as the runtime because it allows spawning futures that are non 'static (in contrast to tokio, which forces you to reference count anything you want to share, making the RefCell issue even more painful). I'm not sure smol is mature enough, but if 2 goes well we might get away using tokio after all. Additionally, I think we'll want to use a single-threaded async runtime, because there's no IO involved, we are using async purely as syntactic sugar.

tdejager · 2024-01-25T16:15:47Z

I'll have a good look at this tomorrow!

baszalmstra · 2024-01-25T19:05:28Z

I think this currently slightly misses the point.

The main goal that I would like to achieve is as follows: when solving incrementally, we know after each partial solve that there are 1 or more candidates that we are going to need more information from. Currently we fetch this information sequentially but especially when network requests are involved it makes sense to request this information concurrently.

Another approach could be to add a function to the dependency provider that provides the result for multiple solvables. Implementers can choose to make this concurrent or not.

Although this callback api is nice, the same could be achieved by passing in a closure that should be called. Internally the closure can do whatever.

I also wonder if an async runtime is needed, couldnt you achieve the same result with a normal blocking channel? Since the solver loop is still blocking anyway.

tdejager · 2024-01-25T19:13:30Z

Yeah I agree a blocking channel can also be used in the asynchronous context because 'send' is non-blocking.

Because you are using futures and smol here anyway I see no reason not to make the functions in the DependencyProvider asynchronous, you are bringing in a runtime anyways.

However, avoiding this is also great but then I think a channel would just work :)

aochagavia · 2024-01-26T08:06:34Z

I think this currently slightly misses the point.

The main goal that I would like to achieve is as follows: when solving incrementally, we know after each partial solve that there are 1 or more candidates that we are going to need more information from. Currently we fetch this information sequentially but especially when network requests are involved it makes sense to request this information concurrently.

For a bit of context, my proposed async / await approach is meant to keep the sequential structure of the code, while enabling concurrency. Await is used on the solver side whenever there is a chance of having to wait on the dependency provider. This, combined with spawn, gives us the ability to concurrently call the dependency provider's methods after each partial solve (I added a test to showcase this, where the dependency provider sleeps on a separate thread and notifies the solver when it's done).

At least from a functionality perspective, this seems to cover the use case you mention. Or am I missing something?

I also wonder if an async runtime is needed, couldnt you achieve the same result with a normal blocking channel? Since the solver loop is still blocking anyway.

A normal blocking channel was my first prototype, but my impression was that it would require significant changes to the solver code (maybe I'm wrong, though). While the solver loop is indeed blocking, in some cases it has to pause execution of Solver::add_clauses_for_solvable and resume it later (e.g. if the dependency provider does some processing in a different thread, using async or plain old blocking code). In the meantime, while it waits for that function to resume execution, it can concurrently call Solver::add_clauses_for_solvable for the next solvable in the list. As I mentioned above, using async / await allows us to keep the sequential structure of the code (i.e. add_clauses_for_solvable and friends), which seems like a huge benefit to me.

Regarding using an async runtime... It's kind of inevitable once you go down the async / await route. However, we could create our own barebones runtime with only the features we need (i.e. single-threaded and supporting async one-shot channels), instead of pulling a more heavyweight dependency. I had to create a custom runtime for my last project and found it surprisingly straightforward (once you understand that async / await is just syntactic sugar), so we should be able to pull it off if we want to go down that path.

Because you are using futures and smol here anyway I see no reason not to make the functions in the DependencyProvider asynchronous, you are bringing in a runtime anyways.

I'm not totally sure about that. I'd say the solver shouldn't care about which runtime is used by the dependency provider. If we allow the dependency provider to return futures, that means they must be compatible with the solver's runtime. I consider the fact that the solver uses async / await under the hood to be an implementation detail and would rather avoid leaking that information into its API.

Additionally, sharing a single async runtime between the solver and the dependency provider would force the solver to use a full-blown async runtime such as tokio to work well, which sounds like overkill in one of our main use cases (Conda packages) where there is no IO / waiting during solving.

tdejager · 2024-01-26T09:24:27Z

But if the solver defines the trait it defines what Futures it allows. So this means that it can choose its runtime in that sense.

If you bring in a Runtime anyway I don't see a reason not to make the DependencyProvider async.

Granted it would be best if the get_dependencies calls and get_candidates would not block the runtime it can potentially be running in but this can only be alleviated by essentially making the solve itself async, or running the solver on a seperate thread.

baszalmstra · 2024-01-26T11:08:06Z

We had a long discussion offline. I didnt properly read the code. I think this does fix the goals.

We decided to keep the (current) callback approach instead of making the DependencyProvider async.

@aochagavia will clean up the code and bring this in a mergable state!

aochagavia · 2024-01-30T14:49:29Z

src/solver/mod.rs

+            // This prevents concurrent requests to add clauses for a solvable from racing. Only the
+            // first request for that solvable will go through, and others will wait till it
+            // completes.
+            let mut clauses_added = mutex.lock().await;


Since this await (and others below) is inside a loop, we could make concurrent metadata fetching even more aggressive by writing this loop in such a way that it processes its items concurrently... I didn't do it, just to keep things simple, but let me know if that's desirable after all!

I do think that would be quite nice! But maybe we can do that in another PR?

Seems good to me!

aochagavia · 2024-01-30T14:51:00Z

src/solver/mod.rs

+                    &mut output.clauses_to_watch,
+                    dependency_name,
+                )
+                .await?;


Again, it might pay off to handle the iterations of this loop concurrently.

aochagavia · 2024-01-30T14:51:34Z

src/solver/mod.rs

+                    &mut output.clauses_to_watch,
+                    dependency_name,
+                )
+                .await?;


Again, it might pay off to handle the iterations of this loop concurrently.

aochagavia · 2024-01-30T14:53:49Z

tests/solver.rs

+        .build()
+        .unwrap();
+
+    provider.runtime = Some(executor);


This code means that all tests using solve_snapshot are now also using the async provider behind the scenes. If that's undesirable, let me know!

aochagavia · 2024-01-30T15:00:04Z

I have pushed a new wip commit. There are still a few open points, some in the form of comments, some written below:

I feel tempted to ditch smol in favor of tokio, just to be sure we are relying on solid foundations. We ended up using locks and channels, so having a production-ready async runtime seems beneficial. The downside is that we'd need to use more Rc, but it shouldn't be that much of a pain. Opinions?
We might be able to get rid of RefCell in the arena for clauses by postponing their allocation, but I'm afraid that might complicate the code more than it simplifies it, so my current stance is to keep the RefCell.

baszalmstra · 2024-01-30T15:12:13Z

I feel tempted to ditch smol in favor of tokio, just to be sure we are relying on solid foundations. We ended up using locks and channels, so having a production-ready async runtime seems beneficial. The downside is that we'd need to use more Rc, but it shouldn't be that much of a pain. Opinions?

Mm, adding tokio would add a TON of dependencies. I always thought that smol is also production ready?

We might be able to get rid of RefCell in the arena for clauses by postponing their allocation, but I'm afraid that might complicate the code more than it simplifies it, so my current stance is to keep the RefCell.

Lets do it in another PR then. :)

aochagavia · 2024-01-30T15:24:27Z

Mm, adding tokio would add a TON of dependencies. I always thought that smol is also production ready?

Tokio's dependencies can be kept reasonably low by only enabling the features you need, so I don't think there'd be much of a difference between it and smol. Regarding smol being production-ready, I have the feeling that tokio has much stronger backing and is more battle-tested, but I'm not opposed to giving smol a try. A nice benefit is that we get to keep the current code without Rcs, which is nice!

I'll do a last polishing round directly using smol's underlying crates, to make sure we don't pull in more dependencies than strictly needed.

baszalmstra · 2024-01-30T15:34:16Z

I would also be happy to use tokio if you feel that doesnt add to much weight.

tdejager · 2024-01-31T08:53:08Z

If you end up adding Tokio you
can use the oneshot channel from there instead.

aochagavia · 2024-01-31T08:54:22Z

I ended up using smol's async-executor after all, with all other concurrency primitives taken from the futures crate (including a one shot channel). Tokio was more painful to introduce than I expected 😅

baszalmstra · 2024-02-01T15:47:18Z

I tried updating rip with the changes from this branch and ran into a couple of issues.

To integrate the changes I implemented a two-stage approach. We have an object PypiResolvoBridge that implements the DependencyProvider trait from this crate. This type holds an Arc<PypiDependencyProvider>. In the modified functions I do something like this:

fn get_candidates(&self, name: NameId, sender: OneShotSender<Option<Candidates>>) {
    let provider = self.inner.clone();
    tokio::spawn(async move {
        let candidates = provider.get_candidates(name).await;
        let _ = sender.send(candidates);
    });
}

Here the inner PypiDependencyProvider has an async function get_candidates.

I think this is what you intended with your design?

The provider is an Arc because when I tokio::spawn something the future has to be 'static. I had to change a couple of fields in the inner dependency provider to make the object Sync. I simply used a couple of RwLocks.

However, I ran into trouble when I came to the Pool stored in the provider.

The return values of the Candidates contain several Ids which are allocated from the Pool. I could probably add another RwLock around Pool except that the DependencyProvider trait itself has this function:

fn pool(&self) -> &Pool<VS, N>;

This limits me from turning the internal pool into an RwLock.

The only way around this that I can think of is to make the Pool sync, however, that might come at the cost which is a shame for the non-concurrent usecase.

Another approach would be to make the DependencyProvider trait async itself which allows me to have more granular control over when I use tokio::spawn because I can opt to only "spawn" parts of my function (e.g. the heavy lifting work) and leave all interactions with the pool on the same thread as the task itself.

WDYT @aochagavia ?

aochagavia · 2024-02-02T10:54:18Z

We discussed this on Discord and came up with the following solution:

Put Pool behind Rc so it can be easily shared across async tasks.
Use single-threaded tokio in the solver instead of smol.
Let users configure the async runtime before passing it to the solver.
Document how to use different async runtimes, if desired.
Make the the async functions in DependencyProvider actually async, instead of relying on channels.

I just pushed the changes and will be checking if this update is more rip-friendly!

tdejager · 2024-02-05T16:00:15Z

So we just found out that for rattler (you can check-out this branch mamba-org/rattler#504) we make use of functions in the SolverCache that are async but the sort_candidates function isn't. This presents a problem when trying to use that function.

aochagavia · 2024-02-05T18:22:02Z

Is that something you could solve by blocking on the futures, because you know they'll always return Ready(...)? Otherwise it might make sense to make sort_candidates async.

baszalmstra · 2024-02-05T19:39:44Z

If thats the case (always ready) we can use .now_or_never()

tdejager · 2024-02-05T20:48:00Z

Right, that would work until someone changes the API but we could make that clear in the expect I suppose :)

baszalmstra · 2024-02-05T21:07:38Z

Im also not entirely sure that the assumption holds. This function may be called while the not all candidates for other packages have been loaded. But lets try to find out.

baszalmstra · 2024-02-05T21:16:17Z

Looking at the code it should be trivial to make sort_candidates async as well. I think this is the proper solution because that allows you to freely call the methods on the SolverCache and await them without issues.

tdejager · 2024-02-06T06:36:02Z

Okay let me give it a go later!

tdejager · 2024-02-08T09:53:16Z

Closing this in favor of #29 (from benchmarks that one is slightly faster)

aochagavia force-pushed the concurrent-metadata-fetching branch 3 times, most recently from 125cfdc to fe79c92 Compare January 30, 2024 14:46

aochagavia commented Jan 30, 2024

View reviewed changes

aochagavia force-pushed the concurrent-metadata-fetching branch from fe79c92 to ce12e32 Compare January 31, 2024 08:32

aochagavia marked this pull request as ready for review January 31, 2024 09:13

chore: update rust-toolchain to latest stable release

b05a137

aochagavia force-pushed the concurrent-metadata-fetching branch from ce12e32 to fe6a7fa Compare February 2, 2024 10:52

aochagavia force-pushed the concurrent-metadata-fetching branch from fe6a7fa to 6e26432 Compare February 2, 2024 11:04

feat: concurrent metadata fetching

0938551

aochagavia force-pushed the concurrent-metadata-fetching branch from 6e26432 to 0938551 Compare February 2, 2024 11:11

more concurrency

a6d2138

aochagavia force-pushed the concurrent-metadata-fetching branch from 96c8b60 to a6d2138 Compare February 2, 2024 16:06

tdejager added 2 commits February 6, 2024 10:06

feat: sort_candidates is now async

d76a865

fix: fmt and clippy

ddfba79

baszalmstra mentioned this pull request Feb 7, 2024

Improved Concurrent metadata fetching #29

Merged

tdejager closed this Feb 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concurrent metadata fetching #22

Concurrent metadata fetching #22

aochagavia commented Jan 25, 2024 •

edited

aochagavia commented Jan 25, 2024 •

edited

tdejager commented Jan 25, 2024

baszalmstra commented Jan 25, 2024

tdejager commented Jan 25, 2024

aochagavia commented Jan 26, 2024

tdejager commented Jan 26, 2024

baszalmstra commented Jan 26, 2024

aochagavia Jan 30, 2024 •

edited

baszalmstra Jan 30, 2024

aochagavia Jan 30, 2024

aochagavia Jan 30, 2024

aochagavia Jan 30, 2024

aochagavia Jan 30, 2024

aochagavia commented Jan 30, 2024

baszalmstra commented Jan 30, 2024

aochagavia commented Jan 30, 2024

baszalmstra commented Jan 30, 2024

tdejager commented Jan 31, 2024

aochagavia commented Jan 31, 2024 •

edited

baszalmstra commented Feb 1, 2024

aochagavia commented Feb 2, 2024 •

edited

tdejager commented Feb 5, 2024

aochagavia commented Feb 5, 2024

baszalmstra commented Feb 5, 2024

tdejager commented Feb 5, 2024

baszalmstra commented Feb 5, 2024

baszalmstra commented Feb 5, 2024

tdejager commented Feb 6, 2024

tdejager commented Feb 8, 2024

Concurrent metadata fetching #22

Concurrent metadata fetching #22

Conversation

aochagavia commented Jan 25, 2024 • edited

aochagavia commented Jan 25, 2024 • edited

tdejager commented Jan 25, 2024

baszalmstra commented Jan 25, 2024

tdejager commented Jan 25, 2024

aochagavia commented Jan 26, 2024

tdejager commented Jan 26, 2024

baszalmstra commented Jan 26, 2024

aochagavia Jan 30, 2024 • edited

Choose a reason for hiding this comment

baszalmstra Jan 30, 2024

Choose a reason for hiding this comment

aochagavia Jan 30, 2024

Choose a reason for hiding this comment

aochagavia Jan 30, 2024

Choose a reason for hiding this comment

aochagavia Jan 30, 2024

Choose a reason for hiding this comment

aochagavia Jan 30, 2024

Choose a reason for hiding this comment

aochagavia commented Jan 30, 2024

baszalmstra commented Jan 30, 2024

aochagavia commented Jan 30, 2024

baszalmstra commented Jan 30, 2024

tdejager commented Jan 31, 2024

aochagavia commented Jan 31, 2024 • edited

baszalmstra commented Feb 1, 2024

aochagavia commented Feb 2, 2024 • edited

tdejager commented Feb 5, 2024

aochagavia commented Feb 5, 2024

baszalmstra commented Feb 5, 2024

tdejager commented Feb 5, 2024

baszalmstra commented Feb 5, 2024

baszalmstra commented Feb 5, 2024

tdejager commented Feb 6, 2024

tdejager commented Feb 8, 2024

aochagavia commented Jan 25, 2024 •

edited

aochagavia commented Jan 25, 2024 •

edited

aochagavia Jan 30, 2024 •

edited

aochagavia commented Jan 31, 2024 •

edited

aochagavia commented Feb 2, 2024 •

edited