Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solving slows down dramatically when testing hundreds of versions #135

Closed
charliermarsh opened this issue Oct 17, 2023 · 24 comments
Closed
Labels
bug Something isn't working enhancement New feature or request

Comments

@charliermarsh
Copy link
Contributor

When testing many consecutive versions of a given package, the cost of computing compatible versions seems to increase non-linearly over time.

Concretely, running this example slows down noticeably for me at around 100 iterations, and grinds to a ~halt in the 200s:

use pubgrub::error::PubGrubError;
use pubgrub::range::Range;
use pubgrub::report::{DefaultStringReporter, Reporter};
use pubgrub::solver::{OfflineDependencyProvider, resolve};
use pubgrub::version::SemanticVersion;

fn main() {
    let mut dependency_provider = OfflineDependencyProvider::<&str, SemanticVersion>::new();

    // root depends on foo...
    dependency_provider.add_dependencies(
        "root", (1, 0, 0),
        vec![
            ("foo", Range::any()),
        ],
    );

    for i in 1..1000 {
        dbg!(i);

        // foo depends on bar...
        dependency_provider.add_dependencies(
            "foo", (i, 0, 0),
            vec![
                ("bar", Range::any()),
            ],
        );

        match resolve(&dependency_provider, "root", (1, 0, 0)) {
            Ok(sol) => println!("{:?}", sol),
            Err(PubGrubError::NoSolution(mut derivation_tree)) => {
                derivation_tree.collapse_no_versions();
                eprintln!("{}", DefaultStringReporter::report(&derivation_tree));
            }
            Err(err) => panic!("{:?}", err),
        };
    }
}

I suspect that this may have to do with the increased size of the intersected versions. For example, after testing and rejecting Django 5.0a1, we produce a term like:

[ 0a0.dev0, 5.0a1 [  [ 5.0a1.post0.dev0, ∞ [

We then test Django 4.2.6, which gives us:

[ 0a0.dev0, 4.2.6 [  [ 4.2.6.post0.dev0, ∞ [

We then take the intersection of these terms, which gives us:

[ 0a0.dev0, 4.2.6 [  [ 4.2.6.post0.dev0, 5.0a1 [  [ 5.0a1.post0.dev0, ∞ [

This continues until we have a range like:

[ 0a0.dev0, 4.2rc1 [  [ 4.2rc1.post0.dev0, 4.2 [  [ 4.2.post0.dev0, 4.2.1 [  [ 4.2.1.post0.dev0, 4.2.2 [  [ 4.2.2.post0.dev0, 4.2.3 [  [ 4.2.3.post0.dev0, 4.2.4 [  [ 4.2.4.post0.dev0, 4.2.5 [  [ 4.2.5.post0.dev0, 4.2.6 [  [ 4.2.6.post0.dev0, 5.0a1 [  [ 5.0a1.post0.dev0, ∞ [

If you're testing hundreds of versions, the terms continue to grow, and I suspect this leads to some kind of quadratic behavior, since we're increasing the number of terms linearly with the number of tested versions.

The intersections aren't "wrong", and it's possible that we're doing something wrong with our version formulations -- but could we take advantage of the fact that we know there are no versions in these partial ranges?

For example, given [ 0a0.dev0, 5.0a1 [ [ 5.0a1.post0.dev0, ∞ [ and [ 0a0.dev0, 4.2.6 [ [ 4.2.6.post0.dev0, ∞ [, it would be preferable to reduce to [ 0a0.dev0, 4.2.6 [, if I'm understanding the syntax correctly.

@mpizenberg
Copy link
Member

I'll let @Eh2406 answer on the performance question as I've been out of the loop for quite a long time. Feels weird though as your root only depends on a single version of foo. So a soon as one is found to be ok, it should be good. But if you're not adding any "bar" to the knowledge, there isn't any solution. One remark, I'd try to separate the performance of the solver and the performance of the error reporting, as no effort has been put into error reporting speed at all.

Regarding you second issue, it's not as you think. An open bracket means the value is excluded. So [ 0, 1 [ [1a, 2 [ is different than [0, 2[ since the former does not include version 1, while the latter does. They cannot be collapsed to a single interval.

@charliermarsh
Copy link
Contributor Author

Thanks @mpizenberg.

One remark, I'd try to separate the performance of the solver and the performance of the error reporting, as no effort has been put into error reporting speed at all.

To clarify, omitting the error handling entirely (let _ = resolve(&dependency_provider, "root", (1, 0, 0))) shows an identical slowdown.

@mpizenberg
Copy link
Member

[ 0a0.dev0, 5.0a1 [ [ 5.0a1.post0.dev0, ∞ [

Means: versions between 0a0.dev0 (included) and 5.0a1 (excluded) and versions >= 5.0a1.post0.dev0.

Assuming 5.0a1.post0.dev0 comes next just after 5.0a1, this is the reduced form saying, "all versions >= 0.a0.dev0, except 5.0a1 which is excluded"

@charliermarsh
Copy link
Contributor Author

That makes sense, and I'm still learning as I go here, but, assume we know that there are no versions after 5.0a1. Could we not simplify that term to just [ 0a0.dev0, 5.0a1 [?

Similarly, given [ 0a0.dev0, 4.2.6 [ [ 4.2.6.post0.dev0, 5.0a1 [ [ 5.0a1.post0.dev0, ∞ [, assume we know that there are no versions between 4.2.6.post0.dev0 and 5.0.a1, and no versions after 5.0a1.post0.dev0. Could we not simplify the terms even further, to [ 0a0.dev0, 4.2.6 [, if we were able to leverage this information?

@charliermarsh
Copy link
Contributor Author

To be clear: I don't know for certain that the term expansion itself the cause of the slowdown. I have just noticed, in practice, that the solver slows to a halt when running into cases in which hundreds of versions of a given package are tested in sequence, and in profiling, significant time in spent computing term unions and intersections, which led me to notice that the terms get extremely long in these cases.

@mpizenberg
Copy link
Member

That makes sense, and I'm still learning as I go here, but, assume we know that there are no versions after 5.0a1. Could we not simplify that term to just [ 0a0.dev0, 5.0a1 [?

oh yeah definitely. I don't remember clearly what amount of work did we put into this kind of reduction. If I remember, reduction of terms was still in our todo list. Jacob can confirm. Some of it was discussed in #19 if I recall correctly.

significant time in spent computing term unions and intersections, which led me to notice that the terms get extremely long in these cases

Thanks for reporting it! It definitely helps to have concrete examples like these to figure out performance improvement opportunities

@Eh2406
Copy link
Member

Eh2406 commented Oct 18, 2023

Thank you for reporting and for the interesting conversation!

With the current architecture the resolver does not know about any NoVersion incompatibilities until all matching versions are enumerated. It knows that choose_package_version for any returned 9 the first time, and the second time when choose_package_version asked for not(9) it was given 3. This is insufficient information to determine if there are no versions in >= 4, <=8 because it could also be a dependency provider that happens to prefer of those versions (there in lock file, or their downloaded or whatever). Adding a get_no_version(p: P) -> VS that gets called at the beginning does not make things linear it just inverts the direction of the triangle. (It starts O(n) and gets smaller.) So we will need a method on dependency provider optimize_in_light_of_missing_versions(p: P, vs: VS) -> VS, but then we have to be careful to avoid comparing optimized VS's with unoptimized ones and also to be careful not to call the optimize method so often as to defeat the purpose. This is going to be hard but important problem to solve.

It would be nice to be able to confirm that this is what's actually causing the slowdown in this case. let me do some experiments.

@mpizenberg
Copy link
Member

mpizenberg commented Oct 18, 2023

On a side note, another generic way to help speedup the solver is to pre-record incompatibilities representing any kind of representable knowledge, such as non-existing versions in a given range, or a strategic incompatibility between two packages that can make the solver save many steps and dead ends (cf #121)

It's possible that given an ecosystem of packages (like rust packages) some key incompatibilities could save hours of cumulative time spent by everyone.

@Eh2406
Copy link
Member

Eh2406 commented Oct 18, 2023

Before #112 Range<NumberVersion> does not grow as it's smart enough to know that there cannot be versions in between consecutive integers. So I ran the following code (in release) with both NumberVersion and SemanticVersion

fn main() {
    let mut dependency_provider = OfflineDependencyProvider::<&str, Range<NumberVersion>>::new();

    // root depends on foo...
    dependency_provider.add_dependencies("root", 1, vec![("foo", Range::any())]);

    for i in 1..500 {
        let start = Instant::now();

        // foo depends on bar...
        dependency_provider.add_dependencies("foo", i, vec![("bar", Range::any())]);

        _ = resolve(&dependency_provider, "root", 1);
        let time = start.elapsed().as_secs_f32();

        println!("{i}, {time}");
    }
}

I used Excel to fit a trendline (note: the two lines are on different axes):
image
It is quite clear that both of these lines are nonlinear. It also looks like the SemanticVersion is growing more steeply then the NumberVersion. From this I conclude that there are other nonlinear things going on, and that but that the growth of the VS is most of the problem.

@Eh2406
Copy link
Member

Eh2406 commented Oct 19, 2023

Thinking about yesterday's analysis again even when intersection is O(1) (using NumberVersion before #112) we still have nonlinear behavior. While we work to make more kinds of VS have O(1) intersection, we may be able to make more progress on the other nonlinearity. Flame graph reports that using NumberVersion iteration 3000 spends most of its time in https://github.com/pubgrub-rs/pubgrub/blob/dev/src/internal/partial_solution.rs#L279-L282 and https://github.com/pubgrub-rs/pubgrub/blob/dev/src/internal/partial_solution.rs#L451-L460 both of which clearly do O(n) work every time they're called. They appear to be called O(n) times, by looking at debug logging. (Thinking about it I'm not sure why backtracking needs be called more than once.) O(n) calls of satisfier that each make O(n) calls intersection which take O(n) time (when not old NumberVersion) would explain the O(n^3) runtime we were seeing yesterday.

@mpizenberg
Copy link
Member

mpizenberg commented Oct 20, 2023

Regarding the rebuilding of assignments intersection ( https://github.com/pubgrub-rs/pubgrub/blob/dev/src/internal/partial_solution.rs#L279-L282 ) we could speed that up in different ways. First, intersection is fundamentally associative, meaning it's fully parallelizable (even on the GPU?). So we could compute the intersection in parallel. Then multiple backtracking are essentially recomputing exactly the same intersections, up to a point. Therefore, we could have logarithmically spaced snapshots of intersections, up-to a point for a given package. Then when backtracking happens, we start at the closest snapshot before the backtracked step, instead of starting from 0.

@Eh2406
Copy link
Member

Eh2406 commented Oct 22, 2023

I spent a lot of last night thinking about this problem, time when a more reasonable person would have been sleeping.

Thinking about it I'm not sure why backtracking needs be called more than once.

To elaborate, the benchmark as written should be a pretty direct test of this optimization, however it does not seem to be firing correctly. It is not firing because the fact that bar cannot be selected is not being stored in the partial solution. Several naïve ways of adding this information led to infinite loops, so this needs to be done with some care. I think this is a problem worth fixing because depending on a package/version that does not exist is a common base case for real conflicts, but we do have to be careful not to over optimize for synthetic benchmarks and slow down real benchmarks. (As several of my more targeted algorithmic improvements did.)

It is fairly easy to change the example to avoid that optimization even if it were working correctly. So algorithmic improvements to backtracking and finding satisfiers are probably valuable. Most importantly, a lot of these computations can be cashed. Either in a targeted fashion, by keeping track of exactly what's needed and pre-computing it. Or by a general fashion in which we keep track of the relations between terms in some data structure. Designs here can get pretty fanciful but generally can be explored after 0.3. (Well possibly adding a + Hash to the definition of VS, may make sense just in case, but otherwise it can wait.)

The problems of optimizing the representation of VS is also interesting to think about. I am fairly convinced that thinking about it as "optimizing a representation without changing what it matches" leads to contradictions, difficulties, or O(n) operations. The only thing that seems plausible is a call back for proactively asking a dependency provider for the null space. Something like "if this function returns Some(VS), then anything that matches VS is known not to have any versions available". (Assuming the bugs about "no version" incompatibilities not aggressively being added to the partial solution get fixed.) The hard part about this API is deciding how often the resolver should ask the dependency provider about a package. Too often, and we spend more time describing the null space than we do solving the problem. Too rarely, and we spend all of our time doing intersections. I think this should be possible to discuss after #104 is merged.

I'm considering breaking these three topics in to separate issues to keep conversations focused. Thoughts?

@mpizenberg
Copy link
Member

So I just had a look at the code this morning, to have a sense of what it would mean to store pre-intersected date derivations.
As a reminder, we already store the intersection of all derivations in our assignments_intersection since that is useful for the rest of the algorithm.
Now the question is would it be interesting to also store the intersection at each step,
and not just the latest intersection.
There is an obvious memory cost, but I don't think there is much of a compute cost, since these intersections are computed anyway to reach the latest intersection.
The benefit I think is that we can avoid recomputing all the terms intersection when backtracking.

There are three locations where we recompute these terms intersections.

  1. In backtrack, we truncate dated derivations back to a certain decision level and recompute intersection. (https://github.com/pubgrub-rs/pubgrub/blob/dev/src/internal/partial_solution.rs#L279-L282)
  2. In satisfier, we intersect all terms until we satisfy the term of the satisfied incompatibility (https://github.com/pubgrub-rs/pubgrub/blob/dev/src/internal/partial_solution.rs#L451-L460)
  3. In find_previous_satisfier, we alter the starting accumulator with the term of the satisfier index instead of Term::any and recompute all intersections until the incompat is satisfied again. (https://github.com/pubgrub-rs/pubgrub/blob/dev/src/internal/partial_solution.rs#L411-L419)

Point (3) is tricky because we can't get around recomputing all intersections, since we have a different starting point. It could still be speed up by using a dichotomic search, where we look at the previous satisfier by doing at each search point the intersection between that initial term and the precomputed accumulators.

@Eh2406
Copy link
Member

Eh2406 commented Oct 25, 2023

I have been working on several different approaches to see what will make a difference on these benchmarks. None of them are at a state for public discussion. Each on their own did not make a significant performance difference. Today I experimented with putting them altogether. And I'm seeing significant results!

The current version of the benchmark is:

    let mut dependency_provider = OfflineDependencyProvider::<&str, Range<NumberVersion>>::new();

    // root depends on foo...
    dependency_provider.add_dependencies("root", 1, vec![("foo", Range::full())]);

    for i in 1..500 {
        // foo depends on bar...
        dependency_provider.add_dependencies("foo", i, vec![("bad", Range::full())]);
    }

    let start = Instant::now();
    _ = resolve(&dependency_provider, "root", 1);
    let time = start.elapsed().as_secs_f32();
    let len = dependency_provider
        .versions(&"foo")
        .into_iter()
        .flatten()
        .count();
    println!("{len}, {time}");

with a slower variation by changing ("bad", Range::full()) to ("bad", Range::higher_than(i)).

Time (s) full higher_than
dev 4.5 11.6
hacks 1.2 2.6

This is without optimizations for making VS smaller! This is also without #104 ! And does not include creating an arena for package names. All of which clearly show up on the flame graph.

I will clean up some of my hacks into PR's, but probably not till after PackagingCon. I may also push for us to merge #104, as keeping two versions of each of these commits is getting difficult.

@mpizenberg

This comment was marked as outdated.

@charliermarsh

This comment was marked as outdated.

@Eh2406 Eh2406 added bug Something isn't working enhancement New feature or request labels Oct 27, 2023
This was referenced Oct 27, 2023
@Eh2406
Copy link
Member

Eh2406 commented Nov 3, 2023

It turns out that one of the hacks on its own was sufficient to get most of the performance benefit for the harder case. I am working on polishing for review in branch slow_135. The fix may be over indexed on the synthetic benchmark. @charliermarsh can you check if it helps with your real world use case? If not, we will have to figure out how to make a more realistic benchmark.

@charliermarsh

This comment was marked as outdated.

@Eh2406

This comment was marked as outdated.

@Eh2406

This comment was marked as outdated.

@charliermarsh

This comment was marked as outdated.

@Eh2406
Copy link
Member

Eh2406 commented Nov 6, 2023

Here are some numbers on the synthetic benchmark:

branch full higher_than
dev 4.7 12.1
slow_135 1.4 1.5
accumulated_intersection 0.84 0.90
fewer_intersections 0.31 0.37

All of the mentioned branches pass all tests and should be cleanable for merge, if they turn out to be as useful as they seem. All of those branches are pushed, so you can check how much of that improvement translates to real world examples by commit. I will run our "ron" files, next time I'm at my benchmarking computer. The flame graph still shows obvious opportunities for improvement, although interestingly not the same suggestions from last time.

@charliermarsh
Copy link
Contributor Author

I was able to update our branch to pull in those changes, and I let this pathological resolution run for 60 seconds. dev got through 433 candidate versions, while the changed branch only got through 212 candidate versions unfortunately. Hopefully I didn't mislead with the above benchmark :( It's also possible I messed up somewhere pulling in the changes... I'll double check now. Though I'd be interested to see how it affects the ron file benchmark.

@Eh2406
Copy link
Member

Eh2406 commented Nov 7, 2023

branch elm large_case zuse hobo * solana-watchtower *
is_terminal_when_created / dev 194.63 ms 9.2129 ms 205.27 ms 1224.3s 6739.7s
slow_135 203.31 ms 9.0410 ms 202.77 ms 984.7s 6478.7s
accumulated_intersection 204.25 ms 8.5664 ms 205.46 ms 933.9s 6526.9s
fewer_intersections 202.25 ms 8.6868 ms 200.92 ms 925.2s 6099.0s

The last 2 (the ones with *) would take far too long to run a full criterion analysis, so I'm just reporting the "Collecting 100 samples in estimated" time.

To say this is not the dramatic improvement I was hoping for, would be an understatement. On the other hand, I don't have an example demonstrating the dramatic slowdowns @charliermarsh observed. 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants