Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for more complex version constraints #86

Closed
calebzulawski opened this issue May 1, 2021 · 27 comments
Closed

Support for more complex version constraints #86

calebzulawski opened this issue May 1, 2021 · 27 comments

Comments

@calebzulawski
Copy link

Perhaps this is (understandably) outside the scope of this crate, but I figured I'd ask.

I'm trying to implement a solver for PEP 440, which unfortunately does not quite boil down to simple ranges. The versions do have a total order, but the constraints have some extra requirements. For example, >1 matches 1.1, 2.0, etc, but it does not match 2.0+foo despite that version being "greater". Is there any way to embed some additional arbitrary constraints into the version selection process?

Thanks for any help!

@mpizenberg
Copy link
Member

Hi @calebzulawski ! thanks for the feedback on this! I'd say that's exactly the same case that the issue with pre-release versions (see discussion and links in issue #80).

In the case of pre-release, we want <= 2.0-beta to include version 2.0-alpha for example, but <= 2.1 would not include it in the list of potential valid versions.

Currently, this is outside the limits of things possible with this crate. It is the major change we want to work on next, after releasing the 0.2.1 version, which will be a perf-improvement only change. As mentioned in #80, a prototype solution for this was started in the range-trait branch and consists in removing the Range type and replace it with a Range trait that would enable such behavior.

To be honest I've my hands full for at least 2 months before being able to start again work on this. But this is definitely the most requested features (3 persons now) so if any want to have a go at it, it will make a lot of people happy ^^.

@Eh2406
Copy link
Member

Eh2406 commented May 1, 2021

Technically No. But there are some workarounds and some plans for 0.3 that may work. We do want to make this kind of use case possible! And your feedback will be invaluable in getting there!

That being said, I am not familiar with the subtleties of PEP 440, so sorry if the links are not quite relevant. If you have a description or a targeted link I'd love to read it. ( also cc @MrGreenTea who has done some python experimentation. )

One form of augmented constraints that has bean discussed is "features", where the constraint tells the package to have some extra functionality and the dependencies required to support it. This can be supported now by adding a synthetic package bar+foo witch means "bar with the foo feature turned on". bar+foo @ 2.0 will have dependencies on the optional requirement of bar @ 2.0 that are enabled with foo and a dependence on bar @ =2.0 to get the normal dependencies. links: pubgrub-rs/advanced_dependency_providers#1

And in the 30 min it has taken me to wright this up @mpizenberg wright up the other thins I was going to say, so I will stop here!

@calebzulawski
Copy link
Author

Thanks for all of the information! It seems like the range trait is probably the answer to this problem. It would also be nice if that allows more tailored error messages. I may take a look at implementing this if I get a chance--is there anything specific that's incomplete about the range-trait branch?

I'll have to take a look at that advanced dependency providers link, virtual packages seem to be used by almost every package manager so I'm sure I will run into that, too.

@MrGreenTea
Copy link

@calebzulawski A friend and I started experimenting with a PEP440 implementation for poetry:
https://github.com/MrGreenTea/poetry-pubgrub-rs

Sadly real life got in the way and I haven't been able to return to it since then, but perhaps it's interesting to you or you might even want to collaborate :)

@Eh2406
Copy link
Member

Eh2406 commented May 1, 2021

is there anything specific that's incomplete about the range-trait branch?

What is done: is that it works, and tests pass. What is not done: It is not bean rebased to use all the new perf-improvements. It does not have all needed polish, like finding the best names for the methods, or like requiring Eq vs a dedicated compare method. And none of the documentation has bean updated.

But the BIG thing it needs is an example of using it for a real world use case that can not be done with the existing system! Someone saying "I can demonstrate that that will get me unstuck" would be invaluable!

cc https://rust-lang.zulipchat.com/#narrow/stream/260232-t-cargo.2FPubGrub/topic/A.20Range.20trate

@calebzulawski
Copy link
Author

@MrGreenTea I'll take a look at that! I plan on at least publishing my PEP 440 parser to crates.io soon, so feel free to work with that once it's up. I'm planning on implementing a conda client, which isn't strictly PEP 440, but based on it.

@Eh2406 I'll see what I can do about getting it ready for release. Seems like once it's all cleaned up it should be easy enough to proof-of-concept PEP 440 as an example!

@Eh2406
Copy link
Member

Eh2406 commented May 2, 2021

I just rebased and force pushed the branch. So at least the perf improvements should be available. :-P

@lpil
Copy link
Contributor

lpil commented May 14, 2021

Hello @calebzulawski ! Did you push your version in the end? I'm attempting to add some form of prelease handling so that I can use this library with the Hex package manager for Gleam/Elixir/Erlang/etc but I've not managed to figure my way through the Range trait for this.

@calebzulawski
Copy link
Author

Unfortunately I've been pretty busy and haven't gotten a chance--I'm hoping to be able to start this soon

@lpil
Copy link
Contributor

lpil commented May 15, 2021

Thank you

@calebzulawski
Copy link
Author

I've got some time now so taking a look at this

@calebzulawski
Copy link
Author

@mpizenberg @Eh2406 I'm taking a look at the RangeSet trait and one concern I have is that version constraints won't necessarily act just like sets (PEP 440 definitely doesn't have like half of the set operations required). What do you think about expanding Term to contain the entire set implementation, and constraints only need a single function that returns if the constraint is satisfied?

@Eh2406
Copy link
Member

Eh2406 commented Jun 12, 2021

They all act like sets in a mathematical sense. But, they don't have user facing syntax for all the things we need. (For example Cargos syntax has no way to say "pre releases without stable releases") So the version we use will have to have a superset of the functionality implied by the native syntax. That being said, I have not successfully implemented this for any real world input yet. So it may be impractical to implement.

I can try to make a Term trait, but I suspect it will not be significantly easier to implement. Term has a negate and a intersection. They are required to realize that foo < 2.0 and foo > 4.0 is not going to lead to a solution (in O(1) not O(versions of foo)).

@calebzulawski
Copy link
Author

calebzulawski commented Jun 12, 2021

Hmm--let me try to clarify.

Term has intersection, but that just more or less calls intersection on Range (or the RangeSet trait in the branch). What I'm suggesting is that I expand Term to contain something for handling intersection, union, etc, and all the user needs to provide is the satisfy function.

So for example, ~=1.0 and >=2 are two possible PEP440 constraints, but there's nothing in PEP 440 to represent a union of those. I'm suggesting that Term should contain the code for representing Union( ~=1.0, >=2 ) and the user-definable trait only needs to check if a version satisfies ~=1.0 or >=2 and not the union itself.

The constraint trait would only have a single function fn satisfy(&self, version: Self::Version) -> bool and the rest would be built into Term.

@calebzulawski
Copy link
Author

A simple way to implement this would be something like:

enum Term<C> {
    Constraint(C),
    Negate(Box<Term>),
    Union(Box<Term>, Box<Term>),
    Intersection(Box<Term>, Box<Term>),
}

@Eh2406
Copy link
Member

Eh2406 commented Jun 12, 2021

I would love to see something like that work! The part I have always gotten stuck on with designs like that is relation_with. Specifically how do we tell that ~=1.0 and >=2 are disjoint (in O(1) not O(versions of foo))? The current implementation is self.intersection(other) == Self::empty(), but obviously Intersection(_, _) will not equal Constraint(_). So, let's add a is_empty method that can do more careful logic. Now the check is self.intersection(other).is_empty(). But, how is is_empty implemented?

@calebzulawski
Copy link
Author

Ah I understand the problem now. I think the problem I see is that for some constraints, determining if an intersection is disjoint is either extremely difficult or maybe intractable. This is particularly true when considering things like release candidates which can have relatively complicated logic. With simple number ranges it's easy but with more arbitrary versions I can see it being a problem.

If it's an optimization maybe it can be handled that way in the trait?

trait Constraint {
    type Version;
    fn satisfies(&self, version: &Self::Version) -> bool;
    fn disjoint(&self, other: &Self) -> Option<bool> { None };
}

An implementer can attempt to determine if two constraints are disjoint, but if not it will fall back to the O(n) implementation?

@Eh2406
Copy link
Member

Eh2406 commented Jun 12, 2021

The algorithm as implemented assumes that all the set operations are exact and reasonably fast, and that every operation returns a canonical representation so Eq can be used for all the checks. Furthermore the O(n) implementation is implicit in choose_package_version returning Ok((T, None)). So figuring out which methods are required and which are optimizations is going to be a lot of work. I am getting more certain that it is work that needs to be done.

@calebzulawski
Copy link
Author

I've been thinking about this a little more and I'm not so sure it's worth supporting completely arbitrary versioning with how much work it seems to require.

For some reason I think I had overcomplicated how hard it would be to implement a "punctured" range--a range that has some versions missing (such as release candidates). I think I see a path forward for implementing PEP 440 that way, and semver should be even easier.

This of course would make it difficult to implement completely arbitrary versioning, but I'm pretty sure most standards are based on ranges in some form or another...

@mpizenberg
Copy link
Member

A simple way to implement this would be something like:

enum Term<C> {
    Constraint(C),
    Negate(Box<Term>),
    Union(Box<Term>, Box<Term>),
    Intersection(Box<Term>, Box<Term>),
}

@calebzulawski Fun fact, for my first implementation of PubGrub (in elm), I initially had something like that. But it was so problematic to handle correctly all the cases that it was the source of many bugs and the reason I switched to a normalized list of non-intersecting semi-open segments (cf this commit mpizenberg/elm-pubgrub@6da0c12)

@mpizenberg
Copy link
Member

Anyway, we have finished twiddling with perf optimizations and are about to release 0.2.1. So both @Eh2406 and me will be able to focus our attention on the limits of the current API.

I have been reading PEP 440 today and oh boy I feel sorry for you ^^. That's kind of a nightmare of trying to be compatible with all pre-existing practices. Imagining that you can face a version like 42!2021.06rc5.post9.dev3 :'(

All that being said, I have few questions after reading this spec.

  1. Do you intend to support the arbitrary versions that do not follow PEP 440 and are referenced to by the === specifier.
  2. How do you plan to handle pre-releases regarding ranges specifiers? The spec says that exclusive ordered comparisons (> and <) specifically exclude pre-releases (and dev releases) and post-releases of the specified version (not of the earlier or later releases). And pre-releases and dev releases are implicitly excluded from all versions specifiers, unless already present on the local system, explicitly required by the user, or the only available version satisfying a given specifier. That being said, the PEP 440 spec says "SHOULD" so what's your take on this?

@Eh2406
Copy link
Member

Eh2406 commented Jun 21, 2021

I have a proof of concept implementation of support for the Semver crate, with one of the draft branches for 0.3. It includes the fact that >=1.2.3-4 matches 5.0.0 but not 4.0.0-alfa, witch is one kind of common more complicated constraint.
The code is at: https://gist.github.com/Eh2406/88b8c799be3f3a6589073e8e58504a11
I think the concept can be extended to where a version needs different requirements depending on witch of a small number of categories it is in. (in my case pre-release or not).

@njsmith
Copy link
Contributor

njsmith commented Jul 5, 2021

@calebzulawski @MrGreenTea oh huh, I went and implemented a python resolver using this crate too, before seeing this thread :-) (Mine's at https://github.com/njsmith/posy/blob/main/src/resolve.rs if you're curious. The resolver code is pretty drafty but the supporting infrastructure is pretty fleshed out. Happy to collaborate if it makes sense!)

In particular, as far as this thread goes, this code might be useful as a concrete example of compiling PEP 440's complex version comparison operators down into ranges: https://github.com/njsmith/posy/blob/af665987b61348940f2e44212c23341ce2f6340e/src/vocab/specifier.rs#L76

And I'm technically not quite supporting prereleases according to the PEP, but I think I'm just going to argue that the PEP's semantics don't really make sense for a multi-package resolver -- prereleases are best handled using global context/configuration, not individual specifiers: https://github.com/njsmith/posy#interpretationsdeviations-from-standards

Also, regarding what PEP 440 calls "local versions":

For example, >1 matches 1.1, 2.0, etc, but it does not match 2.0+foo despite that version being "greater".

I think this is actually a misreading of the PEP. (I also misread it this way for a long time!) You're not allowed to write > 2.0+foo in some package's metadata. But if the resolver finds a package in its index whose version is 2.0+foo, then that does satisfy a package that declares a dependency on > 1. It even satisfies a declared dependency on == 2.0.

@mpizenberg
Copy link
Member

Hey, we just released a new section called "Advanced usage and limitations" in pubgrub's guide. I hope this can be of help to some of you.

@njsmith
Copy link
Contributor

njsmith commented Aug 5, 2021

Re: this bit of the new section: https://pubgrub-rs-guide.netlify.app/limitations/continuous_versions.html

Would it make sense to define a VersionSet trait with the operations you need (membership, intersection, etc.), and let users define their own set representation? It seems like it might end up simpler for both you and users and also more efficient a Vec of RangeBounds, because you have less hard-coded vector operations, users can pick whatever set representation is most natural for their situation, and if a simple representation is adequate (like the current Vec<(Version, Option<Version>)>), then you don't pay any cost for messing with RangeBounds.

It might also allow more efficient implementations than currently, e.g. a custom VersionSet could potentially simplify [a, b) ∪ [b, c) => [a, c) without having to call b.bump().

It might also be useful for the "multi-dimensional ranges" in the pre-release section, or for preserving more "natural" high-level representations of version ranges like ~=2.1 in order to provide better error messages when a resolution fails.

@mpizenberg
Copy link
Member

Would it make sense to define a VersionSet trait with the operations you need (membership, intersection, etc.), and let users define their own set representation?

Yeah it totally would. Actually, for the approach discussed in "multi-dimensional ranges", users would implement their own membership and intersection primitives. I am personally wary of letting users implement it on their own because of the so many pitfalls/bugs that can happen doing so (cause we fell in them). And in those cases, the failures that could happen resulting of a tiny bug in version sets implementation, would be similar to the one you experienced with the Ord and Eq issue, but even less predictable I think. And I definitely don't want to have a majority of issues related broken contracts in some traits.

But the right move is probably keeping the freedom for users, while strongly steering them to one or few implementations of those traits already provided by the crate. It's a very interesting design space, but we are making slow progress since mostly doing that on our free time, and since evaluating those design decisions takes a bit of time.

@mpizenberg
Copy link
Member

Hi all, so support for more flexible version traits have landed in the dev branch. This means that it will land in v0.3. Now we have to work on documentation and example usage before releasing v0.3. Most of the concerns in this PR will be solved in the upcoming version so I'll close this issue. If you are interested in more info about how to actually use the new version 0.3, I suggest you take a look at the dev branch, and subscribe to pubgrub releases to be reminded when it will be fully available with documentation and examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants