Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mostly universal version range spec draft #139

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

pombredanne
Copy link
Member

Originally at nexB/univers#11

This is an work in progress for "vers" a new mostly universal version ranges specification to use as a companion to purl.
This is a possible solution to these issues and PRs:

Signed-off-by: Philippe Ombredanne pombredanne@nexb.com

Originally at nexB/univers#11

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
@pombredanne
Copy link
Member Author

It comes with an experimental implementation in Python at https://github.com/nexB/univers/

@ashcrow
Copy link
Contributor

ashcrow commented Nov 30, 2021

Overall I like the idea! It seems to me that most of the time purl and vers would provide different functionality for different use cases. However, is there a case you see having purl be able to also incorporate vers through a field (such as in qualifiers)?

VERSION-RANGE-SPEC.rst Outdated Show resolved Hide resolved

- The ordering of multiple ``<version-constraint>`` in a range specifier is not
significant. The canonical ordering is by sorting these by lexicographical
order applied with this two steps approach:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sorting ends up being a bit weird, as this sorts first based on the comparator strings and second on the version. Sorting by version is possibly opening a can of worm as this would be scheme specific.

VERSION-RANGE-SPEC.rst Outdated Show resolved Hide resolved
Correct typo and meaning

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
VERSION-RANGE-SPEC.rst Outdated Show resolved Hide resolved
This is the same concept as a Package URL type.

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
version. And also provides a concrete enumeration of the available ranges as
a daily feed.

- The version 5 of the NVD CVE JSON data format at
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note, CycloneDX v1.4 has adopted the CVE v5.0 version range syntax.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excellent!
vers is essentially the same just using a compact notation, and reusing Package URL types. There should be a perfect bijection between vers and the NVD 5.0 (and the OSV schema) albeit with a possible need for a minimal mapping. Let me make this clear in the doc and explain what the mapping is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevespringett based on https://github.com/CVEProject/cve-schema/releases/tag/v5.0.0-rc5 I reckon this is still a release candidate... @chandanbn @rsc do you know when this will become 5.0?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RC designation will be removed when the CVE upload/update API service goes into production sometime early next year.

At this time the format is considered frozen for new additions. Only bug fixes or minor doc changes are being done based on the feedback from team developing the CVE services.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chandanbn Thank you ++ for details! I look forward to the format going live 🙇

I wonder if there would be a way to hone towards a common list or names of versionType used your spec and in purl and here?
And possibly also what is called a type for Package URL and package ecosystems/ aka. collectionURL in CVE 5?

I reckon the CVE 5 spec only provides examples and not definitive lists but we have many devilishly similar yet different takes on essentially the same thing:

For package type or ecosystem:

And for version ranges:

May there is a way we can better control the entropy of the universe? or at the minimum maintain some kind of unambiguous mappings between all these?

@rsc and @oliverchang ping wrt. OSV and @tschmidtb51 and @santosomar ping wrt. CSAF

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(mostly just thinking out loud)

I think a major "pro" of combining purl and vers into one entity is that you'd get the combined benefits of having the identity of a component plus the version ranges in one "thing". Version specifiers to me seem like ideal candidates for representing e.g. vulnerable version of a component. If we take CVE-2022-22818 as an example, we can represent the list of fixed versions as a list with purl:

  • pkg:pypi/django@2.2.27
  • pkg:pypi/django@3.2.12
  • pkg:pypi/django@4.0.2

but also as a list of vulnerable versions:

  • vers:pypi/>=4.0.0|<4.0.2
  • vers:pypi/>=3.0.0|<3.2.12
  • vers:pypi/>=2.0.0|<2.2.27 (or perhaps vers:pypi/<2.2.27 to account for unsupported 0x. and 1.x versions)

The latter however is not related to the Django package in any way, so in whatever database stores this information you have to then identify a versionless Django component to go with the version ranges (perhaps just pkg:pypi/django). If this were combined into purl, you could nicely represent vulnerable versions as of Django as:

purl:pypi/django@<2.2.27|>=3.0.0|<3.2.12|>=4.0.0|<4.0.2

The unfortunate problem is that the semantics of what the range represents (e.g. vulnerable vs not affected) would have to be represented outside of purl (or perhaps as agreed upon qualifiers).

Perhaps the best compromise would be to recommend the use of qualifiers for vers strings within purl? I.e.:

pkg:pypi/django@4.0.2?vulnerable_versions=vers:pypi/>=4.0.0|<4.0.2

Wdyt?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mprpic I think vers and purl SHOULD NOT enter the field of describing the state of a software. They are build to identify a product (including product name and version (with now also version ranges)) - and that works well. Determining, whether a product is affected or not (and that is always a question related to a specific vulnerability) is a completely different question which is well understood in security advisories.
I like the Unix philosophy: "Do one thing and do it well."
IMHO the one thing for vers and purl is identifying a product.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tschmidtb51 But that's the problem I tried to highlight, vers does not identify a product, it only identifies a version range that you then have to relate to a product. I only used the relationship to vulnerability affectedness as an example where you might want to represent identify of a product (purl) and its versions (vers) in one artifact (purlvers).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mprpic Sorry about the misunderstanding - I wanted to convey:

  • I agree that vers is a good approach to specify a version range
  • purlvers is able identify a product by name (purl) and a version range (vers)
  • purlvers should only specify a product (name+version), not the context affected or not affected. Otherwise, it becomes to complex to check that the information is consistent/ not contradicting. (Imagine an advisory stating the product A with the identifier pkg:pypi/django@4.0.2?vulnerable_versions=vers:pypi/>=4.0.0|<4.0.2 is fixed. What would be true in such a situation?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tschmidtb51 Right, the association of a purlvers to the context should not be a part of purlvers itself, which I guess rules out the qualifier. Having purl:pypi/django@<2.2.27|>=3.0.0|<3.2.12|>=4.0.0|<4.0.2 be the "thing" to associate with some context, like "is fixed" or "is vulnerable", would be more useful in an advisory.

These are no longer used: we now use a coma.

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
@pombredanne
Copy link
Member Author

@ashcrow re:

Overall I like the idea! It seems to me that most of the time purl and vers would provide different functionality for different use cases. However, is there a case you see having purl be able to also incorporate vers through a field (such as in qualifiers)?

yes, using a qualifier in a purl would be a way. I think (but need to double-check) that this is possible without the need to encode anything in the vers version range as I picked the vers component separators such that they are both mostly obvious and the ones commonly used and that they do not collide with a purl ones if we were to use both together.... but there may be still some ugliness in URL encoding as @coderpatros mentioned in #84 (comment)

Note that this essentially the proposal of @mprpic in #66 (comment)

@pombredanne
Copy link
Member Author

@david-a-wheeler @copernico @joshbressers @sbs2001 @Hritik14 @bwillis @coderpatros @jhutchings1 @brianf @jbmaillet ... ping... you all have been involved in the discussions that led to this. Your feedback is badly needed.

@pombredanne
Copy link
Member Author

@kerberosmansour @johnmod3 @erosb you had chimed in on this topic too. Your feedback is welcomed!

This may be problem in some cases.
Best is to keep the version as-is even in canonical form.

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
@joshbressers
Copy link

This feels like a good idea to me

I like that we can specify explicit versions that are affected or not affected. There will be instances where trying to list ranges will be harder than just listing the specific affected versions.

Long term I imagine we will want to keep a catalog of known versioning-scheme identifiers. I assume "semver" will end up being the default if no other ecosystem fits (should it be? I currently think yes, but I've only thought about it for a few minutes). In my mind I would compare this catalog of identifiers to how SPDX has a list of known licenses. If you want to add a new one, you can submit an issue or PR and discuss it.

I don't have time right now to do this (I might at the end of the month if nobody gets to it first), but I think putting an Examples section at the bottom could make understanding this easier for casual readers.

@jhutchings1
Copy link
Contributor

Thanks for advancing this draft @pombredanne ! Is there a world where we would want to require or recommend that a lowest common denominator version schema is used all the time? I don't particularly care which one, but having an unbounded set of them means interop could be a challenge for consumers.

cc: @KateCatlin @rschultheis @andrewbredow @reiddraper


Each ``<version-constraint>`` of this pipe-separated list can be either a
single constraint or a list of constraints separated in turn by an comma "," as
in ``1.2.3|>=2.0.0,<5.0.0``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I suggest a more OSV/CVE 5.0 like approach here? OSV used to have a similar approach to this (using a disjunctive normal form), but we realised there was a much more generalisable and simpler way to this as we were discussing this in depth with the CVE WG.

This would constrain the way these can be specified a little, leading to simpler evaluation, less potential human error and with no loss of expressiveness.

The additional restrictions would be:

  • Only >= and <, or =/no operator are allowed. (The first two operators would correspond to "introduced" and "fixed" in OSV).
  • Only a single constraint per pipe ("|") item. The meaning of pipe ("|") would also change, in that they just become a separator for the different constraints / points.

I.e. instead of 1.2.3|>=2.0.0,<5.0, we just have 1.2.3|>=2.0.0|<5.0.

To evaluate them, these essentially become sign posts / events on a version "timeline" (which could instead be a tree because of branching). The algorithm to evaluate this would be almost identical to how CVE 5.0 / OSV does it:

# e.g. v == '2.0.0'
# constraints = ['1.2.3', '>=2.0.0', '<5.0.0']

func is_v_affected(v, constraints):
  status = unaffected
  sorted_constraints = sorted(constraints)
  for constraint in sorted_constraints:
    if is_equals_operator(constraint) and constraint == v:
      return True

    if v >= constraint and is_greater_equals(constraint):
      status = affected

    if v >= constraint and is_less_than(constraing):
      status = unaffected
     
 return status

Why would we want this instead?

  1. A more restrictive way of specifying ranges means more consistency and less chance of human error when writing these ranges (e.g. there's no way to write overlapping ranges). They're also much easier to write evaluators for. This should still be just as expressive.
  2. <version entries have the side effect of pointing to versions that contain fixes. So, they are more descriptive to users who want to know what version to update to.
  3. This also generalizes well to specifying affected parts of complicated git commit trees that are not possible to express using a disjunctive normal form with the same semantics.

@pombredanne WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oliverchang re: WDYT?

I think this is effing brilliant and much better... I cannot find a case that would not fit for vulnerabilities or dependencies. This effectively becomes much simpler to implement in the general case. It also provides a natural "canonical" sort order, which is simply that of the versions. It may make the range conversion code of some range notation more complex in some complex edge cases: complex code for complex cases is perfectly OK in my book.

There is only one corner case where I wonder if adding a != (not equal operator) could help:

  • I can express this as an interval by adding a gap in an interval ... eg., if in >=2 | <6 I want to exclude version 5, then I can rewrite this >=2|<6 as >=2 | <5 | >=5.1 | <6.
  • This may demand to know what is the "smallest" next version after 5 and may not be strictly equivalent to !=5?
  • I reckon this is not a use case that could happen for vulnerabilities and bugs, yet is somewhat common for dependencies as in: use any version 2 or later, up to and excluding version 6, except for version 5 that I know is buggy.
  • I am split as >=2 | !=5.1 | <6 may feel a bit simpler ... yet these case may not be exceptional enough to warrant their own extra operator... e.g. less is more, and >= and <, or =/no operators may be enough?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pombredanne I'm very glad you like this!

It may make the range conversion code of some range notation more complex in some complex edge cases: complex code for complex cases is perfectly OK in my book.

Cannot agree more :)

There is only one corner case where I wonder if adding a != (not equal operator) could help:

I agree that less is more! It's a corner case that can still be expressed using >=, < and = alone. I would expect most ranges to represent different release branches (e.g. >= 2 | < 2.2 | >= 3 | < 3.0.3 | >= 6 | < 6.1), where the gaps would be naturally encoded by omission.

It's also easier to add operators later if it turns out != is really needed in a lot of cases, rather than removing it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also easier to add operators later if it turns out != is really needed in a lot of cases, rather than removing it.

That's an excellent point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oliverchang Your feedback is mucho welcomed on the latest push. Thank you ++ for this.

@pombredanne
Copy link
Member Author

@joshbressers wrote:

This feels like a good idea to me

Thank you for the kind encouragements!

I like that we can specify explicit versions that are affected or not affected. There will be instances where trying to list ranges will be harder than just listing the specific affected versions.

Long term I imagine we will want to keep a catalog of known versioning-scheme identifiers. I assume "semver" will end up being the default if no other ecosystem fits (should it be? I currently think yes, but I've only thought about it for a few minutes). In my mind I would compare this catalog of identifiers to how SPDX has a list of known licenses. If you want to add a new one, you can submit an issue or PR and discuss it.

Exactly. Though I came to appreciate that "semver" may be more like an unreachble dream than a reality. For instance, Ruby's semver is not semver. Composer's semver is not semver and am I am not even sure that node-semver is semver strictly either. Also semver has not notation for ranges. This spec could help there.

I don't have time right now to do this (I might at the end of the month if nobody gets to it first), but I think putting an Examples section at the bottom could make understanding this easier for casual readers.

Good point: I will add a bunch of examples!

@pombredanne
Copy link
Member Author

@jhutchings1 re:

Thanks for advancing this draft @pombredanne ! Is there a world where we would want to require or recommend that a lowest common denominator version schema is used all the time? I don't particularly care which one, but having an unbounded set of them means interop could be a challenge for consumers.

Ideally, I'd want to recommend the proposed unified "vers" notation together with a strict semver version syntax

@jbmaillet
Copy link

@jhutchings1 re:

Thanks for advancing this draft @pombredanne ! Is there a world where we would want to require or recommend that a lowest common denominator version schema is used all the time? I don't particularly care which one, but having an unbounded set of them means interop could be a challenge for consumers.

Ideally, I'd want to recommend the proposed unified "vers" notation together with a strict semver version syntax

Is there a world where we would want to require or recommend that a lowest common denominator version schema is used all the time: maybe here?

VERSION-RANGE-SPEC.rst Outdated Show resolved Hide resolved
VERSION-RANGE-SPEC.rst Outdated Show resolved Hide resolved
VERSION-RANGE-SPEC.rst Show resolved Hide resolved
VERSION-RANGE-SPEC.rst Outdated Show resolved Hide resolved
Copy link

@tschmidtb51 tschmidtb51 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to add a special versioning type called all to provide at least a way to convey the information that all versions of this package are meant. This is useful especially, as tools probably won't implement all custom versioning schemes. With all there is no need to define a separate /custom versioning scheme if you only want to state that all versions match.

By convention the versioning scheme **should** be the same as the ``Package URL``
package type for a given package ecosystem. It is OK to have other schemes
beyond the purl type. A scheme could be specific to a single package name.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to add an explicit versioning-scheme which only expresses "all versions". The benefit of that is, that it is easily parsable (if you come across that type, any given version is in the range) and can be used even if the product itself uses an unknown / private / undefined versioning-scheme.

Suggested change
The special ``<versioning-scheme>`` ``all`` is used to convey an universal and simple form of all versions. It can only be combined with the `*` comparator. Therefore, the only valid value is: `vers:all/*`

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to add an explicit versioning-scheme which only expresses "all versions". The benefit of that is, that it is easily parsable (if you come across that type, any given version is in the range) and can be used even if the product itself uses an unknown / private / undefined versioning-scheme.

@tschmidtb51 we could also make the * or all a special value as in vers:all or vers:* ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you know, I like regex for quick checks. So I would suggest a quick check regex here as ^vers:[a-z]+/(*|([!<=>].+))$ Therefore, my natural suggestion was vers:all/*... I would not use vers:* as this contradicts the rules given.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been resisting using stars until now... but your arguments and other's arguments make sense.
In addition, I have been diving in the nuget versioning schemes where stars are rather special as they actually not only are used select a subrange of versions, but they also change how to interpret this range from lowest version (the default when there is no operator) to the highest compatible version when there is a star in a version segment... We may need to accept stars in versions for some scheme that demand it after all, and if so we could also accept these for the generic case of "all versions".

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also #267


The special star "*" comparator matches any version. It must be used **alone**
exclusive of any other constraint and must not be followed by a version. For
example "vers:deb/\*" represent all the versions of a Debian package. This

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid confusion, I suggest to write:

Suggested change
example "vers:deb/\*" represent all the versions of a Debian package. This
example `vers:deb/*` represent all the versions of a Debian package. This

``pypi`` with PEP440.


The special star "*" comparator matches any version. It must be used **alone**

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid Markdown confusion, I suggest:

Suggested change
The special star "*" comparator matches any version. It must be used **alone**
The special star `*` comparator matches any version. It must be used **alone**

Comment on lines +424 to +425
- The left hand side is the <versioning-scheme> that must be lowercase.
Tools should validate that the <versioning-scheme> is a known scheme.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The left hand side is the <versioning-scheme> that must be lowercase.
Tools should validate that the <versioning-scheme> is a known scheme.
- The left hand side is the <versioning-scheme> that must be lowercase.
Tools should validate that the <versioning-scheme> is a known scheme.
If the ``<versioning-scheme>`` is equal to ``all``, all versions are a match -
independent of the version scheme the product would normally rely on.
Parsing is done and no further processing is needed for this ``vers``. A tool
should report an error if the ``<version-constraint>`` is not equal to `*`.

- a versioning scheme
- a list of constraints of comparator and version, sorted by version
and where each version occurs only once in any constraint.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- If the versioning scheme is ``all``, then the "tested version" is IN the range. Check is finished.

``pypi`` with PEP440.


The special star "*" comparator matches any version. It must be used **alone**
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A constraint with a * comparator is not a constraint at all. IMO a better way to indicate no constraints is to not have a constraint in the given version range.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been useful to me to use * to mean "all versions of the package", and leaving out the constraint to mean either the abstract package (i.e. the "concept" of zlib, for lack of a better term), or in cases where the version isn't known.

For example, in OSS Gadget, we can reference packages like:

  pkg:cpan/Apache-ACEProxy      The latest version of Apache::ACEProxy (via cpan.org)
  pkg:cran/ACNE@0.8.0           Version 0.8.0 of ACNE (via cran.r-project.org)
  pkg:gem/rubytree@*            All versions of RubyTree (via rubygems.org)

Of course, the convention that "no version specifier means the latest" was one that we made up for our use case, but it's been convenient.

- Gentoo https://wiki.gentoo.org/wiki/Version_specifier
- Alpine linux https://gitlab.alpinelinux.org/alpine/apk-tools/-/blob/master/src/version.c

- Arch Linux https://wiki.archlinux.org/title/PKGBUILD#Dependencies use its
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The man-pages explain the version format a bit better then the PKGBUILD wiki page.

https://man.archlinux.org/man/vercmp.8#DESCRIPTION

@jhutchings1
Copy link
Contributor

@pombredanne Just wanted to check in and see if this one's getting near acceptance/merging? Thanks!

@pombredanne
Copy link
Member Author

@jhutchings1 yes, this is mostly ready to merge. There is a few point that I would like to clarify and I will do this by Monday! For instance, there are some specifics wrt. NuGet handling of version ranges that may need to introduce a "*" in the syntax as it may not be otherwise be easily resolvable to a simplified version expression. And a few minor point that emerged from practical experimentation.

@stevespringett
Copy link
Member

@pombredanne what is the status of this? The CycloneDX docs on version ranges link to this spec using the most-merged URL.

Most of these notations can be converted without loss to the ``vers`` notation.
Furthermore these notations typically assume a well defined version string
structure specific to their package ecosystem and are not reusable in another
ecosystem that would not use the exact same version conventions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the argument, but pessimistic operators are widely supported, because they are useful in a large number of circumstances. Pessimistic operators would make many version statements much simpler and clearer.

I recommend adding a pessimistic operator, and just defining it in terms of the other operators. It's true that it won't be enough in some cases, but that's fine, then the user has to do something else. But let's not avoid handling a very common case.

Please add something like this:

"The pessmistic operator '~>' is a shorthand; > version is shorthand for ">=version|^<higher-version". The higher-version is computed by taking version, removing the last "." and all after it, then incrementing the number at the end of what remains (a number must be present). E.g., ">2.3.0" is equivalent to ">=2.3.0|<2.4".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this would fit with verss goal of being generalisable/universal to all ecosystems. '~>' as a shorthand for ">=2.3.0|<2.4" would only make sense when the versioning scheme is well defined (e.g. stridt SemVer) -- here the goal is to accurate represent ranges for every possible versioning scheme/ecosystem out there which may have drastically different rules around what "patch version" means.


- "=" is implied when used to enumerate vulnerable versions
- ">=" (greater or equal) is for the version that introduces a vulnerability
- "<" (lesser) is for the version that fixes a vulnerability
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: OSV now supports the equivalent of a "<=", and this brings it to parity with CVE 5.0 (which also does not support >).

We still have strong objections against >, because it can lead to very misleading ranges (see ossf/osv-schema#31 (comment)). GitHub also needed this initially, but they were able to remove this completely from their DB and no longer need this: github/advisory-database#19 (comment)

Comment on lines +640 to +642
- **generic**: a generic version comparison algorithm (which will be specified
later, likely based on a split on any wholly alpha or wholly numeric segments
and dealing with digit and string comparisons, like is done in libversion)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to split the generic one into 3 types:

  • datetime: The value is interpreted as timestamp according to RFC3339 and ISO 8601 (for simplicity sake just allow YYYY-MM-DDThh:mm:ss[.ffffff[Z|[+-]hh:mm]]). This is especially useful if you want to communicate something that resides in the cloud (like SAAS) where it is hard to determine the version of the running software or to communicate production dates.
  • string: A simple string comparison following the "usual" string sorting rules. (This seems to be currently not really cover - as soon as the string contains a digit it would be separated.
  • intdot: Split at . into integers, ignore strings. Compare those by groups. That would cover 80% of the versions currently around... It includes 4.2 as well as 8.7.190.182.919.
  • generic: Split at a set of delimiters (e.g. .,;:#+-). Compare those by groups but interpret well-known strings like alpha, beta as modifier to the group before.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pombredanne What needs to done to get these things integrated?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed - we should remove generic and introduce a unknown. Reasoning: generic sounds like 'a magic default that fixes everything'. However, that is hard for implementers ('What do I need to implement exactly?') as well as consumers of tools ('What exactly was implemented? Why do I get different results with different implementations?'). The unknown clearly states that the version semantics are not known - and therefore, it can be computed by machines. Users SHOULD avoid to use it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jhutchings1
Copy link
Contributor

@pombredanne 👋🏻 Just wanted to check: how close are we to merging this change?

@jhutchings1
Copy link
Contributor

@pombredanne Just checking in, are you closer to accepting this yet?

@nscuro
Copy link

nscuro commented Aug 21, 2023

Are there any blockers for getting this merged? Conversely, how can we help moving it forward?

@pombredanne
Copy link
Member Author

@jhutchings1 @nscuro I will tackle this next week.

@dlorenc
Copy link

dlorenc commented Oct 13, 2023

Is this one still planned?

@pombredanne pombredanne mentioned this pull request Nov 6, 2023
and dealing with digit and string comparisons, like is done in libversion)


TODO: add Rust, composer and archlinux, nginx, tomcat, apache.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add semver with the rules from https://semver.org/ => see #264


These are a few known versioning schemes for some common Package URL
`types` (aka. ``ecosystem``).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also mention here the true and false cases: #267

- Simplify the list of constraints.


Version constraints simplification

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should only allow the canonical form - simplification and transformation introduce additional places for mistakes...

universal way to compare two versions, even though the concepts that exist in
most version range notations are similar.

Each package type or ecosystem may define their own ranges notation and version

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's even less consistent. In Python, there are multiple package managers, and not all package managers express versions the same way, even though they are the same ecosystem. https://python-poetry.org/docs/dependency-specification/#version-constraints

For Java, it's already noted that Gradle is similar to Maven, but there's also Ivy, which can load dependencies from Maven, but has its own version syntax. It also supports adding custom version constraints. https://ant.apache.org/ivy/history/latest-milestone/ivyfile/dependency.html

Apache TomEE 7.0.0-M1 - 7.0.7, Apache TomEE 1.0.0-beta1 - 1.7.5."

- a normalized version range spec is:
``vers:tomee/>=1.0.0-beta1|<=1.7.5|>=7.0.0-M1|<=7.0.7|>=7.1.0|<=7.1.2|>=8.0.0-M1|<=8.0.1``

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If TomEE is a Maven package, shouldn't this say vers:maven?


By convention the versioning scheme **should** be the same as the ``Package URL``
package type for a given package ecosystem. It is OK to have other schemes
beyond the purl type. A scheme could be specific to a single package name.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a scheme is specific to a single package name, how is the package name qualified to handle ambiguity if multiple packages with the same name appear in other ecosystems?

attempting to avoid the creation of empty or impossible version ranges.

- Spaces are not significant and removed in a canonical form. For example
"<1.2.3|>=2.0" and " < 1.2. 3 | > = 2 . 0" are equivalent.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't both of these ranges unsatisfiable? I'm worried about using '|' as a constraint separator because '|' is commonly used for "or" and here it is being used for "and".

I think the proposed scheme is too simplistic for dependency resolution. Testing if a version is affected by a vulnerability where the vulnerable packages are specified using multiple vers: ranges is probably okay, but consider what happens if you apply boolean operations to two version ranges to exclude ranges? For example, the Maven documentation gives the range (,1.0],[1.2,) (less than or equal to 1.0 or greater than or equal to 1.2) which is unrepresentable by the current spec that only supports and constraints. It's similar to the above vers:gem/>=2.2.0|!= 2.2.1|<2.3.0 example, but with 2.2.1 replaced by a range.

Comment on lines +371 to +373
- If a ``version`` in a ``<version-constraint>`` contains separator or
comparator characters (i.e. ``><=!*|``), it must be quoted using the URL
quoting rules. This should be rare in practice.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be more explicit about using percent encoding to escape a specific set of characters that are ambiguous. From experience with PURL:

  1. There are two different URL specs which specify slightly different sets of characters that must be encoded.
  2. Many readers don't understand that URLs encode different sets of characters in different parts of the URL and that percent encoding is not the same as x-www-form-urlencoding, which has special rules about spaces and plus signs.

- Split the specifier from left once on a slash "/".

- The left hand side is the <versioning-scheme> that must be lowercase.
Tools should validate that the <versioning-scheme> is a known scheme.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it should be "Tools MAY validate". Not every use case requires the same level of understanding. If you have a list of different ranges in different contexts, you may want to parse ranges you don't understand and be able to see the scheme and serialize the range back out (for example if it's part of a list containing supported and unsupported schemes and you want to preserve the unsupported ranges). It seems likely there will be libraries that have partial support, either by implementing only one of the use cases described in this spec, or by not supporting certain version expressions, in which case it seems like an easy extension to have a library that supports nothing but basic introspection and reserialization, which can be supported for all schemes.

Comment on lines +455 to +456
- If the version contains a percent "%" character, apply URL quoting rules
to unquote this string.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be just "percent decode the string". If the string doesn't contain "%", decoding it will yield the same string.

As written, based on experience with PURL, this is dangerous. If an implementation incorrectly uses something like WebUtility.UrlDecode for percent decoding, '+' will change to ' ' (even though the documentation currently says nothing about it). If it only happens when there is also a '%' sign, it will be much more difficult to detect.

scheme-specified version comparison and ordering.


Some of the known versioning schemes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the PURL spec, these schemes should be separated into their own document.

Based on experience with PURL, I think the spec should be written with the intention that not all implementations support all schemes, and implementations should not try to support schemes without understanding them unless this spec is accompanied by a comprehensive and well researched test suite. For PURL, there are a lot of cases that the test suite does not cover, and the test suite in at least one case asserts incorrect behavior because the ecosystem behavior wasn't well understood when it was added to the spec, and because the incorrect behavior is in the test suite it has been propagated into multiple implementations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the PURL spec, these schemes should be separated into their own document.

Good points!

Comment on lines +602 to +603
- **deb**: Debian and Ubuntu https://www.debian.org/doc/debian-policy/ch-relationships.html
The comparators are <<, <=, =, >= and >>.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this work? These comparators are not recognized by a vers parser.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should write instead: Debian uses these comparators: <<, <=, =, >= and >>.

Add Java implementation reference by @nscuro

Reference: https://github.com/nscuro/versatile
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

Co-authored-by: Niklas <nscuro@protonmail.com>
`types` (aka. ``ecosystem``).

- **deb**: Debian and Ubuntu https://www.debian.org/doc/debian-policy/ch-relationships.html
The comparators are <<, <=, =, >= and >>.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The comparators are <<, <=, =, >= and >>.
Debian uses these comparators: <<, <=, =, >= and >>.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet