Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Add version ranges #93

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

david-a-wheeler
Copy link

Package-URL by itself isn't very good for reporting about
vulnerabilities because it cannot report version ranges.
CPE, by contrast, can report version ranges.

This commit proposes adding version range support to package_URL.
I think node-semver is pretty common, so I referenced that; other
options are possible too. Note that even if you don't use semver,
the version range still works, since hierarchical numbers are very common.

Signed-off-by: David A. Wheeler dwheeler@dwheeler.com

Package-URL by itself isn't very good for reporting about
vulnerabilities because it cannot report version ranges.
CPE, by contrast, *can* report version ranges.

This commit proposes adding version range support to package_URL.
I think node-semver is pretty common, so I referenced that; other
options are possible too. Note that even if you don't use semver,
the version range still works, since hierarchical numbers are very common.

Signed-off-by: David A. Wheeler <dwheeler@dwheeler.com>
@stevespringett
Copy link
Member

This PR does not address issues identified in #66 and and #84.

@david-a-wheeler
Copy link
Author

Thanks for the cross-reference!

I think this does handle Debian OR. Doesn't "||" provide the same functionality? If not, please enlighten me!

Handling epochs is a bigger issue. But epochs work very much like a "hidden 0. prefix" on a typical version number, and people who follow SemVer won't have epochs. So I think this could be easily modified to support that case. Something like "Versions that begin with the form NUMBER: are presumed to have an epoch of that number; if unstated the epoch is 0. Epochs act as the topmost number in a version."

I'm absolutely not married to node-semvar range syntax; others would be fine. However, I think it's important to have some syntax for it, and I thought it'd be easier to start with a specific concrete proposal.

I did pick node-semvar intentionally. One reason is that it can express some fairly complex situations. For example, it can relatively easily handle the "multiple streams of versions" case.

@stevespringett
Copy link
Member

In addition to the above, one additional aspect that will need to be addressed is the default repository url for each PURL type. Introducing a range may alter where a component is located, so we'll need to specify what that means.

E.g.

  • Does a range eliminate the default repository url entirely?
  • Does a range continue to have a default repository url, but a list of alternatives may be specified?
  • Does the range only work if the location does not change and a new range has to be used for those where the repo url is different?
  • Something else?

@david-a-wheeler
Copy link
Author

@stevespringett - I think that's backwards. Instead, the version numbers necessarily only apply to the given repo URL (default URL if there's a default and no other URL is given, or a specified URL if one is specified).

For example, there are many different programs all called "zlib"; some of them have the same underlying origin. If I specify version X.Y.Z, that version number would only apply to the repo URL given. If there's a different repo URL, there's no reason to believe that version X.Y.Z refers to the same thing.

@johnmod3
Copy link

johnmod3 commented Oct 1, 2020

With NIST moving away from CPE to SWID, hows does this impact PURL?

@david-a-wheeler
Copy link
Author

david-a-wheeler commented Oct 1, 2020

I believe SWID has two big problems: (1) it's a huge XML file, making processing it very inconvenient, and (2) it can't refer to version ranges. This makes moving from CPEs hard. A big advantage of SWID over package-URL is that SWID can refer to closed source software where no VCS or download source is publicly visible.

If pURL supports version ranges and has a mechanism for referring to software home pages (or something else that supports closed source software), then package-URL suddenly has big advantages over SWID for many purposes.

@johnmod3
Copy link

johnmod3 commented Oct 1, 2020

Hi David, totally agree - but NIST seems to be hell bound to go with SWID (even though Microsoft doesn't seem to use it any longer) - JC has way more context

@stevespringett
Copy link
Member

David. I think the zlib example is incomplete. To my knowledge, the zlib project does not have a default distribution repository. However, distributions that package zlib and distribute it to Conan, Buckaroo, Anaconda, Debian, Redhat, etc, will have a default repository.

I think a cleaner example (and more representative of the majority of software packages) are those that distribute the package to a repository when released. Typically, each ecosystem will have a default package repository. npm.js for Node.js, Maven Central for Java/Maven projects, etc.

My prior point was that if you introduce version ranges, we need to also account for how those package versions can be resolved.

For example, the https://github.com/everit-org/json-schema/ project previously distributed all their compiled packages to Maven Central. However, Maven Central lacks certain security controls and transparency, so the json-schema project moved the default repository for the project to use Jitpack instead. Newer versions of json-schema are not available on Maven Central. Therefore, version ranges need to account for these scenarios.

The first sentence of the PURL spec is:

A purl or package URL is an attempt to standardize existing approaches to reliably identify and locate software packages.

So location information needs to be addressed in version ranges when the default repository moves between versions.

@stevespringett
Copy link
Member

Also FYI, I chatted with @pombredanne about trying to represent SWID in PURL, and there seems to be interest in that.

Also regarding SWID, the attributes of the SoftwareIdentify field can be represented in PURL and can be used for vulnerability use cases - whenever the NVD gets around to supporting it. I have a few different ideas how that can work and will be submitting a PR - however, I've been holding off until #79 is merged.

@david-a-wheeler
Copy link
Author

I agree that this scenario needs to be addressed. However, I think it would be cleaner for people who need to support ranges to also support multiple package urls separated by whitespace. Then you could have a package URL with a version range and refer to a particular location. This also cleanly deals with the case where version numbers are different for different locations, a common challenge. It also deals with the case where there are many different packages that share a underlying problem, for example, a vulnerability in a specification can cause multiple packages with many versions with them to all apply.

I think that would be much cleaner than trying to link version numbers and repository locations within a single package URL.

@kerberosmansour
Copy link

I do agree that it should have ranges I added some reasoning behind it here on one of the OSSF Issues to @pombredanne

@david-a-wheeler
Copy link
Author

As I noted earlier, I'm not married to this specific version syntax. There are several to choose from.

Here's another:
https://raw.githubusercontent.com/CVEProject/automation-working-group/master/cve_json_schema/v5.x_discuss/cve513.schema

I think the key is to be able to support a range of version numbers. The "special" case is epoch numbers, which many systems can't handle, but as I noted earlier I think those are relatively easy to add.

@pombredanne
Copy link
Member

@david-a-wheeler first thank you++ for this!

I am not too much in favor of overloading the version with a ranges syntax as I feel it will be a source of confusion. I would rather craft a new qualifier for that instead. What do you think?
There are also considerations wrt. the many versioning schemes and how versions cane be compared that I collected in #84 (comment)

@pombredanne
Copy link
Member

@stevespringett re:

My prior point was that if you introduce version ranges, we need to also account for how those package versions can be resolved.

It makes sense, and in particular how they are compared matters quite a bit.

For example, the https://github.com/everit-org/json-schema/ project previously distributed all their compiled packages to Maven Central. However, Maven Central lacks certain security controls and transparency, so the json-schema project moved the default repository for the project to use Jitpack instead. Newer versions of json-schema are not available on Maven Central. Therefore, version ranges need to account for these scenarios.

The first sentence of the PURL spec is:

A purl or package URL is an attempt to standardize existing approaches to reliably identify and locate software packages.

So location information needs to be addressed in version ranges when the default repository moves between versions.

IMHO in this case location information would come from specifying a repository_url=https://jitpack.io qualifier for these package versions that are there and dealing with this specific package and its move more one place to another would have to be handle specially. It looks like https://github.com/everit-org/json-schema is exceptional enough that it would not deserve a generic treatment.
And actually based on their home page we really have two Package URLs so this is an even better case:

  • newer pkg:maven/com.github.everit-org.json-schema/org.everit.json.schema@1.12.1?repository_url=https://jitpack.io
  • older pkg:maven/com.github.erosb/everit-json-schema-jdk6@1.9.2 published at Maven Central
    (and BTW kudos to @erosb for doing this to avoid confusion)

@erosb
Copy link

erosb commented Oct 14, 2020

Hello, maven coordinates of the everit-org/json-schema are quite baroque, mostly because I left everit-org (the company) soon after the first release of the json-schema project. So the history is:

  • it is released as org.everit.json:org.everit.json.schema in range 1.0.0 - 1.5.1 on maven central
  • the newer versions (1.6.0 - 1.12.1) are primarily on jitpack
  • versions 1.9.2 - 1.12.1 are available both on jitpack and maven central, but on maven central the groupId:artifactId is com.github.erosb:everit-json-schema (while on jitpack the same releases are available as org.everit.json:org.everit.json.schema)
  • a JDK6-comatible backport of version 1.9.2 was created by a contributor that I released with the coordinates com.github.erosb:everit-json-schema-jdk6:1.9.2

I don't expect any tooling to support it :)

Copy link

@kerberosmansour kerberosmansour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any issues there.

@pombredanne
Copy link
Member

pombredanne commented Nov 30, 2021

@david-a-wheeler Please see #139 for an extended take on the same topic, but drafted as a separate mini spec.

For example, in a vulnerability report it
may be important to say "versions 1.1.3 through 4.2.6 are vulnerable".
Ranges may be specified following
[node-semver](https://github.com/npm/node-semver)i syntax; note that
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of this choice. Makes my life much easier 😂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants