New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial purl draft spec #1

Merged
merged 26 commits into from Nov 22, 2017

Conversation

Projects
None yet
@pombredanne
Member

pombredanne commented Nov 11, 2017

For reference this is the result of a discussion that started here
nexB/scancode-toolkit#805

Initial puurl draft spec
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Use :// everywhere in examples.
Link: nexB/scancode-toolkit#805

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Show outdated Hide outdated README.rst

pombredanne added some commits Nov 13, 2017

Improve purl spec
 * Update spec to use "purl" and package URL, not "puurl"
   based on @sschuberth feedback
 * Re-organize the document in context/problem/solution chapters
 * Refine examples, parsing and construction rules
 * Rename path part to subpath for clarity
 * Document relationship with URL based on @sschuberth feedback
 * Add encoding section
 * Add known types and qualifiers section
 * Add list of candidate types to define
 * Add section for implementation tests
 * Add NuGet and .NET details based on @kasper3 feedback
 * Fix typos

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Improve formatting
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Use proper GitHub org for package-url
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

@pombredanne pombredanne changed the title from Initial puurl draft spec to Initial purl draft spec Nov 13, 2017

@jayfk

This comment has been minimized.

Show comment
Hide comment
@jayfk

jayfk Nov 13, 2017

How are package managers handled that have no unique package name?

PyPi treats - and _ as the same character and is case insensitive.

# resolves to the same package
django-allauth
django_allauth
Django-Allauth
dJAnGo_aLLauTH

jayfk commented Nov 13, 2017

How are package managers handled that have no unique package name?

PyPi treats - and _ as the same character and is case insensitive.

# resolves to the same package
django-allauth
django_allauth
Django-Allauth
dJAnGo_aLLauTH
Add name normalization for Pypi packages
 * Pypi package names are case insensitive and a - and _ are the same:
   the name must be normalized.
 * reported by @jayfk

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
@pombredanne

This comment has been minimized.

Show comment
Hide comment
@pombredanne

pombredanne Nov 13, 2017

Member

@jayfk your wrote:

How are package managers handled that have no unique package name? PyPi treats - and _ as the same character and is case insensitive.

Thanks and this is an excellent point. I should know better! We should then specify this for each type. I added this for Pypi in 042f108

Member

pombredanne commented Nov 13, 2017

@jayfk your wrote:

How are package managers handled that have no unique package name? PyPi treats - and _ as the same character and is case insensitive.

Thanks and this is an excellent point. I should know better! We should then specify this for each type. I added this for Pypi in 042f108

@pombredanne

This comment has been minimized.

Show comment
Hide comment
@pombredanne

pombredanne Nov 13, 2017

Member

@kasper3 the latest version covers NuGet and .NET

Member

pombredanne commented Nov 13, 2017

@kasper3 the latest version covers NuGet and .NET

@pombredanne

This comment has been minimized.

Show comment
Hide comment
@pombredanne
Member

pombredanne commented Nov 13, 2017

@R2wenD2 @sschuberth @mnonnenmacher @jpopelka @jdaguil @JonoYang @majurg @mjherzog @chinyeungli @tdruez This draft ready for your review which is highly valued!

@pombredanne

This comment has been minimized.

Show comment
Hide comment
@pombredanne

pombredanne Nov 13, 2017

Member

@andrew you wrote

Happy to implement on Libraries.io once the spec is finished 👌

This is ready for your review now, I guess!

Member

pombredanne commented Nov 13, 2017

@andrew you wrote

Happy to implement on Libraries.io once the spec is finished 👌

This is ready for your review now, I guess!

@andrew

This comment has been minimized.

Show comment
Hide comment
@andrew

andrew Nov 13, 2017

Member

@pombredanne it's gonna be a busy week for me, don't block on me!

Member

andrew commented Nov 13, 2017

@pombredanne it's gonna be a busy week for me, don't block on me!

GitHub and Bitbucket user/repo are not case sensitive
 * therefore the name and namespace for these package types must
   be normalized to lowercase
 * reported by @jayfk

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
@ashcrow

This comment has been minimized.

Show comment
Hide comment
@ashcrow

ashcrow Nov 13, 2017

Contributor

It would be good to have an example with the subpath as well. The way I read it this would work:

 github:package-url/purl-spec@244fd47e07d1004f0aed9c#/everybody/loves/dogs
Contributor

ashcrow commented Nov 13, 2017

It would be good to have an example with the subpath as well. The way I read it this would work:

 github:package-url/purl-spec@244fd47e07d1004f0aed9c#/everybody/loves/dogs
@ashcrow

This comment has been minimized.

Show comment
Hide comment
@ashcrow

ashcrow Nov 13, 2017

Contributor

/cc @jasinner

Contributor

ashcrow commented Nov 13, 2017

/cc @jasinner

Add GitHub purl example using a subpath
 * reported by @ashcrow

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
@ashcrow

This comment has been minimized.

Show comment
Hide comment
@ashcrow

ashcrow Nov 13, 2017

Contributor

I'm happy to create a golang and/or python parser for this spec once it's finalized.

Contributor

ashcrow commented Nov 13, 2017

I'm happy to create a golang and/or python parser for this spec once it's finalized.

@pombredanne

This comment has been minimized.

Show comment
Hide comment
@pombredanne

pombredanne Nov 13, 2017

Member

@ashcrow you wrote:

It would be good to have an example with the subpath as well. The way I read it this would work:

github:package-url/purl-spec@244fd47e07d1004f0aed9c#/everybody/loves/dogs

Yes! thanks. I added it in 20f84b3 with a slight modification: the leading slash is not significant in a subpath. Note also FWIW that subpaths may be common for Go.

Member

pombredanne commented Nov 13, 2017

@ashcrow you wrote:

It would be good to have an example with the subpath as well. The way I read it this would work:

github:package-url/purl-spec@244fd47e07d1004f0aed9c#/everybody/loves/dogs

Yes! thanks. I added it in 20f84b3 with a slight modification: the leading slash is not significant in a subpath. Note also FWIW that subpaths may be common for Go.

@pombredanne

This comment has been minimized.

Show comment
Hide comment
@pombredanne

pombredanne Nov 13, 2017

Member

@ashcrow you wrote:

I'm happy to create a golang and/or python parser for this spec once it's finalized.

Let me make you a co-owner of the org together with @andrew

I suggest that we use this convention for implementations repo names: purl-language as in purl-python, purl-go, purl-ruby, purl-js, etc.

Member

pombredanne commented Nov 13, 2017

@ashcrow you wrote:

I'm happy to create a golang and/or python parser for this spec once it's finalized.

Let me make you a co-owner of the org together with @andrew

I suggest that we use this convention for implementations repo names: purl-language as in purl-python, purl-go, purl-ruby, purl-js, etc.

@pombredanne

This comment has been minimized.

Show comment
Hide comment
@pombredanne

pombredanne Nov 13, 2017

Member

@ashcrow @andrew org owner invite sent. @jayfk do you want it too?

Member

pombredanne commented Nov 13, 2017

@ashcrow @andrew org owner invite sent. @jayfk do you want it too?

Update formatting
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
@pombredanne

This comment has been minimized.

Show comment
Hide comment
@pombredanne

pombredanne Nov 13, 2017

Member

@ashcrow I have a rough, poorly tested first draft toy bit of python code in scancode here
https://github.com/nexB/scancode-toolkit/blob/275-streamline-package-manifests-models/src/packagedcode/purl.py

I will move this out to a bona-fide repo as a Python starter, I guess either public domain or MIT licensed with a "Copyright (c) the purl authors". public domain might be best?

And in terms of contributions a simple DCO https://developercertificate.org/ should be plenty enough IMHO. Do you agree?

Member

pombredanne commented Nov 13, 2017

@ashcrow I have a rough, poorly tested first draft toy bit of python code in scancode here
https://github.com/nexB/scancode-toolkit/blob/275-streamline-package-manifests-models/src/packagedcode/purl.py

I will move this out to a bona-fide repo as a Python starter, I guess either public domain or MIT licensed with a "Copyright (c) the purl authors". public domain might be best?

And in terms of contributions a simple DCO https://developercertificate.org/ should be plenty enough IMHO. Do you agree?

@ashcrow

This comment has been minimized.

Show comment
Hide comment
@ashcrow

ashcrow Nov 14, 2017

Contributor

Cool! I'll take a look.

I'm fine with the MIT license and using the DCO.

Contributor

ashcrow commented Nov 14, 2017

Cool! I'll take a look.

I'm fine with the MIT license and using the DCO.

@jasinner

This comment has been minimized.

Show comment
Hide comment
@jasinner

jasinner Nov 14, 2017

Great idea, an opensource alternative to ISO 19770-2

jasinner commented Nov 14, 2017

Great idea, an opensource alternative to ISO 19770-2

@pombredanne

This comment has been minimized.

Show comment
Hide comment
@pombredanne

pombredanne Nov 14, 2017

Member

@jasinner scripsit:

Great idea, an opensource alternative to ISO 19770-2

Interesting indeed, though it is not the intent. IMHO the whole SWID tags ISO spec and tagvault is a terrible ultra proprietary centralized wart to stay clear from ;)

Member

pombredanne commented Nov 14, 2017

@jasinner scripsit:

Great idea, an opensource alternative to ISO 19770-2

Interesting indeed, though it is not the intent. IMHO the whole SWID tags ISO spec and tagvault is a terrible ultra proprietary centralized wart to stay clear from ;)

@pombredanne

This comment has been minimized.

Show comment
Hide comment
@pombredanne

pombredanne Nov 22, 2017

Member

But for now, let me merge this so we can use a simpler review process of tickets!

Member

pombredanne commented Nov 22, 2017

But for now, let me merge this so we can use a simpler review process of tickets!

@pombredanne pombredanne merged commit a21a9ec into master Nov 22, 2017

@pombredanne pombredanne deleted the initial-draft branch Nov 22, 2017

@pombredanne

This comment has been minimized.

Show comment
Hide comment
@pombredanne

pombredanne Nov 22, 2017

Member

So next step, please review the main doc and submits tickets as needed, this will be less messy than a crowded PR. Thanks you all for chiming in!

Member

pombredanne commented Nov 22, 2017

So next step, please review the main doc and submits tickets as needed, this will be less messy than a crowded PR. Thanks you all for chiming in!

rpm:fedora/curl@7.50.3-1.fc25?arch=i386&distro=fedora-25
rpm:opensuse/curl@7.56.1-1.1.?arch=i386&distro=opensuse-tumbleweed
(NB: some checksums are truncated for brevity)

This comment has been minimized.

@tgamblin

tgamblin Nov 22, 2017

I see a potential problem here in that this is going to result in a lot of new URL schemes, but URL schemes require a lot of review to get approved. See here. I suspect they're going to reject this spec on that basis alone. The obvious way around it would be to change these to look something like:

purl:rpm/fedora/curl@7.50.3-1.fc25?arch=i386&distro=fedora-25
purl:github/package-url/purl-spec@244fd47e07d1004f0aed9c
...
etc.

Although honestly if this is going to be a thing, and it's always written in a URL, the url part of purl seems redundant. Why not put the focus on the package? e.g.:

pkg:rpm/fedora/curl@7.50.3-1.fc25?arch=i386&distro=fedora-25
pkg:github/package-url/purl-spec@244fd47e07d1004f0aed9c
...
etc.

You'll note that pkg: isn't yet claimed 😄 .

@tgamblin

tgamblin Nov 22, 2017

I see a potential problem here in that this is going to result in a lot of new URL schemes, but URL schemes require a lot of review to get approved. See here. I suspect they're going to reject this spec on that basis alone. The obvious way around it would be to change these to look something like:

purl:rpm/fedora/curl@7.50.3-1.fc25?arch=i386&distro=fedora-25
purl:github/package-url/purl-spec@244fd47e07d1004f0aed9c
...
etc.

Although honestly if this is going to be a thing, and it's always written in a URL, the url part of purl seems redundant. Why not put the focus on the package? e.g.:

pkg:rpm/fedora/curl@7.50.3-1.fc25?arch=i386&distro=fedora-25
pkg:github/package-url/purl-spec@244fd47e07d1004f0aed9c
...
etc.

You'll note that pkg: isn't yet claimed 😄 .

This comment has been minimized.

@pombredanne

pombredanne Nov 27, 2017

Member

@tgamblin I moved this URL and scheme discussion to this ticket: #4

@pombredanne

pombredanne Nov 27, 2017

Member

@tgamblin I moved this URL and scheme discussion to this ticket: #4

For clarity and simplicity a `purl` is always an ASCII string. To ensure that
there is no ambiguity when parsing a `purl`, separator characters and non-ASCII
characters must be UTF-encoded and then percent-encoded as defined at::

This comment has been minimized.

@tgamblin

tgamblin Nov 22, 2017

I'm not sure this is the way to go. URLs can have UTF-8 characters, and they've benefitted countries with different character sets, e.g. Japan and China. I think you should allow UTF-8 to be inclusive. What if there are packages whose users don't ever write their names in roman characters? ASCII seems very limiting there. See here

@tgamblin

tgamblin Nov 22, 2017

I'm not sure this is the way to go. URLs can have UTF-8 characters, and they've benefitted countries with different character sets, e.g. Japan and China. I think you should allow UTF-8 to be inclusive. What if there are packages whose users don't ever write their names in roman characters? ASCII seems very limiting there. See here

This comment has been minimized.

@pombredanne

pombredanne Nov 27, 2017

Member

I am with you there: the whole point is to avoid any weird thing and ensure that things are properly UTF-8 and then percent encoded. This means that eventually any character may be used, but using non-ASCII would require a bit more encoding and decoding.

@pombredanne

pombredanne Nov 27, 2017

Member

I am with you there: the whole point is to avoid any weird thing and ensure that things are properly UTF-8 and then percent encoded. This means that eventually any character may be used, but using non-ASCII would require a bit more encoding and decoding.

@tgamblin

This comment has been minimized.

Show comment
Hide comment
@tgamblin

tgamblin Nov 22, 2017

@pombredanne: looks cool! Two questions and kind of one for @andrew:

  1. The original referenced issue says the goal is to have a "unique" identifier for each package, although the spec doesn't seem to dwell on that too much, which is probably good. Do you have ideas on how to reconcile the same package fetched from multiple sources? e.g., the same Python package might exist in pypi, conda, spack, and system package managers. @andrew: does libraries.io do anything to reconcile the different names?
  2. Is the idea that these URLs will one day be fetchable by curl? How do you see the translation being implemented? I guess libraries.io could provide that as a service?

tgamblin commented Nov 22, 2017

@pombredanne: looks cool! Two questions and kind of one for @andrew:

  1. The original referenced issue says the goal is to have a "unique" identifier for each package, although the spec doesn't seem to dwell on that too much, which is probably good. Do you have ideas on how to reconcile the same package fetched from multiple sources? e.g., the same Python package might exist in pypi, conda, spack, and system package managers. @andrew: does libraries.io do anything to reconcile the different names?
  2. Is the idea that these URLs will one day be fetchable by curl? How do you see the translation being implemented? I guess libraries.io could provide that as a service?
=======
When tools, APIs and databases process or store multiple package types, it is
difficult to reference the same software package across tools in a uniform way.

This comment has been minimized.

@iarna

iarna Nov 22, 2017

Ok… so is the goal to provide a URI that:

  1. That tells you which tool consumes this
    AND
  2. Provides enough information for that tool to consume it

?

Or is the goal only to provide a way to uniquely reference a specific package, not to be able to reconstruct how its installed?

Who would consume these? Humans? Software? Which software?

@iarna

iarna Nov 22, 2017

Ok… so is the goal to provide a URI that:

  1. That tells you which tool consumes this
    AND
  2. Provides enough information for that tool to consume it

?

Or is the goal only to provide a way to uniquely reference a specific package, not to be able to reconstruct how its installed?

Who would consume these? Humans? Software? Which software?

This comment has been minimized.

@pombredanne

pombredanne Nov 27, 2017

Member

@iarna you wrote:

Ok… so is the goal to provide a URI that:

That tells you which tool consumes this
AND
Provides enough information for that tool to consume it ?

Not exactly which tools consumes it: for instance a maven purl does not tell you to use Maven, or Gradle, SBT or Ivy, an npm purl does not tell you you should use the npm tool over yarn or anything else. pypi does not tell you to use pip , easy_install or buildout .... what npm mor pypi mean here is the whole spet of packaging format, manifest, registry, protocols and APIs that a tool uses to deal with such package type.

And provide just enough to consume it: well that's a possibility though it may be just a side goal.

In #5 I toyed with the hypothetical idea of a "meta" package manager.... Though frankly this would be a just a franken manager IMHO.

Or is the goal only to provide a way to uniquely reference a specific package, not to be able to reconstruct how its installed?

Who would consume these? Humans? Software? Which software?

I think that the distinguished racketist @jackfirth provides a better summary in #6 that I could have ever written. I intend to steal his words and add them to the spec:

[...] here are the proposed use cases as I understand them:

  • Cross-system metadata indexing to search and monitor packages by metadata like available versions, dependencies, contributors, etc. across multiple package managers (libraries.io)
  • Vulnerability tracking to determine whether a package's set of possible transitive dependencies includes a known vulnerability and whether the version constraints of that dependency graph allow or prevent patching
  • Other kinds of package-content-agnostic analysis tools, especially tools that look at the dependency graphs of package ecosystems

So IMHO the primary beneficiaries are folks or DBs or APIs or tools that deal with several package formats and languages. Very selfishly,that might include me, @andrew @jayfk @jasinner @ashcrow @R2wenD2 @tgamblin @jpopelka and hopefully many more. This may be used in these context for UI and DB and APIs.

It may also --if this gets enough traction-- be something that influences positively some standardization in the domain in the future as a side benefit: say tomorrow you decide to include an npm package size as a key identifying package attribute: may you will think about whether this makes sense purl-wise? and either reconsider this or contribute to purl to adopt such a fine new standard?

@pombredanne

pombredanne Nov 27, 2017

Member

@iarna you wrote:

Ok… so is the goal to provide a URI that:

That tells you which tool consumes this
AND
Provides enough information for that tool to consume it ?

Not exactly which tools consumes it: for instance a maven purl does not tell you to use Maven, or Gradle, SBT or Ivy, an npm purl does not tell you you should use the npm tool over yarn or anything else. pypi does not tell you to use pip , easy_install or buildout .... what npm mor pypi mean here is the whole spet of packaging format, manifest, registry, protocols and APIs that a tool uses to deal with such package type.

And provide just enough to consume it: well that's a possibility though it may be just a side goal.

In #5 I toyed with the hypothetical idea of a "meta" package manager.... Though frankly this would be a just a franken manager IMHO.

Or is the goal only to provide a way to uniquely reference a specific package, not to be able to reconstruct how its installed?

Who would consume these? Humans? Software? Which software?

I think that the distinguished racketist @jackfirth provides a better summary in #6 that I could have ever written. I intend to steal his words and add them to the spec:

[...] here are the proposed use cases as I understand them:

  • Cross-system metadata indexing to search and monitor packages by metadata like available versions, dependencies, contributors, etc. across multiple package managers (libraries.io)
  • Vulnerability tracking to determine whether a package's set of possible transitive dependencies includes a known vulnerability and whether the version constraints of that dependency graph allow or prevent patching
  • Other kinds of package-content-agnostic analysis tools, especially tools that look at the dependency graphs of package ecosystems

So IMHO the primary beneficiaries are folks or DBs or APIs or tools that deal with several package formats and languages. Very selfishly,that might include me, @andrew @jayfk @jasinner @ashcrow @R2wenD2 @tgamblin @jpopelka and hopefully many more. This may be used in these context for UI and DB and APIs.

It may also --if this gets enough traction-- be something that influences positively some standardization in the domain in the future as a side benefit: say tomorrow you decide to include an npm package size as a key identifying package attribute: may you will think about whether this makes sense purl-wise? and either reconsider this or contribute to purl to adopt such a fine new standard?

- **type**: the package "type" or package "protocol" such as maven, npm, nuget,
gem, pypi, etc. Required.
- **namespace**: some name prefix such as a Maven groupid, a Docker image owner,
a GitHub user or organization. Optional and type-specific.

This comment has been minimized.

@iarna

iarna Nov 22, 2017

Just gonna add it here too: the npm version of this starts with @. It would be super nice to not have to escape that. 😆

@iarna

iarna Nov 22, 2017

Just gonna add it here too: the npm version of this starts with @. It would be super nice to not have to escape that. 😆

This comment has been minimized.

@pombredanne

pombredanne Nov 27, 2017

Member

@iarna I am with you there, but FWIW there are some percent-encoding alright in NPMs registry API URLs:
For instance https://registry.yarnpkg.com/@invisionag%2feslint-config-ivx/ works but https://registry.yarnpkg.com/@invisionag/feslint-config-ivx/ comes out as {"error":"Not found"} just so you known.... the slash between the "namespace" and name must be percent encoded....and yes, I know you might think that I have researched too many of these tiny details and quirks.

Here in a purl I guess this is not strictly needed: a name is always required. Therefore a leading @ in a scoped NPM namespace is never ambiguous. What would be ambiguous is if we allow unescaped @ anywhere in the name or namespace and a purl comes with no version.

For instance, say a package name is super@package for the hypothetical weirdo package type:

With the purl weirdo:super@package I would parse @``package as a version even though no version was provided: weirdo:super@package@1.2.3 would not be ambiguous though of course, but my hypothetical weirdo packages rarely have a version attached...

So for simplicity I specified that the whole name and each namespace segment should be percent-encoded: we could relax this for the leading @ in namespace/name (to make scoped NPMs look beautiful) alright. I will update this and both can be acceptable in any case.

@pombredanne

pombredanne Nov 27, 2017

Member

@iarna I am with you there, but FWIW there are some percent-encoding alright in NPMs registry API URLs:
For instance https://registry.yarnpkg.com/@invisionag%2feslint-config-ivx/ works but https://registry.yarnpkg.com/@invisionag/feslint-config-ivx/ comes out as {"error":"Not found"} just so you known.... the slash between the "namespace" and name must be percent encoded....and yes, I know you might think that I have researched too many of these tiny details and quirks.

Here in a purl I guess this is not strictly needed: a name is always required. Therefore a leading @ in a scoped NPM namespace is never ambiguous. What would be ambiguous is if we allow unescaped @ anywhere in the name or namespace and a purl comes with no version.

For instance, say a package name is super@package for the hypothetical weirdo package type:

With the purl weirdo:super@package I would parse @``package as a version even though no version was provided: weirdo:super@package@1.2.3 would not be ambiguous though of course, but my hypothetical weirdo packages rarely have a version attached...

So for simplicity I specified that the whole name and each namespace segment should be percent-encoded: we could relax this for the leading @ in namespace/name (to make scoped NPMs look beautiful) alright. I will update this and both can be acceptable in any case.

::
bitbucket:birkenfeld/pygments-main@244fd47e07d1014f0aed9c

This comment has been minimized.

@iarna

iarna Nov 22, 2017

bitbucket isn't really a consumer though, is it? Like, what kind of package does that specifier refer to?

This is sticky because npm currently consumes specifiers that look almost like this. Specifically, for npm that would be: bitbucket:birkenfeld/pygments-main#244fd47e07d1014f0aed9c.

The full npm specifier, assuming it was an npm package, would be: pygments-main@bitbucket:birkenfeld/pygments-main#244fd47e07d1014f0aed9c

I'm wondering how that would encode as a purl, particularly seeing as # is already used for a subpath.

@iarna

iarna Nov 22, 2017

bitbucket isn't really a consumer though, is it? Like, what kind of package does that specifier refer to?

This is sticky because npm currently consumes specifiers that look almost like this. Specifically, for npm that would be: bitbucket:birkenfeld/pygments-main#244fd47e07d1014f0aed9c.

The full npm specifier, assuming it was an npm package, would be: pygments-main@bitbucket:birkenfeld/pygments-main#244fd47e07d1014f0aed9c

I'm wondering how that would encode as a purl, particularly seeing as # is already used for a subpath.

This comment has been minimized.

@pombredanne

pombredanne Nov 27, 2017

Member

@iarna bitbucket, github, gitlab are not strictly-speaking providing packages but are de-factor large reservoirs of packagish-things. They provide more than just a a VCS repo and have tickets, release, some API-fetchable metadata, etc which makes these "packagish" enough to join the fray IMHO.
In the case you mentioned you are encoding the whole VCS address as an NPM version and this works beautifully IMHO:
The purl would be:
npm:pygments-main@bitbucket:birkenfeld/pygments-main%23244fd47e07d1014f0aed9c

Of note is the percent encoding of the version # to avoid parsing an incorrect subpath that would otherwise come out as this ugly mess:
type='npm', name='pygments-main', version='bitbucket:birkenfeld/pygments-main', subpath='244fd47e07d1014f0aed9c'

@pombredanne

pombredanne Nov 27, 2017

Member

@iarna bitbucket, github, gitlab are not strictly-speaking providing packages but are de-factor large reservoirs of packagish-things. They provide more than just a a VCS repo and have tickets, release, some API-fetchable metadata, etc which makes these "packagish" enough to join the fray IMHO.
In the case you mentioned you are encoding the whole VCS address as an NPM version and this works beautifully IMHO:
The purl would be:
npm:pygments-main@bitbucket:birkenfeld/pygments-main%23244fd47e07d1014f0aed9c

Of note is the percent encoding of the version # to avoid parsing an incorrect subpath that would otherwise come out as this ugly mess:
type='npm', name='pygments-main', version='bitbucket:birkenfeld/pygments-main', subpath='244fd47e07d1014f0aed9c'

@pombredanne

This comment has been minimized.

Show comment
Hide comment
@pombredanne

pombredanne Nov 27, 2017

Member

@tgamblin you wrote:

looks cool! Two questions and kind of one for @andrew:
Thanks!

  1. The original referenced issue says the goal is to have a "unique" identifier for each package, although the spec doesn't seem to dwell on that too much, which is probably good. Do you have ideas on how to reconcile the same package fetched from multiple sources? e.g., the same Python package might exist in pypi, conda, spack, and system package managers. @andrew: does libraries.io do anything to reconcile the different names?

I kinda like to think of these as "mostly" unique, at least unique if a package manager/type provides some unicity within its standard package manager and within a repo/registry of these. Most provide such a guarantee.

As for thing being the same, I would think this is something that a DB of purls can help with. There is a an amazing graph of relations among the packages: one upstream package may be repackaged in Linux distro, has its source on GH and BB, be bundled or packaged on Conda, spack as RubyGems, etc.

For me, I intend to maintain such relationships in https://github.com/nexB/vulnerablecode (e.g. relate a CPE and several purls together and relate this cluster to a vulnerability; and I capture some relationships in https://github.com/nexB/scancode-toolkit/blob/275-streamline-package-manifests-models/src/packagedcode/models.py#L237 (e.g. this srpm is the sources or this rpm)

Finding that two packages are the same is not trivial matter though.
I know of two efforts in that domain, focused on Linux mostly:

In all cases, this is hard and @AMDmi3 does a rather superb job in this domain with his concept of "meta package"

  1. Is the idea that these URLs will one day be fetchable by curl? How do you see the translation being implemented? I guess libraries.io could provide that as a service?

I would not make this part of the goals .... though there is a discussion in #5 started by @jackfirth where I toyed with such an hypothetical tool that could fetch a purl #5 (comment)

Member

pombredanne commented Nov 27, 2017

@tgamblin you wrote:

looks cool! Two questions and kind of one for @andrew:
Thanks!

  1. The original referenced issue says the goal is to have a "unique" identifier for each package, although the spec doesn't seem to dwell on that too much, which is probably good. Do you have ideas on how to reconcile the same package fetched from multiple sources? e.g., the same Python package might exist in pypi, conda, spack, and system package managers. @andrew: does libraries.io do anything to reconcile the different names?

I kinda like to think of these as "mostly" unique, at least unique if a package manager/type provides some unicity within its standard package manager and within a repo/registry of these. Most provide such a guarantee.

As for thing being the same, I would think this is something that a DB of purls can help with. There is a an amazing graph of relations among the packages: one upstream package may be repackaged in Linux distro, has its source on GH and BB, be bundled or packaged on Conda, spack as RubyGems, etc.

For me, I intend to maintain such relationships in https://github.com/nexB/vulnerablecode (e.g. relate a CPE and several purls together and relate this cluster to a vulnerability; and I capture some relationships in https://github.com/nexB/scancode-toolkit/blob/275-streamline-package-manifests-models/src/packagedcode/models.py#L237 (e.g. this srpm is the sources or this rpm)

Finding that two packages are the same is not trivial matter though.
I know of two efforts in that domain, focused on Linux mostly:

In all cases, this is hard and @AMDmi3 does a rather superb job in this domain with his concept of "meta package"

  1. Is the idea that these URLs will one day be fetchable by curl? How do you see the translation being implemented? I guess libraries.io could provide that as a service?

I would not make this part of the goals .... though there is a discussion in #5 started by @jackfirth where I toyed with such an hypothetical tool that could fetch a purl #5 (comment)

@pombredanne

This comment has been minimized.

Show comment
Hide comment
@pombredanne

pombredanne Nov 27, 2017

Member

So this PR is closed but the discussion can go on there alright. Tickets are best going forward though!

In particular an important one would be #9 : Should the purl scheme/type be prefixed withpurl+?
Please chime in on this as this would be a reasonably important change.

Member

pombredanne commented Nov 27, 2017

So this PR is closed but the discussion can go on there alright. Tickets are best going forward though!

In particular an important one would be #9 : Should the purl scheme/type be prefixed withpurl+?
Please chime in on this as this would be a reasonably important change.

@sschuberth

This comment has been minimized.

Show comment
Hide comment
@sschuberth

sschuberth Nov 29, 2017

Member

@sschuberth do you want to be on the GH org too, btw?

Only if it gets a nice icon 😆

Member

sschuberth commented Nov 29, 2017

@sschuberth do you want to be on the GH org too, btw?

Only if it gets a nice icon 😆

@pombredanne

This comment has been minimized.

Show comment
Hide comment
@pombredanne

pombredanne Nov 29, 2017

Member

@sschuberth then you won the right to design a logo! Invitation sent!

But what's wrong with this fine logo? :D
https://avatars2.githubusercontent.com/u/33497028?s=60&v=4

Member

pombredanne commented Nov 29, 2017

@sschuberth then you won the right to design a logo! Invitation sent!

But what's wrong with this fine logo? :D
https://avatars2.githubusercontent.com/u/33497028?s=60&v=4

@sschuberth

This comment has been minimized.

Show comment
Hide comment
@sschuberth

sschuberth Nov 29, 2017

Member

@pombredanne Thanks!

But what's wrong with this fine logo?

Nothing... if you like Space Invaders 😉

Member

sschuberth commented Nov 29, 2017

@pombredanne Thanks!

But what's wrong with this fine logo?

Nothing... if you like Space Invaders 😉

stevespringett added a commit to CycloneDX/specification that referenced this pull request Dec 1, 2017

pombredanne added a commit to nexB/scancode-toolkit that referenced this pull request Feb 8, 2018

Add purl-python library to setup #805 #275
 * This is a first rough implmentation using
   https://github.com/package-url/packageurl-python
 * Based on package-url/purl-spec#1

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

pombredanne added a commit to nexB/scancode-toolkit that referenced this pull request Feb 8, 2018

Add purl-python library to setup #805 #275
 * This is a first rough implmentation using
   https://github.com/package-url/packageurl-python
 * Based on package-url/purl-spec#1

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

pombredanne added a commit to nexB/scancode-toolkit that referenced this pull request Feb 8, 2018

Add purl-python library to setup #805 #275
 * This is a first rough implmentation using
   https://github.com/package-url/packageurl-python
 * Based on package-url/purl-spec#1

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

pombredanne added a commit to nexB/scancode-toolkit that referenced this pull request Feb 8, 2018

Add purl-python library to setup #805 #275
 * This is a first rough implmentation using
   https://github.com/package-url/packageurl-python
 * Based on package-url/purl-spec#1

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

pombredanne added a commit to nexB/scancode-toolkit that referenced this pull request Feb 16, 2018

Add purl-python library to setup #805 #275
 * This is a first rough implmentation using
   https://github.com/package-url/packageurl-python
 * Based on package-url/purl-spec#1

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

pombredanne added a commit to nexB/scancode-toolkit that referenced this pull request Feb 17, 2018

Add purl-python library to setup #805 #275
 * This is a first rough implmentation using
   https://github.com/package-url/packageurl-python
 * Based on package-url/purl-spec#1

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

pombredanne added a commit to nexB/scancode-toolkit that referenced this pull request Feb 27, 2018

Add purl-python library to setup #805 #275
 * This is a first rough implmentation using
   https://github.com/package-url/packageurl-python
 * Based on package-url/purl-spec#1

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

pombredanne added a commit to nexB/scancode-toolkit that referenced this pull request Mar 12, 2018

Add purl-python library to setup #805 #275
 * This is a first rough implmentation using
   https://github.com/package-url/packageurl-python
 * Based on package-url/purl-spec#1

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

pombredanne added a commit to nexB/scancode-toolkit that referenced this pull request Mar 23, 2018

Add purl-python library to setup #805 #275
 * This is a first rough implmentation using
   https://github.com/package-url/packageurl-python
 * Based on package-url/purl-spec#1

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

pombredanne added a commit to nexB/scancode-toolkit that referenced this pull request Apr 11, 2018

Add purl-python library to setup #805 #275
 * This is a first rough implmentation using
   https://github.com/package-url/packageurl-python
 * Based on package-url/purl-spec#1

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment