Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concerns with type-specific component value transformations #38

Open
jdillon opened this issue Apr 6, 2018 · 5 comments
Open

Concerns with type-specific component value transformations #38

jdillon opened this issue Apr 6, 2018 · 5 comments
Labels
PURL core specification Format and syntax that define PURL (excludes PURL type definitions)
Milestone

Comments

@jdillon
Copy link

jdillon commented Apr 6, 2018

Howdy folks, been looking over this specification and its pretty complete, but I have some concerns about the per-type specific component value transformations.

Specifically the various bits that are per-type that may need for canonical form to be case-sensitive or case-insensitive, or do various translation of chars (like "_" to "-") for example.

It seems like in terms of a generic spec and impls to be able to generically parse and form a package-url, that with such edge-cases that any impl would be eventually invalid since it could not possibly encode the details of presently unspecified package types, or whatever new package systems are created in the future.

The docs for the pypi type state that pypi treats "-" and "" the same, but requires that "" be translated into "-". This seems like over complication if the underlying system would treat them the same?

The docs for the npm type state that the value must be lower-cased. And while I understand the underlying npm system may require that, having to encode this detail into the package-url specification seems like it may lead to sustainability issues in the future. While an impl could encode this, when some new format comes along say some fictitious "upper" type for some fictitious package system where everything is always UPPER-CASE (and anything other than UPPER-CASE is not valid). Its not likely that existing package-url impls would know about that type and end up making invalid canonical string representations.

It seems almost like if you were to consider the URL specification, that the spec would treat path/query/fragment details different depending on the host:port part of the identifiers. Or similarly for URI spec that the scheme would indicate how you would transform the rest of the components. This would make for hugely complex implementations (which would probably be eventually if not already wrong). I feel like the package-url specification is already like that with these type specific transformation wrinkles.

I believe it would be simpler and more normal, to ignore case (except perhaps for type itself) and ignore content transformation (except for percent encoding). This would imply you could end up with:

npm:FOO@1

... which may not be proper with respect to the package expectations that name is lower-cased. But that seems like its an input problem and not really something that a generic specification to identify and generalize package identifiers should be concerned with.

npm:foo@1

... would be more correct in terms of how the NPM community has decided to normalize their identifiers, but in terms of package-url specification, it seems like it really should not care. Since its not reasonable (or even possible presently with various formats needing lower and some needing mixed case), it seems like the specification should to be more general and support future formats not require any such transformations.

@jdillon
Copy link
Author

jdillon commented Apr 7, 2018

Here is another example which may help...

if you have a purl as:

maven:junit/JUNIT@4.12

But the existing maven component is:

maven:junit/junit@4.12

Both PURL expressions are valid, though the first would not actually resolve anything, while the second would, since in maven-land namespace and name are case-sensitive.

@stevespringett
Copy link
Member

stevespringett commented Jul 26, 2018

These are my thoughts on transformations:

  • Transformation should only be necessary when not transforming characters, would violate the URI specification.
  • Purl should be lowercase by convention
  • Purl should not require case transformations

If we want purl to be successful, adoptable, and implemented in a uniform way, we need the specification to be concise, simple, and not open to interpretation. Keeping it simple will help us expand the reach well beyond the standard set of 'types' currently specified.

@jdillon
Copy link
Author

jdillon commented Jul 26, 2018

@stevespringett generally +1, though the "Purl should be lowercase by convention" is still a bit problematic and perhaps except for the type I would suggest it should be case agnostic. As some package formats are case-sensitive and some are not. But as I suggest above it shouldn't be the responsibility of the purl spec (or impls) to ensure the sanity of the input per-type. It should only be concerned with the separators and encoding needed to sanely encode the different fields.

@sarnesjo
Copy link

This also applies to Go, at least since the introduction of Go modules. The purl spec for the golang type states:

The namespace and name must be lowercased.

But this doesn't match the actual behavior of the canonical tools. Go module names are case-sensitive. For example:

go get github.com/antlr/antlr4/runtime/go/antlr # does not work
go get github.com/antlr/antlr4/runtime/Go/antlr # works

This is also reflected by other tooling in the Go ecosystem, such as pkg.go.dev:

More information can be found in the documentation for the protocol used by the Go module proxy. It's also mentioned in this issue.

Are you open to changing the spec for the golang type?

@matt-phylum
Copy link

This also applies to Go, at least since the introduction of Go modules. The purl spec for the golang type states:

The namespace and name must be lowercased.

#196

This was broken even before modules. Before modules, the first namespace component (or the name if there was no namespace) was the only part that could be safely assumed to be case insensitive. The rest was dependent on the packaging hosting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PURL core specification Format and syntax that define PURL (excludes PURL type definitions)
Projects
None yet
Development

No branches or pull requests

4 participants