Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new annex on license namespaces #681

Closed
wants to merge 3 commits into from

Conversation

zvr
Copy link
Member

@zvr zvr commented May 17, 2022

Resurrecting the topic on license namespaces for late addition to 2.3.

I've re-worked the text that @swinslow had proposed back in #209, changing a few points to make the integration easier.

Things to note:

  • this introduces both dash-dot and dash-dash variants of custom license names, e.g. LicenseRef-.example.com.-EULA-v3.1 and LicenseRef--ExampleCorp--EULA-v3.1
  • it presupposes a registration mechanism for namespaces in the SPDX project, to-be-defined later
  • the registration will include the license definitions, instead of pointing externally, for simplicity
  • absolutely no change in any grammar, so backwards compatible lexically and syntactically

Things left out, on purpose:

  • name syntax change, like ExampleCorp:EULA-1.0 or any other variant discussed
  • namespace specification in actual SPDX data, like LicenseNamespace or other variants
  • completely decentralized license location, where registration only points to locations outside the SPDX project

All these, and more, are essentially breaking changes and should be included in SPDX v3.x.

This change is essentially a description change to explain license namespaces and introduce an initial convention for their naming. It does not introduce much not already present in the way that licenses can be defined and used in external documents.
However, I think it's important that it is incorporated in v2.3, so that we have a way forward should v3 gets delayed for whatever reason.

@zvr zvr requested review from goneall and swinslow May 17, 2022 09:22
@zvr zvr added this to the 2.3 milestone May 17, 2022
@zvr
Copy link
Member Author

zvr commented May 17, 2022

Oh, and the PR uses Z as annex letter.
If approved for merging, a replacement for all Z characters in the file to the actual final annex letter is needed (no other Z used).

@zvr zvr requested a review from kestewart May 17, 2022 16:35
@goneall
Copy link
Member

goneall commented May 17, 2022

There has been some interest in having a "private" repository of external license texts.

Should we change the text to encourage registering the license texts, but not require it?

@goneall
Copy link
Member

goneall commented May 18, 2022

There has been some interest in having a "private" repository of external license texts.

Should we change the text to encourage registering the license texts, but not require it?

After thinking about this a bit more, the non-DNS style namespaces need to be registered to avoid name conflicts. Perhaps, we could change the wording where non-DNS style "must" register while the DNS-style "should" register.

@zvr
Copy link
Member Author

zvr commented May 18, 2022

@goneall The issue is that SPDX v2 is definitely oriented towards exchange of information. If I get a valid SPDX Document but it refers to something that is not available, it would be considered an error. We do not to facilitate the creation of such documents.

Put in another way:
As you know, everything here can be accomplished with external documents. So people who want to have non-public custom licenses can always do it.

The point of this change is to allow some licenses not in the SPDX License List that can be used by anyone, not only their creators.

I expect that in SPDX v3 we will add the case you're talking about, with namespaces be able to point to locations outside the SPDX project. But it will probably necessitate some additional syntax.

@goneall
Copy link
Member

goneall commented May 18, 2022

If I get a valid SPDX Document but it refers to something that is not available, it would be considered an error. We do not to facilitate the creation of such documents.

@zvr Completely agree. I just recall other members of the SPDX community discussing this possibility - but I agree SPDX 2.0 documents are intended for interchange.

I didn't see any other issues with the PR - thanks for creating it!

Copy link
Contributor

@kestewart kestewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should refine the mechanism, and processes for registering namespaces before 2.3 comes out (rather than saying at a later date), but agree it doesn't need to be defined to get this merged in here, and start this off. As suggested a github repo, makes sense. What needs to be part of the "registration" also needs to be defined.

@kestewart
Copy link
Contributor

@zvr Could you please go ahead and update this to use next Annex letter, and add in your signed off. Would like to get this merged in our meeting next week, if possible.

Copy link
Member

@swinslow swinslow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @zvr and others, thanks for this! I'm strongly in favor of including the license namespace concept in SPDX 2.3, as I think it solves a number of pain points that folks have encountered.

That said, I have a few changes to request. There are some typo / nitpicky edits and questions inline below, but there are two significant overarching changes I would like to see addressed:

  1. Have the namespace controller register with the SPDX project a URL to the SPDX document defining the LicenseRef-'s, rather than registering the document itself
  2. Omit the organization name format

For item 1, I think it is far easier for us to manage registration of a URL rather than registering the document itself. That puts the responsibility on the namespace controller to handle the maintenance of the document, without putting anyone from the SPDX project in the middle of being responsible for e.g. vetting requested changes to namespace documents, trying to figure out whether or not it's appropriate to merge a PR, etc.

For item 2, relatedly, I don't think we want to be in the business of deciding who has rights to register a particular organizational namespace. DNS registration is an existing mechanism with well-defined processes for determining who owns a particular domain name, and we can leverage that with minimal effort. But creating our own additional "organizational name" registry puts the SPDX project in the middle of trying to decide, for example, whether someone claiming to be registering on behalf of Intel or Amazon or ExampleCorp is really authorized to do so.

Furthermore, by limiting this to both items 1 and 2, there's an easy synergy to determine whether a registration is acceptable. If it's a registration for the DNS namespace example.com, and the requested URL to register is within the https://example.com/... URL format, then it's likely fine to merge. If it's for a URL to register that's outside that domain name, then it shouldn't be merged.

Finally, although I'm not sure it needs to be explicitly stated here, as a matter of SPDX project processes I do want to be really clear that the SPDX Legal Team won't be involved in any way with vetting or reviewing requests for license namespaces. I think we have an appropriate role to play in defining the structure and operation of license namespaces in this Annex; but it is not within the scope of the Legal Team to review and weigh in on whether any particular license namespace is appropriate. @zvr's writeup of this Annex does a great job of describing a clear delineation between the actual SPDX License List, with the reliability expectations that the community has; and license namespaces, which properly have no reliability for stability at all. I don't want folks to think that the SPDX Legal Team is providing any review or guarantees of stability for anything outside the SPDX License List itself.

2. the namespace maintainer creates
an SPDX document defining license texts
and corresponding identifiers
which include the namespave,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "namespave" => "namespace"

and corresponding identifiers
which include the namespave,
as described below;
3. the namespace and the SPDX document
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my main comments about whether the SPDX document should itself be registered with the SPDX project (in the sense of having the document itself checked into a repo within the SPDX GitHub org), vs. having them register a creator-controlled URL with the SPDX project.

* `LicenseRef-.example.com.-EULA-v3.1`
* `LicenseRef-.example.com.-anything`

## Z.3.2 Organization name format
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my main comments about whether we should include the organization name format. I would lean in favor of excluding it, or at least excluding it from the first iteration of the namespace setup for SPDX 2.3.

must only use characters permitted by the license expression syntax
(i.e., letters, digits, "-" and ".").
It must not contain a double hyphen,
as that denotes the end of the namespace.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this lead to confusion if someone decides to implement a domain name in the organization name format, which I think would be permissible given the way this is drafted?

E.g. LicenseRef-.example.com.-ABC-1.0 is a DNS name format, while LicenseRef--example.com--ABC-1.0 is an organization name format -- but this may lead to confusion.

What if the organization name format were only to permit letters and digits for the namespace itself?

it must be registered with the SPDX project.
The registration request must provide
the namespace and
the SPDX document defining the licenses.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my main comments about registering the document itself, vs. a URL to the document.

However, it should be noted that the stability
will always be subject to the namespace maintainer's control.
In other words,
although maintainers should not modify a defined license text
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not certain how the standards-world all-caps words work, but should this be "SHOULD NOT" instead of "should not"?

@goneall
Copy link
Member

goneall commented May 21, 2022

@zvr @swinslow A couple of inputs on Steve's comments:

Have the namespace controller register with the SPDX project a URL to the SPDX document defining the LicenseRef-'s, rather than registering the document itself

I agree. We actually have a registration mechanism in place - it's not really being used, but it does register the URL's, not the actual documents which is consistent with Steve's comment above.

If you want to check out the tool - here's the URL: https://tools.spdx.org/app/submit_new_license_namespace/

Registration requests go into a Github repo: https://github.com/spdx/license-namespace

It looks like the tool submits issues in the same way license requests are done.

I'm sure we'll want to implement a process and improve the tool, but at least we have a starting point.

Omit the organization name format

This was a request from @MarkAtwood from the legal team a couple years ago. If I recall, he proposed the mechanism as a way to avoid name collisions and not requiring any registration. If we have a position that registration is required, there may not be a need for this format. @MarkAtwood - feel free to correct anything I miss-stated.

If there is disagreement on this specific format, perhaps we should postpone it until a future release so that we can get the other format going.

@swinslow
Copy link
Member

Thanks @goneall! And thanks for the reminder on the existing namespace registration tool. I may put together and host a sample document myself and go through the registration process with it, to make sure I understand how it works.

For the organization name format, yes -- I think I'm inclined to suggest that we omit it for now. We can always add it later if it turns out that the DNS format is insufficient, but I assume once we add it then it would be difficult or impossible to remove it. So I'd lean towards starting with the DNS format since I think we can more easily think through its implications.

@zvr
Copy link
Member Author

zvr commented May 23, 2022

@goneall and @swinslow, many thanks for the comments.

I personally disagree with the two major changes proposed by @swinslow, but I'll obviously follow the majority opinion.

As I wrote above, I see the value of this in SPDX actually hosting the license definitions. If we only register external locations where the licenses definitions are, how is it different than an ExternalDoc reference?

Let me expand on this:
ExampleCorp can already use the v2.2 spec to publish their licenses themselves and refer to them in their SPDX Documents.
I believe that the main addition we want to achieve here is that others can also have a mechanism to refer and use the ExampleCorp licenses. Having the licenses published by the SPDX project will greatly help drive adoption, rather than having opaque external links.

Of course, the SPDX project (and Legal Team in particular) will not be involved in any curation or reviewing of the licenses -- except an automated check that the SPDX Document defining the licenses is valid.

Another question / discussions point:

Is the idea that SPDX will be publishing the list of "registered license namespaces", like we do with the License List?
I mean, like actual data, instead of "look up at this repo and find out".

I assume we'll have discussion on the tech call on Tuesday.

@goneall
Copy link
Member

goneall commented May 23, 2022

As I wrote above, I see the value of this in SPDX actually hosting the license definitions. If we only register external locations where the licenses definitions are, how is it different than an ExternalDoc reference?

The main advantage I see is that location information would be available (as well as contact information).

Is the idea that SPDX will be publishing the list of "registered license namespaces", like we do with the License List?

I was thinking we would maintain a github repo with the information published in a github.io page. This would be easier to maintain compared to an spdx.org/license-namespace site and would provide a bit more distance between the legal team and the licenses (my main interest is the easier to maintain).

To do this, we would need to add a Github action and possibly modify the registration utility.

@swinslow
Copy link
Member

As I wrote above, I see the value of this in SPDX actually hosting the license definitions. If we only register external locations where the licenses definitions are, how is it different than an ExternalDoc reference?

The main advantage I see is that location information would be available (as well as contact information).

Agreed. I see the whole "license namespace" proposal essentially as establishing a mapping from a specially-formatted LicenseRef- ID, to be able to find the corresponding document that defines that LicenseRef's text.

Someone could avoid all of this by including in their project repo an SPDX Document defining the LicenseRef, perhaps in a manner such as described by REUSE. But I gather others are wanting a way to define a stable LicenseRef-to-actual-text mapping that doesn't require an SPDX Document sitting alongside the place where the LicenseRef is being used.

So the DNS namespace format, coupled with a mapping to a URL where the namespace's SPDX Document is located, enables this while also putting a minimal amount of effort on the SPDX project. Moving more into the SPDX project's scope (e.g. hosting the Document, or hosting an org-name registry, etc.) puts more burden and responsibility on the SPDX project community to have to coordinate those matters, when frankly the DNS-format-to-URL mapping gets to the same end purpose for downstream users while minimizing what the SPDX project is responsible for hosting.

Or from a different angle: if it's DNS-format-driven and the namespace owner hosts the Document, then it properly reflects that the SPDX project makes minimal guarantees as to the stability of these namespaced LicenseRef's. If by contrast SPDX is hosting the documents, or hosting an org-name registry, then the SPDX project is more in the middle of at least implying that those things will be maintained and made stable / available.

@swinslow
Copy link
Member

I was thinking we would maintain a github repo with the information published in a github.io page. This would be easier to maintain compared to an spdx.org/license-namespace site and would provide a bit more distance between the legal team and the licenses (my main interest is the easier to maintain).

To do this, we would need to add a Github action and possibly modify the registration utility.

@goneall This all sounds good to me. I haven't looked at the guts of the registration utility program yet, but yes, it seems like if it's published to a simple website then it should give the downstream users an easy way to access the details for all namespaces.

If it's as simple as a DNS-namespace-to-URL mapping, then perhaps it could even be offered as a single JSON file with just those mappings. That could make it trivial for consumers to grab the JSON file, find the namespace they want and go get the Document from the corresponding URL.

@pombredanne
Copy link
Member

ScanCode and all the tools that include ScanCode and use its LicenseDB such as ORT, Tern, Fosslight, Barrista, and countless others are all using and relying on a simple "scancode" namespace using this prefix "LicenseRef-scancode-" as used for all the non-SPDX-listed licenses found in the LicenseDB https://scancode-licensedb.aboutcode.org/ . This has been used in practice for over five years (and registered with the SPDX online tools).

So please let's not introduce a spec that is not compatible with ScanCode and would create havoc for all its users for no good reason. We will not be able to rename the thousand+ SPDX Scancode license references using the "scancode" namespace that are used in the wild. And if the changes proposed here are adopted as-is they will make all ScanCode users drift away from SPDX compliance. Instead let's also accept that there are registered namespaces and that there is no need for a complex syntax when this is the case. I can provide spec text along these lines.

On a side note that when it comes to non-public, non-shared namespaces, I do not see a big need to standardize anything: private things will not be shared or else they are not private anymore. A registration is simple and good enough in all public sharing cases I can fathom. And based on the handful of registration requests that the SPDX Online tools received over the last couple years I feel like specifying anything else is not a pressing need: https://tools.spdx.org/app/archive_namespace_requests/

@karsten-klein
Copy link

From a tool provider and integrator perspective I would like to

  1. register a namespace with a given name using my domain
  2. refer to licenses using LicenseRef--

I would regard this is not in conflict with Philippe's current use ( being scancode).

I'd limit registration in a way that

  • a single domain can only register one namespace
  • all namespace names are unique
  • namespace name conforms a set forth policy (lowercase/uppercase letters; numbers; at minimum two characters)
  • all namespace registrations need to be approved by the SPDX Legal Team

Beyond this general concept, I would hope that the procedure/policy would retrofit my request from the 10th of February this year: https://tools.spdx.org/app/license_requests/143/

Thanks Alexios for bringing this aspect up again...

Regards,
Karsten

@goneall
Copy link
Member

goneall commented May 23, 2022

... on a simple "scancode" namespace using this prefix "LicenseRef-scancode-"

The challenge with the current format is being able to differentiate between license text we expect to find within the SPDX document and license namespaces.

Currently, the spec and the validator tools look for license text for any LicenseRef- license ID's. If this PR is accepted, we would not look for license text for any licenses matching a specific pattern.

I can think of 2 alternatives which would be compatible with the existing Scancode LicenseRef's:

  • no longer require license text to be in the SPDX document for LicenseRef's The licenses refs may or may not refer to an external SPDX namespace
  • If the licenseRef is not in the SPDX document, check all registered SPDX namespaces to see if there is a match. If there isn't a match, the document would be invalid.

The first is simpler, but would allow situations where the license text could not be obtained.

The latter would be computationally intensive and possibly require access to the internet for the latest namespace registrations.

Note that per this PR, there would still be a requirement to publish an SPDX document with all known license ref's.

I still think the approach of using the patterns described in the PR is the simplest for tools providers to validate, but I can understand the concern on compatibility.

@goneall
Copy link
Member

goneall commented May 23, 2022

Thanks @karsten-klein @pombredanne and @swinslow for your comments on the proposal and PR. One thing all commenters seem to agree on is we do need to include this in the spec since many of us are using this feature. We definitely should take into account compatibility with existing implementations, but I don't think we should let that stop us from getting this added to the spec in 2.3 (even if it causes some compatibility problems).

@swinslow
Copy link
Member

I'd limit registration in a way that

  • a single domain can only register one namespace
  • all namespace names are unique
  • namespace name conforms a set forth policy (lowercase/uppercase letters; numbers; at minimum two characters)
  • all namespace registrations need to be approved by the SPDX Legal Team

I agree with the first two points.

For the third, I think this is inherent in the existing limits for SPDX LicenseRef- IDs. This aligns to what you described, except with the permitted addition of dashes and periods. If we're talking domain names, I think this generally works as intended, and with the proposal in this PR the "namespace" portion would be prefixed with LicenseRef-. and ended with .-, for easy parsing to tell where it starts and stops.

For the fourth point, I absolutely disagree. The SPDX Legal Team maintains the SPDX License List, and has a role in defining the licensing-related aspects of the SPDX spec, including how license namespaces are defined. But the Legal Team has no role in "approving" a namespace registration. I'm not sure what such approval would mean (by the Legal Team or anyone else) or on what criteria it would be based -- which is partly why I'm wanting the "registration" aspect to be as minimal and limited as possible.

@swinslow
Copy link
Member

@goneall One clarification, because I might have been unclear from my earlier comments:

I don't think that the license namespace concept has any relevance from an SPDX Document parsing perspective. At least for SPDX 2.x, I would not want to change the existing rules requiring all LicenseRef- licenses to either be (1) defined within the SPDX Document they're contained in; or (2) referenced using a DocumentRef-xxxx:LicenseRef-yyyy identifier as in the existing license expression syntax. I'm viewing these as the current requirements that are already fulfilled for SPDX 2.2, and I'd think these should stay in place for SPDX 2.3 as well.

From my perspective, the only relevance for the license namespace concept we're discussing is to make it easier for people to add LicenseRef- short-form identifiers to source code files. By using the DNS format from this current PR, someone can add a LicenseRef-.mynamespace.com.-LicenseABC short-form identifier to their source code, and register mynamespace.com => URL mynamespace.com/mylicenses.spdx with the SPDX project, and then users of the source code would be able to go look up that mylicenses.spdx Document to find the corresponding license text for LicenseABC.

This is a long-winded way of saying, I don't think this means that anything needs to change in the libraries for the SPDX tools that parse SPDX Documents. I'd assume that those libraries should still expect that LicenseRef- identifiers used within an SPDX Document are either defined in that Document, or included via a DocumentRef- and the existing ExternalDocumentRef structure. I think trying to have document parsers be responsible for going to search registered license namespaces is going to be extremely fraught for the reasons you described, and should definitely be avoided.

@zvr
Copy link
Member Author

zvr commented May 24, 2022

OK, it seems we have a number of topics to discuss:

  1. Use case
    • only in source code files
    • in SPDX Documents
  2. Format
    • dot-dash (domain)
    • dash-dash (organization)
    • nothing special(?) (as scancode already does)
  3. License publishing -- where is the SPDX document
    • in SPDX project
    • outside SPDX project
    • not even a document(?) (scancode?)
  4. Registration process
    • what checks will be done before "accepting"
  5. Registration publishing -- how the registered names will be published
    • list on GitHub
    • releases/versions?

@swinslow
Copy link
Member

Thanks @zvr, I think this is a really good list of the open points.

I'm planning to bring this up for discussion at the Legal Team call this Thursday (along with the topic at #464). We can use your list of issues and options as the basis for the discussion.

@goneall goneall mentioned this pull request May 24, 2022
10 tasks
@zvr
Copy link
Member Author

zvr commented May 25, 2022

Update after the tech call on 2022-05-24, where we spent almost 40 minutes discussing this:

  • Consensus that it should be able to be used in both source annotations and SPDX Documents
  • Rough consensus that referring to these licenses should always use External Document References, not "naked"
    • therefore instead of LicenseRef--Example-License1, use DocumentRef-Something:LicenseRef--Example-License1
    • which means we can put the namespace in the external document ref: DocumentRef--Example:LicenseRef-License1
    • which simplifies the namespace registration to registering documents (with URIs and contact people)
  • Note that all these are semantic changes, not syntactic ones, so they don't change OWL, schemas, etc.

Still open (among others): whether a special format is needed to denote namespace.

@jlovejoy
Copy link
Member

Using @zvr list of open issues as a basis, and having read the PR, this thread, and Steve's email the legal-mailing list (plus memory of @MarkAtwood original description and discussion):

  1. Use case - I'm fine with use in both SPDX Documents or source code files, but I would emphasize "SPDX Documents" first, as I'm not sure we want to encourage rampant use of these in source code files? In any case, given the broad range of uses of SPDX License List identifiers, we may want to simply say, "For use in SPDX Documents and anywhere else you might use an SPDX License List identifier"

  2. Format - to the extent I follow the debate on this, @swinslow suggestion of starting with one format and the domain format seems the most practical.
    That does not help with the Scancode issue that @pombredanne mentions, but I'm a bit unclear as to why people are using something that hasn't been adopted yet.
    As for the SPDX Online Tool related to this - quite frankly, we should have never allowed a GSoC project to implement something that had yet to be adopted in the Spec - or at least not made the tool live. In any case, there are only two real submissions - one from @pombredanne and one from @karsten-klein (the latter of which discovered the issue is being made in the SPDX License List repo, so that needs to be fixed if we will use this tool in the future!)

  3. License publishing -- where is the SPDX document - this is where I got a bit lost. We are talking about where will the corresponding license text that "goes with" LicenseRef-.mynamespace.com.-LicenseABC, that is the actual text of LicenseABC, right? I think that is correct, but we should use some better language around this to be clearer.

  • I'm also VERY hesitant for this to be stored in the SPDX project. ANY kind of "storage" required maintenance. Who is going to do that.
  • more importantly (and this could impact storage location) - a) what are the controls for someone using a license text that matches a license already on the SPDX License List? (this would bad), and b) what happens if a license from the namespace that later (and separately) gets added to the SPDX License List? is there some kind of check to remove the LicenseRef- as it will no longer be needed?
  1. Registration process - what checks will be done before "accepting" and
  2. Registration publishing -- how the registered names will be published
  • these go hand in hand really and I suppose 3 needs to be answered first, but again, I suspect that any implementation of this will require some amount of work from the SPDX team. I'm still not clear on who is going to do that or if we've truly identified the scope of that effort.

Lastly, I'm still concerned about this being used as a lazy way to bypass getting more licenses that should be on the SPDX License List. I know that @MarkAtwood's original proposal focused on licenses that would never nor should never be on the SPDX License List.
@pombredanne , @karsten-klein - of the license you intended to "register" (whatever that ends up meaning!) - were these all licenses could/should not be on the SPDX License List?

@zvr
Copy link
Member Author

zvr commented May 26, 2022

Quick couple of comments to @jlovejoy reply above:

Starting from the end, yes, the idea for these private lists is that they cover licenses not in SPDX License List. But I assume people might also want to use them for licenses not currently in the SPDX License List.

On the publishing point (3), you are correct in understanding the problem: given an identifier LicenseRef-.mynamespace.com.-LicenseABC, there has to be an SPDX Document that uses the "other license info detected" section to say "hey, for this LicenseRef-.mynamespace.com.-LicenseABC the corresponding text is this".

The two alternatives we have are:
a) people submit this document and we store it in a repo; or
b) people submit the location of this document and we store (the location) in a repo.

In both cases, the SPDX project will not be checking content like "someone using a license text that matches a license already on the SPDX License List" or anything like this. Yes, it would be "bad", but this can also happen today: someone defining their own LicenseRef-MIT.

The SPDX project registers namespaces, not what goes within them.

Related to checks during registration process (point 4), I believe everyone until now only talks about automated checks, no human decision involved. Things like:

  • checking whether the namespace is not already registered;
  • checking whether the format of the namespace is correct;
  • checking whether the URL is valid;
  • checking whether the URL resolves to a document;
  • checking whether the document is a syntactically correct SPDX document;
  • etc.

@karsten-klein
Copy link

Hi all,

Tuesday’s session left me a little bewildered and puzzled. The interesting observation is that all participants on the discussion have good arguments. However, (in my view) only from a certain and not a holistic perspective. The related fears (such as identifier collisions (e.g., two parties using the same LicenseRef-scancode- referring to different licenses); a party introducing a LicenseRef for a License already on the SPDX License List) appear quite artificial to me and mix syntactic, semantic, and integrity validation aspects, which rather should be disentangled.

As I also can only contribute a perspective and would leave the holistic assessment to the group, I would like to make some observations:

SPDX Format:

  • SPDX is primarily about the exchange of software package information. That is information on software conveying structural, relational facts associated with metadata on different aspects; primarily – but not limited to – licensing information
  • As such the format requires to reference licenses by id and provide (rather optional in my view) license texts associated with the id; concerns below

License Ids:

  • When exchanging software licensing information, we need to make sure that we refer unambiguously to licenses. Using consolidated ids for licenses is key.
  • The SPDX License List – due to scope (limited to open/public licenses) and the policies set forth in the matching guidelines – will not (and never) cover all licenses that can be used for software.
  • When producing SPDX documents we cannot limit the scope of licenses to open (or publicly available) licenses only
  • I’d argue that SPDX-Legal Team is an authority managing the SPDX License List (the work being highly appreciated!!); however, it would be only expectable that there may be other Authorities that manage Licenses Lists (i.e., OSI included in this considerations).
  • Scancode Toolkit – for me – is a community managed license list authority. Scancode has not invented these licenses; Scancode organizes them according to the Scancode policies. The rules may not be the same as the SPDX matching guidelines, but scancode is of value to the people using it and provides a most pragmatic entry into the domain; why should these not – unambiguously – reference a Scancode license from an SPDX document leveraging SPDX as a format.
  • My company does SCA and license identification in customer projects. In this respect we permanently run into licenses that are not on the SPDX License List and sometimes even not in Scancode. We therefore developed an extended identification concept and published is as {metæffekt} Universe. We would like to – unambiguously – reference licenses using this “extended namespace” within SPDX documents; again, leveraging SPDX as standardized exchange format. Customers with access to the licenses database, can use the ids to resolve the license texts.
    Please take the time and have a look on the license table for letter B. You can see the SPDX and Scancode coverage there.
  • Please also be aware that license ids may refer to a unique license text OR they may refer to a license templates. The instances of https://spdx.org/licenses/BSD-3-Clause.html may have different text in case the copyright holder modified the variable parts of the template. Putting the default text of the license template in an SPDX document is not appropriate and not in the sense of the license.

License Texts:

  • I regard putting license texts in the SPDX document as problematic; especially in the way it currently done. A software package may reference a license (id is sufficient to capture this fact) or may include a license text (which may not be 100% the same as stored in the SPDX License Data). The license texts may be mixed with other information or refer to list of third-party licenses. SPDX trying to segregate this into different parts adds an artificial layer of information that often does not align with the facts. Example: https://github.com/spring-projects/spring-framework/blob/main/src/docs/dist/license.txt
  • For us It is important that
    • an SPDX document consumer is able to resolve a license by Id (as approved general representation of the license); this can be done by an internal or external link to a license database (e.g end on the SPDX license list, OSI, Scancode, {metæffekt} Universe or any other party web site that provides consolidated license ids and information).
    • the package specific license files (copyright, LICENSE, NOTICE, …) can be accessed by the consumer and are preserved in format and content
  • Please note that different authorities for licenses may model licenses differently to match their specific policies and guidelines. This means that the same license or license aggregate is represented differently by the different authorities. I don’t regard this as bad; I regard this as opportunity. Such cases may trigger exchange and discussion.
  • Licenses / contracts may be confidential. While the id is not critical the license/contract content is. Therefore:
    • Licenses can be public / shared
    • Licenses can be private and shared only within parties under contract or NDA
    • Licenses can contain conditions that do not allow to distribute the license text
    • We must anticipate that we are never only talking open source (!)

License Namespaces:

  • I would argue from an SPDX Format perspective that License Authorities should be treated equivalently. Currently the SPDX License List is treated unique and special (work highly appreciated) making it ultimately hard for others to contribute their work and with all the caveats listed above. This means that SPDX License List is a namespace definition itself.
  • Registering a namespace means just to register the namespace definition. I’d still argue (as this is content) that the SPDX Legal Team could approve a namespace registration to make sure the namespace is unique and follows given guidelines with respect to naming. A namespace definition includes at least:
    • Short Id
    • Namespace domain (could also be used for owner verification)
    • URL where to find details on the namespace
    • Contact Address (legal entity or person owning the namespace)
    • Latest version
  • I do not see that SPDX needs to care about the licenses managed in the namespace. The management and rules of the licenses in the namespace is up to namespace owner. If (s)he plays not to the rules, the namespace reputation will suffer. People will not use it.
  • Validation of LicenseRef in the SPDX document is limited; but here I see opportunities for tool providers to add further levels of integrity validation.

As indicated earlier I will not propose any solution. Some aspects may be rather revolutionary, and I can currently not foresee whether these thoughts resonate with the group. Just some highlight showing the idea:

[…]
LicenseConcluded: spdx:MIT

[…]
LicenseConcluded: scancode:bittorrent-eula

[…]
LicenseConcluded: ae:BSD-3-Clause-copyright-holder-variant

[…]
LicenseConcluded: spdx:MIT AND ae:BSD-3-Clause-copyright-holder-variant

I’d also argue that you could define a default license namespace for an SPDX document. In this case you could omit the default namespace short-name prefix. The default would be spdx.

This even doesn’t break LicenseRef compatibility. LicenseRef is just a special case (no namespace available). This can compensate the compatibility concern raised by Philippe and enable a transition once namespaces are available.

Regards,
Karsten

@jlovejoy
Copy link
Member

jlovejoy commented May 26, 2022 via email

@jlovejoy
Copy link
Member

jlovejoy commented May 26, 2022 via email

@goneall
Copy link
Member

goneall commented Jun 29, 2022

Proposal that we agree to use a syntax where the namespace is in the DocumentRef- portion of the license identifier.

Reasoning:

  • The DocumentRef supports a unique identified (URI) to be used within an SPDX document
  • The DocumentRef supports a verification method (checksum) to be used within an SPDX document
  • DocumentRef's allow us a convenient way to group licenses in a manner consistent with the SPDX documents (e.g. you can have a single document with all license texts associated with that DocumentRef)
  • Having the same DocumentRef value when used inside and outside an SPDX document provides a very convenient way of correlating the groupings of licenses - you can use exactly the same text
  • If you include the namespace in the LicenseRef, you would sill need to use the DocumentRef within SPDX documents if you want to refer to license texts stored in a separate SPDX document

@kestewart
Copy link
Contributor

Based on discussion with Alexios on 7/20, it doesn't look like we're going to resolve this for 2.3. Moving milestone to 3.0

@kestewart kestewart modified the milestones: 2.3, 3.0 Jul 21, 2022
@zvr
Copy link
Member Author

zvr commented Jul 22, 2022

Let's close this until we're more aligned on how to deal with the issue.

@zvr zvr closed this Jul 22, 2022
@zvr zvr deleted the license-namespaces branch May 9, 2024 10:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants