Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider a way to keep track of "primary" licenses #50

Open
pombredanne opened this issue Jul 6, 2020 · 15 comments
Open

Consider a way to keep track of "primary" licenses #50

pombredanne opened this issue Jul 6, 2020 · 15 comments

Comments

@pombredanne
Copy link
Member

pombredanne commented Jul 6, 2020

Say I start from these expressions:

  • primary: bsd-new
  • initial: bsd-new AND bsd-simplified AND mit AND mit AND bsd-new AND gpl-2.0

I would like a way to end with this combining the two expressions above with AND

  • transformed: bsd-new AND (bsd-simplified AND gpl-2.0 AND mit)
@mjherzog
Copy link
Member

mjherzog commented Jul 6, 2020

Primarized is not a word. Just Primary would work or perhaps PRIMARY to make it prominent.

@pombredanne
Copy link
Member Author

Let me explain further:
Say I start from these expressions for a package:

  • I have a primary, top-level, key license expression that is stated e.g. "declared" in a package manifest : bsd-new AND mit
  • I also have other licenses that have been collected from the package files and there is more variety there. These are combined together in this expression: bsd-new AND bsd-simplified AND mit AND mit AND bsd-new AND gpl-2.0

I would like a way to end with this combining the two expressions above with AND but keeping track of the fact one expression is the primary one. For instance, the primary license could be the one provided in an RPM spec file or an npm package.json and may not cover all the file-level details with less important but still important secondary licenses.

  • Just combining the expression without simplification would yield this:
    bsd-new AND mit AND bsd-new AND bsd-simplified AND mit AND mit AND bsd-new AND gpl-2.0
    but this is really not helping to track what the original top-level license was.
  • I could use parenthesis such as in (bsd-new AND mit) AND (bsd-new AND bsd-simplified AND mit AND mit AND bsd-new AND gpl-2.0) as they do not change the meaning of the expression... but since these are optional they would be easy to drop when parsing and printing back an expression.

@DennisClark came with a brilliant idea which would be to use a new keyword in the expression syntax PLUS that would be strictly equivalent to AND

If we go with that we would get bsd-new AND mit PLUS bsd-new AND bsd-simplified AND mit AND mit AND bsd-new AND gpl-2.0 and things are very clear now: the left hand side is the primary license and the ight hand side is the secondary license.
There is no loss of meaning and no risk to drop the PLUS and furthermore it reads nicely.

The rule would be that there could be zero or only one PLUS keyword in an expression and that PLUS is strictly equivalent to AND. When sorting, simplifying or minimizing an expression with a PLUS, the left hand side (LHS) and right hand side (RHS) would be processed separately. And there could be a convenience method to cast back a PLUS to a simple AND.

With this simple enhancement the expressive power would be vastly enhanced.

Note that this would be entirely optional and this would mean that whenever client code that uses the license-expression library deal with SPDX expressions, then they would request casting a possible PLUS back to AND so this is correct SPDX-wise. And if this PLUS proves as useful as I think it will, then we would submit this as an enhancement to the official SPDX syntax.

@qduanmu @mjherzog @sschuberth @mbargull @DennisClark @tdruez @pkolbus @carmenbianca @mxmehl @MaJuRG @MaJuRG @chinyeungli @johnmhoran FYI

@pombredanne
Copy link
Member Author

pombredanne commented Jul 18, 2020

Just adding further about why this is useful... it is super common to have multiple licenses for a package, but all these licenses do not have the prominences. For instance, a package may be using the GPL for its command line utilities and the LGPL for a library as core, top level licenses (common for Linux userland tools) and still harbor bits of code under BSD and MIT licenses and have its build script under yet another license and its documentation under GFDL. All these licenses need to be reported alright, but in doing so and combining them all in a single license expression we may lose sight that the key, core licenses are GPL and LGPL and that the other licenses are there but secondary.

@pkolbus
Copy link
Contributor

pkolbus commented Jul 18, 2020

@pombredanne, thanks for the FYI.

If a license is applicable then you have to abide by the terms to use the bit of code, so understanding the entire license expression is necessary for compliance purposes. Because of this, I'm not seeing the use case that leads to prominence being a necessary concept. (Admittedly, this could well be a lack of vision on my part.)

Taking the example of a declared license vs scan results a bit further, I'm not sure it's appropriate to combine in that way, as there's a difference in the level of confidence or possibly legal interpretation applied. It's possible that:

  • the scan is covering files that are not actually used
  • the declared license-expression already takes into account the scanned files (for example, if following https://dwheeler.com/essays/floss-license-slide.html, LGPLv2.1 AND GPLv2 might have been simplified to GPLv2)
  • the declared license-expression is incomplete (in which case, it's really best to fix the declared license).

Note also that SPDX v2.2.0 (https://spdx.github.io/spdx-spec/) is consistent with this separation as it defines multiple license-expression fields: "Concluded License" (3.13), "All Licenses From Files" (3.14), and "Declared License" (3.15).

But if prominence does in fact have value: treatment of the left and right sides of the PLUS as independent leads to expressions that are unnecessarily complex. (bsd-new AND mit PLUS bsd-new AND bsd-simplified could easily be bsd-new AND mit PLUS bsd-simplified.) And while the PLUS->AND transform does enable simplification, there is an irreversible loss of prominence data. Assuming the concept that prominence is roughly prevalence, I would suggest that simplifications across the PLUS are valid, but those involving a prominent sub-expression (the left-hand side of PLUS) result in a prominent sub-expression. (For example, if GPLv2+ AND GPLv3 simplifies to GPLv3 then GPLv2+ PLUS GPLv3 AND MIT simplifies to GPLv3 PLUS MIT.)

@sschuberth
Copy link
Contributor

@DennisClark came with a brilliant idea which would be to use a new keyword in the expression syntax PLUS that would be strictly equivalent to AND

I'm sorry to spoil the party here, but if I may be frank, I believe this is not a good approach. Because:

Note that this would be entirely optional and this would mean that whenever client code that uses the license-expression library deal with SPDX expressions, then they would request casting a possible PLUS back to AND so this is correct SPDX-wise.

So strictly speaking, this breaks SPDX compatibility, which IMO is an absolute no-go. Third-party application must be able to rely on being able to parse the expression if they adhere to the SPDX standard.

If you have a hard requirement to track primary / declared vs. other licenses you really should use different fields like e.g. we do in ORT (and SPDX itself does like @pkolbus mentioned), and only create a combined license expression on the fly on license evaluation. Or come up with a convention that does not break the standard, like using parentheses as @pombredanne suggested before (maybe extend that idea to use "dummy" double-parentheses to avoid confusion with regular parentheses).

@pombredanne
Copy link
Member Author

See also nexB/scancode-toolkit#2065

@pombredanne
Copy link
Member Author

@sschuberth re:

So strictly speaking, this breaks SPDX compatibility, which IMO is an absolute no-go. Third-party application must be able to rely on being able to parse the expression if they adhere to the SPDX standard.

Actually it would not break compatibility: the PLUS would not be used for SPDX expressions, but only for scancode and aboutcode expressions using non-SPDX license keys
Furthermore, I floated the idea to add this to SPDX and I would submit this for consideration there too.

@sschuberth
Copy link
Contributor

but only for scancode and aboutcode expressions using non-SPDX license keys

Ideally, there would be no such expressions. Non-SPDX license keys should become SPDX LicenseRefs, and everything that looks like an SPDX expression should actually be one. Just my 2 cents.

@pombredanne
Copy link
Member Author

@pkolbus Thank you for the detailed feedback!

For example, if GPLv2+ AND GPLv3 simplifies to GPLv3 then GPLv2+ PLUS GPLv3 AND MIT simplifies to GPLv3 PLUS MIT.)

It is important to note that this GPLv2+ AND GPLv3 simplifies to GPLv3 is NEVER true (at least that's not a license-expression library feature, though you could implement this with substitutions. The simplification done here is a logical/boolean simplification based on symbol (e.g. license keys) and operators (AND, OR and WITH).
If we add support for PLUS then we would treat each sides to the left and to the right of the PLUS operator as separate expressions tat would be simplified separately (and would not be mixed, except possibly to ensure that they are disjoint)
So

  • bsd-new AND mit PLUS bsd-new AND bsd-simplified would unlikely change or if it does i may be optionally to bsd-new AND mit PLUS bsd-simplified
  • GPLv2+ PLUS GPLv3 AND MIT would not be simplified

@pombredanne
Copy link
Member Author

@sschuberth

Ideally, there would be no such expressions. Non-SPDX license keys should become SPDX LicenseRefs, and everything that looks like an SPDX expression should actually be one. Just my 2 cents.

Agreed, and we should likely move ahead in that direction since spdx/spdx-spec#113 seems to be stalled... but that would not remove the value to distinguish the "main license" vs. the rest IMHO

@sschuberth
Copy link
Contributor

but that would not remove the value to distinguish the "main license" vs. the rest IMHO

Here I agree, too, but IMO the best approach to document such a main / primary license would still be a dedicated field / property.

@pombredanne
Copy link
Member Author

@sschuberth re:

Here I agree, too, but IMO the best approach to document such a main / primary license would still be a dedicated field / property.

but then this is not one but eventually an array of license expressions that would be needed to be correct, eventually grouping each files that share a "purpose" together (say doc, build scripts, tests, dev tools, dead code, etc.) ?

@sschuberth
Copy link
Contributor

@mjherzog
Copy link
Member

@sschuberth There does not seem to be any activity with CD Facets, but the concept is similar. This has also been a long-standing topic at SPDX, but no conclusion afaik. You could argue that "Relationships between SPDX Elements" capture some of this, but that is much more complex than this use case.

@pombredanne
Copy link
Member Author

pombredanne commented Aug 31, 2020

@sschuberth You wrote:

but that would not remove the value to distinguish the "main license" vs. the rest IMHO

Here I agree, too, but IMO the best approach to document such a main / primary license would still be a dedicated field / property.

Well, the thing is that it would assume that anywhere we use a single license expression string we now would need two license expressions to convey this notion of primary and secondary.

I think that this would not be practical when license expressions are used outside of SPDX documents: there we have no control on the schema and adding new fields is unlikely to happen IMHO.

Working towards increased adoption of using a license expression rather than an unstructured license string in a package manager metadata field is already a significant piece of work. Asking folks to break things down in multiple fields feels like an even more difficult or impossible task to me.

You also wrote:

Yes, maybe something similar to ClearlyDefined's facets:

yes, conceptually. But that's also transforming the "license-expression-as-a-single-string" into "license-expression-as-a-mapping-of-key-value-pairs" which would be impractical to be adopted by many package managers tools and other places where a license expression string may be used

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants