Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ambiguity regarding multiple licenses #1108

Open
J3RN opened this issue Jan 15, 2022 · 10 comments
Open

Ambiguity regarding multiple licenses #1108

J3RN opened this issue Jan 15, 2022 · 10 comments

Comments

@J3RN
Copy link

J3RN commented Jan 15, 2022

Related to, but not exactly, #746
Prompted by discussion on gleam-lang/gleam#1450

Presently the Adding metadata section of the Hex.pm docs contain a licenses field with the documentation:

A list of licenses the project is licensed under. This attribute is required. It is recommended to use SPDX License identifier.

This leaves ambiguous a few items:

  • If a list of licenses is specified, should they be interpreted as being anded or ored together? If the answer is one of these, how would I convey the other?
  • How do I convey an exception, such as "Apache-2.0 WITH LLVM-exception"
  • How do I convery "Apache-2.0 or later"?

In my opinion (which you're free to ignore 😅) the answer to this is to supersede the licenses field with a license_expression field containing an SPDX Expression. SPDX Expression unambiguously convey ands, ors, withs, and "or later" (via -or-later or +, depending on GNU vs non-GNU).

@ericmj
Copy link
Member

ericmj commented Jan 16, 2022

The licenses field is meant to list the licenses the project is licensed under, or or and does not matter to it.

Do you have examples of projects in the Hex ecosystem with licenses this complex and why they need to express the licenses with expressions in this field?

@J3RN
Copy link
Author

J3RN commented Jan 16, 2022

The licenses field is meant to list the licenses the project is licensed under, or or and does not matter to it.

Does this mean that whether the licenses are anded or ored should be communicated elsewhere? This is fine, unless we want to be able to automatically audit whether new packages comply with our existing license.

For instance, the common Apache 2.0 license (used by Elixir, Erlang, etc) and the GPLv2 (used by git itself, the Linux kernel in conjunction with others, etc) are incompatible. However, I could license a library as "Apache-2.0 or (at your option) MIT", effectively adding GPLv2 compatibility (at the cost of different terms when it is) as the MIT license is GPLv2 compatible. The license list would look like this:

licenses: ["Apache-2.0", "MIT"]

If someone else's package is licensed under GPLv2, knowing whether my package is licensed under "Apache-2.0 or MIT" vs "Apache-2.0 and MIT" (a la Rust) determines whether or not they may use my library. Of course, I should unambiguously state this somewhere within my project, but it would be nice if a tool could use the package metadata to perform this check automatically.

Do you have examples of projects in the Hex ecosystem with licenses this complex and why they need to express the licenses with expressions in this field?

I don't know of any packages currently on Hex with a complex license, but I could ship one today 😉 Many of the foundational technologies that the ecosystem relies on have more than simple a license, including LLVM (Apache-2.0 with LLVM-exception) which is used by BEAMJIT and wxWindows/wxWidgets (LPGL-2.0-or-later with WxWindows-exception-3.1) which underlies the wx system used by :debugger, :observer.

@supersimple
Copy link
Member

This is an interesting situation @J3RN
We are in the process of normalizing licenses because many packages are using identifiers that are not recognized (IMO this has to be done before we can go any further with licenses.)
I am especially curious about the "with" option, but I do think we'll have to consider how that feature would fit into our normalization process. The purpose of normalization is to make reading the licenses more understandable. Adding and, or, with can give us an infinite number of licenses (which makes normalization pointless perhaps.)
I'd like your thoughts on how that feature could work?

@J3RN
Copy link
Author

J3RN commented Jan 19, 2022

@supersimple Fortunately for us, the good folks over at SPDX have a solution to normalizing all kinds of crazy licenses, including composite licenses, licenses with exceptions, and even user-defined licenses: SPDX License Expressions. The SPDX specification also specifies and unambiguous grammar for parsing licenses expressions into their constituent parts (the licenses, the operators, the exceptions, etc). FWIW, the Rust ecosystem has a spdx crate for parsing these expressions which could be used as a reference when building our own.

@J3RN
Copy link
Author

J3RN commented Jan 19, 2022

Speaking of Rust, a quick Google reveals that the license field in a crate package manifest must be a valid SPDX 2.1 License Expression.

@lpil
Copy link

lpil commented Jan 23, 2022

I think conforming to the SPDX standard would be sensible, seeing as it's already widely used and the rules are unambiguous and powerful enough for real-world use in larger ecosystems than ours.

Supporting SPDX expressions does make the list of multiple licences redundant, but so long as we decide whether licences are and'd or or'd then they can be compatible.

@ericmj
Copy link
Member

ericmj commented Jan 30, 2022

To be clear we are conforming to the SPDX license identifiers but we do not support expressions.

Between and and or which one is most common/useful? We can pick that one as how a list of licenses should behave.

When there are projects in the community that require expressions we can reconsider supporting them but right now they do not seem to be needed so I would like to avoid the extra maintenance work for something that 10000+ packages has not needed so far.

@lpil
Copy link

lpil commented Jan 30, 2022

or is a more common than and, but I've found both to be common when doing audits of dependencies start ups during investment rounds with languages that do surface this information in their package manager's API.

There's also a with, i.e. "Apache-2.0 with LLVM-exception".

I would like to avoid the extra maintenance work for something that 10000+ packages has not needed so far.

Given we don't know whether existing projects are using and or or, and their both being relatively common, I would speculate that they are currently both in use, but some proportion of those users are using the config format incorrectly.

RE maintenance, the SPDX ABNF grammar is thankfully very simple. We'd only need to parse a string in this format to validate an expression.

idstring = 1*(ALPHA / DIGIT / "-" / "." )

license-id = <short form license identifier in Annex A.1>

license-exception-id = <short form license exception identifier in Annex A.2>

license-ref = ["DocumentRef-"1*(idstring)":"]"LicenseRef-"1*(idstring)

simple-expression = license-id / license-id"+" / license-ref

compound-expression = simple-expression /
  simple-expression "WITH" license-exception-id /
  compound-expression "AND" compound-expression /
  compound-expression "OR" compound-expression /
  "(" compound-expression ")"

license-expression = simple-expression / compound-expression

If there was a Hex package that validated SPDX licences would that ease the development cost concern for you?

-spec is_spdx_expression(binary()) -> boolean().

@ericmj
Copy link
Member

ericmj commented Jan 30, 2022

I am not so worried about the parsing itself, we already have spdx that validates license identifiers that we could add the parsing to. But updating the Hex package metadata specification involves changes to (I think) at least 7 repositories. We also just introduced enforcement of spdx identifiers with deprecations and eventually error messages that users will have to deal with.

Introducing more changes for something that doesn't seem to be actively used does not make sense to me when there are so many other things that needs development time.

@lpil
Copy link

lpil commented Jan 30, 2022

That's very reasonable.

What would a migration path be for people who are currently using the list as AND and people who are currently using licences that cannot be represented with OR'd licences? Could there be some value which means "see the licence file for details"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants