Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify case sensitivity of Short Form licenses - for list and tools. #63

Closed
kestewart opened this issue Jan 2, 2018 · 24 comments · Fixed by #228
Closed

Clarify case sensitivity of Short Form licenses - for list and tools. #63

kestewart opened this issue Jan 2, 2018 · 24 comments · Fixed by #228
Assignees
Labels
Milestone

Comments

@kestewart
Copy link
Contributor

kestewart commented Jan 2, 2018

This has been moved from bugzilla: https://bugs.linuxfoundation.org/show_bug.cgi?id=1327

Kate Stewart 2015-11-19 19:00:15 UTC
Jilayne wrote:
in http://lists.spdx.org/pipermail/spdx-tech/2015-November/002905.html

  • in http://wiki.spdx.org/view/Technical_Team/Minutes/2014-09-16#
    Case_sensitivity_for_license_information - the tech team discussed this on 16 Sept 2014, note saying “License ID’s case sensitive”

  • and then the legal team discussed it - http://wiki.spdx.org/view/Legal_Team/Minutes/2014-09-18 - and concluded:
    • Mark raised issue of whether SPDX License List short identifiers and (new) license expression operators should be case sensitive with the Tech Team and discussed further here: decided that for purposes of spec, in terms of a legitimate value, both could be case insensitive (but best practice would be to display with precise capitalization). Mark to go back to tech team with this decision.

So… looks like maybe we didn’t really capture this elsewhere? In any case, I don’t see a reason to have them be case sensitive in terms of matching (for tools), but have them display with the upper/lower case as they are shown in the SPDX License List - it’s easier for humans to read/spot :)

Kate Stewart 2015-11-19 19:01:50 UTC
I'll add it to the 2.1 version of the spec. Also consider adding this as an appendum/erratta for 2.0.

Kate Stewart 2015-12-22 18:13:49 UTC
Discussed on 12/22 - no concerns, going forward with documenting.

Bill Schineller 2016-05-10 17:53:56 UTC
didn't jump out at me where / if we made edit yet to SPDX 2.2
todo

Kate Stewart 2016-05-17 17:01:29 UTC
Have proposed edit to 6.1, and Appendix I. Lets review.

Kate Stewart 2016-05-17 17:14:40 UTC
In discussion, some concern about other tools and matching in future.

Circling back this discussion to include Mark Gisi.

Bill Schineller 2016-05-17 17:15:33 UTC
fwiw:
from http://lists.w3.org/Archives/Public/www-rdf-interest/2003Aug/0002.html
RDF is case-sensitive. From the last call Concepts working draft:

 Two RDF URI references are equal if and only if they compare as
 equal, character by character, as Unicode strings.

 -- http://www.w3.org/TR/rdf-concepts/#section-Graph-URIref

An upper-case 'A' and a lower-case 'a' are different Unicode characters.

Bill Schineller 2016-05-24 17:13:32 UTC
Kate / Jilayne agreed to leave the Spec language as-is for 2.1

as-is means 'it IS case-sensitive'

leaving ticket open with Version 'unspecified' in case we want to revisit in the future.

We were reluctant to make case-insensitive now for 2.1 without understanding the impacts case might have on URIs (website, other tools, RDF graphs, ...)

@kestewart kestewart added the bug label Jan 2, 2018
@kestewart kestewart added this to the 2.1.1 milestone Jan 2, 2018
@wking
Copy link
Contributor

wking commented Jan 2, 2018 via email

wking added a commit to wking/choosealicense.com that referenced this issue Jan 3, 2018
The previous case-insensitive matching was removed in e5f46fa (test
required spdx-ids against data from spdx, 2016-05-25, github#418).  That
commit was designed [1] to allow case-sensitive matching as discussed
in [2].  But while I'm in favor of case-sensitive keys in spdx_list,
the case-sensitive match breaks script/check-approval which downcases
its argument since it was added in 8e56bb8 (add
script/check-approval, 2016-01-18, github#318).

There are more notes on SPDX's plans for case sensitivity in [3], so
we should see a clearer policy there soon.  I'm arguing for
case-sensitive *display* with optional case-insensitive matching.  I
am optimistic that the SPDX will at least agree not to register short
IDs that only differ by case, which is all we need to make this
case-insensitive match safe here.

[1]: github#418 (comment)
[2]: licensee/licensee#72
[3]: spdx/spdx-spec#63
@tsteenbe tsteenbe modified the milestones: 2.1.1, 2.2 Jan 23, 2018
@pombredanne
Copy link
Member

In practice case does not matter and every id is unique case-wise. Mandating a certain case is an hindrance to adoption IMHO and serves absolutely no good purpose.

@wking
Copy link
Contributor

wking commented Feb 9, 2018

Mandating a certain case is an hindrance to adoption…

If we explicitly allow tools to accept case-insensitive matches, then tool maintainers who feel that case is not important can ignore it.

…and serves absolutely no good purpose.

I think preserving case is useful for readability, and we should at least SHOULD authors to use the right case even if we don't MUST them. For example, I think RHeCos is a better hint for “Red Hat eCos” than rhecos or RHECOS would be.

If we MUST case preservation for authors, then tools can can chose not to support case-insensitive matching as well. That simplifies the parsing logic. Downcasing the whole license expression before parsing it would be a straightforward way to do case-insensitive parsing, but then it's a bit tedious to get back to the original case if you want to warn about an unrecognized identifier.

@pombredanne
Copy link
Member

I am not sure that the mostly all uppercase approach we have today helps with readability.
Instead we should define a canonical form and allow case-insensitive IDs and operators in license expressions.

@jeffmcaffer ping

@wking
Copy link
Contributor

wking commented Feb 15, 2018 via email

@jeffmcaffer
Copy link

as a user and relatively new to expressions I can definitely say that allowing people to do whatever case they want is a win. Consider

MIT AND GPL-2.0 AND GPL-1.0+ AND EPL-1.0 AND ISC

MIT and GPL-2.0 and GPL-1.0+ and EPL-1.0 and ISC

The latter is more approachable. More approachable == simpler. Simpler == more interest in using.

my preference is to allow, but not require, tools to support noncanonical casing

Seems like that would a problem as you might use non-canonical casing and then my tool may not understand it. So the interchange format is not all that interchangeable.

I likely just missed it but what is the argument for dictating casing?

@kestewart
Copy link
Contributor Author

kestewart commented Feb 16, 2018

The case sensitivity concerns were more with the actual short form identifiers. If there is quorum to permit the license expression operators to be case insensitive, that is less impact. Note this would probably mean that AND, and, And, aNd, etc. variants would all be recognized (impact is AND, OR, WITH keywords).

@goneall
Copy link
Member

goneall commented Feb 16, 2018

As one example of a tool implementation of license expression, it would be quite easy to make the entire expression case insensitive, quite easy to make the entire expression case sensitive and just a little bit of work to have different case sensitivity for the operators and license ID's.

The most important thing is to specify and document case sensitivity so that all tools behave consistently. You wouldn't want one tool to accept a license expression and a different tool to reject it due to different decisions on how to treat case sensitivity. Allowing tools to treat the case differently would make the interchange less reliable and defeat one of the goals of SPDX IMHO.

I would support whatever is easiest for the user.

@wking
Copy link
Contributor

wking commented Feb 16, 2018

Consider

MIT AND GPL-2.0 AND GPL-1.0+ AND EPL-1.0 AND ISC
MIT and GPL-2.0 and GPL-1.0+ and EPL-1.0 and ISC

The latter is more approachable.

“more approachable” probably depends on your past experience. For example, and is a Python operator (and AND is not supported), so folks with a Python background may prefer and. SQL operators are case insensitive, but the convention is to use uppercase operators (as seen here). No single case convention will feel native to all authors. Allowing authors to pick their own case convention makes things friendlier (by not necessarily easier) for those authors. It makes life slightly less friendly (but not necessarily harder) for other human readers. And it makes life slightly harder for tool authors who want to generate warnings that match the original casing.

Seems like that would a problem as you might use non-canonical casing and then my tool may not understand it.

If the spec says that canonical casing is required (my preference), than yeah, that's a risk you'd take by using non-canonical casing. I don't think that's a compatibility issue, because you'll have the same issue if you break any of the other SPDX rules.

@jeffmcaffer
Copy link

"more approachable", yes, it is subjective. However, I was speaking as a human reader, not a programmer or technical person. Simply put, many people find readying a long string of all uppercase text hard. If these expressions are to show up in a human context (e.g., SPDX identifier tags in readmes) then the more human-readable, the better.

For tool compatibility, the spec should not IMHO bail on taking a position. Saying that "tools SHOULD tolerate different casing" (for example) is not really helpful as users still don't know with confidence what they can do. So anyone who cares about using different tools (which presumably is the point of an interchange format standard) will then read that as they MUST use the canonical casing if they want interchange.

@zvr
Copy link
Member

zvr commented Feb 16, 2018

Consider

MIT AND GPL-2.0 AND GPL-1.0+ AND EPL-1.0 AND ISC
MIT and GPL-2.0 and GPL-1.0+ and EPL-1.0 and ISC

@jeffmcaffer, we are also looking at

Mit and gpl-2.0 And Gpl-1.0+ aNd ePl-1.0 aND isc

and any other combination. At least we are still using ASCII for now :)
I personally prefer having a single way of specifying these things.

I assume all this discussion is only about tag-value representation? In RDF, we will always have case-sensitive URIs:

     <spdx:licenseConcluded>
        <spdx:member rdf:resource="http://spdx.org/licenses/GPL-3.0"/>
    </spdx:licenseConcluded>

@wking
Copy link
Contributor

wking commented Feb 16, 2018 via email

@wking
Copy link
Contributor

wking commented Feb 16, 2018

In RDF, we will always have case-sensitive URIs…

This is a good reason to require case-sensitive IDs. With that, a tool can easily construct the URI for a given license ID. Without a case requirement, tools would need to build in a complete (for a given release?) copy of the license list if they wanted to be able to canonicalize the case to get the license URI. To mitigate that problem if we decide to allow case-insensitive IDs, I think we would want to add resources for downcased identifiers that redirect to the canonical URI.

@pombredanne
Copy link
Member

I always receive complaints about this case sensitivity. From Fedora people, from others. Case does not matter at all since every ID is unique ignoring case and the keyword case does not matter either.

We should have a canonical representation of an expression (which can be specific case-wise) but mandating using a certain case for something that does not need it is just a barrier to use and adoption IMHO.

@Conan-Kudo
Copy link

Conan-Kudo commented Feb 17, 2018

The current style for case-sensitivity is a major annoyance for identifying tags and expressions.

From my point of view, I want expressions to be clearly distinct from license identifiers (many of which are initialisms, so are in all capitals, or at least begin with a capital letter). Thus, my preference is that expression terms (such as and, or, with, or without, etc.) should be lowercase while license tags are either title case (if they are words) or all caps (if they are initialisms/acronyms).

@wking
Copy link
Contributor

wking commented Feb 18, 2018 via email

@goneall
Copy link
Member

goneall commented Feb 21, 2018

An additional data point for tools developer impact. I created a pull request for the SPDX tools to make the entire license expression parsing case insensitive. See spdx/tools#153

Bottom line from the work- completely ignoring case would be a moderate amount of work to any tool that would like to preserve the proper case for license ID's for human readable purposes or to comply with the RDF spec.

Details:

The tools already allows all uppercase and all lower case operators (e.g. and and AND are both allowed). It does not currently allow mixed case (e.g. aNd is not allowed).

It was easy to update the operators to completely ignore case.

It was a moderate amount of work to ignore case on listed license ID's. I had to maintain a map of lowercase to SPDX license ID's and translate back and forth when displaying or interpreting licenses. Not a big deal, but a couple dozen lines of code which make the code a bit more complex.

Similar to the listed licenses, local document license-ref's needed a hashmap from the lowercase to proper (or original) cased ID's. In the case of the SPDX tools, there was already a map of ID's to the extracted license objects, so it was a bit easier to update.

@salicodes
Copy link
Contributor

@goneall @kestewart has this issue been resolved? If no, can I work on it ?

@goneall
Copy link
Member

goneall commented Mar 12, 2018

@salicodes We should wait until we have consensus on the specification before working on the solution. There is probably enough discussion to add this as an agenda topic to an upcoming SPDX technical meeting. Once resolved, we would welcome the help in the spec and also the tools..

@goneall
Copy link
Member

goneall commented Jun 5, 2018

Discussed on tech call on 6/5/2018: Need to respect the case for the license ID's since they translate to URI's in RDF. There are also other use cases that may break other parsers.

Note: license identifies must be unique ignoring case.

Spec can be strict on operator case sensitive, but tools implementations are suggested to allow case insensitivity.

Operators will be case sensitive in spec.

TODO: Create a pull request to update the spec. - just adding a sentence (ABNF already is case sensitive)

@goneall goneall assigned goneall and unassigned goneall Jun 5, 2018
wking added a commit to wking/license-list-XML that referenced this issue Jun 5, 2018
Encouraging case preservation while allowing for case-insentive
comparison matches the spec possition discussed in [1].  Note that
this is just the list commitment.  It *allows* the spec, tools, and
other list consumers to decide to be case-insensitive, but does not
require them to be either case-sensitive or case-sensitive.

[1]: spdx/spdx-spec#63
wking added a commit to wking/license-list-XML that referenced this issue Jun 5, 2018
Encouraging case preservation while allowing for case-insentive
comparison matches the spec possition discussed in [1].  Note that
this is just the list commitment.  It *allows* the spec, tools, and
other list consumers to decide to be case-insensitive, but does not
require them to be either case-sensitive or case-insensitive.

[1]: spdx/spdx-spec#63
wking added a commit to wking/spdx-spec that referenced this issue Jun 5, 2018
This is a large diff, but I aimed for restructuring/polishing without
changing the end result (much).  I did make a few intentional changes:

* Extended license-id to include appendix I.3 (deprecated licenses).
  We don't want folks using these in license expressions (because
  they're deprecated), but they are valid (or we would have removed
  them instead of just deprecating them).  That means that in some
  cases the nature of a string is unclear.  For example 'GPL-1.0+'
  could be the depreacted license-id, or it could be a
  simple-expression using the more-recently-deprecated GPL-1.0
  license-id and the + operator.  I don't think that's a problem
  though, because I can't think of a case where the ambiguity would
  matter.

* I've allowed + for license-ref (it used to be only for license-id).
  There could be external licenses which offer a choice between
  only-this-version and or-later grants, and allowing + for
  license-ref makes it easier to support those licenses as they
  transition into the SPDX License List.  This isn't a big deal, but
  it avoids needing separate license-refs for the only-this-version
  and or-later grants if you need both.

* I've added explicit whitespace handling, vs. the previous version
  which just discussed it in the text.  That way the ABNF is the sole
  source of normative syntax information.

* I've added a paragraph addressing casing, based on discussion in
  [1].

* I've added enclosed-license-expression, so consumers like the
  tag:value format can suggest/require it.  This allows for more
  precision in consumers (e.g. appendix V should be updated to require
  enclosed-license-expression), but I've left those other sections
  alone for this commit.  Ideally the tag:value line would be moved to
  a separate section that defined the tag-value format, but we don't
  have such a section yet [2].

* I've added Gary's documentation for spdx:OrLaterOperator [3];
  previously there was no way to represent the + operator in RDF/XML.

* I've added Gary's documentation for spdx:WithExceptionOperator [4].
  I think it's a bit odd that the XML operator represetation are using
  URLs instead of the SPDX IDs that the license expression syntax
  calls for.  That means you cannot convert between the two
  representations without an ID <-> URL map.  But we can address that
  later.

* I've removed spdx:LicenseException, because we currently provide no
  other way for authors to define license exceptions.  We do define a
  way for them to define their own licenses [5], and currently authors
  have to use that to give a LicenseRef to a license+exception pair if
  their exception is not in our list.  Gary feels like we may return
  to this later (and I'd be happy giving users a way to define their
  own exceptions), but we're removing it for now [6].

* I've fleshed out the documentation for the + operator to explain how
  it works with the AGPL-1.0.  Without this explaination, I think
  there's a risk that folks misinterpret ${ROOT}-${BASE_VERSION}+ as
  "allows ${ROOT}-${VERSION} for any ${VERSION} >= ${BASE_VERSION}",
  but that's not true.  Instead the proper interpretation is "allows
  ${ROOT}-${VERSION} and any other licenses allowed by ${ROOT-VERSION}
  which are based on 'any later version' grant".  For example, if the
  AGPL-2.0 had not been released, you could distribute
  AGPL-1.0-or-later code under the GPL-3.0-or-later, but *not* under
  the AGPL-3.0-or-later.

The HTML comment avoids the ambiguous four-space indent after the
list.  Without the comment, it could be parsed as a code block (which
is what we want) [7] or a second paragraph of the final list entry [8]
(which is not what we want).  The HTML comment closes the list to
resolve the ambiguity.

[1]: spdx#63
[2]: spdx#22 (comment)
[3]: spdx#37 (comment)
[4]: spdx#37 (comment)
[5]: https://github.com/spdx/spdx-spec/blob/cfa1b9d08903befdf03e669da6472707b7b60cb9/chapters/6-other-licensing-information-detected.md#6.1
[6]: spdx#37 (comment)
[7]: https://daringfireball.net/projects/markdown/syntax#precode
     "To produce a code block in Markdown, simply indent every line of
     the block by at least 4 spaces or 1 tab"
[8]: https://daringfireball.net/projects/markdown/syntax#list
     "Each subsequent paragraph in a list item must be indented by
     either 4 spaces or one tab"
wking added a commit to wking/license-list-XML that referenced this issue Jun 18, 2018
Encouraging case preservation while allowing for case-insentive
comparison matches the spec possition discussed in [1].  Note that
this is just the list commitment.  It *allows* the spec, tools, and
other list consumers to decide to be case-insensitive, but does not
require them to be either case-sensitive or case-insensitive.

[1]: spdx/spdx-spec#63
@phadej
Copy link

phadej commented Jul 13, 2018

We (Haskell's Cabal) got a pull request related to this, and I found others using https://spdx.org/spdx-license-list/matching-guidelines as the justification about parsing spdx license expressions case-insensitively. Could the matching guidelines document be updated to clearly state that it doesn't apply on matching spdx license expressions. (or that it does, if it's so)

@phadej
Copy link

phadej commented Jul 13, 2018

Reading more carefully matching guidelines say

1.1 Purpose: To ensure consistent results by different SPDX document creators when matching license information that will be included in the License Information in File field. SPDX document creators or tools may match on the license or exception text itself, the official license header, or the SPDX License List short identifier.

4.1.1 Guideline: All upper case and lower case letters should be treated as lower case letters. Templates do not include markup for this guideline.


I'm slightly confused. I guess, always producing identifiers as they are written in the License list, but being lenient in the parser is safer approach for tooling. I.e.

  • parsers of the SPDX license expressions MAY / SHOULD parse identifiers case-insensitively but
  • (pretty-)printers of the SPDX license expressions MUST produce identifiers as listed in the license list?

@zvr
Copy link
Member

zvr commented Jul 13, 2018

@phadej I think you were correct on your first assumption that the matching guidelines should not be used when parsing license expressions or short identifiers.

@swinslow
Copy link
Member

I'll submit a PR to add the sentence clarifying that it needs to be case-sensitive, per @goneall's comment at #63 (comment)

swinslow added a commit to swinslow/spdx-spec that referenced this issue Mar 24, 2020
This commit attempts to reflect the outcome of the discussion at
spdx#63 regarding whether
license expression operators and identifiers should be matched in
a case-sensitive manner.

Specifically it attempts to reflect the comment at
spdx#63 (comment)
regarding the outcome of the tech team discussion on 2018-06-05.

Signed-off-by: Steve Winslow <steve@swinslow.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment