Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify the use of newlines in license expressions in Appendix IV #44

Closed
goneall opened this issue Oct 12, 2017 · 10 comments · Fixed by #224
Closed

Clarify the use of newlines in license expressions in Appendix IV #44

goneall opened this issue Oct 12, 2017 · 10 comments · Fixed by #224
Assignees
Milestone

Comments

@goneall
Copy link
Member

goneall commented Oct 12, 2017

The current license expression syntax states that whitespace must be used between elements of a compound expression. However, it does not explicitly state if a newline, CR, or LF is included in the definition of whitespace.

Suggest adding a definition of white space which include new line.

@goneall goneall changed the title Clarify the use of newlines in license expressions in Appendix V Clarify the use of newlines in license expressions in Appendix IV Oct 12, 2017
@goneall
Copy link
Member Author

goneall commented Oct 12, 2017

Proposal from David W. on legal team email list: For the Tag:value format, any license expression that consists of more than one license identifier and/or LicenseRef, should be encapsulated by parentheses: "( )".
I don't think this is a change in meaning, since it's clear that a "license expression" is a "license-expression", but it might help the reader go back and look at the actual syntax rules.

@wking
Copy link
Contributor

wking commented Oct 12, 2017

There's some explicit whitespace handling in #37, although it doesn't currently allow for CR or LF. If the approach looks promising, I can make them legal. I'd rather restrict the legal newlines (more in #45), but I'm happy to update #37 to reflect whatever the consensus opinion on legal whitespace is.

@goneall goneall self-assigned this Oct 13, 2017
@goneall
Copy link
Member Author

goneall commented Oct 13, 2017

@wking Thanks for #37 Updating it to reflect this issue and #45 seems to make sense.
There is enough changes in #37 I think we should have a broader review before accepting the changes into the spec. @kestewart - let me know if you agree. Perhaps we could schedule a walk-through on one of the tech calls?

@wking
Copy link
Contributor

wking commented Oct 13, 2017 via email

@goneall
Copy link
Member Author

goneall commented Oct 13, 2017

To stay asynchronous, maybe I can refactor #37 into a series of smaller pivots.

I like that idea - it would make the decision making easier.

Until then, maybe folks can provide feedback on the other intentional changes listed in the PR topic message?

Would be a good topic to raise on the email distribution list (the issues are not as widely subscribed to). I'm still inclined to make it a topic for the Tuesday meeting, but perhaps we just discuss it at a high level to align on a direction then finish the details in the PR and issues list.

@kestewart
Copy link
Contributor

Steve Winslow volunteers to investigate and see if we can get a PR together or recommend moving to 3.0

@swinslow
Copy link
Member

Attempting to summarize current state and next-step options for issues #44 (this one), as well as #45, #52 and #80, since these are interrelated. Apologies on the length.

I may be wrong on any of the technical details so folks should feel free to weigh in, and I'll correct this as needed!

Problem:

  • complexities in parsing are introduced by enabling license expressions with multiple elements (e.g. AND, OR, WITH) to span multiple lines:
    • for tag-value files, if license expressions are permitted to span multiple lines, parsing is non-trivial to determine where the license expression ends
    • for identifiers in source files, if license expressions are permitted to span multiple lines, it defeats the efficiencies of being able to grep for SPDX-License-Identifier and gather all such information.
  • currently state under the spec is:
    • for tag-value files, the spec (requires? encourages?) that ALL expressions surround multiple elements with parentheses
      • see app. IV: "For the tag:value format, any license expression that consists of more than one license identifier and/or LicenseRef, should be encapsulated by parentheses: "( )". This has been specified to facilitate expression parsing."
      • "should" rather than "must" => is this actually a current requirement, or just encouraged?
    • for identifiers in source files, the spec requires that ALL expressions surround multiple elements with parentheses
      • see app. V: "A set of licenses must be enclosed in parentheses (this is a convention for SPDX expressions)."
      • "must" rather than "should" => compare to above
  • anecdotally, many tools and use of short-form IDs in the wild don't comply with this requirement for parentheses

Potential solutions:

  • postpone a decision: kick it down the road to 3.0 without further discussion now
  • prohibit multi-line license expressions: require that all license expressions may only be on a single line. This would make surrounding parentheses permissible but not mandatory.
    • For tag-value files, this would be a breaking change (e.g. v2.1 docs with multi-line expressions would become invalid) so probably would need to be kicked to 3.0.
    • For source code identifiers, could arguably implement now.
  • permit multi-line expressions, but don't require parentheses for single-line expressions: permit license expressions to go across multiple lines, but only make parentheses mandatory if they are multi-line. Single-line expressions do not require parentheses.
    • I think this is not a breaking change from v2.1, because it is not invalidating anything that is currently permitted under v2.1.
  • permit multi-line expressions, using whitespace unfolding: permit license expressions to go across multiple lines, but using the "whitespace unfolding" process that was documented at Tag-value format definition (including continuation lines) #80
    • I'm not certain whether or not this would be a breaking change from v2.1...

These could be separate outcomes for tag-value vs. source code identifiers.

@swinslow
Copy link
Member

Posting as a separate comment, to share my personal thoughts now :)

My preference would be:

  • for tag-value files: permit multi-line expressions, but don't require parentheses for single-line expressions
  • for source code identifiers: prohibit multi-line license expressions

For tag-value files, I think this keeps maximum backwards compatibility while fitting with what I'm anecdotally seeing happening in practice. Also (as best I can tell) minimizing the tooling changes required.

For source code identifiers, I think any reason to permit multi-line expressions is outweighed by the negative impact it would have on usability / greppability of identifiers.

@goneall
Copy link
Member Author

goneall commented Mar 24, 2020

@swinslow Thanks for writing this up and summarizing the issues.

FWIW - I checked the current implementation of the tag/value parser for the SPDX tools and parenthesis are not required for multiple terms. It does not handle multiple lines for license expressions in the tag/value file and no bug has been filed. This tells me it is not very common to span lines.

I'm OK prohibiting multi-line expressions in the source code identifiers, and even OK prohibiting multiline in tag/value.

@swinslow
Copy link
Member

Thanks @goneall! Good to hear -- if the Java SPDX tools aren't handling multiple lines at all, then agreed, it's doubtful this is really being followed in a significant way.

I'll prep a PR to prohibit multiline in both, and to explain that parentheses are permitted but optional / not required. Unless anyone objects, we can then merge it and be done with this :)

swinslow added a commit to swinslow/spdx-spec that referenced this issue Mar 31, 2020
This changes appendices IV and V to prohibit multiple lines (e.g.,
line breaks) in a single license expression. This is applicable
for both tag-value SPDX documents as well as short-form license
identifier expressions in source code.

Because multiple lines are no longer permitted, this also makes
the outer set of parentheses optional, rather than mandatory, for
license expressions with more than one license identifier.

Fixes spdx#44
Fixes spdx#45
Fixes spdx#52

Signed-off-by: Steve Winslow <steve@swinslow.net>
kestewart pushed a commit that referenced this issue Apr 7, 2020
This changes appendices IV and V to prohibit multiple lines (e.g.,
line breaks) in a single license expression. This is applicable
for both tag-value SPDX documents as well as short-form license
identifier expressions in source code.

Because multiple lines are no longer permitted, this also makes
the outer set of parentheses optional, rather than mandatory, for
license expressions with more than one license identifier.

Fixes #44
Fixes #45
Fixes #52

Signed-off-by: Steve Winslow <steve@swinslow.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants