-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
src/GPL-2.0-with-*: Remove internal byte-order-mark #411
Conversation
780bc54
to
3326482
Compare
Generated with: $ sed -i 's/\xef\xbb\xbf//' $(git ls-tree -r --name-only HEAD) because we don't a BOM for UTF-8 [1]. And even if we did need a BOM, it would be at the beginning of the file, and not in the middle after a <p> tag. [1]: http://unicode.org/faq/utf_bom.html#bom5
As a policy, we won't change deprecated licenses after the point at which it was deprecated. Can you please close this? |
On Thu, Sep 14, 2017 at 10:42:02AM -0700, Jilayne Lovejoy wrote:
As a policy, we won't change deprecated licenses after the point at
which it was deprecated…
That seems like a strange policy. For example, see “This one may
require a bit more review since I did quite a bit of manual
formatting” for the GPL-3.0-with-autoconf-exception [1]. It's not
clear to me where “we're translating this license definition to our
new XML format” (which we all agree we want) ends and “we're changing
a deprecated license” (which we may not want) begins. As it stands,
we have an internal BOM which the Unicode spec cautions against [2]:
Carelessly appending files together, for example, can result in a
signature code point in the middle of text. Unfortunately, U+FEFF
also has significance as a character. As a zero width no-break
space, it indicates that line breaks are not allowed between the
adjoining characters. Thus U+FEFF affects the interpretation of
text and cannot be freely deleted. The overloading of semantics for
this code point has caused problems for programs and protocols. The
new character U+2060 word joiner has the same semantics in all cases
as U+FEFF, except that it cannot be used as a signature.
Implementers are strongly encouraged to use word joiner in those
circumstances whenever word joining semantics are intended.
I'd rather have us follow the Unicode spec's strong encouragement and
either drop the BOM (as here) or replace it with U+2060.
Can you please close this?
While I'm still in favor of this change (as I explain above), I'm not
a maintainer. Any maintainer with write access to this repo is free
to close this PR if they aren't buying my argument ;).
[1]: #344 (comment)
[2]: http://www.unicode.org/versions/Unicode5.0.0/ch16.pdf#page=23
|
The concern is just about breaking literal matches for things we've marked
deprecated, but I think we could have used your expertise on the legal call
just now.
That said, I confess I don't fully understand. Will these files not be UTF8
compliant if we don't do this? cc: @goneall
|
On Thu, Sep 14, 2017 at 11:10:37AM -0700, Bradlee H. Edmndson wrote:
The concern is just about breaking literal matches for things we've marked
deprecated…
I don't think this will break complaint matchers, which are supposed
to [1] ignore whitespace [2] (which is what the Unicode spec says a
non-leading BOM counts as).
That said, I confess I don't fully understand. Will these files not
be UTF8 compliant if we don't do this?
They're complaint (the Unicode spec says you are “strongly encouraged”
to do something other than use an in-file BOM, they don't say that you
*have to* do something else.
[1]: https://spdx.org/spdx-specification-21-web-version#h.2mjng0vqrghe
[2]: https://spdx.org/spdx-license-list/matching-guidelines
Whitespace rules in §3.1.1
|
Agree with @wking comments on the BOM - It shouldn't affect the matching and is probably a safe merge. On the legal call, we just didn't spend much time on this PR since it was a deprecated license. @wking if you feel strongly that this should be accepted, we can raise it again to the legal group with the additional technical information / background. If you don't feel strongly, we can just close the PR. |
On Thu, Sep 14, 2017 at 06:33:03PM +0000, goneall wrote:
@wking if you feel strongly that this should be accepted, we can
raise it again to the legal group with the additional technical
information / background.
I can do that. But does the legal team really care? I think the
legal team established a “we want to make sure deprecated license
instances continue to be matched” criterion. After that, I think it's
up to the tech team to maintain clean backing code that implements the
legal team's requirements. Which would make merging this PR (or not)
a tech-team decision, based on how convinced we are that:
a) It cleans up the current backing code, and
b) It does not effect matching.
I'm pretty convinced on both counts, and think the Unicode spec I
quote above backs up (a). I'm happy to file test cases (in
spdx/tools? With other matchers?) if folks need something more
concrete for (b).
|
as per @goneall comment that we didn't spend much time on this on the call and I don't think it's worth spending much time on it beyond that - shall I just merge this pull request? |
On Thu, Sep 14, 2017 at 01:03:50PM -0700, Jilayne Lovejoy wrote:
as per @goneall comment that we didn't spend much time on this on
the call and I don't think it's worth spending much time on it
beyond that…
Regardless of whether it matters for this particular pull request, I
think it would be useful to have a corpus of examples and a test suite
to confirm continued matching [1] as the spec, license source format,
and tooling evolve. Do we already do this somewhere? I haven't been
able to turn it up in spdx/tools.
[1]: #411 (comment)
|
I believe @goneall does this for releases of the license list and has done this as we touch all this XML (offline IIRC), but AFAIK it's not in any repo yet. Others have asked this as too, and I've called for this in the form of continuous integration, but I don't believe it's automated yet. But that could be a good target after completing the XML conversion and adding the new licenses for the next release. |
On Thu, Sep 14, 2017 at 10:28:21PM +0000, Bradlee H. Edmndson wrote:
But that could be a good target after completing the XML conversion
and adding the new licenses for the next release.
This order helps us get through XML conversion faster, but means we
don't get refactoring protection for the conversion. Although perhaps
@goneall's offline checks are sufficient protection for that.
|
The code that generates the website and license data for the license data output is located here: https://github.com/spdx/tools/blob/master/src/org/spdx/tools/LicenseRDFAGenerator.java Feel free to suggest other checks. |
Generated with:
because we don't a BOM for UTF-8. And even if we did need a BOM, it would be at the beginning of the file, and not in the middle after a
<p>
tag.