Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Textual licence formatting is broken #1076

Closed
uiopaubo opened this issue Jul 24, 2020 · 8 comments
Closed

Textual licence formatting is broken #1076

uiopaubo opened this issue Jul 24, 2020 · 8 comments
Assignees
Milestone

Comments

@uiopaubo
Copy link

While double-checking the output of the REUSE tool (https://reuse.software/), I noticed that the licence texts it downloads in order to populate a licensing manifest did not match those traditionally distributed with software. At least the GPL3 licence text differs in formatting from the more established form:

https://github.com/spdx/license-list-data/blob/master/text/GPL-3.0-or-later.txt

In this downloaded document, indentation appears to have been stripped, and the text appears to have been reformatted. Such reformatting actually introduces errors such as in the final URL of the above document:

<https://www.gnu.org/
licenses /why-not-lgpl.html>.

It is broken across two lines and includes an erroneous space. Neither of these things are present in the established, canonical version.

Although this issue may seem trivial, discovering superficial differences required me to investigate whether the actual content of the downloaded document was correct. By reformatting, any comparison with existing documents is made difficult and, to be certain that no mistakes or tampering have occurred with the published documents, requires more effort instead of the anticipated reduced effort in using such a centralised licence repository in the first place.

May I suggest that the canonical textual forms of these licences be used instead? I have read about the use of XML forms of licences as the origin of published licence data, but this is arguably the wrong decision or one that needed to be taken with the fidelity of the textual output in mind.

@goneall
Copy link
Member

goneall commented Jul 24, 2020

Moving this issue to the License-List-XML repo.

@uiopaubo changing from XML to a different format is a rather large effort. The SPDX legal team would be the group to decide if we change format.

Note the related enhancement request spdx/license-list-data#44 which is currently being worked on.

@goneall goneall transferred this issue from spdx/license-list-data Jul 24, 2020
@goneall
Copy link
Member

goneall commented Jul 24, 2020

I took a look at the source XML for the GPL license. The space and line break are caused by the <alt matching text at

&lt;http<optional spacing="none">s</optional>://www.gnu.org/<alt match="philosophy|licenses" name="philicenses">licenses</alt>/why-not-lgpl.html&gt;.

I can think of 2 solutions - we could remove the <alt from the URL or we could add a spacing attribute similar to what we did for the <optional tag.

If someone was willing to volunteer to update the license XML to use the spacing attribution (e.g. <alt spacing=none..., I could update the license list publisher tools to interpret the spacing attribute and generate the correct text.

@uiopaubo
Copy link
Author

I'm not advocating anyone changing the format of anything. As the referenced issue points out, the upstream licence files should be the canonical form of the licence in those formats. Otherwise, one has to verify that the content has not been modified semantically, which is far more work than just comparing data byte for byte. Indeed, I can imagine these reformatted text versions causing all sorts of problems by appearing superficially different to the widely distributed forms of these licences.

One thing I would also be worried about is the authenticity of marked up versions of licences which have not originated from the authors of those licences. Although one might argue that one has the freedom to make such content available in any form, I seem to recall that some organisations (the FSF, for example) retain strict control over the use of licence texts in order to avoid the proliferation of lookalike licences.

@swinslow
Copy link
Member

I can think of 2 solutions - we could remove the <alt from the URL or we could add a spacing attribute similar to what we did for the <optional tag.

If someone was willing to volunteer to update the license XML to use the spacing attribution (e.g. <alt spacing=none..., I could update the license list publisher tools to interpret the spacing attribute and generate the correct text.

Hi @goneall, I'm happy to update the <alt tags to include the spacing attribute. Did the latest updates to the license list publisher tools include functionality to handle this? If so then I'll make the change before 3.11 goes out, if not then I can push this issue out.

@goneall
Copy link
Member

goneall commented Nov 19, 2020

@swinslow @uiopaubo The license text file issue may already be resolved by spdx/LicenseListPublisher#83

This can be confirmed by reviewing the text files in the license-list-data repo (e.g. GPL-3.0-or-later.txt).

There is still an issue with the HTML formatting, however (reference https://spdx.org/licenses/GPL-3.0-or-later.html the end of which contains <https://www.gnu.org/ licenses/why-not-lgpl.html> - note the space between gnu.org/ and licenses/).

I'm thinking we should open a separate issue to track the HTML formatting and close this one as resolved.

I have not yet implemented the spacing attribute for <alt tags. I probably won't get to it before this release, but I can do this for the next release - updating the software sometime in December.

@swinslow
Copy link
Member

@goneall, is this sufficiently resolved by #1147 and #1148, or should I keep this one open?

@goneall
Copy link
Member

goneall commented Nov 23, 2020

@swinslow I think we should close this issue since there are no more specific pull requests and issues logged to track the solution implementation.

@swinslow
Copy link
Member

Thanks @goneall!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants