Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add license text from the test files to resolve text formatting issues #83

Merged
merged 1 commit into from
Nov 14, 2020

Conversation

goneall
Copy link
Member

@goneall goneall commented Nov 14, 2020

Signed-off-by: Gary O'Neall gary@sourceauditor.com

Signed-off-by: Gary O'Neall <gary@sourceauditor.com>
@goneall goneall merged commit 3f9f2ef into master Nov 14, 2020
@goneall goneall deleted the testlictext branch November 14, 2020 04:38
kdesysadmin pushed a commit to KDE/licensedigger that referenced this pull request Dec 15, 2020
Due to recent changes in the SPDX license parser

spdx/LicenseListPublisher#83

the format of several canonical licenses changed. This patch
updates to the latest version from the SPDX registry:

for f in `ls *.txt`; do reuse download ${f::-4} -o new/$f; done
@sschuberth
Copy link
Member

sschuberth commented Dec 18, 2020

BTW, I finally gave this a look, and IMO it does not fix the issue properly. Just compare e.g. https://github.com/spdx/license-list-data/blob/master/text/Apache-2.0.txt to https://www.apache.org/licenses/LICENSE-2.0.txt. For example all the leading indentation is stripped, so the formatting is still broken compared to upstream.

@goneall
Copy link
Member Author

goneall commented Dec 18, 2020

I went back through the code and found 2 issues:

  1. There was still some formatting being done to word-wrap the text files.
  2. The Apache-2.0 test file is not the canonical license text. see https://raw.githubusercontent.com/spdx/license-list-XML/master/test/simpleTestForGenerator/Apache-2.0.txt

I can easily fix 1 above - I'll create a separate PR.

For 2, the License-List-XML repo will need to be updated with the correct text. I looked at other licenses and most of them have had the line breaks for word-wrapping removed from the original text. Fixing these would require someone (or someones) to go through and replace the test text with the canonical text - a rather large effort.

@sschuberth
Copy link
Member

Fixing these would require someone (or someones) to go through and replace the test text with the canonical text - a rather large effort.

I could probably help with this. But speaking about this, I've always wondered why the files in https://github.com/spdx/license-list-data/blob/master/text/ aren't simply copies of the canonical upstream texts, and why the test in https://raw.githubusercontent.com/spdx/license-list-XML/master/test/simpleTestForGenerator/ doesn't simply use those files (e.g. included as a Git submodule). It seems odd to me that currently, the only place where plain copies of the canonical upstream texts are used, is a repository called "license-list-XML".

@goneall
Copy link
Member Author

goneall commented Dec 19, 2020

I could probably help with this.

That would be great 👍

I've always wondered why the files in https://github.com/spdx/license-list-data/blob/master/text/ aren't simply copies of the canonical upstream texts, and why the test in https://raw.githubusercontent.com/spdx/license-list-XML/master/test/simpleTestForGenerator/ doesn't simply use those files (e.g. included as a Git submodule).

The reason is there isn't a repository of upstream texts to reference. It would take quite a bit of effort to create such an repository for the hundreds of files.

In many cases, the files stored in license-list-data test directory are copies of the upstream text.

The proposal is that we just use the license-list-data test directory files as the upstream representation.

The tools that generate the license-list-data have already been updated to just copy the text from https://raw.githubusercontent.com/spdx/license-list-XML/master/test/simpleTestForGenerator/ to the license text.

This PR just removes the word-wrapping being done against the copies. Once we merge this PR, it "should" just be an exact copy of the files in license-list-data test directory.

@goneall
Copy link
Member Author

goneall commented Dec 19, 2020

I created a PR in the license-list-XML repo to recommend that plain text test files should match the text and formatting of the original license: spdx/license-list-XML#1160

Feel free to comment on any process related suggestions in the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants