Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formatting of plain license text in JSON data is broken #1924

Open
goneall opened this issue Jun 11, 2018 · 10 comments
Open

Formatting of plain license text in JSON data is broken #1924

goneall opened this issue Jun 11, 2018 · 10 comments
Assignees
Milestone

Comments

@goneall
Copy link
Member

goneall commented Jun 11, 2018

Moving issue from SPDX tools. Originally submitted by @sschuberth

At the example of Apache-2.0, when extracting the licenseText string to a file, I'd expect that file to be exactly formatted like the original plain text license including leading spaces and blank lines. However, the JSON string is formatted like

Apache License

Version 2.0, January 2004

http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      

      "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.

      

      "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.

      

(note the missing leading spaces but added trailing spaces) which not only does not match the original text but also is quite ugly.

@goneall
Copy link
Member Author

goneall commented Jun 11, 2018

The way we are maintaining the license information in the license-list-XML github repository it is not feasible to retain the formatting of the original text since XML removes the white space and we do not have enough tags to retain all of the formatting.

That being said, we could do a better job of formatting the text and making it look prettier.

The code for this has actually moved to a different project: LicenseListPublish

@sschuberth
Copy link
Member

The way we are maintaining the license information in the license-list-XML github repository it is not feasible to retain the formatting of the original text

I believe that's exactly the problem then, and XML shouldn't be used as the primary format to store the original text. It could still be used as a format to apply SPDX-specific formatting, however.

@jeffmcaffer
Copy link

+1 to maintaining the original format of the licenses. Currently, for example, these two license texts differ by newlines. https://github.com/spdx/tools/blob/master/resources/stdlicenses/MIT.jsonld
https://github.com/OpenSourceOrg/licenses/blob/master/texts/plain/MIT

While it is not a huge deal for consumers to find some wordwrapping implementation and run the text through before, say, generating a NOTICE file, it is extra hassle and will lead to apparent differences. Would be great to generate clarity and simplicity around licenses by using the same canonical form everywhere.

@goneall goneall self-assigned this Nov 12, 2020
@goneall
Copy link
Member Author

goneall commented Nov 14, 2020

Resolves in PR spdx/LicenseListPublisher#83

@sschuberth
Copy link
Member

I'm reopening this to remind myself that the issue hasn't really been fixed yet. While PR spdx/LicenseListPublisher#83 laid the foundation for getting it fixed, https://raw.githubusercontent.com/spdx/license-list-data/b8d6af45ad2fcfed61bb85a8ad068aa4a77eadf9/text/Apache-2.0.txt still does not match https://www.apache.org/licenses/LICENSE-2.0.txt formatting-wise.

IIUC @goneall correctly, the remaining thing to do is to commit the original / upstream plain text licenses to https://github.com/spdx/license-list-XML/tree/master/test/simpleTestForGenerator and then rerun this publisher to make the correct licenses show up at https://github.com/spdx/license-list-data/tree/master/text. I'll try to wrote a script for that to finally resolve this long-stand issue.

@goneall
Copy link
Member Author

goneall commented Apr 10, 2023

@sschuberth - Just going through the older issue. Any thoughts or progress on updating the text in the license-list-XML repo?

@sschuberth
Copy link
Member

Sorry @goneall, this issue has slipped my mind. But would you agree that the mentioned approach is the way to go:

the remaining thing to do is to commit the original / upstream plain text licenses to https://github.com/spdx/license-list-XML/tree/master/test/simpleTestForGenerator and then rerun this publisher to make the correct licenses show up at https://github.com/spdx/license-list-data/tree/master/text.

@goneall
Copy link
Member Author

goneall commented Apr 10, 2023

@sschuberth I agree with the above approach.

I'll move this issue over to the license-list-XML repo since this is where the work will be done.

@swinslow @jlovejoy FYI - if you disagree with updating the test text to fix the formatting in JSON, please add to this issue and cc @sschuberth

@goneall goneall transferred this issue from spdx/LicenseListPublisher Apr 10, 2023
@jlovejoy
Copy link
Member

@sschuberth @goneall - I'm not sure I'm following the implementation details here, but I think the goal is to get to a point to where the text files at https://github.com/spdx/license-list-XML/tree/main/test/simpleTestForGenerator are "formatted" to look or reflect any original text file for a given license (e.g, https://www.apache.org/licenses/LICENSE-2.0.txt ) or at least has some form of line length limit to avoid horizontal scrolling?

if we do that, then the formatting will show up better at https://github.com/spdx/license-list-data/tree/master/text.

is that right-ish?

I'm all in favor of better formatting such that people can "reuse" text files. I think we need to document which text file directory is the best to use as well.

Also, keep in mind that the text files created in https://github.com/spdx/license-list-XML/tree/main/test/simpleTestForGenerator are created as part of the PR when the license is accepted to the SPDX License List. We have a GSoC project that would add functionality to create this text file automatically via the online submission tool, instead of people having to create it manually. So, any formatting parameters should be included for that project.

@goneall
Copy link
Member Author

goneall commented Apr 12, 2023

@jlovejoy

if we do that, then the formatting will show up better at https://github.com/spdx/license-list-data/tree/master/text.

Close - the specific issue is related to the JSON files, but the formatting for JSON and the text files is the same source

Sounds like you're in general agreement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants