-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
License details should include a link to the plain text, if known #78
Comments
@sschuberth Your solution raises another possible approach to fix the problem. Approache 1: Currently, when a license is added to the license-XML repository, a test file containing the text must also be added. Most of the time, the text file is a copy/paste of the original text. We could update the tool to copy the license text including the linefeeds and spaces verbatim. You can review what these test files look like at https://github.com/spdx/license-list-XML/tree/master/test/simpleTestForGenerator Approach 2: There is also an HTML format of the license text used to generate the web page. We could store this in the JSON files which would include the HTML tags for paragraphs etc. You can review what this would look like at https://github.com/spdx/license-list-data/tree/master/html Adding the link to the schema and JSON is reasonably straightforward, but the legal team would need to add the data and maintain this information. Going back and doing this for all of the licenses would be a very time consuming process and we would need volunteers to do the work. Something that would need to be discussed on the legal call. @jlovejoy let me know any additional thoughts. |
Add example file to convert tv to rdf Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
In any case, I believe the directory at https://github.com/spdx/license-list-data/tree/master/text should contain the original text files from upstream, if upstream has a plain text version. Also other formats like HTML should be taken as-is from upstream, and only formats that do not exist upstream should be generated to fill the "gaps". |
Note that @tjasmith is working on a project to identify any of the seeAlso URL's that have matching license text. This project may provide a partial solution which would not require manually reviewing and updating all of the licenses. |
Transferring this issue to the LicenseListPublisher where it would most likely get fixed. |
PR #83 implements approach 1. above. |
I'm reopening this to remind myself that the issue hasn't really been fixed yet. While PR #83 laid the foundation for getting issue spdx/license-list-XML#1924 fixed, this specific issue is about tracking the original URL to the original plain text licenses as part of license metadata. That is, at the example of |
I'm closing this as a duplicate of spdx/license-list-XML#1924. |
Unfortunately, the formatting in the
licenseText
's JSON value is broken for a lot of licenses (e.g. regarding indentation and paragraphs). For any processing the plain text version of the license text, if provided by upstream, should be the source of truth. To capture that, I propose to include a link to the plain text version, if any, into the license details. Preferably this would go to a newplainTextUrl
field, but better than nothing would also be a convention that the first link inseeAlso
refers to the plain text version, if any.Also, if a link to an upstream plain text version exists, that plain text should be used as-is for the
licenseText
field, instead of creating its value by stripping formatting from some rich text version of the license text, as it seems to be done now.The text was updated successfully, but these errors were encountered: