Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPL-3.0-or-later template has extra spaces #1146

Closed
alandtse opened this issue Nov 22, 2020 · 12 comments
Closed

GPL-3.0-or-later template has extra spaces #1146

alandtse opened this issue Nov 22, 2020 · 12 comments
Labels
XML markup change potential change or addition to XML markup in license
Milestone

Comments

@alandtse
Copy link
Collaborator

alandtse commented Nov 22, 2020

This appears to be an issue converting the xml to the template text.

The template for GPL-3.0-or-later includes extra spaces around the alternatives (philosophy/licenses) for the url:
https://www.gnu.org/licenses/why-not-lgpl.html.

The source file looks fine:
https://github.com/spdx/license-list-XML/blob/master/src/GPL-3.0-or-later.xml#L838

But the generated template has the extra spaces:
https://github.com/spdx/license-list-data/blob/master/template/GPL-3.0-or-later.template.txt#L251

You can also see it on the main html page.
https://spdx.org/licenses/GPL-3.0-or-later.html

@goneall
Copy link
Member

goneall commented Nov 22, 2020

For optional, there is a spacing attribute. Adding spacing="none" attribute value to the <optional> element (e.g. http<optional spacing="none">s</optional>) removes the spaces.

Unfortunately, this isn't currently implemented for the variable/alt text.

@goneall
Copy link
Member

goneall commented Nov 22, 2020

Issue #1076 is also requesting this feature.

@alandtse
Copy link
Collaborator Author

I'm not sure what the attribute does for downstream users of the generated template. They don't see that tag at all so they can't manipulate it. It also makes the template incorrect as it won't match the canonical license. The issue is whatever is parsing the xml file and adding in spaces to the generated files.

@goneall
Copy link
Member

goneall commented Nov 23, 2020

I'm not sure what the attribute does for downstream users of the generated template.

It is a hint to the licenseListPublisher which generates the template. The downstream will have the correct spacing based on the hints.

The problem is that when an element is added in the XML, you can not reliably determine what the spacing should be. I fear removing the space for all occurrences would re-introduce a previous formatting bug.

For reference, here is the code that interprets the XML and generates the template: https://github.com/spdx/LicenseListPublisher/blob/23ff8ef1a8bf5aff20944b2e2571c28f44056f14/src/org/spdx/licensexml/LicenseXmlHelper.java#L502

There is a BUNCH of code in there to handle spaces and newlines which was developed somewhat through trial and error. If you have any suggestions on a better approach, please add suggestions to the LicenseListPublisher issues list.

I'll add an issue soon to the LicenseListPublisher to track resolving the spacing issue described here.

@goneall
Copy link
Member

goneall commented Nov 23, 2020

Added issue spdx/LicenseListPublisher#87 to track the tool changes.

@alandtse
Copy link
Collaborator Author

Given we now have the Python license matcher, is there a unit test to make sure that all templates will actually match their text? This is basically how I found the GPL-3.0 issue, but it feels like it should be part of the CI.

@goneall
Copy link
Member

goneall commented Nov 23, 2020

Given we now have the Python license matcher, is there a unit test to make sure that all templates will actually match their text? This is basically how I found the GPL-3.0 issue, but it feels like it should be part of the CI.

It is already part of the CI - The LicenseListPublisher does a number of checks including if there are any duplicate licenses in the licenseXML and verify the license text matches test text submitted.

For reference, the code which does the checking can be found here: https://github.com/spdx/LicenseListPublisher/blob/23ff8ef1a8bf5aff20944b2e2571c28f44056f14/src/org/spdx/licenselistpublisher/LicenseRDFAGenerator.java#L503

@goneall
Copy link
Member

goneall commented Nov 23, 2020

Note that there are some implementation differences between the Python and Java validation. A difference may indicate an issue in one (or both) of the libraries.

@alandtse
Copy link
Collaborator Author

I'm surprised that this wasn't caught earlier if we are doing the test. Even if there are different implementations, the space in a url would stop a match which is how I found it.

@swinslow
Copy link
Member

Thanks @goneall, I've merged #1147 and #1148. Between that and the open issue at spdx/LicenseListPublisher#87, is it okay to close this one or is there anything here for which we should keep this issue open?

@goneall
Copy link
Member

goneall commented Nov 23, 2020

I'm surprised that this wasn't caught earlier if we are doing the test. Even if there are different implementations, the space in a url would stop a match which is how I found it.

Me too. I suspect this may be an issue with the Java license matching code. It needs more investigation to confirm, but I added an issue to track.

@swinslow I think we can now close this one as we have more specific issues opened to track the identified problems.

@swinslow
Copy link
Member

Thanks @alandtse and @goneall!

@swinslow swinslow added this to the 3.11 release milestone Nov 23, 2020
@swinslow swinslow added the XML markup change potential change or addition to XML markup in license label Nov 24, 2020
goneall added a commit that referenced this issue Apr 14, 2022
This addresses a concern raised in issue #1146

Signed-off-by: Gary O'Neall <gary@sourceauditor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
XML markup change potential change or addition to XML markup in license
Projects
None yet
Development

No branches or pull requests

3 participants