Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce SPDX compatible shortnames #855

Open
maxhbr opened this issue Jun 20, 2017 · 6 comments
Open

Enforce SPDX compatible shortnames #855

maxhbr opened this issue Jun 20, 2017 · 6 comments
Milestone

Comments

@maxhbr
Copy link
Member

maxhbr commented Jun 20, 2017

It might be good to enforce SPDX compatible shortnames, i.e which satisfy that they are made up of the characters from the set 'a'-'z', 'A'-'Z', '0'-'9', '+', '_', '.', and '-'.

Otherwise are the generated SPDX Documents possibly not valid. The broken files then can not be parsed with the spdx-tools.

The enforcing can be done by disallowing the creation of licenses with invalid shortnames.

Additionally it would be good if we fail the SPDX generation if it would contain invalid shortnames.

@timurphy
Copy link
Contributor

This would fix #748 and help with fixing #747

@mcjaeger
Copy link
Member

Makes absolutely sense from my opinion. needs to be placed at various locations (candidate licenses, add license, license import by CSV) though, and also we need a policy how to deal with importing licenses and a non-SPDX name is found. Simple error in the import log (which is returned on the page) would be enough imho.
@MaximilianHuber is there a regex in the SPDX tools code ensuring this there? could be good to use exactly the same.

@maxhbr
Copy link
Member Author

maxhbr commented Jun 26, 2017

Maybe something like [A-z\d\.\-]{3,20}\+ but I have not yet found the correct regex under https://github.com/spdx/tools.

@timurphy
Copy link
Contributor

timurphy commented Jun 27, 2017

@MaximilianHuber https://spdx.org/spdx-specification-21-web-version#h.jxpfx0ykyb60 has the official specification, which doesn't mention length restrictions. Is the {3,20} restriction a fossology limitation? The spec lists LicenseRef-23 as a valid identifier, which would be listed as 23 in the fossology database. This is too short for the regex above. BSD-3-Clause-Attribution is also a valid identifier, which is too long (24 chars).

@timurphy
Copy link
Contributor

timurphy commented Jun 27, 2017

I think spdx-tools is using the regex defined here: https://github.com/spdx/tools/blob/02657c72bec23a0f3a2578f6f63e2f0a89235309/src/org/spdx/rdfparser/SpdxRdfConstants.java#L254

// license ID format
public static String NON_STD_LICENSE_ID_PRENUM = "LicenseRef-";
public static Pattern LICENSE_ID_PATTERN_NUMERIC = 
		Pattern.compile(NON_STD_LICENSE_ID_PRENUM+"(\\d+)$");	// Pattern for numeric only license IDs
static Pattern LICENSE_ID_PATTERN = Pattern.compile(NON_STD_LICENSE_ID_PRENUM+"([0-9a-zA-Z\\.\\-\\_]+)\\+?$");

Interestingly, I don't think the regex above meets the SPDX requirements, as it allows the + character in the middle of the license id. The spec only allows it as the last character.

EDIT: Regex was updated in this commit. I've updated the code block above with the change.

@maxhbr
Copy link
Member Author

maxhbr commented Jun 28, 2017

@timurphy You are absolutely right. I have found the restriction somewhere in the correct context and have copied it without thinking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants