Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http://www.microsoft.com/opensource/licenses.mspx redirects to irrelevant page #780

Closed
ppalaga opened this issue Mar 1, 2019 · 7 comments

Comments

@ppalaga
Copy link

ppalaga commented Mar 1, 2019

http://www.microsoft.com/opensource/licenses.mspx currently redirects to https://cloudblogs.microsoft.com/opensource/ where no license texts occur.

https://web.archive.org/web/20150619132250id_/http://www.microsoft.com/en-us/openness/licenses.aspx is the newest relevant archive.org entry I found

@ppalaga
Copy link
Author

ppalaga commented Mar 4, 2019

Related #433

@swinslow
Copy link
Member

swinslow commented May 2, 2019

Hi @ppalaga, which license on the license list are you looking at for this URL?

Per @jlovejoy's note at #433 (comment) and the SPDX license list overview, we don't remove old URLs as they may still be useful for matching references to the license in old code. But we may add new or best available URLs.

@ppalaga
Copy link
Author

ppalaga commented May 2, 2019

Hi @ppalaga, which license on the license list are you looking at for this URL?

I am not looking at any particular license. It is a problem for every license entry that refers to http://www.microsoft.com/opensource/licenses.mspx .

We try to leverage SPDX license data in license-maven-plugin [1] for creating license reports. This i.a. includes downloading license texts from the URLs, assigning license names to those documents as well as grouping URLs that deliver the same content.

Clearly, having documents in the report that actually do not contain any license text is a problem. It would be nice if the URLs known not to return valid content could be annotated accordingly.

[1] https://github.com/mojohaus/license-maven-plugin

@goneall
Copy link
Member

goneall commented May 2, 2019

@ppalaga Can you use the license text in the license data (JSON or RDF format)?

I would be cautious about using the URL for the purpose of license text. The URL's are references and are not as reviewed as closely as the license text. License URL's also have a nasty habit of going stale.

BTW - Agree we should add an annotation on if they are known to be stale. I thought we already logged an issue for that, but I couldn't find it. For the schema file, we could add an optional attribute to the crossRef element. I'm not quite sure how we would represent this in the JSON files since the URLs are just an array of strings. Feel free to propose any solutions as a new issue.

@ppalaga
Copy link
Author

ppalaga commented May 3, 2019

@ppalaga Can you use the license text in the license data (JSON or RDF format)?

I am considering it, but the link from a URL to SPDX license entry is not 1:1 in all cases. Several SPDX license entries may refer to one URL. License name, if present in Maven metadata may help to choose the right license, but as you may know, the license names found in Maven metadata rarely match the SPDX names.

It would be nice to have a library that would be able to decide reliably for a (URL, SPDX license ID) pair whether the document returned by the URL contains the license text of the given SPDX license. Do you happen to know if something like that exists (preferably in Java)?

I would be cautious about using the URL for the purpose of license text. The URL's are references and are not as reviewed as closely as the license text. License URL's also have a nasty habit of going stale.

Yes you are right but the URLs are still the most reliable part of Maven license metadata.

BTW - Agree we should add an annotation on if they are known to be stale. I thought we already logged an issue for that, but I couldn't find it. For the schema file, we could add an optional attribute to the crossRef element. I'm not quite sure how we would represent this in the JSON files since the URLs are just an array of strings. Feel free to propose any solutions as a new issue.

Where should I propose that? Here under license-list-XML project?

@goneall
Copy link
Member

goneall commented May 3, 2019

It would be nice to have a library that would be able to decide reliably for a (URL, SPDX license ID) pair whether the document returned by the URL contains the license text of the given SPDX license. Do you happen to know if something like that exists (preferably in Java)?

Here's an attempt at doing this type of mapping: https://github.com/spdx/spdx-maven-plugin/blob/master/src/main/java/org/spdx/maven/MavenToSpdxLicenseMapper.java

Note that it doesn't really try to validate the URL points to the license text. It just ignores any URL's which are ambiguous. The code license licensed under Apache-2.0, so feel free to use it. If you have any ideas on improving the algorithm, please post an issue or PR to the plugin URL.

Where should I propose that? Here under license-list-XML project?

I would suggest posting here in the license-list-XML repo since it impacts the XML schema and is currently the only repo specifying the full set of license terms. Once the schema and JSON formats have been figured out, I can add an issue to the LicenseListPublisher to implement the changes for the JSON format.

@ppalaga
Copy link
Author

ppalaga commented May 13, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants