Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPDX identifiers for licenses? #251

Open
stain opened this issue Nov 3, 2020 · 3 comments
Open

SPDX identifiers for licenses? #251

stain opened this issue Nov 3, 2020 · 3 comments

Comments

@stain
Copy link

stain commented Nov 3, 2020

This thread is trying to gather existing best practice, or for such to be established, and perhaps to hear other views.

license property vs SPDX identifier

https://schema.org/license refers to a CreativeWork or URL and is of course useful particularly on all kinds of https://schema.org/CreativeWork beyond documents, e.g. https://schema.org/SoftwareSourceCode and https://schema.org/ImageObject

It is now common best practice in open source software to [use SPDX ids]https://spdx.dev/ids/) for identifying source code's license, you may have come across code comments like:

# SPDX-License-Identifier: GPL-2.0-or-later

But http://schema.org/license requires a URL or Creative Work - so which one to use? And can we classify these with SPDX identifiers even if a specialized license file (with copyright) is linked to? How do we deal with dual-license?

SPDX intro

https://spdx.org/licenses/ lists known open source licenses. These are great as you avoid confusions such as "What do you mean 'BSD license', 2-clause, 3-clause or 4-clause?" - the umabigious BSD-3-Clause can be looked up to https://spdx.org/licenses/BSD-3-Clause

SPDX has known licenses expressed as RDF like (simplified):

<http://spdx.org/licenses/GPL-2.0-or-later>
        a                             spdx:License ;
        rdfs:comment                  "This license was released: June 1991. This license identifier refers to the choice to use code under GPL-2.0-or-later (i.e., GPL-2.0 or some later version), as distinguished from use of code under GPL-2.0-only. The license notice (as seen in the Standard License Header field below) states which of these applies the code in the file. The example in the exhibit to the license shows the license notice for the \"or later\" approach." ;
        rdfs:seeAlso                  "https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html" , "https://opensource.org/licenses/GPL-2.0" ;
        spdx:isFsfLibre               "true" ;
        spdx:isOsiApproved            "true" ;
        spdx:licenseId                "GPL-2.0-or-later" ;
        spdx:name                     "GNU General Public License v2.0 or later" ;

(this RDF seems to only exist in GitHub, although some microdata is embedded it gets the subject wrong).

Using SPDX URIs as @id

So the simple approach, shown in schemaorg/schemaorg#1928, is to just use these URIs like http://spdx.org/licenses/GPL-2.0-or-later directly - @njh in https://www.arduinolibraries.info/libraries/arduino-json.json have opted for the https instead of http variant:

{
  "@context": "http://schema.org/",
  "@type": "SoftwareApplication",
  "name": "ArduinoJson",
  "url": "https://arduinojson.org/?utm_source=meta&utm_medium=library.properties",
  "author": {
    "@type": "Person",
    "name": "Benoit Blanchon"
  },
  "license": "https://spdx.org/licenses/MIT"
}

Many URIs

Many of the licenses have their own URIs as well, and then the usual http vs https etc, so we could have many potential inconsistencies:

For listing/mapping https://opendefinition.org/licenses/api/ has a nice list, but it's custom JSON.

Challenges

The SPDX website is inconsistent with it's own RDF and https://spdx.org/licenses/ links to https://spdx.org/licenses/MIT.html (notice https and html) so I guess many will get the alternative URIs - I have also seen the variant NJH uses as most common, e.g. we refer to it from https://www.commonwl.org/user_guide/17-metadata/index.html

SPDX identifiers are also not just identifying a single license, but also expressions covering dual licenses like MIT or Apache-2.0 or exceptions. Some licenses like https://spdx.org/licenses/BSD-3-Clause are templates requiring a copyright year and copyright holder, and so the actual license URL would be a specialized file, say https://github.com/seek4science/seek/blob/master/BSD-LICENSE which would then not immediately be recognizable as the BSD 3-Clause license.

Using identifier from CreativeWork

One way around this could be to use http://schema.org/identifier on an anonymous or local CreativeWork license resource - of course setting the SPDX expression directly as identifier would be easiest, but a bit too much left as implications:

{ "@id": "workflow.cwl",
  "@type": "SoftwareSourceCode",
  "license": {
      "@id": "https://creativecommons.org/licenses/by/4.0/",
      "@type": "CreativeWork",
      "name": "CC BY 4.0",
      "description": "Creative Commons Attribution 4.0 International License",
      "identifier": "CC-BY-SA-4.0"
    }
}

Using PropertyValue to capture SPDX expressions

More explicit using http://schema.org/PropertyValue identifiers we can better include SPDX expressions, even if there either is no license file, or it is a local specialization:

{ "@id": "dual-licensed.py",
  "@type": "SoftwareSourceCode",
  "license": {
      "@type": "CreativeWork",
      "name": "MIT or AGPL 3.0 (or later)",
      "description": "Dual-licensed as MIT or AGPL 3.0",
      "isBasedOn": [
        "https://spdx.org/licenses/MIT",
        "https://spdx.org/licenses/AGPL-3.0-or-later",
      ],
      "identifier": {
          "@type": "PropertyValue",
          "name": "SPDX-License-Identifier",
          "value": "MIT OR AGPL-3.0+",
          "propertyID": "https://spdx.github.io/spdx-spec/appendix-V-using-SPDX-short-identifiers-in-source-files/"
       }
    }
 }

We see that the SPDX expression MIT OR AGPL-3.0+ is captured. I threw in http://schema.org/isBasedOn for good measure, although this would play double-duty with the SPDX license expression without its flexibility or rigidity.

Here I used https://spdx.github.io/spdx-spec/appendix-V-using-SPDX-short-identifiers-in-source-files/ as the https://schema.org/propertyID as it explains well the SPDX expressions, and instead of just SPDX I used SPDX-License-Identifier to match what they recommend for code comments. (not sure if propertyId here should be {@id: https://spdx.github.io/spdx-spec/appendix-V-using-SPDX-short-identifiers-in-source-files instead.)

This is much more precise - but unfortunately becomes a bit too nested/repetitive when applied to the base case of just using https://spdx.org/licenses/MIT style URIs directly:

{
  "@context": "http://schema.org/",
  "@type": "SoftwareApplication",
  "name": "ArduinoJson",
  "license": {
      "@id": "https://spdx.org/licenses/MIT",
      "@type": "CreativeWork",
      "name": "MIT",
      "identifier": {
          "@type": "PropertyValue",
          "name": "SPDX-License-Identifier",
          "value": "MIT",
          "propertyID": "https://spdx.github.io/spdx-spec/appendix-V-using-SPDX-short-identifiers-in-source-files/"
       }
    }
}

Discussion across GitHub

(This section added to lure others in to comment with their views 😁 )

In schemaorg/schemaorg#1928 @njh concludes to use https://spdx.org/licenses/MIT directly as @id

In seek4science/seek#456 we tried to explore this further, as we had initially abused license as a text field with an implied SPDX identifier looked up using https://opendefinition.org/ JSON - we need to distinguish between "data license" and "software license". It suggests the PropertyValue expanded form shown above. Discussions include @fbacall @stuzart @alaninmcr

In radiantearth/stac-spec#378 @mojodna @gkellogg @m-mohr are using the variant https://spdx.org/licenses/MIT.html in JSON-LD

In galaxyproject/galaxy#10408 @jmchilton and @nsoranzo are referencing SPDX from Galaxy workflows, unclear which identifier form (custom YAML?)

In earthcubearchitecture-project418/p418Docs#6 we see @mbjones earthcubearchitecture-project418/p418Docs#6 (comment) suggest a PropertyValue approach as above, but less verbose with propertyID: SPDX string, as https://schema.org/propertyID can be either Text or URL.

The Citation File Format (CFF) (custom YAML) use license_url: https://spdx.org/licenses/MIT and license: "MIT" - see for instance citation-file-format/cffconvert#25 by @jspaaks and citation-file-format/citation-file-format#105 with @thomaskrause

@mbjones
Copy link

mbjones commented Nov 3, 2020

In the https://science-on-schema.org guidelines for Dataset metadata, we recommend using SPDX URIs from the RDF files: https://github.com/ESIPFed/science-on-schema.org/blob/master/guides/Dataset.md#license

In CodeMeta, which is a schema.org extension for software metadata, we also recommend using SPDX: codemeta/codemeta#67 although the guidelines are not prescriptive.

@m-mohr
Copy link

m-mohr commented Nov 6, 2020

Some quick thoughts:

  • We can surely adopt different variations of SPDX. We only generate JSON-LD on the fly from the STAC metadata files, which are JSON only. We are the only one to append the ".html" to the URL, but can surely remove that to align with others.
  • We found that SPDX for data is not very suitable in many cases. There are a couple of data-related licenses missing and many licenses are actually custom/proprietary licenses (although some of the data sets are free), so we went for an additional allowed value "proprietary" (also not ideal), which then adds a link to the actual license.

@bact
Copy link

bact commented Jan 25, 2024

In the https://science-on-schema.org guidelines for Dataset metadata, we recommend using SPDX URIs from the RDF files: https://github.com/ESIPFed/science-on-schema.org/blob/master/guides/Dataset.md#license

In CodeMeta, which is a schema.org extension for software metadata, we also recommend using SPDX: codemeta/codemeta#67 although the guidelines are not prescriptive.

Just a note from Codemetapy https://github.com/proycon/codemetapy :

"For schema:license, full SPDX URIs are used where possible."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants