New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: expand supported license data #7

Closed
migurski opened this Issue Sep 19, 2015 · 32 comments

Comments

Projects
None yet
6 participants
@migurski
Member

migurski commented Sep 19, 2015

Based on the license principles discussion, I’d like to recommend a formal change to the contribution guidelines and our parsing code to support an optional extended description.

Currently, the license is documented as “a URL or string”, and supports both explicit links and implicit short strings:

"license": "http://geonb.snb.ca/downloads/documents/geonb_license_e.pdf"

Also valid:

"license": "CC-BY-SA"

We should support an additional expanded version of the license data, with true/false flags for license properties such as required attribution or share-alike:

"license": {
    "url": "http://geonb.snb.ca/downloads/documents/geonb_license_e.pdf",
    "attribution-string": "GeoNB – www.snb.ca/geonb",
    "attribution": true,
    "share-alike": false
}

The old forms will still be acceptable. In cases where attribution or share-alike are not explicitly defined, we would assume both are required.

If this proposal were accepted, here are the next steps:

  1. Write and test machine code to support the structure above.
  2. Deploy new machine code.
  3. Update contribution guide to reflect newly-supported structure.
  4. Research licenses of existing sources to determine their properties.
  5. Expand existing sources to use new license structure.
  6. Determine whether to deprecate older URL/string structure.
@NelsonMinar

This comment has been minimized.

Show comment
Hide comment
@NelsonMinar

NelsonMinar Sep 19, 2015

Contributor

I'm in favor of this! I don't feel strongly about the format of the JSON blob, but this looks good.

Is there some commonly accepted definition of what "Attribution" and "Share-Alike" means that we're implying here? Perhaps the Creative Commons definitions? http://creativecommons.org/licenses/

Are there different degrees of Share-Alike? Wondering if there are some Share-Alike licenses that basically mean "your copy of our database has to be shared", but doesn't extend to a more viral "anything you make with this data must also be Share-Alike".

Contributor

NelsonMinar commented Sep 19, 2015

I'm in favor of this! I don't feel strongly about the format of the JSON blob, but this looks good.

Is there some commonly accepted definition of what "Attribution" and "Share-Alike" means that we're implying here? Perhaps the Creative Commons definitions? http://creativecommons.org/licenses/

Are there different degrees of Share-Alike? Wondering if there are some Share-Alike licenses that basically mean "your copy of our database has to be shared", but doesn't extend to a more viral "anything you make with this data must also be Share-Alike".

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski Sep 19, 2015

Member

Attribution seems really clear to me.

Share-alike is much more slippery—I’m still not sure if it seems safer to assume yes or no on this one.

Member

migurski commented Sep 19, 2015

Attribution seems really clear to me.

Share-alike is much more slippery—I’m still not sure if it seems safer to assume yes or no on this one.

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski Sep 19, 2015

Member

Tagging @sbma44 and @iandees for particular input on this.

Member

migurski commented Sep 19, 2015

Tagging @sbma44 and @iandees for particular input on this.

@NelsonMinar

This comment has been minimized.

Show comment
Hide comment
@NelsonMinar

NelsonMinar Sep 19, 2015

Contributor

Perhaps we need a "license unknown" category in the output files.

Contributor

NelsonMinar commented Sep 19, 2015

Perhaps we need a "license unknown" category in the output files.

@iandees

This comment has been minimized.

Show comment
Hide comment
@iandees

iandees Sep 19, 2015

Member

I like this idea and the format of the json blob. I'm a tad bit worried
about us interpreting licenses and boiling them down into a couple
attributes. Maybe if we're clear in docs that this is our interpretation
and might not want to be your interpretation?

Member

iandees commented Sep 19, 2015

I like this idea and the format of the json blob. I'm a tad bit worried
about us interpreting licenses and boiling them down into a couple
attributes. Maybe if we're clear in docs that this is our interpretation
and might not want to be your interpretation?

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski Sep 20, 2015

Member

Yes, I agree with the idea that this is our interpretation.

Member

migurski commented Sep 20, 2015

Yes, I agree with the idea that this is our interpretation.

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski Sep 20, 2015

Member

Would it be fair to say that anything in OA can be used for derived works? That’s really the crux of the SA flag: it governs what you can do with those works, but we assert that anything in OA should be usable for new data products.

Member

migurski commented Sep 20, 2015

Would it be fair to say that anything in OA can be used for derived works? That’s really the crux of the SA flag: it governs what you can do with those works, but we assert that anything in OA should be usable for new data products.

@NelsonMinar

This comment has been minimized.

Show comment
Hide comment
@NelsonMinar

NelsonMinar Sep 20, 2015

Contributor

I don't think there's any value in us collecting data that cannot be used at all for derived works. (Is there any?) The challenge is what restrictions the license might place on derived works. Share-Alike provisions require derived works (sometimes?) make the whole derived work share-alike. Non-Commercial provisions forbid commercial use. I think we should include sources with SA or NC provisions but very clearly delimit them.

Contributor

NelsonMinar commented Sep 20, 2015

I don't think there's any value in us collecting data that cannot be used at all for derived works. (Is there any?) The challenge is what restrictions the license might place on derived works. Share-Alike provisions require derived works (sometimes?) make the whole derived work share-alike. Non-Commercial provisions forbid commercial use. I think we should include sources with SA or NC provisions but very clearly delimit them.

@NelsonMinar

This comment has been minimized.

Show comment
Hide comment
@NelsonMinar

NelsonMinar Sep 20, 2015

Contributor

Calling out Non-Commercial explicitly, that's also a common license provision in some circles. Do we need it for OA data sources?

Contributor

NelsonMinar commented Sep 20, 2015

Calling out Non-Commercial explicitly, that's also a common license provision in some circles. Do we need it for OA data sources?

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski Sep 20, 2015

Member

Sounds like we might, and it would map cleanly to the three flags in CC licenses. There’s eight possible combinations, but CC documents just six.

Member

migurski commented Sep 20, 2015

Sounds like we might, and it would map cleanly to the three flags in CC licenses. There’s eight possible combinations, but CC documents just six.

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski Sep 20, 2015

Member

…and I see that two of them include No Derivatives, which I think we can exclude. We would have five possible kinds of downloads:

  • No requirements.
  • Attribution (BY)
  • Attribution-ShareAlike (BY-SA)
  • Attribution-NonCommercial (BY-NC)
  • Attribution-NonCommercial-ShareAlike (BY-NC-SA)

Three without NC:

  • No requirements.
  • Attribution (BY)
  • Attribution-ShareAlike (BY-SA)
Member

migurski commented Sep 20, 2015

…and I see that two of them include No Derivatives, which I think we can exclude. We would have five possible kinds of downloads:

  • No requirements.
  • Attribution (BY)
  • Attribution-ShareAlike (BY-SA)
  • Attribution-NonCommercial (BY-NC)
  • Attribution-NonCommercial-ShareAlike (BY-NC-SA)

Three without NC:

  • No requirements.
  • Attribution (BY)
  • Attribution-ShareAlike (BY-SA)
@NelsonMinar

This comment has been minimized.

Show comment
Hide comment
@NelsonMinar

NelsonMinar Sep 20, 2015

Contributor

Yeah, three flags in the source documents (one for each feature: BY, SA, NC). Then we can present a list of collections however makes sense based on which license features are most common.

Contributor

NelsonMinar commented Sep 20, 2015

Yeah, three flags in the source documents (one for each feature: BY, SA, NC). Then we can present a list of collections however makes sense based on which license features are most common.

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski Sep 20, 2015

Member

What do we think about presenting them as positives in the download descriptions:

  1. Share-alike → Any License Allowed.
  2. Noncommercial → Commercial Use Allowed.
  3. Attribution → [whatever the opposite of attribution would be]
Member

migurski commented Sep 20, 2015

What do we think about presenting them as positives in the download descriptions:

  1. Share-alike → Any License Allowed.
  2. Noncommercial → Commercial Use Allowed.
  3. Attribution → [whatever the opposite of attribution would be]
@ajturner

This comment has been minimized.

Show comment
Hide comment
@ajturner

ajturner Sep 21, 2015

Big fan of this.

By the way, CreativeCommons Rights Relation & ccREL for w3c & OKFN Open Licenses.

So using CreativeCommons NS would perhaps be:

"license": {
 "cc:permits": ["cc:Reproduction", "cc:Distribution"],
 "cc:prohibits": ["cc:CommercialUse"]
}

ajturner commented Sep 21, 2015

Big fan of this.

By the way, CreativeCommons Rights Relation & ccREL for w3c & OKFN Open Licenses.

So using CreativeCommons NS would perhaps be:

"license": {
 "cc:permits": ["cc:Reproduction", "cc:Distribution"],
 "cc:prohibits": ["cc:CommercialUse"]
}
@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski Sep 21, 2015

Member

I like the “permits” vs. “prohibits” language, that’s great. The cc: namespaces might not be entirely appropriate since we’re not technically dealing with CC, but linking to them for the spirit could be enough.

Member

migurski commented Sep 21, 2015

I like the “permits” vs. “prohibits” language, that’s great. The cc: namespaces might not be entirely appropriate since we’re not technically dealing with CC, but linking to them for the spirit could be enough.

@sbma44

This comment has been minimized.

Show comment
Hide comment
@sbma44

sbma44 Sep 28, 2015

Super-late to this, but will say:

  • I like the idea of nudging the world toward CC taxonomies
  • I think noncommercial licenses are likely to be common enough to be worth noting, but the difficulty of defining "commercial" in a unified way across licenses makes this of limited practical value
  • ditto the thresholds under which sharealike attaches. attribution is relatively straightforward, and even when requirements vary (for instance: the Austrians are touchy about having the snapshot date specified), the plausible damages for a violation are minimal. when you ponder the limitless mutability of the ODbL "substantial" threshold you can get a sense of just how slippery this will be in practice.

With all that said I think proceeding is great, but we should make the disclaimers totally unavoidable. I'd hate for anyone to think we're taking formal positions on the usability of the data/offering legal advice.

sbma44 commented Sep 28, 2015

Super-late to this, but will say:

  • I like the idea of nudging the world toward CC taxonomies
  • I think noncommercial licenses are likely to be common enough to be worth noting, but the difficulty of defining "commercial" in a unified way across licenses makes this of limited practical value
  • ditto the thresholds under which sharealike attaches. attribution is relatively straightforward, and even when requirements vary (for instance: the Austrians are touchy about having the snapshot date specified), the plausible damages for a violation are minimal. when you ponder the limitless mutability of the ODbL "substantial" threshold you can get a sense of just how slippery this will be in practice.

With all that said I think proceeding is great, but we should make the disclaimers totally unavoidable. I'd hate for anyone to think we're taking formal positions on the usability of the data/offering legal advice.

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski Sep 29, 2015

Member

Makes sense, thank you! I’ll move forward, and I’ll make sure that disclaimers are reflected in the download page design.

Member

migurski commented Sep 29, 2015

Makes sense, thank you! I’ll move forward, and I’ll make sure that disclaimers are reflected in the download page design.

migurski added a commit to openaddresses/openaddresses.io that referenced this issue Sep 29, 2015

Re-pointed download link to results page
We are going to have disclaimers there about license status (openaddresses/openaddresses-ops#7), so a simple get-everything zip file link is no longer appropriate on the front page.
@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski Oct 4, 2015

Member

I’m making a series of changes here that introduce the new dictionary syntax, with backwards-compatible support for simple strings. It’s just URLs and strings so far; nothing about attribution or license properties yet. The new behavior is released in Machine 2.6.0.

Member

migurski commented Oct 4, 2015

I’m making a series of changes here that introduce the new dictionary syntax, with backwards-compatible support for simple strings. It’s just URLs and strings so far; nothing about attribution or license properties yet. The new behavior is released in Machine 2.6.0.

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski Oct 5, 2015

Member

Next step: there are a large number of existing sources with attribution tags. That’s the first explicit license flag we should support. I think we can use the presence of the tag as an implicit hint about the required attribution of the source, going back to the sources later on with explicit flags.

Member

migurski commented Oct 5, 2015

Next step: there are a large number of existing sources with attribution tags. That’s the first explicit license flag we should support. I think we can use the presence of the tag as an implicit hint about the required attribution of the source, going back to the sources later on with explicit flags.

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski Oct 9, 2015

Member

Here’s where we are at the moment with license tag documentation, FYI: https://github.com/openaddresses/openaddresses/blob/633cd4c/CONTRIBUTING.md#optional-tags

Member

migurski commented Oct 9, 2015

Here’s where we are at the moment with license tag documentation, FYI: https://github.com/openaddresses/openaddresses/blob/633cd4c/CONTRIBUTING.md#optional-tags

@geobrando

This comment has been minimized.

Show comment
Hide comment
@geobrando

geobrando Oct 21, 2015

Member

A couple things:

Next step: there are a large number of existing sources with attribution tags. That’s the first explicit license flag we should support. I think we can use the presence of the tag as an implicit hint about the required attribution of the source, going back to the sources later on with explicit flags

  1. @migurski : On at least a few occasions I know that I've added and attribution tag as a simple courtesy to the data owners and not because attribution was required under the license terms. I believe others have done the same. I would recommend against blindly converting all existing sources with this tag to the new license tag structure, unless you're OK with this.
  2. Since license text is sometimes included with the source data, shouldn't the license tag structure allow for paths in the data file that would allow machine to extract this from a single download?
Member

geobrando commented Oct 21, 2015

A couple things:

Next step: there are a large number of existing sources with attribution tags. That’s the first explicit license flag we should support. I think we can use the presence of the tag as an implicit hint about the required attribution of the source, going back to the sources later on with explicit flags

  1. @migurski : On at least a few occasions I know that I've added and attribution tag as a simple courtesy to the data owners and not because attribution was required under the license terms. I believe others have done the same. I would recommend against blindly converting all existing sources with this tag to the new license tag structure, unless you're OK with this.
  2. Since license text is sometimes included with the source data, shouldn't the license tag structure allow for paths in the data file that would allow machine to extract this from a single download?

@migurski migurski referenced this issue Oct 31, 2015

Closed

Separate attribution collections #236 #248

1 of 2 tasks complete
@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski Oct 31, 2015

Member

@geobrando: I’m treating the attribution tag as an implied requirement only in the absence of other information, and I’m not updating any of the sources to make this explicit. It should affect only collections without a clear flag, and I believe it will be safe. Does that sound okay to you?

Say more about the paths in the data. Are you thinking that we might point to some text file included in a zip archive?

Member

migurski commented Oct 31, 2015

@geobrando: I’m treating the attribution tag as an implied requirement only in the absence of other information, and I’m not updating any of the sources to make this explicit. It should affect only collections without a clear flag, and I believe it will be safe. Does that sound okay to you?

Say more about the paths in the data. Are you thinking that we might point to some text file included in a zip archive?

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski Nov 8, 2015

Member

Comments from @NelsonMinar suggest that splitting attribution downloads doesn’t make sense, but that splitting share-alike ones does. I’m going to put openaddresses/machine#236 and openaddresses/machine#248 on ice for a little while, and introduce a share-alike flag first.

Member

migurski commented Nov 8, 2015

Comments from @NelsonMinar suggest that splitting attribution downloads doesn’t make sense, but that splitting share-alike ones does. I’m going to put openaddresses/machine#236 and openaddresses/machine#248 on ice for a little while, and introduce a share-alike flag first.

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski Nov 8, 2015

Member

In openaddresses/machine#254, missing share-alike license information is assumed to mean false. Is this safe, or should it default to true to be more cautious?

Member

migurski commented Nov 8, 2015

In openaddresses/machine#254, missing share-alike license information is assumed to mean false. Is this safe, or should it default to true to be more cautious?

@NelsonMinar

This comment has been minimized.

Show comment
Hide comment
@NelsonMinar

NelsonMinar Nov 8, 2015

Contributor

My gut reaction is to assume false, simply because share-alike is so rare in the world we're dealing in. Right now do we have any sources that require it? Six months ago I bet we were explicitly not including them at all.

Better yet would be to not assume anything, and either reject a source that doesn't specify or else have some lint tool that's reporting sources missing this info.

Contributor

NelsonMinar commented Nov 8, 2015

My gut reaction is to assume false, simply because share-alike is so rare in the world we're dealing in. Right now do we have any sources that require it? Six months ago I bet we were explicitly not including them at all.

Better yet would be to not assume anything, and either reject a source that doesn't specify or else have some lint tool that's reporting sources missing this info.

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski Nov 9, 2015

Member

I’m thinking false as well. There are a few sources that appear to have SA. I’ll merge the machine changes as they are, and get to work on a set of OA changes that will formally document this and modify some sources.

Member

migurski commented Nov 9, 2015

I’m thinking false as well. There are a few sources that appear to have SA. I’ll merge the machine changes as they are, and get to work on a set of OA changes that will formally document this and modify some sources.

@geobrando

This comment has been minimized.

Show comment
Hide comment
@geobrando

geobrando Nov 10, 2015

Member

@migurski My concern was mainly fully deprecating the standalone attribution tag and converting existing sources to license.attribution = true, but in general I worry about using the presence of an attribution name tag to imply that attribution is required. Maybe it's just a matter of the documentation making this clear.

Say more about the paths in the data. Are you thinking that we might point to some text file included in a zip archive?

Yeah. I think I got confused and thought there were plans to extract license text for dissemination using license.url. But that isn't really feasible or typically necessary. But I believe I have seen some sources with terms that state that a copy of the license should be included whenever the data is disseminated. Can't recall where I saw this though.

Member

geobrando commented Nov 10, 2015

@migurski My concern was mainly fully deprecating the standalone attribution tag and converting existing sources to license.attribution = true, but in general I worry about using the presence of an attribution name tag to imply that attribution is required. Maybe it's just a matter of the documentation making this clear.

Say more about the paths in the data. Are you thinking that we might point to some text file included in a zip archive?

Yeah. I think I got confused and thought there were plans to extract license text for dissemination using license.url. But that isn't really feasible or typically necessary. But I believe I have seen some sources with terms that state that a copy of the license should be included whenever the data is disseminated. Can't recall where I saw this though.

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski Nov 13, 2015

Member

Hm good point. I’ve augmented some of the data sets with explicit attribution: false where possible, based on common licenses in openaddresses/openaddresses#1408.

Right now, the only place where the attribution requirement appears is the collection license file. Should I maybe have it default to false instead? Is this dangerous? It’s a softer license term than share-alike.

Member

migurski commented Nov 13, 2015

Hm good point. I’ve augmented some of the data sets with explicit attribution: false where possible, based on common licenses in openaddresses/openaddresses#1408.

Right now, the only place where the attribution requirement appears is the collection license file. Should I maybe have it default to false instead? Is this dangerous? It’s a softer license term than share-alike.

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski Nov 15, 2015

Member

After a few conversations, realizing that SA is the right license requirement to split downloads on. Going to stop openaddresses/machine#236 and openaddresses/machine#248 and create new issues to reflect this.

Member

migurski commented Nov 15, 2015

After a few conversations, realizing that SA is the right license requirement to split downloads on. Going to stop openaddresses/machine#236 and openaddresses/machine#248 and create new issues to reflect this.

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski Nov 17, 2015

Member

With the completion of these issues, I’d like to close this ticket:

Some remaining things that can be done separately:

Member

migurski commented Nov 17, 2015

With the completion of these issues, I’d like to close this ticket:

Some remaining things that can be done separately:

@iandees

This comment has been minimized.

Show comment
Hide comment
@iandees

iandees Nov 17, 2015

Member

I agree. Thanks for all your work on this, Mike!

Member

iandees commented Nov 17, 2015

I agree. Thanks for all your work on this, Mike!

@migurski

This comment has been minimized.

Show comment
Hide comment
@migurski

migurski Nov 17, 2015

Member

💥

Member

migurski commented Nov 17, 2015

💥

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment