Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

License for public dataset #204

Closed
EgorBu opened this issue Apr 23, 2018 · 5 comments
Closed

License for public dataset #204

EgorBu opened this issue Apr 23, 2018 · 5 comments

Comments

@EgorBu
Copy link

EgorBu commented Apr 23, 2018

Hi!

We need to decide about licenses we can (have to) use for public datasets and add it to guide.

Examples of licenses:
In PGA: https://github.com/src-d/datasets/blob/master/DCO
In other dataset related to data from Github: https://www.kaggle.com/davidshinn/github-issues

@EgorBu
Copy link
Author

EgorBu commented Apr 23, 2018

@marnovo, @campoy, @mcuadros - what do you think? (If somebody else is missed - please add them to discussion)

@smola
Copy link
Contributor

smola commented Apr 23, 2018

IANAL but if we include actual source code, we cannot use a traditional software license (e.g. Apache or GPL), since it would be incompatible. We can look into some database licenses instead: https://opendatacommons.org/licenses/

More info: https://opendatacommons.org/faq/licenses/#Why_Do_You_Distinguish_Between_the_8220Database8221_and_its_8220Contents8221

@campoy
Copy link
Contributor

campoy commented May 4, 2018

I had a quick chat with @eiso about this, I wonder if he could share his knowledge here.

@eiso
Copy link
Member

eiso commented May 4, 2018

@campoy using the ODBL 1.0 license suggested by @smola is a good option because we specify the individual content licenses to the best of our ability in the index file of the dataset.

Databases can contain a wide variety of types of content (images,
audiovisual material, and sounds all in the same database, for example),
and so the ODbL only governs the rights over the Database, and not the
contents of the Database individually. Licensors should use the ODbL
together with another license for the contents, if the contents have a
single set of rights that uniformly covers all of the contents. If the
contents have multiple sets of different rights, Licensors should
describe what rights govern what contents together in the individual
record or in some other way that clarifies what rights apply.

There is another option which is the Community Data License Agreement by the Linux Foundation but it hasn't picked up a lot of steam since launch. So let's go for ODBL.

@campoy
Copy link
Contributor

campoy commented May 16, 2018

Created an issue to track the change in the datasets repo, I'll close this one.

@campoy campoy closed this as completed May 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants
@smola @eiso @campoy @EgorBu @marnovo and others