Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Licensing #1

Open
1 of 2 tasks
mbojan opened this issue Feb 5, 2021 · 13 comments
Open
1 of 2 tasks

Licensing #1

mbojan opened this issue Feb 5, 2021 · 13 comments
Labels
help wanted Extra attention is needed question Further information is requested
Milestone

Comments

@mbojan
Copy link
Member

mbojan commented Feb 5, 2021

  • Authors
  • Other Statnet packages seem to have custom LICENSE should we use it or standard GPL-3 is enough?
@CarterButts
Copy link

We may want to think about what license makes sense for data objects, and associated documentation. GPL-3 may be fine, but we certainly want to specify that when using a data set, attribution should be given to the data set author. (GPL is a great license for code, but not always for other kinds of things, where e.g. fidelity of attribution or the like is important. Also, data itself is sometimes non-copyrightable, though associated material may be.) Perhaps worth looking to see what others have been doing in this space....

@mbojan
Copy link
Member Author

mbojan commented Feb 5, 2021

Good point. In similar packages I see:

I think all the data-object-related man pages in Statnet packages have proper references. We may add a sentence that the original sources need to be cited rather than the package itself. What do you think?

@martinamorris
Copy link
Member

I think this is where we need to be careful with the Add Health public datasets. We are not in a position to decide the license for those.

mbojan added a commit that referenced this issue Feb 7, 2021
@mbojan mbojan added this to the v1.0 milestone Feb 7, 2021
@mbojan
Copy link
Member Author

mbojan commented Feb 8, 2021

WRT licenses and citations I have added the following template to every man page:

#' @section Licenses and Citation: When publishing results obtained using this
#'   data set, the original authors (see sections Source and References) should
#'   be cited, along with this \code{R} package.

Some of the datasets have more details about the sources and publications than that.

@mbojan
Copy link
Member Author

mbojan commented Feb 8, 2021

* Authors

I have added all Statnet core group as authors. Please let me know if I should add somebody else.

We may want to think about what license makes sense for data objects, and associated documentation. GPL-3 may be fine, but we certainly want to specify that when using a data set, attribution should be given to the data set author. (GPL is a great license for code, but not always for other kinds of things, where e.g. fidelity of attribution or the like is important. Also, data itself is sometimes non-copyrightable, though associated material may be.) Perhaps worth looking to see what others have been doing in this space....

The package is on GPL-3 at this moment. Some of the datasets explicitly mention CC licenses. I think this needs to be unified. I suspect CRAN will not like one license in DESCRIPTION and some other licensing mentioned in the man pages. I think we can also license the whole package (and the datasets) with CC-BY, which I find in some of the data-only packages on CRAN.

@mbojan
Copy link
Member Author

mbojan commented Feb 9, 2021

I looked at pure data packages I have installed on my machine, this is how they are licensed:

package authors url license
carData John Fox [aut, cre], Sanford Weisberg [aut], Brad Price [aut] https://r-forge.r-project.org/projects/car/, https://CRAN.R-project.org/package=carData, http://socserv.socsci.mcmaster.ca/jfox/Books/Companion/index.html GPL (>= 2)
datasets R Core Team and contributors worldwide Part of R 4.0.3
igraphdata Gabor Csardi csardi.gabor@gmail.com http://igraph.org CC BY-SA 4.0 + file LICENSE
Lahman Michael Friendly [aut], Chris Dalzell [cre, aut], Martin Monkman [aut], Dennis Murphy [aut], Vanessa Foot [ctb], Justeena Zaki-Azat [ctb] https://CRAN.R-project.org/package=Lahman GPL
networkdata David Schoch [aut, cre] https://github.com/schochastics/networkdata MIT + file LICENSE
nycflights13 Hadley Wickham [aut, cre], RStudio [cph] http://github.com/hadley/nycflights13 CC0

So quite a variability.

I think this needs to be unified. I suspect CRAN will not like one license in DESCRIPTION and some other licensing mentioned in the man pages.

I browsed through the documentation of the datasets contained in other packages and have not encounter a single instance of a dataset licensed differently than the package itself. igraphdata is somewhat exceptional because the included LICENSE file have individual entries for each dataset describing the license for that particular dataset. The information is of roughly the same type as we currently have in sections "Source" and "Licensing and citing" for almost all data objects.

GPL seems rather strange for datasets but is often used (even R itself and pacakge datasets are licensed on GPL-2). One of CC seems more natural. I'm not an expert on licenses so I will be grateful for suggestions. What do you think of the following setup:

  • License the whole package on CC-BY or GPL-3
  • Include additional LICENSE file saying additional licensing information may be included in the man pages of individual data
  • Keep the sections mentioning CC-BY or other licensing in the man pages as they are.

Thoughts?

@mbojan mbojan added help wanted Extra attention is needed question Further information is requested labels Feb 9, 2021
@mbojan mbojan changed the title Update DESCRIPTION Licensing (was "Update DESCRIPTION") Feb 9, 2021
@mbojan mbojan changed the title Licensing (was "Update DESCRIPTION") Licensing Feb 9, 2021
@mbojan
Copy link
Member Author

mbojan commented Apr 25, 2021

Hadley Wickham recomends CC licenses for data packages: https://r-pkgs.org/license.html?q=licens#license-data . GPL indeed seems too code-specific.

mbojan added a commit that referenced this issue Nov 23, 2021
@mbojan
Copy link
Member Author

mbojan commented Nov 23, 2021

I have addressed this issue with the following:

  1. The package is licensed with CC BY
  2. All data documentation pages include the following section:

Licenses and Citation

If the section Source of this page does not specify otherwise, this data set is protected by the Creative Commons License https://creativecommons.org/licenses/by-nc-nd/2.5/.

When publishing results obtained using this data set, the original authors (see sections Source and/or References) should be cited, along with this R package. To cite this package please use the following:

Handcock M, Hunter D, Butts C, Goodreau S, Krivitsky P, Morris M, Bojanowski M (2021). statnet.data: Network Datasets for the Statnet Suite. R package version 0.1-0, <URL: https://statnet.org>.

Is that an acceptable solution @statnet/dev ?

@mbojan mbojan modified the milestones: v1.0, Publish on CRAN Nov 25, 2021
@mbojan
Copy link
Member Author

mbojan commented Nov 26, 2021

I'm not sure if you get notifications so mentioning individually @martinamorris @CarterButts @handcock @krivit @sgoodreau @drh20drh20 .

@krivit
Copy link
Member

krivit commented Dec 2, 2021

Do we want NC and ND terms in this license? NC means, AFAIK, that someone can't use this dataset to teach a workshop or in an example in a book, and ND means that they can't redistribute a version modified for their purposes or in a different format. Am I missing something?

@mbojan
Copy link
Member Author

mbojan commented Dec 3, 2021

I've put a wrong link in the man pages, should be https://creativecommons.org/licenses/by/4.0/ so only BY - attribution but no additional restrictions. It is correct in DESCRIPTION and LICENSE. Thanks @krivit for noticing that.

@martinamorris
Copy link
Member

Unless there's a good reason to restrict, it seems like these datasets should just require attribution. And while we've made the data accessible, which should be noted by attribution, we should encourage people to cite the original sources (which should be included as metadata in the dataset package).

mbojan added a commit that referenced this issue Dec 3, 2021
@mbojan
Copy link
Member Author

mbojan commented Dec 3, 2021

Per 6272770 URL is fixed to CC BY now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants