Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPL 2.0 License and necessary codebase changes #12

Closed
carlosparadis opened this issue May 26, 2020 · 0 comments
Closed

MPL 2.0 License and necessary codebase changes #12

carlosparadis opened this issue May 26, 2020 · 0 comments
Assignees
Labels
type:milestone Issues representing milestones since they don't recognize markdown... type:refactoring Code changes that improve maintenability, performance, etc

Comments

@carlosparadis
Copy link
Member

carlosparadis commented May 26, 2020

Both me and @rnkazman are in agreement to go with a MPL 2.0 instead of GPL, for the reasons stated here.

As of this version, however, my guess is that the code can't "legally" be redistributed due to any potential conflict of licenses themselves OR otherwise is implicitly GPL. This issue contains collected links, evidence, or sanity checking on the best course to go about this, and to index needed changes in order to go with the said license. This whole issue is of course IANAL.

1. R is GPL, does that make anything R GPL?

Maybe.

1.1 Not because of the R language per se, according to this twitter discussion between Hadley and another person, which cites, in turn, this page from the GPL license:

When the interpreter just interprets a language, the answer is no. The interpreted program, to the interpreter, is just data; a free software license like the GPL, based on copyright law, cannot limit what data you use the interpreter on. You can run it on any data (interpreted program), any way you like, and there are no requirements about licensing that data to anyone.

However, when the interpreter is extended to provide “bindings” to other facilities (often, but not necessarily, libraries), the interpreted program is effectively linked to the facilities it uses through these bindings.

However, maybe yes because all functions from base R are themselves license as GPL, a point raised on the discussion:

That clause is very explicit re library bindings — in this case, all of base R, which is GPL.

There is a public statement from R foundation that seems to discourage this interpretation, but it's murky. The same argument is also raised here concerning using base R.

More on the confusion is summarized in this Reddit thread which someone even suggests the NAMESPACE forces GPL too:

IANAL, but I think as soon as you build a package that imports function from other packages using the NAMESPACE mechanism of R that use gpl your package also needs to be distributed with gpl. This has to do with linking. As described in the same faq.

In short: It is not clear if using Base R or simply using the R package mechanism implies GPL. I believe data.table, an R package who recently switched from GPL to MPL 2.0 gets away with this because they do not use base R themselves.

Apparently, we don't use it either for now, but it's hard to say if we won't in the future.

2. Dynamically Linking with GPL packages.

2.1 library(aGPLlibrary)

The second and more prominent problem comes from this. Specifically, using library(aGPLlibrary) seems to falls under the clause from GPL that the entire codebase must turn GPL. According to GNU website:

It depends on how the main program invokes its plug-ins. If the main program uses fork and exec to invoke plug-ins, and they establish intimate communication by sharing complex data structures, or shipping complex data structures back and forth, that can make them one single combined program. A main program that uses simple fork and exec to invoke plug-ins and does not establish intimate communication between them results in the plug-ins being a separate program.

If the main program dynamically links plug-ins, and they make function calls to each other and share data structures, we believe they form a single combined program, which must be treated as an extension of both the main program and the plug-ins. If the main program dynamically links plug-ins, but the communication between them is limited to invoking the ‘main’ function of the plug-in with some options and waiting for it to return, that is a borderline case.

The interpretation above has been associated to give a margin of doubt on library(aGPLlibrary) being viral, and is why data.table turned MPL 2.0.

In short, for our purposes, this means we have to let go of 4 of the current packages currently in use to avert trouble:

  • stringr (>= 1.4.0),
  • jsonlite (>= 1.6),
  • igraph (>= 1.2.5)
  • lubridate

I am not sure yet if this is possible for stringr (the other option would be baseR), and for jsonlite (not sure on an option yet). lubridate alternative may also be just baseR.

igraph can possibly be circumvented, save for the projection operation to generate co-change, and thus file-file network projections weighted by number of modifications). Then again, DV8 provides HDSMs, which could be parsed to construct a file-file network weighted by number of modifications.

The second change is that the parse_igraph functions would be modified to not output igraph objects (as I can't load the library). Rather, it would output edgelists with type and weight, a common public format widely adopted to represent graphs, much like adjacent matrices and not particular to igraph. This in turn could be used by a GPL igraph notebook in a separate repo to showcase the work.

The good news is that the visualization library itself I use is MIT licensed: https://github.com/datastorm-open/visNetwork/blob/master/DESCRIPTION so maybe even the visualizations can be made available still, without relying on igraph.

2.2 Can we refer to any GPL code at all?

This was the next pertinent question: Currently Kaiaulu relies on data output by a GPL program, namely Perceval. Perceval parses the gitlog for us, so we don't need to reinvent the wheel. Is that a problem? Well, again the answer depends, but this time it seems favorable to us.

Specifically, Kaiulu interacts with Perceval (and to many more tools in the future) via command line. For example:

kaiaulu/R/parsers.R

Lines 27 to 31 in 77f0d66

# Use percerval to parse gitlog_path. --json line is required to be parsed by jsonlite::fromJSON.
perceval_output <- system2(perceval_path,
args = c('git', '--git-log',gitlog_path,git_uri,'--json-line'),
stdout = TRUE,
stderr = FALSE)

What does GPL says about this? GPL says maybe again. However, the general intuition I draw from it is that it is ok if your code does not look like a wrapper to the program itself. I am emphasizing the items by separating in bullets below, but the original text contains none:

  1. Where's the line between two separate programs, and one program with two parts? This is a legal question, which ultimately judges will decide. We believe that a proper criterion depends both on the mechanism of communication (exec, pipes, rpc, function calls within a shared address space, etc.) and the semantics of the communication (what kinds of information are interchanged).
  2. If the modules are included in the same executable file, they are definitely combined in one program.
  3. If modules are designed to run linked together in a shared address space, that almost surely means combining them into one program.
  4. By contrast, pipes, sockets and command-line arguments are communication mechanisms normally used between two separate programs. So when they are used for communication, the modules normally are separate programs.
    4.1 But if the semantics of the communication are intimate enough, exchanging complex internal data structures, that too could be a basis to consider the two parts as combined into a larger program.

In essence, we fall under item 4, and I believe items 1 to 3 are the reason why Section 2.2 of this issue is so confusing.

Does our use of Perceval fall under 4.1? I doubt it. We just get the data it downloads from elsewhere parsed. Moreover, I think my reasoning on going forward with OO-R6, which is also a package under MIT license, also seems to make sense due to the following discussion:

I have checked a legal books about GPL (unfortunately it s in French: Droit des logiciels - édition Puf - F Pellegrini et S. Canevet) and basically, to avoid contamination of a project with GPL2 licence, you need to wrap the GPL2 code and access to it through a generic interface.

This interface should make it possible to use code from other projects with similar functionalities than the GPL code. One interesting point is that to make it work, another project with similar function but under another licence should exist.

The implementation of the other project has not to be written, but just be easily possible. The idea is that it should be easy to replace the code under GPL with another one because of the interface (like a lego block).

The approach is inspired of how to limit GPL2 contamination because of the Linux kernel executing some non GPL code.

In general, my opinion is to limit the consequences of GPL code to the final user.
For instance, forcing the user to install another package to get some functions just for legal reasons is really an overkill. And letting the GPL licence contamining the whole project and making it impossible to use for some startup seems not Ok at all.

This means #11 is ideal. Having abstract classes defining how other interfaces can integrate clearly goes along with the description above:

  • Perceval is not the only git log parser.
  • igraph is not the only program who processes edgelists

Etc.

3. Another use case and a few more examples: Xgboost

This library was also subject to the same concerns. The discussion covers a lot of points not covered here. But, being a prominent library we can piggyback on some practices. Namely:

  • An alternative to stringr is stringi, which is not GPL, as xgboost did.
  • igraph is ok to be added under Suggested packages, and even code can be made available to plot, provided an alternative interface is available. See here for an example where igraph code is available as an alternative.
  • The package DESCRIPTION(https://github.com/dmlc/xgboost/blob/646def51e02d4017ac85065a10ca763e8941d62a/R-package/DESCRIPTION) file has a good guideline overall to where GPL packages and other permissive licenses packages should be.
  • It also shows I should move the packages from Depends to Import as a good practice, to enforce explicit call of the functions with package names, as I am already doing anyway.
  • This is a fantastic PR with several commits showing how to reduce GPL dependency and risk of infection. In particular the commit switching from stringr to stringi, and the commit creating a generic interface to ggplot.
    • Finally note the way to provide a generic interface is to use base R as the default. Which clears our concern that base R should be OK to use, all else failing.
  • Writing Vignettes with igraph should be OK too! This example which uses the jsonlite, a GPL library I must replace, is included in the vignette. From the R package standpoint that is considered OK, because the vignette is to showcase problems of someone having data originally in JSON. XGboost itself does not require json data, but a .csv file. Hence, it is OK to use. For our purposes, making the case we don't need a json library could be possible, but this would be very inconvenient.

4. Remaining Challenges

Finding a json library that is not GPL based, and any package that helps with replacing lubridate should be it. For lubridate maybe ?IDateTime will do, which is from data.table, albeit experimental.

Edit: Disregard jsonlite concern. It is MIT licensed. I don't know where I got the idea it wasn't.

Edit 2: See also: https://stat.ethz.ch/pipermail/r-help/2008-July/169332.html

Edit 3: See this to handle dates without lubridate: http://biostat.mc.vanderbilt.edu/wiki/pub/Main/ColeBeck/datestimes.pdf

@carlosparadis carlosparadis added type:refactoring Code changes that improve maintenability, performance, etc type:milestone Issues representing milestones since they don't recognize markdown... labels May 26, 2020
carlosparadis added a commit that referenced this issue May 28, 2020
As of this commit, the code only Imports the
following packages:

 * data.table (MPL 2.0)
 * stringi (BSD-3-clause)
 * jsonlite (Apache 2.0)

Moreover, the use of Perceval (GPL 3.0) is done by command line,
and can be replaced by any other commit log or mailing list log parsers,
as will be done in future commits. This, to the best of my knowledge,
as I am not a lawyer, complies with GPL based on:

https://www.gnu.org/licenses/gpl-faq.html#MereAggregation

and other open source packages like XGBoost (i #1338), in respect
to clarify the  codebase does not depend on a particular GPL project
to function.

Author roles were also rectified.
carlosparadis added a commit that referenced this issue Jun 8, 2020
The entire package was already licensed under MPL 2.0.
This just adds a missing header.

Signed-off-by: Carlos Paradis <carlosviansi@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:milestone Issues representing milestones since they don't recognize markdown... type:refactoring Code changes that improve maintenability, performance, etc
Projects
None yet
Development

No branches or pull requests

2 participants