Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

License terms for generated code #6824

Closed
hoehrmann opened this issue Jul 8, 2018 · 22 comments
Closed

License terms for generated code #6824

hoehrmann opened this issue Jul 8, 2018 · 22 comments
Labels

Comments

@hoehrmann
Copy link

Hi,

I have been unable to find information about the license terms for code generated by emcc. For instance, if I have a simple C program and compile it with

emcc -O3 -std=c11 -o example.js -s NODERAWFS=1 -s ALLOW_MEMORY_GROWTH=1 -s SINGLE_FILE=1 example.c

... there is a substantial amount of JavaScript code in the generated example.js file, and I assume the implementation for standard functions like malloc and qsort is also coming from somewhere other than the execution environment. I do get messages like

INFO:root:generating system library: dlmalloc.bc...
INFO:root: - ok
INFO:root:generating system library: libc.bc...
INFO:root: - ok
INFO:root:generating system library: wasm-libc.bc...
INFO:root: - ok
INFO:root:generating system asset: generated_struct_info.json...
INFO:root: - ok

... but have been unable to tell which licenses might apply. For instance, the libc.bc code might be under the UIUC, NCSA, LGPL, or MIT license, depending on where it might come from, which I have been unable to find out. Looking through the emsdk source code I suspect dlmalloc.bc might have a CC0 1.0 Universal license attached to it, but that is as far as I got.

I am afraid that I can't just upload the example.js file to my web site and completely ignore possible license restrictions or requirements, even the MIT license tends to require attribution for substantial portions, for instance. I probably don't understand, to begin with, if any of the generated code comes from Emscripten and is governed by the Emscripten licenses, why the generated code does not automatically comply with the terms of the Emscripten licenses.

I am also interested in using third-party open source libraries in my project, but I am struggling with retaining mandatory license text as part of the generated source code. Is there a part of the Emscripten documentation that makes recommendations in that regard?

Thanks.

@kripken
Copy link
Member

kripken commented Jul 8, 2018

The LICENSE file should cover this - if something is missing we should add it to there. Specifically,

  • Emscripten code - including the JS, python, etc. etc. - is available under both the MIT and the LLVM licenses (really just MIT matters, but we have LLVM too in case it would be useful - meanwhile LLVM is changing it's license anyhow, so that's kind of moot...).
  • The musl libc code is MIT licensed.
  • Not mentioned in LICENSE file is dlmalloc, whose license says it is public domain, as you said. If it would be useful to mention that in LICENSE then that sounds good, a PR would be welcome. (Also, if there is somewhere we should document the importance of the LICENSE file, that sounds like a good improvement too.)

Overall the MIT license is basically what matters here. As a permissive, commercial-friendly license, I don't think we've had any complaints thus far.

There is no technical mechanism for retaining license texts - compiling (even just to bitcode) generally removes license comments. You need to manually add them for libraries you include.

@hoehrmann
Copy link
Author

Let's take https://github.com/kripken/emscripten/blob/master/system/lib/libc/musl/src/stdlib/qsort.c for instance. If the example.js file in my example above ends up including a compiled version of this code, I may have to say »Copyright (C) 2011 by Valentin Ochs ...« somewhere alongside where I distribute the example.js file.

And for other parts of musl, I may have to say »Copyright © 2005-2014 Rich Felker, et al. ...«, for other bits I may have to say »Copyright (c) 2010-2014 Emscripten authors ...« and depending on what features I use, there may be many more copyright notices I may have to account for, even if my own example.c does not have any third party dependencies.

But I do not really have a way to assemble all the copyright notices and license texts. That would seem to require a deep technical understanding of Emscripten. Even if I did it once, when I upgrade the emsdk, there may then be additional license obligations affecting the code I ultimately distribute. I would rather not distribute outdated copies of the emsdk used to build example.js in an attempt, possibly foolish, to meet license requirements -- so I am really not sure what to do.

@kripken
Copy link
Member

kripken commented Jul 8, 2018

The LICENSE file should contain those copyright notices, so quoting/linking it should be enough. If it doesn't contain everything it should currently, then that's a bug we should fix. I agree people shouldn't need to look through the codebase or have a deep understanding of it - that's what LICENSE is for.

@hoehrmann
Copy link
Author

What would help a lot is a single file that includes all the required notices and licenses that may be relevant to Emscripten output. I could then prepend that as a comment in example.js. The current LICENSE file makes that difficult because it makes references to other files, in particular

The musl libc project is bundled in this repo, and it has the MIT license, see
system/lib/libc/musl/COPYRIGHT

and

The third_party/ subdirectory contains code with other licenses. None of it is
used by default, but certain options use it (e.g., the optional closure compiler
flag will run closure compiler from third_party/).

If the latter paragraph could be clarified whether there are any non-default options that cause inclusion of third_party/ code in compiled binaries that may require notices or licenses, that would be a big help.

For the reference to the musl COPYRIGHT file, that file also contains references to other parts of the code base and does not fully reproduce all notices and licenses that may be required. The musl COPYRIGHT file could be included verbatim in the Emscripten LICENSE, but that might not be enough, but maybe that could taken up with the musl project?

@dschuff
Copy link
Member

dschuff commented Aug 2, 2018 via email

@GregorR
Copy link

GregorR commented Jan 4, 2019

Going to dive in here because this is actually a pretty major issue. It's not really emscripten's fault, but what tends to happen is that party A compiles everything and provides it with a LICENSE file which is usually correct except for the exclusion of libc's license, then party B takes the compiled version and throws it on a web server with no headers at all, which is a violation of all the licenses involved. A big part of the problem is that the F/OSS community in general doesn't seem to be able to read this text:

«The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.»

This isn't legalese that requires a sharp legal mind to understand. Just read it.

That, or something substantially similar, is in every major F/OSS license, including MIT. This is not a primary difference between BSD and MIT; the old BSD license had an advertisement attribution notice as well, but the new one doesn't, rendering them largely the same for this purpose. Regardless, copyright notices are required in the distribution of the software itself, a compiled version certainly constitutes as the software, and the LICENSE file doesn't make it even borderline clear how to provide them. emscripten is doing its due diligence in providing the correct attribution for itself, but doing nothing to help the end user in correctly maintaining attribution. Again, this isn't really emscripten's job, so there's not really any blame, but it would be very nice if it helped a bit.

For myself, I'd recommend that everyone interested in staying on the legal side of copyright law prepend the license text of whatever they're compiling to the generated output, as well as the license text for musl and, if applicable, the license text for the regular expression library it uses and libm it uses. As stated, dlmalloc is under the public domain, so at least we're free of that. Yes, this could be a lot of extra text, but every major web server and browser uses gzip, and license text compresses extremely well, particularly if it's the same license for several components.

It would be nice if a discussion of these issues was somewhere on the emscripten web page or documentation.

@GregorR
Copy link

GregorR commented Jan 4, 2019

(And, for what it's worth, it would be nice if emcc had an option to prepend some text after all processing, so it wouldn't get stripped. Obviously cat can do this just fine, but it can be awkward to do this in Makefiles due to emcc's proclivity to burn in filenames for including .mem/.wasm files, and thus the necessity to generate files with their final name rather than some intermediate name.)

@kripken
Copy link
Member

kripken commented Jan 7, 2019

@GregorR sounds good to me to add this to the docs, and to add an emcc option to prepend some text at the very end. If someone's interested to do it and needs help with either, let me know.

@stale
Copy link

stale bot commented Jan 7, 2020

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 7 days. Feel free to re-open at any time if this issue is still relevant.

@stale stale bot added the wontfix label Jan 7, 2020
@GregorR
Copy link

GregorR commented Jan 7, 2020

I'm reminded by stalebot of this issue. I still use emscripten fairly frequently, and still resort to manually concatenating license text afterwards. Indeed, mere days ago I pushed to github a repo containing this nonsense:

	cat license.js $@ > $@.tmp
	mv $@.tmp $@

I'm woefully unfamiliar with the emscripten ecosystem, so I haven't even a clue where to start. Really all I want is --unprocessed-pre-js (I don't think it's reasonable to imagine emscripten actually divining what license text belongs, merely giving you the opportunity to put it there), but I don't know what file includes the main function, where any of that is handled, or even what language it's written in :). I'd be happy to take a look and pull request something if I knew where to start.

@stale stale bot removed the wontfix label Jan 7, 2020
@kripken
Copy link
Member

kripken commented Jan 7, 2020

I think the relevant info should all be in the LICENSE file. I'm not a lawyer so I'm not sure if you're legally required to include that file, but in practice I believe most users don't, and I think that's understandable as they focus on code size, and our policy has always been to let as many people use the code in the most permissive way. But yeah, adding an option to automatically include that file seems reasonable, a PR would be welcome.

@GregorR
Copy link

GregorR commented Jan 7, 2020

Re PR: I'd be happy to do so, I just don't even know where to start with this codebase :)

Re licensing: FYI, while it is doubtful that any of the relevant parties would seek to do anything about it, both your license and musl's license (and virtually all common F/OSS licenses with the exception of the so-called "0-clause BSD") require attribution (in the form of including the license) in both source and binary form, and insofar as compiled code includes both the emscripten runtime libraries and the musl C library, all code compiled by emscripten must be accompanied by the license. The "no lawyer needed" demonstration of this fact is in major proprietary applications: see, for instance, the "about->licensing information" panel in Firefox, "chrome://credits" in Chrome, "system->about->legal information->third-party licenses" in Android, etc etc. It's not mandatory that said license text be in the actual generated JavaScript file, and I'd certainly never suggest that that should be a default or even desirable behavior, for the reasons of size you've pointed out; I just find that a convenient place to put it myself since it's hard to accidentally lose it that way, and would personally find an option to prepend such text convenient for my own use. The only requirement is that the licenses be distributed with the software, and so for web software, for instance, simply having it somewhere on the site is sufficient. It's the responsibility of emscripten's users, not emscripten, to do this, and emscripten isn't doing anything wrong, and is providing all of the licensing information where it ought. I just think it'd be nice if emscripten helped a bit.

The unfortunate thing with people "I am not a lawyer"-ing away license issues is seen in the situation of the Bukkit Minecraft server mod. Without considering the implications of their choice, they distributed their code under the GPL (a ridiculous choice for code which must be linked against the proprietary Minecraft server), and one disgruntled former contributor leveraged his apparent sole ability to actually read and understand license text to effectively dismantle the project. Your policy is reasonable from a pragmatic perspective, and certainly the kindest policy, but you're not the sole contributor or copyright owner, so you're opening up anyone who blindly follows said policy to leverage by other parties. People should know that it is a violation of contract for them to distribute emscripten-compiled code without license and attribution, even if it just so happens that all of the subjects of the license, at present, don't actually care. Spiteful people have raised court cases over less, and given these extremely liberal licenses, it's so simple not to open oneself to any such vulnerability. Literally all one has to do is include the license text for whatever they've compiled (if relevant), emscripten and musl, somewhere along with the software.

@sbc100
Copy link
Collaborator

sbc100 commented Jan 7, 2020

I can't see a good reason to embed the enture license in the JS rather then simply serve the LICENSE.txt as a separate file alongside the JS. Perhaps including a comment at the top of the JS with a link to the licence could make sense?

@GregorR
Copy link

GregorR commented Jan 7, 2020

Sure, that's fine, the license is there. That would still require the --unprocessed-pre-js option which is all I'm actually after :)

(Just for edification, the only reason I choose to embed the whole license is essentially paranoia. Most emscripten-compiled things are made to be used as libraries by others, and I'd rather make the conformant behavior the automatic behavior, rather than requiring users of said libraries to themselves transmit license files further. Ultimately I think the size concern is a bit moot with the sizes of software we're usually talking about; 3K of easy-compressed ASCII is nothing next to half a meg or more of wasm. But, I've been known to concatenate ZIP files with source code to compiled GPL applications so that I could be assured of source distribution; I may at times be a good guide on what is demanded by a license, but not on what is the best way to comply with those demands ;) )

@sbc100
Copy link
Collaborator

sbc100 commented Jan 7, 2020

What is process that strips the inserted license in the case of --pre-js? Is it closure? I wonder if there is simple way to mark it as not to be GC'd?

@GregorR
Copy link

GregorR commented Jan 7, 2020

By my understanding, the code is passed through closure, yes.

@sbc100
Copy link
Collaborator

sbc100 commented Jan 7, 2020

So if closure isn't used then --pre-js works just fine for your purposes with --closure is not specified on the command line?

@GregorR
Copy link

GregorR commented Jan 7, 2020

Brilliant idea, @sbc100! Never occurred to me to look into the minifier instead of emscripten, so I did. An accepted PR to closure suggests that /*! comments should be preserved, but I can't seem to make it work on code output by emscripten (with the comment in my pre.js).

Obviously it's preserved with --closure 0, but it would've been anyway :). That's a pretty rough compromise.

I wonder why closure in this configuration is stripping those comments... I tried it on my pre.js directly and it kept the comment fine...

@GregorR
Copy link

GregorR commented Jan 7, 2020

Wait, I lied! -Os is stripping it out, but --closure 1 or --closure 2 isn't! Now if I can just puzzle out what -Os is doing to strip out pre-js comments beyond what --closure is doing, you've saved my day 👍

@kripken
Copy link
Member

kripken commented Jan 7, 2020

It's possible either closure or the JS optimizer doesn't preserve those special comments. The JS optimizer uses acorn, see tools/acorn-optimizer.js, so maybe something needs to be set up there.

@stale
Copy link

stale bot commented Jan 8, 2021

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 30 days. Feel free to re-open at any time if this issue is still relevant.

@stale stale bot added the wontfix label Jan 8, 2021
@stale stale bot closed this as completed Feb 7, 2021
@Ciantic
Copy link

Ciantic commented Jun 17, 2022

I had same problem, found out printf works for prepending:

printf '%s\n/*\n%s\n\n%s\n%s\n*/\n%s\n' \
    "/// <reference types=\"./freetype.d.ts\" />" \
    "Freetype WASM library: https://github.com/Ciantic/freetype-wasm" \
    "$(cat freetype2/LICENSE.TXT)" \
    "$(cat brotli/LICENSE)" \
    "$(cat example/freetype.js)" \
    >example/freetype.js

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants