Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiling a JVM to run Java programs with emscripten #3342

Closed
httpdigest opened this issue Apr 13, 2015 · 34 comments
Closed

Compiling a JVM to run Java programs with emscripten #3342

httpdigest opened this issue Apr 13, 2015 · 34 comments

Comments

@httpdigest
Copy link

Hi there,

since with clang and emscripten it is now possible to run arbitrary C/C++ code via asm.js on the browser, I was curious whether anyone thought about compiling a simple JVM with emscripten to be able to run Java programs?

I thought about JamVM with GNU Classpath.
Of course, only interpretation would be possible and not JIT. Or at least the JIT compiler would need to emit JavaScript code and not native code.

Another thing would be threading. Since JavaScript does not support native threads with shared memory (only web worker threads without shared memory), the JVM would have to implement "green threads" with some cooperative mode, for example have the main "thread" invoke GC and system/IO calls at some safepoint bytecode instructions.

Is there anything in progress right now that tries to accomplish this?
Otherwise I am keen on looking into that experimentally.

@kripken
Copy link
Member

kripken commented Apr 13, 2015

JIT could actually be done - see pypy.js, which can do JIT compilation.

There is experimental threading support in progress in browsers and in emscripten, see https://blog.mozilla.org/javascript/2015/02/26/the-path-to-parallel-javascript/

There is also the emterpreter option in emscripten, which can pause and resume the execution state. That could implement switching between "threads", but not the speedup of multiple cores.

@httpdigest
Copy link
Author

Thank you for the blog article link and your hint to Emterpreter!

(An interpreter (JVM) within an interpreter (Emterpreter)... that's fancy! :D )

Is there an API for the thread scheduling in Emterpreter, because JamVM and the GNU Classpath build their threading on top of pthreads and we would now need to have an API to replace pthreads in order to guide Emterpreter in its decisions about when to create, pause and resume/restore execution of a particular "thread".
Like, when JamVM invokes pthread_create then Emterpreter would need to know that another "virtual" thread of execution need to be created.

This would be needed if we talk about translating the whole JamVM and GNU Classpath to the Bytecode suited for Emterpreter and let it then handle multithreading as guided by Emterpreter's API invocations in JamVM and GNU Classpath.

Again, thanks for your suggestions!

@kripken
Copy link
Member

kripken commented Apr 13, 2015

The SharedArrayBuffer project is implementing the pthreads API. In principle we could also hook up the emterpreter to those.

@httpdigest
Copy link
Author

Do you know the status of that project? I googled for it but all I found was some blog discussions.
Is it already available as API in Nightly, for example and is there some documentation?
Is it related to this google docs spreadsheet?: https://docs.google.com/spreadsheets/d/1PFa3aDxY6mffT8uoflCaFitX9lKj_Y4_aZwtMApIRiI/edit#gid=0
And is this the status of the project?: http://compatibility.shwups-cms.ch/en/home?&property=SharedArrayBuffer

@dtc
Copy link
Contributor

dtc commented Apr 14, 2015

@juj
Copy link
Collaborator

juj commented Apr 14, 2015

@httpdigest : you can try out pthreads using the branch from the pull request #3266 . See also the associated branch to LLVM in the comments of that pr.

@httpdigest
Copy link
Author

Thank you all for your help and hints to the documentation! Helps me alot.

Now, I first started slowly with test compiling Classpath and JamVM using a default GCC 4.8.2 for x86_64 target and it worked very nicely without any manual source or config/Makefile changes or quirks. (happens quite seldomly with C projects :) )
Now, I can already run Java programs with the compiled jamvm executable!

Currently, I am on to compiling juj's 'pthreads' branch of 'emscripten-fastcomp' repository from source together with kripken's 'master' branch of 'emscripten-fastcomp-clang', as I did not find a 'pthreads' branch in juj's 'emscripten-fastcomp-clang' repository and it seems to be heavily out of date compared to kripken's 'emscripten-fastcomp-clang' repository (tens of thousands of commits behind).

Is this the correct way of doing?

EDIT: Compiling fastcomp and fastcomp-clang worked but executing emcc now prints:
ERROR root: Emscripten, llvm and clang versions do not match, this is dangerous (1.30.5, 1.30.5, 1.31.0)
ERROR root: Make sure to use the same branch in each repo, and to be up-to-date on each. See http://kripken.github.io/emscripten-site/docs/building_from_source/LLVM-Backend.html

But I guess that's okay, since I am using the pthreads branch.

@juj
Copy link
Collaborator

juj commented Apr 15, 2015

There were no changes needed to the emscripten-fastcomp-clang repository for pthreads so you can just use the latest incoming branch from kripken's emscripten-fastcomp-clang. Or since kripken merged just last night, practically incoming equals master, so no differences there. The warning message is as expected and it also appeared last night as kripken bumped up the versions. If you want to get rid of the warning, you can also try doing git checkout 1.30.5 in the emscripten-fastcomp-clang repository.

Note that these branches are in quite a flux, since there is a lot of code coming in upstream, and I frequently rebase the pthreads branches on top of the latest upstream incoming to keep them closely in sync.

@httpdigest
Copy link
Author

Running a simple single-threaded printf "Hello, World!" program in latest firefox-trunk resulted in the error "TypeError: invalid object argument" in line:

HEAP8 = new SharedInt8Array(buffer);

I used firefox-trunk from the repository ppa:ubuntu-mozilla-daily/ppa via apt-get. The current version is 40.0a1 (2015-04-13).
Is SharedInt8Array view already supposed to be supported on Nightly?

@juj
Copy link
Collaborator

juj commented Apr 15, 2015

Not sure about different derivative repositories... Try checking in the browser page console if it does have the object SharedInt8Array in it. My recommendation is to directly build mozilla-central, which does have the support:

hg clone https://hg.mozilla.org/mozilla-central
cd mozilla-central
./mach build

More info at https://developer.mozilla.org/en-US/docs/Simple_Firefox_build

@httpdigest
Copy link
Author

The conditional

if (typeof SharedArrayBuffer != 'undefined') {

in the emscripten-generated code seems to have succeeded (evaluated to true, I mean) and then the next line

HEAP8 = new SharedInt8Array(buffer);

failed.
But I will try with the mozilla-central now.

EDIT: Just some info: Just tried on the current Firefox Nightly download for Windows 7 x64, which is 40.0a1 (2015-04-14).
There, the same error occurs with the same message and at the exact same source location.

@httpdigest
Copy link
Author

The global function SharedInt8Array does exist in both browser variants (Ubuntu and Windows), but the constructor function seems to take 3 arguments (EDIT: okay, that's just due to optional byte 'offset' and 'length' parameters).
I just debugged the code and was looking in the Debug Variables inspector in Firefox.
There, the SharedInt8Array is specified as SharedInt8Array(,,) with "Length = 3"
And will the SharedInt8Array not need to be created on a SharedArrayBuffer instead of a simple ArrayBuffer?

If I do this, it works:

if (typeof SharedArrayBuffer != 'undefined') {
var shared = new SharedArrayBuffer(buffer);
HEAP8 = new SharedInt8Array(shared);

but the little-endianness check after that fails. :(

So, I first create a SharedArrayBuffer on an ArrayBuffer and then the typed views on the SharedArrayBuffer.

EDIT: Just found a hint in the documentation https://docs.google.com/document/d/1NDGA_gZJ7M7w1Bh8S0AoDyEqwDdRh4uSoTPSNn77PFk/edit#heading=h.a6o4dubw5qla (end of Page 2) that the constructor of SharedArrayBuffer can only get an int as parameter and anything else will throw a TypeError.
Likewise, the typed views on a SharedArrayBuffer can only get a SharedArrayBuffer as first argument.

Funny thing is if, according to the "specs", I do this:

var buffer;
if (typeof SharedArrayBuffer != 'undefined') {
buffer = new SharedArrayBuffer(bufferLength);
HEAP8 = new SharedInt8Array(buffer);

Nightly hangs on the last call indefinitely. :)

@httpdigest
Copy link
Author

Okay, that "hanging" of Nightly was just due to the Debugger trying to get the values of all the memory locations of the big SharedArrayBuffer when mouse-hovering over the variable holding the SharedArrayBuffer.
Running it in non-debug mode does not hang.

So, the code setup with:

var buffer = new SharedArrayBuffer(bufferLength);
HEAP8 = new SharedInt8Array(buffer);

works for me now using a small test (putting and getting 8 and 32 bit ints here and there and testing for equality with expected values) on the current Windows 7 x64 Firefox Nightly.
Although a simple emscripten compile of a simple "Hello, World" does not work if I alter the generated SharedArrayBuffer creation to the above. I will investigate further. No errors but also no Hello World printing to the console.

Without altering the code, I get the aforementioned "TypeError".

BIG EDIT: I always totally forgot to specify "-s USE_PTHREADS=1" when invoking emcc. facepalm
Now with this and additionally with the latest changes in juj's branch according the creation of the SharedArrayBuffer it works on both Nightly on Windows 7 and on firefox-trunk on Ubuntu 14.04.2!

There is just one thing. Now I am getting "ReferenceError: PThread is not defined" at "PThread.terminateAllThreads();" in the "exit" function.

@httpdigest
Copy link
Author

I would close this issue, since in principle with the support of pthreads in Emscripten it seems to be possible. Another important thing would be atomics. Currently JamVM has very little architecture-specific assembler scripts that do cmpxchg.
I just don't know whether LLVM is able to read in .S files and convert that assembler text to LLVM bitcode which fastcomp would then translate to JavaScript using atomic operations provided by some JavaScript/Browser API.
But that could be another issue.

@juj
Copy link
Collaborator

juj commented Apr 16, 2015

The pthreads branch does support atomics, both as GCC intrinsics ( https://github.com/juj/emscripten/blob/pthreads/tests/pthread/test_pthread_gcc_atomic_op_and_fetch.cpp ) and as custom API that matches 1:1 with JS function calls ( https://github.com/juj/emscripten/blob/pthreads/system/include/emscripten/threading.h ). C++11 atomics should also work, but haven't gotten around to testing that yet.

It is not worthwhile to attempt to shove .S files through Emscripten, but generally its just better to ifdef those out. If JamVM has been written smartly, the whole codebase has one global spot where it defines those ops, so #ifdeffing them for GCC or Emscripten specific atomics to avoid the assembly should only be a few lines of change.

@httpdigest
Copy link
Author

Yes, luckily the codebase of JamVM is small and clean. I just saw those are not .S files as originally assumed, but are few inline asm in one specific header file, which I would use. This is src/arch/i386.h.
There is however one asm inline in another header file whose meaning I do not quite understand. It is asm("")
Could that just be removed?

@juj
Copy link
Collaborator

juj commented Apr 16, 2015

Empty asm blocks are often used as a compiler reordering barrier, see http://stackoverflow.com/questions/12183311/difference-in-mfence-and-asm-volatile-memory . Emscripten does support such empty asm blocks for the same purpose. Note though that it doesn't serve as a full memory barrier.

@httpdigest
Copy link
Author

Yay!
I managed to compile the GNU Classpath library 0.99 with Emscripten and ecj.
Classpath does not only contain the Java class library but a significant amount of C code binding the functionality exposed by the Java classes to native platform functions.
Luckily they do not use any assembler there. Only the virtual machine (JamVM) does.
The result of compiling Classpath are a set of shared library .so files (the file command says, they are "LLVM IR Bitcode" files :-) that can hopefully statically be linked into JamVM when building JamVM later on.

The repository for all this can be found here: https://github.com/httpdigest/gnu-classpath

@httpdigest
Copy link
Author

Ahh... just found out about Browser.asyncLoad and FS.createDataFile which completely answered all my current questions on how I would go about loading .jar and .class files from the user-supplied classpath over the web into the virtual file system for the JVM to read via libc/stdio.
Although I am not really comfortable with loading the 10MB bootstrap jar file into the Emscripten heap...
Maybe just once and then with IndexedDB there is a way to access data without putting it into the Emscripten heap?

@kripken
Copy link
Member

kripken commented Apr 21, 2015

We are experimenting with ways to do that, but they would only work in workers (which have synchronous IO access in some ways). Overall, in general you do just need to load that 10MB into the head. But 10MB isn't so bad ;)

@httpdigest
Copy link
Author

Thanks for the info. I am planning on having the "main" thread of the JVM just be a worker thread, which the browser thread of the site just bootstraps and kicks off via some "java.js" file. So nothing is done in the browser thread except to load this small "java.js" starter script, which then does a new Worker("libjamvm.js") and postMessage the JVM command line args (such as the -classpath) to the worker which then will start the worker thread.
This results in the browser thread (i.e. the whole site) not hanging (too long).

@httpdigest
Copy link
Author

@juj Will pthread_getattr_np make it out of pthread_stub into the real runtime?
Currently, I have both GNU Classpath and more importantly JamVM compiled via Emscripten without errors! :D
And I was making sure that all runtime functions used by JamVM are supported when I found that pthread_getattr_np is not.
Also, emcc is warning me about putting a volatile unsigned int * as the first argument to emscripten_atomic_cas_u32, which ought to be a void * (according to the function's prototype, which also has its expected type uint32_t* commented out... why?). Does this matter, or should the first parameter type be actually unsigned int* (because of the u32) and also volatile?

@juj
Copy link
Collaborator

juj commented Apr 27, 2015

Nice!

I added the function pthread_getattr_np now at juj@b05bdf8 . It was not part of the official pthreads standard so there were no tests for that in the test suite and it was left out in the initial implementation. Note though that there's a difference between native and Emscripten that native platforms typically have the stack growing downwards, whereas in Emscripten stack grows up, so if you have code that is fixed to assume that stack grows downwards, then it will probably fail unless adjusted for Emscripten convention. Again when updating the branch, you'll need to issue emcc --clear-cache or otherwise Emscripten won't rebuild the newly added function but will reuse the old libc from cache.

Regarding the signature in emscripten_atomic_cas_u32 et al., I figured that it would be best to use untyped pointers, since everything (non-volatile) is implicitly castable to void*, and the uint32_t* was left there for documentation purposes. The volatile keyword has some possibly annoying behavior, with respect to nonthreaded operation (it can generate excessive atomic ops), so that's why it is just void* instead of volatile void*. Casting whatever you shove in there to just (void*) is ok, as long as the datatype you are passing in is 32-bit and aligned.

@httpdigest
Copy link
Author

Thank you for implementing that!
And for the good hint with stack grows upwards towards higher addresses with Emscripten, 'cause JamVM does indeed assume that the stack grows downwards within its JNI implementation when pushing arguments on the stack and retrieving the the retval.
I'll adjust that with #ifdef EMSCRIPTEN guards.

@juj
Copy link
Collaborator

juj commented Apr 27, 2015

Oh, I recommend using #ifdef __EMSCRIPTEN__ instead of #ifdef EMSCRIPTEN, since the one without underscores is old legacy name for the detection (although both do work at present).

@thegodone
Copy link

Do you manage to do it ? Can we think of java to javasscript ?

@MichaelBalazs
Copy link

@httpdigest What was the outcome of this work? Very interested in if you have a JVM running with emscripten.

@vilie
Copy link

vilie commented Jan 29, 2016

I've also tried to do what @httpdigest did (after I unsuccessfully tried to compile openjdk) and needed to make similar changes to the ones he made (to jamvm and classpath).

https://github.com/vilie/javify (work in progress, cannot run a .class file in it yet)

Other than that, I do not understand why I always get

warning: unresolved symbol: pthread_attr_getstacksize
warning: unresolved symbol: pthread_attr_setstacksize

They seem to be defined in emscripten. Any thoughts about that?

@kripken
Copy link
Member

kripken commented Jan 29, 2016

We don't support those because we can't control the native stack size - it's an unobservable property of the JS engine. But perhaps the codebase works with those calls removed?

@juj
Copy link
Collaborator

juj commented Feb 1, 2016

These two functions actually only control getters and setters for the "thread creation attributes" struct that controls the behavior of how threads are spawned, so we could support them even in non-threaded builds. (after a thread is spawned, the pthread_attr_t struct can be discarded). Although it might be a bit silly, since why would any code manipulate a structure that is used for creating threads when threads can't be created anyways.

If you build with pthreads enabled with -s USE_PTHREADS=1 or =2 (pass to both compile and link stages!), then these two functions do exist are fully supported. Otherwise I recommend gating on the preprocessor defined #ifdef __EMSCRIPTEN_PTHREADS__ to conditionally exclude out code that should not be compiled when Emscripten multithreading is not enabled at build time.

@calvinrsmith
Copy link

@httpdigest @vilie What is the current status on this?
I followed the readme as listed on: https://github.com/vilie/javify and when I issue the link command I get:
warning: unresolved symbol: pthread_kill
warning: unresolved symbol: pthread_sigmask
warning: unresolved symbol: sigwait
warning: unresolved symbol: get_nprocs
warning: unresolved symbol: sigsetjmp
warning: unresolved symbol: sigsuspend

However, it does generate a .js file which does run! and gives me usage and version information.
Any attempt to run a class gives me:
Exception occurred while VM initialising.
java/lang/NoClassDefFoundError: jamvm/java/lang/VMClassLoaderData

I have tried various ways to get past this without luck.

@calvinrsmith
Copy link

I just noticed the internal buffer is not shared
Instead of:
var buffer = new SharedArrayBuffer(bufferLength);
I see:
buffer = new ArrayBuffer(TOTAL_MEMORY);

So, perhaps I do not have pthreads correct?

@vilie
Copy link

vilie commented Apr 11, 2016

Hello,

I suspect the java file and the classpath.zip need to be preloaded by using --preload-file - see ./experiments/IO

Feel free to open a bug there (I may have time to look at this issue in the following weeks).

@mwarrenus
Copy link

@httpdigest DoppioJVM is "a Java virtual machine written in 100% JavaScript". The research project was published in 2014 and continues evolving.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants