New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build MESS/MAME #131
Comments
If you can provide the relevant C++ source code that is compiled into the problem, that would also be very helpful. Without that, it's hard to know what to focus on here. |
My attempt at a minimal case was not successful in reproducing the bug: http://batcave.textfiles.com/ziz/staging/casttest.zip As requested, here's the .js and .bc files: http://batcave.textfiles.com/ziz/staging/messtiny-10.js.gz and the source code we're compiling is found in https://github.com/ziz/jsmess/tree/no_cothreads So far as I can tell, the infinite loop happens during execution of a dynamic_cast call which runs as part of src/emu/clifront.c cli_frontend::listmedia, line 698. Compiling fully from source requires a second, clean copy of the MESS source to build the external tools the build process depends on:
then build the MESS project:
Then link with emscripten (since the build process tries to link .a instead of .a.bc, replace all .a with .a.bc in the link command):
and then emscripten:
Run with the argument '-listmedia'. |
I'll look at this very soon, I just need to finish a few more compiler optimizations to speed up building large files (which will help with compiling and debugging code like this). |
The compressed js file linked here doesn't seem to extract to a js file properly. I built my own though. Just trying to run it, I ran into a few problems:
We should really try to reduce the size of the compiled code - hopefully we don't need everything currently built. The compiled JS is 50MB, which is too much for node and spidermonkey. d8 can run it, but d8 has other limitations like poor support for typed arrays which will greatly limit our ability to test (we will likely need typed arrays for speed). But the real blocker now is to re-implement the necessary functions mentioned before in a way that can actually be used by us. |
I have updated the jsmess makefile to use emcc and friends, and compiled with the latest emscripten. The JS is only 30M now, and compiles in less than half the time it used to - grats on the improvements! There appears to be a minor bug in parseTools.js, preventing out-of-the-box compilation here: the 64-bit basic integer ops switch (around line 1660) is missing srem (compared to the 32-bit basic integer ops). I may have recompiled spidermonkey with an increased functions limit to get around the OOM before. Pending a real fix, this is certainly an option. Here's a new .bc and .js: http://batcave.textfiles.com/ziz/staging/messtiny-11.tgz |
How did you work around the lack of srem? The problem is that 64-bit math isn't really possible in JS. We can add srem, but it might fail if values too big to fit in a double are used. Perhaps there is a way to make jsmess use 32-bit values over 64-bit ints? |
I applied the wrong-but-obvious solution: I just added srem to that case statement, which caused it to compile. There may be an obvious solution that I'm missing to force 32-bit compilation for jsmess on OS X or other 64-bit platforms (and, so far as I can tell, this won't cause any problems if we do manage to force 32-bit compilation). |
I'm having trouble building this .ll file, not sure why. What version of LLVM did you use to build it? |
LLVM 3.0 (svn revision 139974) and clang 3.0 (svn revision 139974), which were the revisions I believe you reported you were using in a previous discussion on IRC. If you're using a different version, I'm happy to switch. |
3.0 should be fine. Another question, I get warnings about different targets (os x vs linux). Can you perhaps run the automatic test suite, to see if there are any problems on os x? (Emscripten has not been heavily tested there I am afraid.) The command to run them is |
Sure, I'll run the test suite. Is this the first time the different targets problem has shown up? I don't recall it being an issue before. |
We do have people using OS X, and someone said things were working on Windows. But only Linux gets full test coverage all the time (because I run Linux, basically, no special reason). Specifically for here, I worry that clang will implement dynamic cast differently on different platforms, and that that might be what is tripping us up here. A problem like that might have gone unnoticed even though we have some people using OS X for some projects. |
Test output: https://gist.github.com/87ad7200ca6f708e6a73 Failures specifically related to closure ( |
A lot of those errors will, I suspect, be fixed by pull 154. And I am hoping some will be fixed by a headers correctness fix we landed yesterday. But if not, then those errors look very dangerous, and could possibly explain the jsmess infinite loop. Let's focus on the shortest one, pystruct. Can you please run
and gist the results? There will be an .ll and .js file in /tmp/emscripten_tmp (assuming /tmp is what TEMP_DIR is set to in ~/.emscipten). edit: and please make sure you pull the latest code, since the headers fix was just yesterday |
Pulled to f6e8383 and ran test_pystruct. Output: https://gist.github.com/e81dba0c90b5589e0f79 Resulting temp files: |
Thanks! Ok, it looks like we need to be more assertive in telling clang to generate platform-independent code. Long-term we will want to have a formal "emscripten" llvm target, but until then, let's try to use 32-bit linux as the uniform target. I pushed this to the incoming branch as 4cab9f5. Can you test it and see if it fixes pystruct? |
Success! The incoming branch passes pystruct: |
Great! :) There's a chance it will fix the other tests too (although we might still need the bitcode fix in pull #154). Can you run the rest as well? |
Some of the other tests are fixed; test_emcc is not apparently available in the incoming branch, but of the others, these now pass: test_cubescript and these continue to fail: test_freetype test/runner.py output here: |
That pull has just been merged to incoming, so hopefully all tests will now pass if you pull the latest incoming. (Note that it isn't in master yet, waiting on automatic tests.) |
test_emcc is in "other", so to run it separately you need |
freetype, poppler, openjpeg and bullet should hopefully be fixed with that pull. The others do not look like they will be fixed by it. Can you gist the output from |
I'm just running the tests that failed at the moment, to avoid the 2.5-hour test run. I'll rerun the whole suite when we've finished poking at individual tests, of course. Looks like we're still failing on the collection of tests listed in my last message. other.test_emcc is also failing; here's the EMCC_DEBUG=1 output. |
On my machine it takes more than twice that, heh, I usually leave it to run overnight ;) Based on the stack trace, on that part of the emcc test it is trying to run lli (the LLVM interpreter) on a bitcode file. Can you put up the generated bitcode files in that directory? (suffix .o and .bc) |
Sure thing, here's the output from EMCC_DEBUG=1 python tests/runner.py test_zlib: http://batcave.textfiles.com/ziz/staging/test_zlib.tar.gz (it's too large for gist) |
http://batcave.textfiles.com/ziz/staging/test_emcc-bitcode.tgz is the generated bitcode from EM_SAVE_DIR=1 python tests/runner.py other.test_emcc |
The generated bitcode files work fine here. Do they work for you when you run them manually? I am baffled by the zlib failure. The output is quite different, despite our using the same LLVM and Clang (3.0), and the same target. |
Nope, the .o fails when I run it manually with lli. I see 'Illegal instruction: 4' in the terminal, and the following crash report comes up: https://gist.github.com/e1d98bb52adae5779449 |
Do both the normal and the "cleaned" .o files crash in that way? (the cleaned version removes debug info, which used to crash lli in the past) Let's try to use exactly the same LLVM version, because I think we'll need to file an LLVM bug or post to their mailing list soon, not sure what else to do. I'm rebuilding LLVM 3.0 release from source now. |
Yep, they both crash in the same way (including extremely similar crash reports - I haven't diffed them yet, though). Just let me know what you need me to do. You're changing to make sure you're at LLVM 3.0 (svn revision 139974) and clang 3.0 (svn revision 139974), the versions I'm currently using - is that correct? Or should I switch LLVM / clang revs? |
That patch fixes the color masks, previously we had them wrong. So it isn't obvious to me what is going on here. Can you make a testcase I can debug? Then we can also add the testcase to the test suite to prevent future regressions. |
So with current emscripten the compiled js bombs out in Chrome with the following error in the error console:
I guess it is trying to init something GL-related even though WebGL is not enabled; I guess emscripten should detect that more gracefully? The actual video output is not going through GL yet. |
What function is being called, and from where? If a GL function is called, that sounds like a bug in the C++ code. Unless it only happens in Chrome and not Firefox, in which case I would suspect a browser bug. But this might become clearer if you show me the relevant code. |
Whoops the markup ate part of that which might make it clearer. |
So the emscripten-generated code contains this:
And then later on there is code like this in GLEmulation.init():
Which apparently does not exist if the canvas is set up as 2D. useWebGL is set in makeSurface:
So I am assuming what is happening is useWebGL is coming out false for whatever reason and then the C++ is calling GL's init() somewhere, even though the OpenGL output option has not been enabled on the MESS command line. I haven't actually debugged it though. I agree that ideally the C++ should not be calling GL stuff if the surface isn't set up for it (if that is indeed the problem), but this is also a pretty lame way to fail. If the createBuffer() method is not going to always exist then there should be some kind of type check or exception handling for the case where it doesn't, with an appropriate error message, and ideally allowing execution to continue (with, obviously, no GL output). |
Hmm, the question is why GLEmulation is included in the first place. There are a few functions which depend on it,
Is one of those used in MESS? It seems like they should only appear in a build that uses GL. But perhaps they are included but not used, and we need to add a workaround for this? |
Yep all three of those are in MESS. Whether MESS uses GL or not is not a build-time option, just a command-line parameter:
vs.
|
But you are building without opengl as a build-time option, I assume? then why do those commands end up in the output binary? |
You assume wrongly. However now that I check there is such an option which may be enough to get by for now. But surely MESS is not the only software on earth with run-time video output selection? |
Not the only one, but the first to be compiled with emscripten I guess ;) ok, then we need runtime checks for this, I will add that. |
Thanks. Just tried with emscripten incoming and now I can't link - I see
|
The first issue should be fixed in incoming. |
The second should be fixed on incoming as well (it was fallout from LLVM deprecating llvm-ld). |
When I try to compile now the resulting .js is missing important functions like _main() and __ZN7astring4initEv(). This is before closure has been run on it. .bc: http://interbutt.com/temp/messtiny-20120622.bc.zip Command line: post.js: https://raw.github.com/ziz/jsmess/no_cothreads/post.js Maybe they are missing from the .bc file somehow but I don't know how to check. |
LLVM's llvm-nm tool can tell you what symbols are in a .bc file. Is main there? |
It's not there. It is in obj/sdl/messtiny/osd/sdl/sdlmain.o and also obj/sdl/messtiny/libosd.a but disappears after the final link:
I am getting a lot of I'm using llvm 3.1. |
Perhaps LLVM 3.1 changed linking semantics somehow, very odd. Can you send me the relevant bitcode files (before the link that removes main) so I can try to reproduce myself? (please try to narrow it down as much as possible) |
Here's a smaller example with the Running the command line in |
The problem is the attempt to link in native x86 binaries. You need to remove those, they can't be compiled into JS. Normally this is not too much of a problem, we detect them and ignore them - it just makes your builds slower. But in this case here, you have
Note there is both Replace that line with
that is, remove all requests for system libraries - and it will work. |
With that said, emcc should still not get confused even though there are irrelevant x86 libraries. I pushed a possible fix for that, it might help here. But I still recommend removing native libraries, even if they do not make the build fail they make it slower. |
Thanks, that fixes the testkeys example and restores __ZN7astring4initEv() to MESS, _main() is still missing though so some more poking is in order. |
Well my poking time has been pretty limited so here is something reproducible anyway: http://interbutt.com/temp/mess-linkerror.zip
No main(). Actually lots(all?) stuff from libosd.a appears not to make it in (e.g. _Z13sdlinput_initR15running_machine()) so maybe something is going wrong with that file. |
Ok, looks like what happens here is that |
Link is good now (thanks!) Getting this when I try to run it in Chrome:
|
Ok, that should be fixed on incoming now, and I added a test. |
MESS works again now! Can probably close this issue at this point and file anything new that comes up separately. |
Cool. This page was getting long to scroll down on ;) |
dynamic_cast can get into an infinite loop (encountered when compiling the MAME / MESS source). Per discussion, "the current implementation is missing some stuff like multiple inheritance etc., so maybe more basic stuff needs to be done first."
The dynamic_cast call arguments can be found in llvm's tools/clang/lib/CodeGen/CGExprCXX.cpp around line 1535 (using SVN rev 139974 of clang 3.0).
(Apologies for the pretty minimal issue content; I don't have anything resembling a minimal reproduction case to offer yet.)
The text was updated successfully, but these errors were encountered: