Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build MESS/MAME #131

Closed
ziz opened this issue Nov 30, 2011 · 153 comments
Closed

Build MESS/MAME #131

ziz opened this issue Nov 30, 2011 · 153 comments

Comments

@ziz
Copy link

ziz commented Nov 30, 2011

dynamic_cast can get into an infinite loop (encountered when compiling the MAME / MESS source). Per discussion, "the current implementation is missing some stuff like multiple inheritance etc., so maybe more basic stuff needs to be done first."

The dynamic_cast call arguments can be found in llvm's tools/clang/lib/CodeGen/CGExprCXX.cpp around line 1535 (using SVN rev 139974 of clang 3.0).

(Apologies for the pretty minimal issue content; I don't have anything resembling a minimal reproduction case to offer yet.)

@kripken
Copy link
Member

kripken commented Dec 2, 2011

If you can provide the relevant C++ source code that is compiled into the problem, that would also be very helpful. Without that, it's hard to know what to focus on here.

@ziz
Copy link
Author

ziz commented Dec 2, 2011

My attempt at a minimal case was not successful in reproducing the bug: http://batcave.textfiles.com/ziz/staging/casttest.zip

As requested, here's the .js and .bc files:

http://batcave.textfiles.com/ziz/staging/messtiny-10.js.gz
http://batcave.textfiles.com/ziz/staging/messtiny-10.bc.gz

and the source code we're compiling is found in https://github.com/ziz/jsmess/tree/no_cothreads

So far as I can tell, the infinite loop happens during execution of a dynamic_cast call which runs as part of src/emu/clifront.c cli_frontend::listmedia, line 698.

Compiling fully from source requires a second, clean copy of the MESS source to build the external tools the build process depends on:

cd mess-orig; cp -rp src/osd/osdmini src/mess/osd; make TARGET=mess SUBTARGET=tiny

then build the MESS project:

make clean; find . -name \*.a.bc -o -name \*.o -delete; make clean; make TARGET=mess SUBTARGET=tiny
# Previous command fails because it can't run the external tools, copy them in
cp ../mess-orig/obj/osdmini/messtiny64/build/* obj/osdmini/messtiny/build/
# and finish the build
make TARGET=mess SUBTARGET=tiny

Then link with emscripten (since the build process tries to link .a instead of .a.bc, replace all .a with .a.bc in the link command):

/home/ziz/Dev/llvm-3.0-release/Release/bin/llvm-ld -disable-opt obj/osdmini/messtiny/version.o obj/osdmini/messtiny/drivlist.o obj/osdmini/messtiny/emu/drivers/emudummy.o obj/osdmini/messtiny/mess/drivers/coleco.o obj/osdmini/messtiny/mess/machine/coleco.o obj/osdmini/messtiny/libosd.a.bc obj/osdmini/messtiny/libcpu.a.bc obj/osdmini/messtiny/libemu.a.bc obj/osdmini/messtiny/libdasm.a.bc obj/osdmini/messtiny/libsound.a.bc obj/osdmini/messtiny/libutil.a.bc obj/osdmini/messtiny/libexpat.a.bc obj/osdmini/messtiny/libsoftfloat.a.bc obj/osdmini/messtiny/libformats.a.bc obj/osdmini/messtiny/libz.a.bc obj/osdmini/messtiny/libocore.a.bc -o=messtiny

and then emscripten:

../emscripten/emscripten.py messtiny.bc -o messtiny.js

Run with the argument '-listmedia'.

@kripken
Copy link
Member

kripken commented Dec 4, 2011

I'll look at this very soon, I just need to finish a few more compiler optimizations to speed up building large files (which will help with compiling and debugging code like this).

@kripken
Copy link
Member

kripken commented Dec 5, 2011

The compressed js file linked here doesn't seem to extract to a js file properly. I built my own though.

Just trying to run it, I ran into a few problems:

  1. spidermonkey and node fail on OOM.
  2. v8 shell gets farther, but fails on lack of bsearch. bsearch was implemented in pull request 114, however the submitter there was not willing to license the code in a way that we can use. So I guess we would need to reimplement that function and the other ones there from scratch.

We should really try to reduce the size of the compiled code - hopefully we don't need everything currently built. The compiled JS is 50MB, which is too much for node and spidermonkey. d8 can run it, but d8 has other limitations like poor support for typed arrays which will greatly limit our ability to test (we will likely need typed arrays for speed).

But the real blocker now is to re-implement the necessary functions mentioned before in a way that can actually be used by us.

@ziz
Copy link
Author

ziz commented Dec 22, 2011

I have updated the jsmess makefile to use emcc and friends, and compiled with the latest emscripten. The JS is only 30M now, and compiles in less than half the time it used to - grats on the improvements!

There appears to be a minor bug in parseTools.js, preventing out-of-the-box compilation here: the 64-bit basic integer ops switch (around line 1660) is missing srem (compared to the 32-bit basic integer ops).

I may have recompiled spidermonkey with an increased functions limit to get around the OOM before. Pending a real fix, this is certainly an option.

Here's a new .bc and .js: http://batcave.textfiles.com/ziz/staging/messtiny-11.tgz

@kripken
Copy link
Member

kripken commented Dec 22, 2011

How did you work around the lack of srem?

The problem is that 64-bit math isn't really possible in JS. We can add srem, but it might fail if values too big to fit in a double are used. Perhaps there is a way to make jsmess use 32-bit values over 64-bit ints?

@ziz
Copy link
Author

ziz commented Dec 22, 2011

I applied the wrong-but-obvious solution: I just added srem to that case statement, which caused it to compile.

There may be an obvious solution that I'm missing to force 32-bit compilation for jsmess on OS X or other 64-bit platforms (and, so far as I can tell, this won't cause any problems if we do manage to force 32-bit compilation).

@kripken
Copy link
Member

kripken commented Dec 26, 2011

I'm having trouble building this .ll file, not sure why. What version of LLVM did you use to build it?

@ziz
Copy link
Author

ziz commented Dec 26, 2011

LLVM 3.0 (svn revision 139974) and clang 3.0 (svn revision 139974), which were the revisions I believe you reported you were using in a previous discussion on IRC. If you're using a different version, I'm happy to switch.

@kripken
Copy link
Member

kripken commented Dec 26, 2011

3.0 should be fine.

Another question, I get warnings about different targets (os x vs linux). Can you perhaps run the automatic test suite, to see if there are any problems on os x? (Emscripten has not been heavily tested there I am afraid.) The command to run them is python tests/runner.py, it should take several hours.

@ziz
Copy link
Author

ziz commented Dec 30, 2011

Sure, I'll run the test suite. Is this the first time the different targets problem has shown up? I don't recall it being an issue before.

@kripken
Copy link
Member

kripken commented Dec 30, 2011

We do have people using OS X, and someone said things were working on Windows. But only Linux gets full test coverage all the time (because I run Linux, basically, no special reason).

Specifically for here, I worry that clang will implement dynamic cast differently on different platforms, and that that might be what is tripping us up here. A problem like that might have gone unnoticed even though we have some people using OS X for some projects.

@ziz
Copy link
Author

ziz commented Jan 2, 2012

Test output: https://gist.github.com/87ad7200ca6f708e6a73

Failures specifically related to closure (Invalid or corrupt jarfile /usr/local/bin/closure) are because I had the config file pointed to a closure compiler wrapper .sh rather than the closure .jar. I can rerun those specific tests if needed.

@kripken
Copy link
Member

kripken commented Jan 3, 2012

A lot of those errors will, I suspect, be fixed by pull 154. And I am hoping some will be fixed by a headers correctness fix we landed yesterday. But if not, then those errors look very dangerous, and could possibly explain the jsmess infinite loop. Let's focus on the shortest one, pystruct. Can you please run

EMCC_DEBUG=1 python tests/runner.py test_pystruct

and gist the results? There will be an .ll and .js file in /tmp/emscripten_tmp (assuming /tmp is what TEMP_DIR is set to in ~/.emscipten).

edit: and please make sure you pull the latest code, since the headers fix was just yesterday

@ziz
Copy link
Author

ziz commented Jan 3, 2012

Pulled to f6e8383 and ran test_pystruct.

Output:

https://gist.github.com/e81dba0c90b5589e0f79

Resulting temp files:

https://gist.github.com/93358b0ae60cb27fb293

@kripken
Copy link
Member

kripken commented Jan 3, 2012

Thanks!

Ok, it looks like we need to be more assertive in telling clang to generate platform-independent code. Long-term we will want to have a formal "emscripten" llvm target, but until then, let's try to use 32-bit linux as the uniform target. I pushed this to the incoming branch as 4cab9f5. Can you test it and see if it fixes pystruct?

@ziz
Copy link
Author

ziz commented Jan 3, 2012

Success! The incoming branch passes pystruct:

https://gist.github.com/4e286d64ccfd24e5af82

@kripken
Copy link
Member

kripken commented Jan 3, 2012

Great! :)

There's a chance it will fix the other tests too (although we might still need the bitcode fix in pull #154). Can you run the rest as well?

@ziz
Copy link
Author

ziz commented Jan 3, 2012

Some of the other tests are fixed; test_emcc is not apparently available in the incoming branch, but of the others, these now pass:

test_cubescript
test_files
test_libcxx
test_pystruct

and these continue to fail:

test_freetype
test_lua
test_openjpeg
test_poppler
test_python
test_thebullet
test_zlib

test/runner.py output here:

https://gist.github.com/eaad86c149e5c81ca26e

@kripken
Copy link
Member

kripken commented Jan 3, 2012

That pull has just been merged to incoming, so hopefully all tests will now pass if you pull the latest incoming. (Note that it isn't in master yet, waiting on automatic tests.)

@kripken
Copy link
Member

kripken commented Jan 3, 2012

test_emcc is in "other", so to run it separately you need python tests/runner.py other.test_emcc. But if you run the whole suite, it should be run.

@kripken
Copy link
Member

kripken commented Jan 3, 2012

freetype, poppler, openjpeg and bullet should hopefully be fixed with that pull. The others do not look like they will be fixed by it.

Can you gist the output from EMCC_DEBUG=1 python tests/runner.py test_zlib?

@ziz
Copy link
Author

ziz commented Jan 3, 2012

I'm just running the tests that failed at the moment, to avoid the 2.5-hour test run. I'll rerun the whole suite when we've finished poking at individual tests, of course.

Looks like we're still failing on the collection of tests listed in my last message.

other.test_emcc is also failing; here's the EMCC_DEBUG=1 output.

https://gist.github.com/22eda2059ac6d3af2b7a

@kripken
Copy link
Member

kripken commented Jan 3, 2012

On my machine it takes more than twice that, heh, I usually leave it to run overnight ;)

Based on the stack trace, on that part of the emcc test it is trying to run lli (the LLVM interpreter) on a bitcode file. Can you put up the generated bitcode files in that directory? (suffix .o and .bc)

@ziz
Copy link
Author

ziz commented Jan 3, 2012

Sure thing, here's the output from EMCC_DEBUG=1 python tests/runner.py test_zlib: http://batcave.textfiles.com/ziz/staging/test_zlib.tar.gz (it's too large for gist)

@ziz
Copy link
Author

ziz commented Jan 3, 2012

http://batcave.textfiles.com/ziz/staging/test_emcc-bitcode.tgz is the generated bitcode from EM_SAVE_DIR=1 python tests/runner.py other.test_emcc

@kripken
Copy link
Member

kripken commented Jan 4, 2012

The generated bitcode files work fine here. Do they work for you when you run them manually? lli hello_world.o

I am baffled by the zlib failure. The output is quite different, despite our using the same LLVM and Clang (3.0), and the same target.

@ziz
Copy link
Author

ziz commented Jan 4, 2012

Nope, the .o fails when I run it manually with lli. I see 'Illegal instruction: 4' in the terminal, and the following crash report comes up: https://gist.github.com/e1d98bb52adae5779449

@kripken
Copy link
Member

kripken commented Jan 4, 2012

Do both the normal and the "cleaned" .o files crash in that way? (the cleaned version removes debug info, which used to crash lli in the past)

Let's try to use exactly the same LLVM version, because I think we'll need to file an LLVM bug or post to their mailing list soon, not sure what else to do. I'm rebuilding LLVM 3.0 release from source now.

@ziz
Copy link
Author

ziz commented Jan 4, 2012

Yep, they both crash in the same way (including extremely similar crash reports - I haven't diffed them yet, though).

Just let me know what you need me to do. You're changing to make sure you're at LLVM 3.0 (svn revision 139974) and clang 3.0 (svn revision 139974), the versions I'm currently using - is that correct? Or should I switch LLVM / clang revs?

@kripken
Copy link
Member

kripken commented Apr 21, 2012

That patch fixes the color masks, previously we had them wrong. So it isn't obvious to me what is going on here. Can you make a testcase I can debug? Then we can also add the testcase to the test suite to prevent future regressions.

@DopefishJustin
Copy link
Contributor

So with current emscripten the compiled js bombs out in Chrome with the following error in the error console:

Uncaught TypeError: Object #<CanvasRenderingContext2D> has no method 'createBuffer'

I guess it is trying to init something GL-related even though WebGL is not enabled; I guess emscripten should detect that more gracefully? The actual video output is not going through GL yet.

@kripken
Copy link
Member

kripken commented May 28, 2012

What function is being called, and from where? If a GL function is called, that sounds like a bug in the C++ code. Unless it only happens in Chrome and not Firefox, in which case I would suspect a browser bug. But this might become clearer if you show me the relevant code.

@DopefishJustin
Copy link
Contributor

Whoops the markup ate part of that which might make it clearer.

@DopefishJustin
Copy link
Contributor

So the emscripten-generated code contains this:

createContext:function (canvas, useWebGL, setInModule) { try { var ctx = canvas.getContext(useWebGL ? 'experimental-webgl' : '2d');

And then later on there is code like this in GLEmulation.init():

this.vertexObject = Module.ctx.createBuffer();

Which apparently does not exist if the canvas is set up as 2D.

useWebGL is set in makeSurface:

// Decide if we want to use WebGL or not var useWebGL = (flags & 0x04000000) != 0; // SDL_OPENGL

So I am assuming what is happening is useWebGL is coming out false for whatever reason and then the C++ is calling GL's init() somewhere, even though the OpenGL output option has not been enabled on the MESS command line. I haven't actually debugged it though.

I agree that ideally the C++ should not be calling GL stuff if the surface isn't set up for it (if that is indeed the problem), but this is also a pretty lame way to fail. If the createBuffer() method is not going to always exist then there should be some kind of type check or exception handling for the case where it doesn't, with an appropriate error message, and ideally allowing execution to continue (with, obviously, no GL output).

@kripken
Copy link
Member

kripken commented May 30, 2012

Hmm, the question is why GLEmulation is included in the first place. There are a few functions which depend on it,

glVertexPointer
glMatrixMode
SDL_GL_GetProcAddress

Is one of those used in MESS? It seems like they should only appear in a build that uses GL. But perhaps they are included but not used, and we need to add a workaround for this?

@DopefishJustin
Copy link
Contributor

Yep all three of those are in MESS. Whether MESS uses GL or not is not a build-time option, just a command-line parameter:

mess -video soft (the default)

vs.

mess -video opengl

@kripken
Copy link
Member

kripken commented May 31, 2012

But you are building without opengl as a build-time option, I assume? then why do those commands end up in the output binary?

@DopefishJustin
Copy link
Contributor

You assume wrongly. However now that I check there is such an option which may be enough to get by for now. But surely MESS is not the only software on earth with run-time video output selection?

@kripken
Copy link
Member

kripken commented May 31, 2012

Not the only one, but the first to be compiled with emscripten I guess ;) ok, then we need runtime checks for this, I will add that.

@DopefishJustin
Copy link
Contributor

Thanks.

Just tried with emscripten incoming and now I can't link - I see emld was removed so I replaced it with emcc in the makefile but now this happens:

/home/jkerk/emscripten/emcc -Wl,--warn-common -s obj/sdl/messtiny/build/file2str.o obj/sdl/messtiny/libocore.a -lm sdl-config --libs pkg-config --libs fontconfig -lSDL_ttf -lutil -o obj/sdl/messtiny/build/file2str JAVA not defined in ~/.emscripten, using "java" Traceback (most recent call last): File "/home/jkerk/emscripten/emcc", line 551, in <module> assert '=' in newargs[i+1], 'Incorrect syntax for -s (use -s OPT=VAL): ' + newargs[i+1] AssertionError: Incorrect syntax for -s (use -s OPT=VAL): obj/sdl/messtiny/build/file2str.o

@kripken
Copy link
Member

kripken commented May 31, 2012

The first issue should be fixed in incoming.

@kripken
Copy link
Member

kripken commented May 31, 2012

The second should be fixed on incoming as well (it was fallout from LLVM deprecating llvm-ld).

@DopefishJustin
Copy link
Contributor

When I try to compile now the resulting .js is missing important functions like _main() and __ZN7astring4initEv(). This is before closure has been run on it.

.bc: http://interbutt.com/temp/messtiny-20120622.bc.zip

Command line:
emcc messtiny.bc -o messtiny.js --post-js post.js --embed-file roms/coleco.zip --embed-file cosmofighter2.zip

post.js: https://raw.github.com/ziz/jsmess/no_cothreads/post.js

Maybe they are missing from the .bc file somehow but I don't know how to check.

@kripken
Copy link
Member

kripken commented Jun 24, 2012

LLVM's llvm-nm tool can tell you what symbols are in a .bc file. Is main there?

@DopefishJustin
Copy link
Contributor

It's not there. It is in obj/sdl/messtiny/osd/sdl/sdlmain.o and also obj/sdl/messtiny/libosd.a but disappears after the final link:

/home/jkerk/emscripten/emcc -Wl,--warn-common -s obj/sdl/messtiny/version.o obj/sdl/messtiny/drivlist.o obj/sdl/messtiny/emu/drivers/emudummy.o obj/sdl/messtiny/mess/drivers/coleco.o obj/sdl/messtiny/mess/machine/coleco.o obj/sdl/messtiny/libosd.a obj/sdl/messtiny/libcpu.a obj/sdl/messtiny/libemu.a obj/sdl/messtiny/libdasm.a obj/sdl/messtiny/libsound.a obj/sdl/messtiny/libutil.a obj/sdl/messtiny/libexpat.a obj/sdl/messtiny/libsoftfloat.a obj/sdl/messtiny/libformats.a obj/sdl/messtiny/libz.a obj/sdl/messtiny/libocore.a -lm sdl-config --libs pkg-config --libs fontconfig -lSDL_ttf -lutil -o messtiny

I am getting a lot of /usr/bin/llvm-dis: Invalid bitcode signature at link time which may be related (llvm-dis likes sdlmain.o but not libosd.a, but llvm-nm works on both).

I'm using llvm 3.1.

@kripken
Copy link
Member

kripken commented Jun 24, 2012

Perhaps LLVM 3.1 changed linking semantics somehow, very odd. Can you send me the relevant bitcode files (before the link that removes main) so I can try to reproduce myself? (please try to narrow it down as much as possible)

@DopefishJustin
Copy link
Contributor

Here's a smaller example with the testkeys utility: http://interbutt.com/temp/testkeys-linkerror.zip

Running the command line in maketestkeys results in an output file which is missing _Z15utf8_from_ucharPcjj(), which appears to be present in libutil.a. (Compiling the output to a .html with emcc, then opening it in a browser and pressing a key trips the missing function.)

@kripken
Copy link
Member

kripken commented Jun 26, 2012

The problem is the attempt to link in native x86 binaries. You need to remove those, they can't be compiled into JS. Normally this is not too much of a problem, we detect them and ignore them - it just makes your builds slower. But in this case here, you have

emcc  -Wl,--warn-common -s testkeys.o libutil.a libocore.a -lm `sdl-config --libs` `pkg-config --libs fontconfig` -lSDL_ttf -lutil -o testkeys

Note there is both libutil.a - a bitcode file - and -lutil - a request for a system library to be linked. This is what mixes up the compiler.

Replace that line with

emcc  -Wl,--warn-common -s testkeys.o libutil.a libocore.a -o testkeys

that is, remove all requests for system libraries - and it will work.

@kripken
Copy link
Member

kripken commented Jun 26, 2012

With that said, emcc should still not get confused even though there are irrelevant x86 libraries. I pushed a possible fix for that, it might help here. But I still recommend removing native libraries, even if they do not make the build fail they make it slower.

@DopefishJustin
Copy link
Contributor

Thanks, that fixes the testkeys example and restores __ZN7astring4initEv() to MESS, _main() is still missing though so some more poking is in order.

@DopefishJustin
Copy link
Contributor

Well my poking time has been pretty limited so here is something reproducible anyway:

http://interbutt.com/temp/mess-linkerror.zip

$ llvm-nm libosd.a | grep main

         T main
         d _ZL13main_threadid

$ emcc -s version.o drivlist.o emudummy.o coleco_driver.o coleco_machine.o libosd.a libcpu.a libemu.a libdasm.a libsound.a libutil.a libexpat.a libsoftfloat.a libformats.a libz.a libocore.a -o messtiny.bc

$ llvm-nm messtiny.bc | grep main

         T _Z16jsmess_main_loopv
         T _Z20jsmess_set_main_loopR16device_scheduler
         d _ZL13mnemonic_main
         t _ZL18menu_main_populateR15running_machineP7ui_menuPv
         t _ZL9menu_mainR15running_machineP7ui_menuPvS3_
         T _ZNK24device_execute_interface16cycles_remainingEv
         T _ZNK9emu_timer9remainingEv
         U emscripten_set_main_loop

No main(). Actually lots(all?) stuff from libosd.a appears not to make it in (e.g. _Z13sdlinput_initR15running_machine()) so maybe something is going wrong with that file.

@kripken
Copy link
Member

kripken commented Jul 23, 2012

Ok, looks like what happens here is that main() is in an archive file and not a normal object. We were only looking for explicit undefined symbols in archives so we missed this. This is fixed in incoming. With this fix, I see main as well as _Z13sdlinput_initR15running_machine etc.

@DopefishJustin
Copy link
Contributor

Link is good now (thanks!)

Getting this when I try to run it in Chrome:

Uncaught TypeError: Object #<CanvasRenderingContext2D> has no method 'getExtension'

@kripken
Copy link
Member

kripken commented Jul 25, 2012

Ok, that should be fixed on incoming now, and I added a test.

@DopefishJustin
Copy link
Contributor

MESS works again now!

Can probably close this issue at this point and file anything new that comes up separately.

@kripken
Copy link
Member

kripken commented Jul 26, 2012

Cool. This page was getting long to scroll down on ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants