Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linker eating functions in JSMESS again #2619

Closed
DopefishJustin opened this issue Aug 3, 2014 · 53 comments
Closed

Linker eating functions in JSMESS again #2619

DopefishJustin opened this issue Aug 3, 2014 · 53 comments

Comments

@DopefishJustin
Copy link
Contributor

Compare #131 (comment)

Getting a bunch of warnings about unresolved symbols when compiling JSMESS for the Tandy 1000 with current Emscripten incoming:

/0/home/dfjustin/jsmess/third_party/emscripten/emcc  -O3 -s DISABLE_EXCEPTION_CATCHING=0 -s NO_EXIT_RUNTIME=1 -s ASSERTIONS=0 -s COMPILER_ASSERTIONS=1 -s FORCE_ALIGNED_MEMORY=1 -s TOTAL_MEMORY=67108864       -s EXCEPTION_CATCHING_WHITELIST='["__ZN15running_machine17start_all_devicesEv"]' -s EXPORTED_FUNCTIONS="['_main', '_malloc', '__Z14js_get_machinev', '__Z9js_get_uiv', '__Z12js_get_soundv', '__ZN10ui_manager12set_show_fpsEb', '__ZNK10ui_manager8show_fpsEv', '__ZN13sound_manager4muteEbh', 'SDL_PauseAudio']" -g2 -s ASSERTIONS=2 -s SAFE_HEAP=1 -s DEMANGLE_SUPPORT=1 /0/home/dfjustin/jsmess/third_party/mame/messtandy1t.bc -o /0/home/dfjustin/jsmess/build/tandy1t/messtandy1t.js --pre-js /0/home/dfjustin/jsmess/build/tandy1t/pre.js --post-js /0/home/dfjustin/jsmess/templates/default/post.js
warning: unresolved symbol: _ZN17centronics_device11write_data7Ei
warning: unresolved symbol: pthread_attr_setdetachstate
warning: unresolved symbol: _ZN17centronics_device12write_autofdEi
warning: unresolved symbol: _ZN17centronics_device11write_data4Ei
warning: unresolved symbol: _ZN17centronics_device10write_initEi
warning: unresolved symbol: _ZN17centronics_device11write_data6Ei
warning: unresolved symbol: _ZN17centronics_device15write_select_inEi
warning: unresolved symbol: _ZN17centronics_device11write_data1Ei
warning: unresolved symbol: SDL_UnlockYUVOverlay
warning: unresolved symbol: _ZN17centronics_device11write_data3Ei
warning: unresolved symbol: SDL_CreateYUVOverlay
warning: unresolved symbol: _ZN17centronics_device11write_data0Ei
warning: unresolved symbol: _ZN17centronics_device12write_strobeEi
warning: unresolved symbol: pthread_create
warning: unresolved symbol: SDL_GL_LoadLibrary
warning: unresolved symbol: SDL_LockYUVOverlay
warning: unresolved symbol: _ZN17centronics_device11write_data5Ei
warning: unresolved symbol: SDL_DisplayYUVOverlay
warning: unresolved symbol: SDL_FreeYUVOverlay
warning: unresolved symbol: _ZN17centronics_device11write_data2Ei
warning: unresolved symbol: CENTRONICS
warning: unresolved symbol: CENTRONICS_COVOX
warning: unresolved symbol: CENTRONICS_COVOX_STEREO
warning: unresolved symbol: CENTRONICS_PRINTER
warning: unresolved symbol: _ZTI17centronics_device

The pthread and SDL stuff is expected but the centronics functions seem to be getting munched by the linker - they are compiled into libbus.a but not the final binary.

To reproduce, use the following files: http://interbutt.com/temp/tandylinkerbug.zip

$ llvm-nm libbus.a | grep _ZN17centronics_device11write_data7Ei
         T _ZN17centronics_device11write_data7Ei
$ emcc -s version.o mess.o drivlist.o tandy1t.o genpc.o pc_t1t.o libosd.a libbus.a liboptional.a libemu.a libdasm.a libutil.a libexpat.a libsoftfloat.a libjpeg.a libflac.a lib7z.a libformats.a liblua.a libsqlite3.a libweb.a libz.a libocore.a libportmidi.a -o messtandy1t
$ llvm-nm messtandy1t | grep _ZN17centronics_device11write_data7Ei
         U _ZN17centronics_device11write_data7Ei
@kripken
Copy link
Member

kripken commented Aug 3, 2014

I can't seem to download that file (eta 19 hours right now), any chance you can put it up somewhere else?

@kripken
Copy link
Member

kripken commented Aug 3, 2014

Oh wait, I got it.

@kripken
Copy link
Member

kripken commented Aug 3, 2014

The issue is .a linking depends on order. libbus.a provides that symbol, but liboptional.a which requires it appears afterwards, which means the linker already decided to not link it in.

Swapping their order doesn't fix it, I assume because there is some other symbol dependency between those two, in the other direction.

This should be solvable using some combination of linker flags, (group options?), but I'm not very familiar with those. (Btw, what is the -s linker flag doing here?) A hackish way to solve it is to put libbus.a both before and after liboptional.a - this won't throw duplicate symbol errors for exactly the same reason as the problem exists in the first place, .a linking pulls in files only when needed, so the stuff already present is just ignored (and this is done on a .o file basis, of the .o files inside the .a).

A more general solution is to just not use .a files, as recommended on the FAQ. Linking to normal object files (.so or .bc) avoids all the legacy complexity of .a linking. However, in theory .a linking can be faster.

@DopefishJustin
Copy link
Contributor Author

That's the same linking order used by native builds though, is the thing....

-s is coming from https://github.com/startaq/mame/blob/master/makefile#L648

@DopefishJustin
Copy link
Contributor Author

So I tried adding --start-group and --end-group around everything and the behaviour is the same, which strikes me as a bug. Putting libbus.a before and after liboptional.a does work though.

@kripken
Copy link
Member

kripken commented Aug 4, 2014

I can't understand how that can work natively, because as I said, one .a needs a symbol from another .a and they are in the wrong order for that. Either the native linker has a bug, or my understanding of .a linking is incorrect, I guess.

@DopefishJustin
Copy link
Contributor Author

I'm not actually sure whether this exact case works on native as the native build is normally for all systems and not just one. If groups would work that would probably be enough for now though.

@kripken
Copy link
Member

kripken commented Aug 4, 2014

Groups has one open issue on it, #2568, perhaps you are hitting that.

@kripken
Copy link
Member

kripken commented Aug 4, 2014

If we can confirm it's that, we should close this and prioritize that other one.

@coolwanglu
Copy link
Contributor

--start/end-group means try to find the function regardless of the order of the modules in the groups, so the compiler try to iterate each of them until all the required functions are found (or failed).

@DopefishJustin
Copy link
Contributor Author

It does sound like #2568 is the same issue, yes.

@kripken
Copy link
Member

kripken commented Aug 4, 2014

Ok, that other issue is now resolved on incoming, please check if groups works for you on that.

@DopefishJustin
Copy link
Contributor Author

Still doesn't seem to work.

@kripken
Copy link
Member

kripken commented Aug 4, 2014

Even with groups? Then I guess we have another groups-related bug then.

@kripken
Copy link
Member

kripken commented Aug 4, 2014

Looks to me like it works ok with groups,

$ emcc -s -Wl,--start-group version.o mess.o drivlist.o tandy1t.o genpc.o pc_t1t.o libosd.a libbus.a liboptional.a libemu.a libdasm.a libutil.a libexpat.a libsoftfloat.a libjpeg.a libflac.a lib7z.a libformats.a liblua.a libsqlite3.a libweb.a libz.a libocore.a libportmidi.a -Wl,--end-group -o messtandy1t ; llvm-nm messtandy1t | grep _ZN17centronics_device11write_data7Ei
WARNING  root: treating -s as linker option and not as -s OPT=VALUE for js compilation
         T _ZN17centronics_device11write_data7Ei

Perhaps you had a typo in the groups command? Make sure the comma is present in -Wl,--start-group etc.

@DopefishJustin
Copy link
Contributor Author

It works if you do -Wl,--start-group but not with just --start-group etc. It would be nice if that wasn't a silent failure but I can work with this now.

@kripken
Copy link
Member

kripken commented Aug 4, 2014

It will be tricky to add a warning, as we would need to add new code to track all arguments back to their source, to let us know which are in fact user-supplied args we are passing to clang, and which are not. Then, when we don't run clang (like here, when we just link bitcode to bitcode), we could warn. This would be nice to have, but a major refactoring like that is risky for new bugs.

startaq pushed a commit to mamedev/mame that referenced this issue Aug 5, 2014
@kripken
Copy link
Member

kripken commented Aug 8, 2014

I can't find an easy way to improve that warning. Closing this as the main issue is working as intended.

@kripken kripken closed this as completed Aug 8, 2014
john-peterson pushed a commit to mirror/mame that referenced this issue Sep 6, 2014
See emscripten-core/emscripten#2619 for discussion

git-svn-id: svn://dspnet.fr/mame/trunk@31513 749742ba-7341-0410-aadc-df50b521781e
DopefishJustin added a commit to mamedev/mame that referenced this issue Oct 18, 2014
DopefishJustin added a commit to mamedev/mame that referenced this issue Oct 20, 2014
@DopefishJustin
Copy link
Contributor Author

This is happening again with current emscripten incoming, with slightly different symptoms:

Linking messjaguar...
/0/home/dfjustin/jsmess/third_party/emscripten/emcc -Wl,--warn-common  -Wl,--profiling obj/sdl/version.o -Wl,--start-group obj/sdl/mess/mess.o obj/sdl/mess/jaguar/drivlist.o obj/sdl/mame/audio/jaguar.o obj/sdl/mame/drivers/jaguar.o obj/sdl/mame/video/jaguar.o obj/sdl/libosd.a obj/sdl/mess/jaguar/libbus.a obj/sdl/mess/jaguar/liboptional.a obj/sdl/libemu.a obj/sdl/mess/jaguar/libdasm.a obj/sdl/libutil.a obj/sdl/libexpat.a obj/sdl/libsoftfloat.a obj/sdl/libjpeg.a obj/sdl/libflac.a obj/sdl/lib7z.a obj/sdl/libformats.a obj/sdl/liblua.a obj/sdl/libsqlite3.a obj/sdl/libweb.a obj/sdl/libz.a obj/sdl/libocore.a obj/sdl/libportmidi.a -Wl,--end-group  -o messjaguar
make[1]: Leaving directory `/0/home/dfjustin/jsmess/third_party/mame'
/0/home/dfjustin/jsmess/third_party/emscripten/emcc  -O3 -s DISABLE_EXCEPTION_CATCHING=2 --memory-init-file 0 -s NO_EXIT_RUNTIME=1 -s ASSERTIONS=0 -s COMPILER_ASSERTIONS=1 -s FORCE_ALIGNED_MEMORY=1 -s TOTAL_MEMORY=134217728 -s EXCEPTION_CATCHING_WHITELIST='["__ZN15running_machine17start_all_devicesEv"]' -s EXPORTED_FUNCTIONS="['_main', '_malloc', '__Z14js_get_machinev', '__Z9js_get_uiv', '__Z12js_get_soundv', '__ZN10ui_manager12set_show_fpsEb', '__ZNK10ui_manager8show_fpsEv', '__ZN13sound_manager4muteEbh', 'SDL_PauseAudio']" -g2 /0/home/dfjustin/jsmess/third_party/mame/messjaguar.bc -o /0/home/dfjustin/jsmess/build/jaguar/messjaguar.js --pre-js /0/home/dfjustin/jsmess/build/jaguar/pre.js --post-js /0/home/dfjustin/jsmess/templates/default/post.js
warning: unresolved symbol: pthread_attr_setdetachstate
warning: unresolved symbol: SDL_UnlockYUVOverlay
warning: unresolved symbol: _Z21get_dispensed_ticketsR15running_machine
warning: unresolved symbol: SDL_CreateYUVOverlay
warning: unresolved symbol: _Z22coin_counter_get_countR15running_machinei
warning: unresolved symbol: pthread_create
warning: unresolved symbol: SDL_GL_LoadLibrary
warning: unresolved symbol: _Z22coin_lockout_get_stateR15running_machinei
warning: unresolved symbol: SDL_LockYUVOverlay
warning: unresolved symbol: SDL_DisplayYUVOverlay
warning: unresolved symbol: SDL_FreeYUVOverlay
warning: unresolved symbol: _Z20generic_machine_initR15running_machine
warning: unresolved symbol: OSD_SOUND_NONE

I've uploaded the object files to http://interbutt.com/temp/linkerbug3.zip

$ llvm-nm libemu.a | grep _Z20generic_machine_initR15running_machine
         U _Z20generic_machine_initR15running_machine
         T _Z20generic_machine_initR15running_machine
$ emcc -Wl,--warn-common  -Wl,--profiling version.o -Wl,--start-group mess.o drivlist.o audio_jaguar.o drivers_jaguar.o video_jaguar.o libosd.a libbus.a liboptional.a libemu.a libdasm.a libutil.a libexpat.a libsoftfloat.a libjpeg.a libflac.a lib7z.a libformats.a liblua.a libsqlite3.a libweb.a libz.a libocore.a libportmidi.a -Wl,--end-group  -o messjaguar
$ llvm-nm messjaguar | grep _Z20generic_machine_initR15running_machine
         U _Z20generic_machine_initR15running_machine

Not sure what the significance is of the same symbol being listed as both U and T.

@kripken
Copy link
Member

kripken commented Dec 3, 2014

I get an error trying to extract that archive - can you please re-upload it?

@DopefishJustin
Copy link
Contributor Author

There's nothing wrong with it as far as I can tell, but recompressed and reuploaded anyway.

@kripken kripken reopened this Dec 8, 2014
@kripken
Copy link
Member

kripken commented Dec 8, 2014

Ok, I can download it and extract it now.

The command you quoted doesn't work out of the box - first, I had to fix up all the paths to point to the current dir. Not sure if that changes things or not. Second, 3 of the files ending in jaguar.o do not exist here, which seems like more of a problem.

@DopefishJustin
Copy link
Contributor Author

Look at the second code block in the post, I already massaged the command line to have everything in the current dir and rename the jaguar.o files to unique names.

@kripken
Copy link
Member

kripken commented Dec 8, 2014

Thanks, I see now.

@xxuejie
Copy link
Contributor

xxuejie commented Jan 25, 2015

@kripken Yes, I can confirm this is working using native ar, since this is already part of the mruby build process.

Or should this be classified as an undefined behaviour?

@kripken
Copy link
Member

kripken commented Jan 25, 2015

I honestly don't know what is defined and undefined behavior for ar. However, I tested ar on linux here, and it works identically to how llvm-ar does. So llvm-ar seems fine.

$ gcc -c tests/hello_world.c -o tests/a.o
$ file a.o
a.o: ELF 64-bit LSB  relocatable, x86-64, version 1 (SYSV), not stripped
$ ar rs a.a tests/a.o
ar: creating a.a
$ ar t a.a
a.o
$ cd waka
$ ar xo ../a.a
$ ls
a.o

So only the basename is retained, and files are dumped in the current dir, so ones with identical basenames trample previous ones.

Are you on mac? Perhaps mac ar works differently, and the build system is not cross-platform?

We could work around this issue if we could extract files by their index number, and not by name. But I don't see an option for that on ar or llvm-ar...

@DopefishJustin
Copy link
Contributor Author

Definitely the issue with MESS:

WARNING  root: loading from archive /0/home/dfjustin/jsmess/third_party/mame/obj/sdl/libemu.a, which has duplicate entries (files with identical names). this is dangerous as only the last will be taken into account, and you may see surprising undefined symbols later. run |llvm-ar t archive.a| to see the list of entries
WARNING  root:    duplicate: none.o
WARNING  root:    duplicate: generic.o

@kripken
Copy link
Member

kripken commented Jan 26, 2015

Thanks, good to know that the simpler testcase helped us to at least know what the problem is now. Although, given that ar works the same, something is still unclear.

@xxuejie
Copy link
Contributor

xxuejie commented Jan 26, 2015

For the native ar, I think the case is: if are are extracting all files at once, the latter file will override the former file with the same name. However, if we are extracting only one file, we can preserve the former file:

$ nm build/clang/src/symbol.o | grep mrb_init_symtbl  # original object file
0000000000000cf0 T _mrb_init_symtbl

$ nm build/clang/lib/libmruby.a | grep mrb_init_symtbl    # archive file
                 U _mrb_init_symtbl
0000000000000cf0 T _mrb_init_symtbl

$ ar x build/clang/lib/libmruby.a symbol.o    # extract one file
$ nm symbol.o | grep mrb_init_symtbl
0000000000000cf0 T _mrb_init_symtbl

Please ignore the clang part in the folder path here, it is still the native ar used here, not llvm-ar.

So this still looks like a bug of llvm-ar to me.

And yes, I'm on a mac, but I can confirm the same behaviour occurs on Linux(tested on a Linode VM).

In mruby's case, the problem will be gone if we avoid using .a files completely, in other words, specifying all object files in the final linking phase will work. Not sure if that applies to MESS though.

@kripken
Copy link
Member

kripken commented Jan 26, 2015

But if you extract a single file, using the basename (which is all there is at this point), then you can only extract one of the two? How can you specify the other?

@xxuejie
Copy link
Contributor

xxuejie commented Jan 26, 2015

I don't have an answer for that, sorry, Google tells us that maybe we have to use more trickier way, so it makes me wonder if this should be categorized as "undefined behavior" rather than "bug".

@kripken
Copy link
Member

kripken commented Jan 26, 2015

Based on that, it seems like ar will have the same problem as llvm-ar. In which case, how do these build systems work natively? The exact same issue should arise. The native build systems must be doing something differently, just tracing the commands should show how.

@xxuejie
Copy link
Contributor

xxuejie commented Jan 26, 2015

Well, here's the complete build steps of mruby on a native platform: https://gist.github.com/xxuejie/915269285747b66a796b

Basically what it does is the following 3 steps:

  1. Compile every .c file in to .o file.
  2. Archive a set of .o files(including multiple symbol.o files with the same name but different path) into a single .a file.
  3. Link some other .o files with the .a file to create a single executable, which will work.

I guess the case here is that ar does include symbols from every file on the system even though they might contain duplicate names. It is only the extraction phase of ar that is not working well: like the standard command interface has limitations on extraction of files with duplicate name.

@kripken
Copy link
Member

kripken commented Jan 26, 2015

Ah, ok, so it sounds like ar x, the commandline extraction, is faulty - in both ar and llvm-ar, while the native linker apparently has it's own logic to handle archives, which is better then ar x. I suppose we need to do something similar.

@xxuejie
Copy link
Contributor

xxuejie commented Jan 26, 2015

If llvm-ar has a bug handling this(from my test results, llvm-ar won't preserve symbols from the former file, even if 2 files are shown in llvm-ar t), can we do anything at the linker level?

@kripken
Copy link
Member

kripken commented Jan 26, 2015

Yes, the only option is to write our own code to extract files from archives, I suppose. Perhaps like the link you posted above, I didn't take a close look yet though.

@kripken
Copy link
Member

kripken commented Jan 27, 2015

@sunfishcode noticed the N option, which in ar is precisely how one can actually access multiple files with the same names. This seemed like a perfect solution here... except that it was removed

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130708/180847.html

Remove the 'N' modifier from llvm-ar.

* It is not present on OS X.
* It is untested.
* It is not needed for using ar in a build system.

So this was intentionally removed from llvm-ar, and I'm fairly sure there's no point in filing a bug to ask for it to be re-implemented. So LLVM is not going to help us here.

The only solution seems to be to write a new archive file parser. If someone can find a nice standalone one, that seems like a possibility. Writing one from scratch sounds like a bad idea - we'd need to debug it on multiple platforms, etc. etc.

@kripken
Copy link
Member

kripken commented Jan 30, 2015

I can't find a workaround here. Deleting entries from a .a does not let us read them all out one by one, even slowly. And there does not seem to be a good standalone ar handler.

I pushed a test to verify that we issue a proper warning on this. As the workaround is fairly simple once the problem is known - rename a few source files, as mentioned in the warning text - I think that's the best we can do here.

@xxuejie
Copy link
Contributor

xxuejie commented Jan 31, 2015

Thanks for the help! Webruby now works by ignoring the .a file and links all .o files directly. I will verify if I can reproduce the same bug with LLVM 3.5 and report it back to mruby. What's strange, it seems llvm-ar in LLVM 3.5 distribution does not play well with its own linker out of box.

@DopefishJustin
Copy link
Contributor Author

There ended up being too many affected files in MESS for renaming to be a good workaround so I used some makefile tricks to use .bc files instead of .a: mamedev/mame@31bc7f4

@bkaradzic
Copy link

I ran into this issue only with emscripten toolchain. Can't you fix it by adding random numbers at the end of filename that goes into archive? Does it really matter what name is inside archive?

@kripken
Copy link
Member

kripken commented Mar 23, 2015

Well, if the user inserts file.o and later tries to extract that entry by name, then if we added random characters that would be broken. That is definitely an interesting idea, though - perhaps it's better to have that broken than the current problem? Perhaps it's very rare for people to manually extract from archives themselves, and only the linker does?

@bkaradzic
Copy link

Another idea is that once you start extracting those files get their hash, and then use hash to identify identical files, not by name. I have case where files have identical name but their content is completely different. I assume you want to remove redundant content when linking and you're using name to identify that which causes this issue.

@kripken
Copy link
Member

kripken commented Mar 23, 2015

The problem is that llvm-ar can only extract one file out of all the files with identical basenames in the archive. The only way to fix it would be to modify llvm-ar, however it already had such an option and they removed it, so I'm not sure they would even be interested.

@bkaradzic
Copy link

Ah ok, emar maybe then needs to add some value to make names unique. I don't think it's ever needed to extract exact file-name anyway (not even possible as you can only extract one with given name).

@kripken
Copy link
Member

kripken commented Mar 26, 2015

It is possible to extract individual files by name, and on native ar's except for OS X, apparently even when there is an identical basename. It's possible no one uses this functionality, though.

Making the names unique isn't trivial though - should the same full path lead to the same unique name? or not? I'm not sure.

Regardless, if this would be a useful option for people, I'm not opposed. But it would need to be an option and not the default behavior, I think.

@bkaradzic
Copy link

I can pack foo/bar.o and bar/bar.o into the same library and not have any issues linking on any platform other than Emscripten. Solution like foo_bar.o, bar_bar.o would be acceptable too.

@kripken
Copy link
Member

kripken commented May 22, 2015

Let's move discussion to #2142 which is still open and focused on the last issue here. I'll give an update in there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants