-
-
Notifications
You must be signed in to change notification settings - Fork 448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spurious duplicate symbol error #524
Comments
What program are you trying to build? And what distro? I want to reproduce the issue on my machine. |
I run it on ubuntu 22.04 but I think it will have the same effect on 20.04. The repo to reproduce: https://github.com/romange/helio build prerequisites: to configure: to build: |
Getting something similar when I enable LTO with 1.2.1 (no issues with 1.0.3 that falls back to ld.bfd for LTO) : |
I can reproduce this. Let me take a look. |
What I know so far: this happens for TLS usage in statically linked libraries and probably has to do with some GNU-related behavior. One of the
While the lto_trans symbols look like this:
The problem should go away if we treat these |
GNU unique symbols have a weird semantics, and it's no longer used actively. So it's odd that Ubuntu 22.04 distributes an object file containing GNU unique symbols. That object file must have been built with a misconfigured compiler or with the |
I looked at the same file on Arch and it also has UNIQUE. Maybe Boost is setting such a flag?
It seems like it can link it without error. |
Possibly.
That's odd too. Does lld handle GNU unique symbols as weak symbols? |
I'll see if it has the flag in the build configuration. FWIW, the Boost I used was built directly from source (the repro required a certain version).
It does not. I do wonder if it eliminates such symbols before checking for duplicates, though. Also, not sure but should the visibility of |
The |
I'm very sorry that I made some mistakes during debugging, and ld.bfd was actually used when I said I used lld. I guess it's possibly some bfd specific behavior related to |
As for
|
If GNU unique is enabled by default and gcc emits GNU unique symbols aggressively, the symbols we are discussing about should have been consistently of type GNU unique. It's odd that one object file contains the symbols as GNU unique, and another file has them as non-GNU unique symbols. What caused that symbol's inconsistency? Also, if GNU unique is disabled, I believe gcc emits symbols as weak defined symbols instead of regular strongly defined symbols. But here we get global symbols instead of weak ones. I don't know what that means, but it looks odd. |
For the reference, this is what GCC docs say:
|
Used it(-fno-gnu-unique) to rebuild boost and now I can link succesfully with mold (1.2.1) having LTO enabled |
OK, I figured out whatever nefarious logic gold was using: Apparently the GCC linker plugin will emit most of the ltrans symbol as @rui314 If I understand correctly, it seems that mold's approach to symbol resolution is a little bit different from gold's. Is it true that mold just throws the stub symbols away before resolution so we don't have the same "replace LTO stub symbols with the ltrans file" process? If so, do you think it's a sensible approach to solve this would be to have a separate pass that copies over the st_bind values? |
It feels like a bug of the GCC linker plugin. Do you mind if I ask you to file a bug against GCC? Or you can fix it yourself and send a patch to them if you want to. As to how to fix the issue, I think it's sensible to copy st_bind values from symbols in IR files to the resolved symbols as the last step of LTO. |
Sounds good, will do.
Thanks for confirming. I'll work on a fix. |
After looking into this again, it turned out that the cause is rather different: GCC LTO plugin does not emit the object file with correct I filed this as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105933. The strategy for us is largely unchanged: we just copy over the attributes we got from the plugin. I don't think it's worth constructing another symbol hash table though, so I'm just looking over the symbols and see if's currently resolved to an IR symbol. If it is, then we get the information from that; if it isn't, then we assume that the LTO symbol can't win the resolution and drop the symbol instead. WIP branch is here. There is still another issue that seems to be hitting us though: GCC LTO plugin doesn't emit COMDAT sections, and we're doing conflict resolution using the fallback path with WEAK etc. In this case, the So the question here is:
|
I cannot answer to the question regarding the WEAK symbol without experiments. The current symbol strength rules is chosen based on a large scale experiment (by compiling 10,000 Gentoo packages), and changing the rule may result in an unexpected failure in other package. It may still be worth it, but I cannot predict the outcome without actually trying it. I think there's another way to fix the issue. Currently, we redo symbol resolution from scratch after LTO, but we don't need to do that. Instead of discarding all symbols and redo symbol resolution, we can override IR symbols with compiled LTO symbols. This is actually what lld does. Since LTO can introduce a new undefined symbol, we need to handle the case that a new object file is pulled out from an archive though. I don't know if it's a good idea, but it looks like it's one way to solve this issue. |
Sorry for the delay. On second thought, I'm not sure if this actually solves the issue. In the branch I linked above, I'm already copying as much information from the IR symbols over to the LTO symbols which effectively ensures it results in the same resolution. However, even if we did not throw away the IR symbols, it will likely get the same duplicate symbol error later in the process because 1. GCC doesn't include COMDAT information in IR objects nor LTO objects and 2. our duplication resolution seems to not like cases with a WEAK from object file and UNIQUE from static library. So really here we need to teach GCC to emit COMDAT or get the duplicate checker to not complain about this case, I think. |
Sorry, this causes so much trouble 👐🏼 |
How difficult it is to make mold not to complain about symbol duplication error? We don't need to de-duplicate COMDAT groups (our UNIQUE symbol handling isn't perfect, and having redundant copies of function code in the output is acceptable). We just suppress the symbol duplication error for this case. |
It's... tricky. The duplicate checker is written in a manner that assumes COMMON symbols always overrides WEAK symbols. Under such assumption, you only need to check the top 2 matches, and if both are COMMON, then there's a duplicate. But here we're overriding a UNIQUE symbol with an object file's WEAK symbol. If we allow WEAK as the top match and UNIQUE (in staticlib) as the second match, then there might be a third UNIQUE match in staticlib that we didn't check that is an actual duplicate. That said, I need to do more research on whether this is considered an actual issue (I don't understand why staticlib symbols are demoted right now). |
was able to reproduce in another way:
this -flto asymmetricity looks weird but it can easily happen when CMake pybind11_add_module is used. interestingly, pybind11_add_module : https://github.com/pybind/pybind11/blob/v2.9.2/tools/pybind11Tools.cmake#L195 Debian 11 bullseye, stock boost 1.74 |
An update. The archive UNIQUE vs object WEAK problem I mentioned actually has a clear answer: if the archive file is reached from one of the objects (in old terms, extracted), then it's just treated as a normal object file. There's no ambiguity here; the UNIQUE one wins. More problematic is the hack we're using to deal when resolving UNIQUE symbols. We treat all UNIQUE symbols as WEAK in terms of resolution priority: Line 828 in a8559be
But this breaks the soundness of the resolution system. As a result, a WEAK symbol (with higher file priority) could become the file resolution in presence of a non-eliminated UNIQUE symbol, and that's exactly where the false-positive duplicate is coming from, after my prototype patch to copy st_bind from the IR objects. This did not happen in non-LTO, because everything was guarded with a COMDAT, but now with this edge case it becomes a trouble. I've been thinking about doing COMDAT elimination before symbol resolution. That way we can avoid this soundness-breaking hack altogether. Although it doesn't look like just reordering the passes works; in particular, mergeable sections gets initialized as (The following diff is what I have been doing as a part of proposal above. It doesn't fix this issue (524). Also it doesn't cover the case LTO yet.)
|
That idea hasn't occurred to me, but it sounds very interesting. The more I think about it, it more feels like it's the right way to handle comdats and the symbol resolution. |
Cool, I'll experiment more with this and try to get the interaction with mergable sections resolved. |
I'm also seeing this with a (sadly closed source) linking against
If I manually drop the Is there some debugging information we can supply to disambiguate this error from this bug (ie I don't know if this is a separate issue or not). Edit: adding versions:
|
We've been treating STB_GNU_UNIQUE symbols as if they were weak. The rationale of doing is this: 1. GCC sometimes creates GNU-unique symbols instead of weak ones for comdats, and 2. after COMDAT de-duplication, we used to report a duplicate symbol error for GNU-unique symbols However, (2) is no longer the case because we do not report a symbol duplication error if its section is dead. I'm not sure if this new logic will work for all programs, but I want to give it a shot. #524
Can you try with the above commit? |
@rui314 that indeed fixes the issue for me! Built ok with:
|
Did it also pass tests? |
It did, all seems functional.and working. |
I've put around a branch for early COMDAT elimination at #810. It seems to pass all tests, including the gnu-unique test that fail in case of the naive fix. |
Just FYI #810 is now merged and it's hopefully a less problematic approach to fix the same issue. If it breaks anything for you please let me know. |
@mattgodbolt If you have time, please rebuild mold and try it to see if it works for you. I want to get a feedback before making next release. I'll close this bug because I think it is finally fixed once and for all. Thank you @ishitatsuyuki for inventing this new symbol resolution scheme and implementing it! |
I'll retry on Monday and let you know! |
With:
And I confirmed everything builds, links, runs and passes tests on our side. Thanks for the quick fix! |
I built mold 1.7.0 just now (debian 11 build in https://github.com/cielavenir/salsa-mold/releases/tag/debian%2F1.7.0%2Bdfsg-1)
This sequence is confirmed to be fixed in mold 1.7.0. Thank you for fixing. |
For me the issue (duplicate symbol) persists even with mold 1.7.0. The combination of mold + Boost static libs + LTO is causing this:
Things work only if I change either one of the three:
|
I think I actually forgot that the GCC st_bind issue still exist, so this will still fail in some cases. It doesn’t look like GCC developers are going to fix their part anytime soon, so we might want to reopen this and introduce a targeted workaround for GCC. |
spurious linker error when linking with libboost_fiber.a. with plain gcc it doesn't happen.
mold 1.2.1 (c8d8f86a52084c96e2663d9f692c51e98c04cc2f; compatible with GNU ld)
The text was updated successfully, but these errors were encountered: