Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

__wasm_apply_data_relocs is absolutely enormous #55608

Open
k1nder10 opened this issue May 20, 2022 · 13 comments
Open

__wasm_apply_data_relocs is absolutely enormous #55608

k1nder10 opened this issue May 20, 2022 · 13 comments

Comments

@k1nder10
Copy link

We are porting a big project (>1'000'000 lines of code) to wasm using emscripten. Everything had worked fine until we started using dynamic linking (MAIN_MODULE=2 & SIDE_MODULE=2). Since then we are running into the issue of exceeding the function body size limit. This function is __wasm_apply_data_relocs and it's added by wasm-ld I believe.
We also noticed that this function is huge even in the targets where it fits the limit. (10-100x times bigger than the functions from our codebase). It feels like it does more work than actually needed.

What does this function actually do?
What does the size of this function depend on?
Are there any flags/features we can use to reduce its size?

It must be a common problem for every big project. Is it possible to split this function from our/your side somehow?

@llvmbot
Copy link
Collaborator

llvmbot commented May 20, 2022

@llvm/issue-subscribers-backend-webassembly

@sbc100
Copy link
Collaborator

sbc100 commented May 20, 2022

This function applies data relocations. Whenever the address of a symbol is stored in linear memory, but that address is not known until runtime (this is true for all side module symbols, as well as undefined main module symbols), we need to update that location once the modules are loaded and locations are fixed.

I'm not sure what could cause the massive explosion of data relocations, but I can imagine it could be the vtables used by C++?

There are a few things we can do:

  1. Figure out if the overall number of relocations can be reduced. I'm not sure how easy it is to inspect the relocations or to have wasm-ld report them, but we should investigate why you have so many in our program.
  2. Move to a more efficient encoding.. Perhaps we can find a way to reduce the size of the function by performing the relocations with fewer instruction.
  3. Split up the function into chunks (as you suggested)

@k1nder10
Copy link
Author

What do you suggest to us to get around this problem from our side? We were even thinking of manually splitting up this function in .wasm file though it's kinda ridiculous.

Are you guys planning on fixing this issue somehow? It seems like no big project can use dynamic linking because of it.

@sbc100
Copy link
Collaborator

sbc100 commented May 24, 2022

I think (1) needs to be investigated. The number of relocations should not be that high. Is it the side module or the main module is has the very large function?

Is your program using a lot C++ virtual functions? Do the main and side modules share a lot of C++ classes over the boundary. Emscripten recently switched to visibility=default which has been causing trouble for some folks. Can you try compiling with -fvisibility=hidden?

edit: recommend visibility=hidden

@k1nder10
Copy link
Author

Sorry, have to tell you that we build with SIDE_MODULE=1 & MAIN_MODULE=1. We figured out that DCE (side_module=2) is not an option for us as for many reasons.

This is the main module where the function exceeds the limit.
Yes, our program has a lot of virtual functions. Yes, side modules and the main module share many C++ classes.
Does it make sense to test -fisibility=default for SIDE_MODULE=1?

@sbc100
Copy link
Collaborator

sbc100 commented May 31, 2022

Can you try again with the latest emsdk (emsdk install tot) ? I'm curious to see if 198815e helped in your case.

Also it is the side module all the main module that contains the huge relocation function? I would suggest building with -fisibility=hidden, yes (sorry I meant to say -fvisibility=hidden in my previous comment), although any code that you share between the main the side module would then need to be tagged in the source code as __attribute__(((visibility("default")))

@k1nder10
Copy link
Author

Yes, the commit you mentioned does help us. Though the function is anyway too big. Probably we'll hit the limit in the near future when more code is written.

@k1nder10
Copy link
Author

I can see you're discussing it here emscripten-core/emscripten#17374
Splitting this function up into chunks would be really great!

@sbc100
Copy link
Collaborator

sbc100 commented Jul 13, 2022

One reason I'm hesitant to invest in splitting up the function is that it doesn't address the overall code size issue.

@k1nder10
Copy link
Author

I see. However, we always hit the function size limit when experimenting with static/dynamic libs/-O -g flags, and many more. We have to keep in mind the limit all the time and go around it. Besides, we have to write code in such a big codebase thinking about data relocations, which is absolutely insane. So, it would at least enable us to do many common things and not think about this stuff. IMO it is absolutely worth splitting this up.

@k1nder10
Copy link
Author

k1nder10 commented Jul 14, 2022

You can leave it as a temporary solution until you find a better way to sort it out

@sbc100
Copy link
Collaborator

sbc100 commented Jul 14, 2022

I agree, in the absence of better solution, simply splitting it up into chunks seems like a reasonable solution. I'm not sure when I will get time to work on it though.

@k1nder10
Copy link
Author

k1nder10 commented Dec 15, 2022

I uploaded a patch to split this up. Can you review that? https://reviews.llvm.org/D140111
@sbc100

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants