-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prov/opx: 1.15.1: use always_inline
with flatten
__attribute__ cause use gcc huge memory usage when LTO is used
#7916
Comments
If I understand the bug report, the problem is that the OPX provider is insane and thinks that having a 100 GB binary will somehow outperform a provider with function calls and a significantly smaller footprint. Does this seem to be the issue? Can you build libfabric, but using --disable-opx, and see if the problem is limited to that provider? |
This +100GB it is not binary size but size of the MEMORY used by gcc during linking (after reaching 110GB consumed RAM and 2.5h of linking I've stopped that process so amount of CPU power memory used on linking is probably even bigger). |
I know trying to build opx takes about 5 minutes on my system. Can you try using --disable-opx and verify that the problem goes away? I want to confirm this is related to the opx provider. |
With --disable-opx there is no that issue. And/or as dist binaries should I build all provides as loadable modules? 🤔 |
libfabric will try to build all providers that have the required prerequisites available on the system. I don't see a strong reason to build providers as loadable modules (unless wanting to update only that provider against an older build of libfabric). Even if a provider is built into libfabric, if a loadable provider is found, it will be used if it reports a newer version number than the internal provider. |
always_inline
with flatten
__attribute__ cause use gcc huge memory usage when LTO is usedalways_inline
with flatten
__attribute__ cause use gcc huge memory usage when LTO is used
Kind of strange logic. |
Suppose there is an installed version of libfabric on a system (say, packaged by RedHat or SuSE). That version may have all providers built into it. Now say vendor Intel finds a bug in their provider and wants to deploy a fix. They can do this by shipping only their provider library. This way, the system installed version of libfabric would still be used, but it can pick up the latest Intel provider. Not everyone will necessary be able to rebuild libfabric and re-install it on their system. This way, the other providers that might be in use (the ones build into libfabric) are not modified. This also provides a mechanism for providers that are maintained out of tree, and may not be open source, to be used with the upsteam libfabric. |
We are taking abbout providers which are part of the libfabric. Please .. PS. Still .. |
"Provider" is the term that we use because we needed to call these plug-ins something, and at this point it's baked into the API, so I don't see that changing. Most are not drivers in the common use of that term and are not associated with specific HW. They provide an implementation of the APIs over "something" -- shared memory, tcp sockets, udp sockets, any verbs-based device, EFA NIC, etc. -- maybe even over another provider in order to extend its functionality. Most of the API calls are function pointers that go directly into the provider. libfabric itself only exports a handful of actual APIs. |
So .. in other words opx provider code is broken because it tries to use |
Looking into this |
Fixed with 779972e |
Can we close this? Tim's removal of the |
If you are thinking that issue is solved just close ticket. |
I don't think I have the ablity to close this ticket? |
Describe the bug
Use
always_inline
withflatten
attribute cause use huge memory usage when LTO is used (I've hit 110GB phisical RAM).All details of this case + explanation of the issu in libfabric code + patch which fixex the issue are on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106499
The text was updated successfully, but these errors were encountered: