-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OSX: performance issues within Clang + duplicate symboles with g++-13 #1755
Comments
I confirm that
which takes 0.057317972 seconds on my intel linux laptop, gcc compiler, no eigen::
is internally optimised to do nothing ( This restricts the area of the problem to a very tiny number of code lines, essentially what happens in " |
CULPRIT FOUND!!! On OSX, for obscure historical reasons, and given that the system defines HAVE_MALLOC_ZONE_STATISTICS and HAVE_MALLOC_MALLOC_H, the very very inner code for destruction of variables would call the obscure UpdateCurrent() function to report precise memory useage. The loss of time is tremendous, and would have been seen in a profiler by the enormous number of calls to strange functions like malloc_zone_statistics() etc. making UpdateCurrent() just return solves the speed problem, time_test4 drops to 1 sec. |
Just commited the single-liner that is supposed to do wonders. |
@GillesDuvert : brilliant ! Thanks tested on a intel OSX, using the script ...
|
Congrats!!!!
… On 2. Mar 2024, at 15:09, Giloo ***@***.***> wrote:
CULPRIT FOUND!!!
On OSX, for obscure historical reasons, and given that the system defines HAVE_MALLOC_ZONE_STATISTICS and HAVE_MALLOC_MALLOC_H, the very very inner code for destruction of variables would call the obscure UpdateCurrent() function to report precise memory useage. The loss of time is tremendous, and would have been seen in a profiler by the enormous number of calls to strange functions like malloc_zone_statistics() etc.
making UpdateCurrent() just return solves the speed problem, time_test4 drops to 1 sec.
—
Reply to this email directly, view it on GitHub <#1755 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AOC5K6HM546IOUDCHT5XCO3YWIIXDAVCNFSM6AAAAABDU7VVQSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZUHA3DQMBYHE>.
You are receiving this because you are subscribed to this thread.
|
OK, the performance issues within FOR loops detected first on Mac M2/M3 is in fact also here on x86_64
time_test4 : 1.06796=Total Time
The OSX versions here were compiled with the script, and OpenMP is declared as ON
(all tests : 4, 5, 16, 25 are bad, but also 2 regress since clang 17 :(
OSX gdl-1.0.2git230313 : clang 15.0.7_1
time_test4 : 21.8063=Total Time
OSX gdl-1.0.2git230420 : clang 16.0.1
time_test4 : 21.9463=Total Time
(case 2 0.206096 Foreach, 6000000 elements
OSX gdl-1.0.3git231123CMake: clang 17.0.4
time_test4 : 70.3176=Total Time
(case 2 : 49.0947 Foreach, 6000000 elements)
OSX gdl-1.0.4git240222CMake: clang 17.0.6_1
time_test4 : 69.5246=Total Time
Unfortunately I cannot finish the compilation with GCC 13 because of duplicates symbols
datatypes.cpp.o
is always involved ...The text was updated successfully, but these errors were encountered: