-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up mandelbrot #742
Speed up mandelbrot #742
Conversation
The vector version is faster but the parallel version is slower (from 0.96
ms to 1.17 ms). Why is this? What hardware did this run on? It says 32
threads so I assume this is a AVX512 server? Maybe the problem size is too
small then?
…On Tue, Sep 12, 2023 at 1:21 AM Jack Clayton ***@***.***> wrote:
@jackos <https://github.com/jackos> requested your review on: #742
<#742> Jackos/speedup mandelbrot.
—
Reply to this email directly, view it on GitHub
<#742 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/A35B3DZZOQ3O3IRHAD3OLVDXZ7WONANCNFSM6AAAAAA4UJM5NA>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
|
We can calculate the best possible time. I guess this system has 16 physical cores time = 40120461 * (8.5) /frequency/16/16. I don't know what the frequency is on this system but lets assume its about 3 GHz. Then the best time is about 0.4 ms. And the best time was 0.96 s. So about 42% of best possible. We should test with larger sizes. We could increase max_iter to e.g. 1000. I usually try and get something to run for about a second. In this case it runs for less than 1 ms and I am afraid the overhead is large. When I benchmarked I used 4000x4000 and 1000 max iterations. I think @abduld did not want to do increase the run time default on the server. But |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out the new version is faster on the current system. Can you please report the numbers with the old code and the new code to show the improvement on the current system.
Addresses the improvements @zbosons pointed out.
Changed to using an isolated machine with 8-core CPU, for 960x960 image with 200 max_iters results in:
Pre change
After change