-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenMP: No memset in viewfill #6573
Conversation
I'm at least surprised that a manual loop would be faster than |
Sorry I was wrong on that one, its the opposite, |
OpenMP: do not use memset for 0's only if execution space is OpenMP Co-authored-by: Daniel Arndt <arndtd@ornl.gov>
maybe it makes sense to add this to the benchmarks? |
We should probably do the same for the |
Just posting the observations of slow down observed in filling views with 0's with OpenMP backend with different threads in the current
n = 100000, Fill test -1
Write time: 0.00s; Write Bandwidth: 2.26GB/s
Write time: 0.00s; Write Bandwidth: 34.84GB/s
Write time: 0.00s; Write Bandwidth: 37.99GB/s
Write time: 0.00s; Write Bandwidth: 36.58GB/s
Write time: 0.00s; Write Bandwidth: 37.45GB/s
n = 100000, Fill test 0
Write time: 0.00s; Write Bandwidth: 1.86GB/s
Write time: 0.00s; Write Bandwidth: 55.52GB/s
Write time: 0.00s; Write Bandwidth: 58.28GB/s
Write time: 0.00s; Write Bandwidth: 60.72GB/s
Write time: 0.00s; Write Bandwidth: 60.91GB/s
n = 1000000, Fill test -1
Write time: 0.00s; Write Bandwidth: 8.17GB/s
Write time: 0.00s; Write Bandwidth: 92.28GB/s
Write time: 0.00s; Write Bandwidth: 104.30GB/s
Write time: 0.00s; Write Bandwidth: 96.42GB/s
Write time: 0.00s; Write Bandwidth: 98.13GB/s
n = 1000000, Fill test 0
Write time: 0.00s; Write Bandwidth: 9.23GB/s
Write time: 0.00s; Write Bandwidth: 58.10GB/s
Write time: 0.00s; Write Bandwidth: 58.27GB/s
Write time: 0.00s; Write Bandwidth: 58.28GB/s
Write time: 0.00s; Write Bandwidth: 58.28GB/s
n = 10000000, Fill test -1
Write time: 0.00s; Write Bandwidth: 20.64GB/s
Write time: 0.00s; Write Bandwidth: 50.90GB/s
Write time: 0.00s; Write Bandwidth: 59.54GB/s
Write time: 0.00s; Write Bandwidth: 59.32GB/s
Write time: 0.00s; Write Bandwidth: 59.47GB/s
n = 10000000, Fill test 0
Write time: 0.01s; Write Bandwidth: 12.12GB/s
Write time: 0.00s; Write Bandwidth: 44.90GB/s
Write time: 0.00s; Write Bandwidth: 49.49GB/s
Write time: 0.00s; Write Bandwidth: 49.87GB/s
Write time: 0.00s; Write Bandwidth: 44.36GB/s
n = 100000, Fill test -1
Write time: 0.00s; Write Bandwidth: 3.60GB/s
Write time: 0.00s; Write Bandwidth: 64.18GB/s
Write time: 0.00s; Write Bandwidth: 64.08GB/s
Write time: 0.00s; Write Bandwidth: 62.62GB/s
Write time: 0.00s; Write Bandwidth: 65.28GB/s
n = 100000, Fill test 0
Write time: 0.00s; Write Bandwidth: 2.54GB/s
Write time: 0.00s; Write Bandwidth: 53.20GB/s
Write time: 0.00s; Write Bandwidth: 56.67GB/s
Write time: 0.00s; Write Bandwidth: 58.28GB/s
Write time: 0.00s; Write Bandwidth: 60.40GB/s
n = 1000000, Fill test -1
Write time: 0.00s; Write Bandwidth: 12.79GB/s
Write time: 0.00s; Write Bandwidth: 171.67GB/s
Write time: 0.00s; Write Bandwidth: 168.80GB/s
Write time: 0.00s; Write Bandwidth: 171.38GB/s
Write time: 0.00s; Write Bandwidth: 172.56GB/s
n = 1000000, Fill test 0
Write time: 0.00s; Write Bandwidth: 7.96GB/s
Write time: 0.00s; Write Bandwidth: 58.38GB/s
Write time: 0.00s; Write Bandwidth: 58.63GB/s
Write time: 0.00s; Write Bandwidth: 58.49GB/s
Write time: 0.00s; Write Bandwidth: 58.41GB/s
n = 10000000, Fill test -1
Write time: 0.00s; Write Bandwidth: 49.40GB/s
Write time: 0.00s; Write Bandwidth: 183.65GB/s
Write time: 0.00s; Write Bandwidth: 203.21GB/s
Write time: 0.00s; Write Bandwidth: 216.32GB/s
Write time: 0.00s; Write Bandwidth: 216.86GB/s
n = 10000000, Fill test 0
Write time: 0.01s; Write Bandwidth: 13.58GB/s
Write time: 0.00s; Write Bandwidth: 45.25GB/s
Write time: 0.00s; Write Bandwidth: 43.72GB/s
Write time: 0.00s; Write Bandwidth: 47.85GB/s
Write time: 0.00s; Write Bandwidth: 50.10GB/s
n = 100000, Fill test -1
Write time: 0.00s; Write Bandwidth: 4.51GB/s
Write time: 0.00s; Write Bandwidth: 81.81GB/s
Write time: 0.00s; Write Bandwidth: 80.01GB/s
Write time: 0.00s; Write Bandwidth: 87.36GB/s
Write time: 0.00s; Write Bandwidth: 93.83GB/s
n = 100000, Fill test 0
Write time: 0.00s; Write Bandwidth: 2.49GB/s
Write time: 0.00s; Write Bandwidth: 51.91GB/s
Write time: 0.00s; Write Bandwidth: 55.18GB/s
Write time: 0.00s; Write Bandwidth: 58.93GB/s
Write time: 0.00s; Write Bandwidth: 60.77GB/s
n = 1000000, Fill test -1
Write time: 0.00s; Write Bandwidth: 12.32GB/s
Write time: 0.00s; Write Bandwidth: 265.18GB/s
Write time: 0.00s; Write Bandwidth: 273.25GB/s
Write time: 0.00s; Write Bandwidth: 236.29GB/s
Write time: 0.00s; Write Bandwidth: 270.30GB/s
n = 1000000, Fill test 0
Write time: 0.00s; Write Bandwidth: 7.50GB/s
Write time: 0.00s; Write Bandwidth: 52.09GB/s
Write time: 0.00s; Write Bandwidth: 52.31GB/s
Write time: 0.00s; Write Bandwidth: 52.35GB/s
Write time: 0.00s; Write Bandwidth: 52.29GB/s
n = 10000000, Fill test -1
Write time: 0.00s; Write Bandwidth: 60.60GB/s
Write time: 0.00s; Write Bandwidth: 314.60GB/s
Write time: 0.00s; Write Bandwidth: 314.42GB/s
Write time: 0.00s; Write Bandwidth: 315.79GB/s
Write time: 0.00s; Write Bandwidth: 337.31GB/s
n = 10000000, Fill test 0
Write time: 0.01s; Write Bandwidth: 11.21GB/s
Write time: 0.00s; Write Bandwidth: 38.54GB/s
Write time: 0.00s; Write Bandwidth: 44.82GB/s
Write time: 0.00s; Write Bandwidth: 39.06GB/s
Write time: 0.00s; Write Bandwidth: 45.75GB/s The difference is more prominent with higher number of threads. |
Adding a reference to the PR that introduced |
Seeing
on |
The PR attempts to fix issue #6480 where filling a view with 0's is very slow compared to filling the same with 1's.
The slowdown observed is repeatable with clang and g++ compilers and with OpenMP and Serial backend
Example to test the solution with OpenMP
Results with
OMP_NUM_THREADS=8
with gcc/11.2 on AMD EPYC.Similar behavior observed with more number of threads too.
current develop
This PR
In a follow up PR, I will add it as a benchmark.