Replace Marsaglia polar method with Box-muller to generate a normally distributed random number #6556

Shihab-Shahriar · 2023-10-31T03:00:00Z

On GPU, Box-Muller is expected to significantly outperform current Marsaglia polar. The reasoning can be found here:

The polar method (Press et. al. 1992) is simple and relatively efficient, but the probability of looping per thread is 14 percent. This leads to an expected 1.6 iterations per generated sample turning into an expected 3.1 iterations when warp effects are taken into account.

On the CPU side, there doesn't seem to be any difference.

Before

N	CPU	CPU	CUDA	CUDA
	64	1024	64	1024
1e4	0.061071	0.0757334	0.0150474	0.016794
1e5	0.600972	0.761684	0.120827	0.28777
2e5	1.19428	1.50356	0.236278	0.586949

After

N	CPU	CPU	CUDA	CUDA
	64	1024	64	1024
1e4	0.0610803	0.0734002	0.0119627	0.0119717
1e5	0.60062	0.73712	0.0957707	0.0968437
2e5	1.17503	1.47986	0.195854	0.19791

(The timings are taken using one of the kokkos examples here- generate a 2D array with N rows and 1000 columns. All values in seconds. CPU is AMD 3700x, GPU is Nvidia 1650 super, 16GB RAM)

I'm not quite sure why there is such a performance gap in Polar method on GPU between the 64 and 1024 bit versions. Maybe this can be tuned out by trying different kernel launch configurations (i.e. #blocks, #threads/block). I wanted to present the results before diving deeper.

…rate a normally distributed random number

dalg24-jenkins · 2023-10-31T03:00:03Z

Can one of the admins verify this patch?

fnrizzi · 2023-10-31T06:23:04Z

ok to test

fnrizzi · 2023-10-31T06:33:54Z

can you please also include the code you used to generate all the timings?

fnrizzi · 2023-10-31T13:12:10Z

format is wrong , you need to run the /scripts/apply-clang-format before pushing the repo

dalg24 · 2023-10-31T14:01:00Z

format is wrong , you need to run the /scripts/apply-clang-format before pushing the repo

diff --git a/algorithms/src/Kokkos_Random.hpp b/algorithms/src/Kokkos_Random.hpp
index 0057c79cb..0ee96098c 100644
--- a/algorithms/src/Kokkos_Random.hpp
+++ b/algorithms/src/Kokkos_Random.hpp
@@ -855,9 +855,9 @@ class Random_XorShift64 {
   double normal() {
     constexpr double M_PI2 = 2.0 * Kokkos::numbers::pi_v<double>;
 
-    double u = drand();
-    double v = drand();
-    double r = Kokkos::sqrt(-2.0 * Kokkos::log(u));
+    double u     = drand();
+    double v     = drand();
+    double r     = Kokkos::sqrt(-2.0 * Kokkos::log(u));
     double theta = v * M_PI2;
     return r * Kokkos::cos(theta);
   }
@@ -1099,9 +1099,9 @@ class Random_XorShift1024 {
   double normal() {
     constexpr double M_PI2 = 2.0 * Kokkos::numbers::pi_v<double>;
 
-    double u = drand();
-    double v = drand();
-    double r = Kokkos::sqrt(-2.0 * Kokkos::log(u));
+    double u     = drand();
+    double v     = drand();
+    double r     = Kokkos::sqrt(-2.0 * Kokkos::log(u));
     double theta = v * M_PI2;
     return r * Kokkos::cos(theta);
   }

Shihab-Shahriar · 2023-10-31T16:56:06Z

@fnrizzi , here is the code I used to benchmark. Please note that the random example isn't automatically built by kokkos even with Kokkos_ENABLE_EXAMPLES=ON, so I had to edit some CMakeLists.txt files. The CMake options I used are in the build-job.sh file at the root.

I just pushed the formatting-related changes.

algorithms/src/Kokkos_Random.hpp

fnrizzi

thank you for working on this!

Shihab-Shahriar · 2023-11-06T18:31:45Z

Thanks for your feedback and help.

Kokkos Random: Replace Marsaglia polar method with Box-muller to gene…

fef22f4

…rate a normally distributed random number

Apply clang-formatting

a2fcc61

dalg24 added Performance Code showing unusually slow performance for an architecture and/or backend Kokkos-Algorithms labels Oct 31, 2023

dalg24 approved these changes Nov 3, 2023

View reviewed changes

masterleinad approved these changes Nov 3, 2023

View reviewed changes

fnrizzi reviewed Nov 6, 2023

View reviewed changes

algorithms/src/Kokkos_Random.hpp Outdated Show resolved Hide resolved

fnrizzi reviewed Nov 6, 2023

View reviewed changes

algorithms/src/Kokkos_Random.hpp Outdated Show resolved Hide resolved

dalg24 reviewed Nov 6, 2023

View reviewed changes

algorithms/src/Kokkos_Random.hpp Outdated Show resolved Hide resolved

Shihab-Shahriar added 2 commits November 6, 2023 12:00

Add const qualifier to some internal variables

fbf9f25

Update Kokkos_Random.hpp

4042911

fnrizzi approved these changes Nov 6, 2023

View reviewed changes

dalg24 merged commit 0a83695 into kokkos:develop Nov 9, 2023
28 checks passed

dalg24 mentioned this pull request Nov 9, 2023

CHANGELOG: 4.3.0 #6519

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace Marsaglia polar method with Box-muller to generate a normally distributed random number #6556

Replace Marsaglia polar method with Box-muller to generate a normally distributed random number #6556

Shihab-Shahriar commented Oct 31, 2023 •

edited

dalg24-jenkins commented Oct 31, 2023

fnrizzi commented Oct 31, 2023

fnrizzi commented Oct 31, 2023

fnrizzi commented Oct 31, 2023 •

edited

dalg24 commented Oct 31, 2023

Shihab-Shahriar commented Oct 31, 2023 •

edited

fnrizzi left a comment

Shihab-Shahriar commented Nov 6, 2023

Replace Marsaglia polar method with Box-muller to generate a normally distributed random number #6556

Replace Marsaglia polar method with Box-muller to generate a normally distributed random number #6556

Conversation

Shihab-Shahriar commented Oct 31, 2023 • edited

Before

After

dalg24-jenkins commented Oct 31, 2023

fnrizzi commented Oct 31, 2023

fnrizzi commented Oct 31, 2023

fnrizzi commented Oct 31, 2023 • edited

dalg24 commented Oct 31, 2023

Shihab-Shahriar commented Oct 31, 2023 • edited

fnrizzi left a comment

Choose a reason for hiding this comment

Shihab-Shahriar commented Nov 6, 2023

Shihab-Shahriar commented Oct 31, 2023 •

edited

fnrizzi commented Oct 31, 2023 •

edited

Shihab-Shahriar commented Oct 31, 2023 •

edited