## OpenMP benchmark for Rcpp based codes

Here I test if OpenMP helps with some of the computations.

In [1]:
attach(readRDS('em_optim_difference.rds'))

Here, sample size `N` is around 800, number of variables `P` is around 600. 50 conditions are involved.

In [2]:
X = cbind(X,X,X)

In [3]:
dim(X)

In [4]:
dim(Y)

In [5]:
devtools::load_all('~/GIT/software/mvsusieR')
omp_test = function(m, d, n_thread) {
    x = m$clone(deep=TRUE)
    x$set_thread(n_thread)
    x$fit(d)
    return(0)
}

Loading mvsusieR

Loading required package: mashr

Loading required package: ashr

Loading required package: susieR



I will benchmark it on my 40 CPU threads computer, using number of threads from 1 to 96.

## Center and scale the data

In [11]:
d = DenseData$new(X,Y)
d$standardize(T,T)
d$set_residual_variance(resid_Y)

mash_init = MashInitializer$new(list(diag(ncol(Y))), 1)
B = MashRegression$new(ncol(X), mash_init)

In [12]:
res = microbenchmark::microbenchmark(c1 = omp_test(B, d, 1),
c2 = omp_test(B, d, 2), c3 = omp_test(B, d, 3),
c4 = omp_test(B, d, 4), c8 = omp_test(B, d, 8),
c12 = omp_test(B, d, 12), c24 = omp_test(B, d, 24),
c40 = omp_test(B, d, 40), c96 = omp_test(B, d, 96),
times = 30
)

In [13]:
summary(res)[,c('expr', 'mean', 'median')]

expr,mean,median
<fct>,<dbl>,<dbl>
c1,161.0818,136.147
c2,170.6787,119.054
c3,175.371,110.2931
c4,135.8872,118.4377
c8,170.4492,125.5141
c12,151.2837,131.4356
c24,145.8516,124.3913
c40,224.2847,163.7604
c96,345.9077,335.4519


There is no advantage here, as expected, because when data is centered and scaled, the parallazation happens at mixture prior level. Since only one mixture component is used, there is nothing to parallel.

## Do not center and scale the data

This will be more computationally intensive than previous run, because `sbhat` here is different for every variable. But now the parallazation will happen at variable level.

In [6]:
d = DenseData$new(X,Y)
d$standardize(F,F)
d$set_residual_variance(resid_Y)

mash_init = MashInitializer$new(list(diag(ncol(Y))), 1)
B = MashRegression$new(ncol(X), mash_init)

In [16]:
res = microbenchmark::microbenchmark(c1 = omp_test(B, d, 1),
c2 = omp_test(B, d, 2), c3 = omp_test(B, d, 3),
c4 = omp_test(B, d, 4), c8 = omp_test(B, d, 8),
c12 = omp_test(B, d, 12), c24 = omp_test(B, d, 24),
c40 = omp_test(B, d, 40), c96 = omp_test(B, d, 96),
times = 30
)

In [17]:
summary(res)[,c('expr', 'mean', 'median')]

expr,mean,median
<fct>,<dbl>,<dbl>
c1,359.0996,320.164
c2,229.466,207.2559
c3,215.6167,180.4148
c4,219.5334,178.681
c8,171.594,146.5264
c12,175.7622,152.8917
c24,142.9345,125.4073
c40,168.9303,150.1708
c96,322.8361,305.4616


We see some advantage here using multiple threads. Performance keeps improving as number of threads increases, up to 40 threads (capacity of my computer). More threads asked beyond that point resulted in performance loss. It seems 4 threads strikes a good balance and reduce the compute time by more than half.

## Center and scale data but using mixture prior

Here since we are running a mixture prior, the advantage of parallazation should kick in because for common `sbhat` we parallel over prior mixture,

In [10]:
d = DenseData$new(X,Y)
d$standardize(T,T)
d$set_residual_variance(resid_Y)

mash_init = MashInitializer$new(create_cov_canonical(ncol(Y)), 1)
B = MashRegression$new(ncol(X), mash_init)

In [11]:
res = microbenchmark::microbenchmark(c1 = omp_test(B, d, 1),
c2 = omp_test(B, d, 2), c3 = omp_test(B, d, 3),
c4 = omp_test(B, d, 4), c8 = omp_test(B, d, 8),
c12 = omp_test(B, d, 12), c24 = omp_test(B, d, 24),
c40 = omp_test(B, d, 40), c96 = omp_test(B, d, 96),
times = 30
)

In [12]:
summary(res)[,c('expr', 'mean', 'median')]

expr,mean,median
<fct>,<dbl>,<dbl>
c1,489.7533,478.0427
c2,344.7106,323.2162
c3,300.3792,258.1757
c4,269.4045,244.0847
c8,242.0541,210.5421
c12,232.5791,215.5211
c24,246.1973,216.6343
c40,273.2946,244.1338
c96,533.4972,541.2539


We see that the advantage is obvious for using multiple threads for computation with mixture prior having a large number of components (this case is about 60 for canonical prior).