-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OpenMP support #189
Comments
I experimented with this a bit: library(unmarked)
library(microbenchmark)
n <- 10000 # number of sites
T <- 4 # number of primary periods
J <- 3 # number of secondary periods
lam <- 3
phi <- 0.5
p <- 0.3
y <- array(NA, c(n, T, J))
M <- rpois(n, lam) # Local population size
N <- matrix(NA, n, T) # Individuals available for detection
for(i in 1:n) {
N[i,] <- rbinom(T, M[i], phi)
y[i,,1] <- rbinom(T, N[i,], p) # Observe some
Nleft1 <- N[i,] - y[i,,1] # Remove them
y[i,,2] <- rbinom(T, Nleft1, p) # ...
Nleft2 <- Nleft1 - y[i,,2]
y[i,,3] <- rbinom(T, Nleft2, p)
}
y.ijt <- cbind(y[,1,], y[,2,], y[,3,], y[,4,])
umf1 <- unmarkedFrameGMM(y=y.ijt, numPrimary=T, type="removal")
microbenchmark(
m1 = gmultmix(~1,~1,~1, data=umf1, K=30, nthreads=1),
m2 = gmultmix(~1,~1,~1, data=umf1, K=30, nthreads=2),
m4 = gmultmix(~1,~1,~1, data=umf1, K=30, nthreads=4),
times=5) Unit: seconds
expr min lq mean median uq max neval cld
m1 9.099484 9.111422 9.372926 9.418311 9.614503 9.620908 5 b
m2 5.414858 5.548022 5.629184 5.631519 5.680126 5.871395 5 a
m4 5.349892 5.388611 5.446901 5.482736 5.503174 5.510094 5 a Looks like there definitely is a performance improvement. However 2 threads and 4 threads were the same for me. My laptop has 2 cores and 2 threads per core so might be related to that? My CPU usage did spike to 100% for the 4-thread test. I had to change the indexing around a bit for |
Sounds promising. You may have had some other programs using up memory on the other two threads. What OS were you using? I've had great luck lately with OpenMP on Ubuntu. BTW1: https://pages.tacc.utexas.edu/~eijkhout/pcse/html/omp-basics.html BTW2: I'm thinking |
I'm also on Ubuntu. I have a pretty limited understanding of cores, threads, and hyperthreads but it seems like going beyond 2 threads on my CPU gets into hyperthreading, and you aren't expected to see much performance boost from that. I have access to another linux machine with many more cores, so I'll compile it on that and see if I can get performance to scale more. Looks like you can enable openMP conditionally pretty easily: #pragma omp parallel for if(condition_holds)
for(...) {
} The other issue is if this code would compile without issues on CRAN/Windows/Mac. For example Dirk warns about this in section 2.10.3 here: I agree this is best kept in a branch for the time being. Maybe it could be worked on together with modernizing the interface with Rcpp that I mentioned a while back. |
On a machine with 4 cores and 2 threads per core, testing 1, 2, 4, and 8 threads: Unit: milliseconds
expr min lq mean median uq max neval
m1 637.3505 637.6977 639.8118 640.2461 640.2783 643.4865 5
m2 367.2343 369.1187 370.4815 369.8476 370.8641 375.3429 5
m4 231.8589 233.7292 237.0996 236.9478 239.3805 243.5815 5
m8 218.4185 219.7414 231.8048 220.5617 221.1067 279.1956 5 Continued speed-up with more cores, and as before, going beyond the number of cores doesn't speed things up as much. |
OK, great. I think there are some other issues that we have to be careful
about too. For example, I don't think you can use Rcout or other functions
for sending messages or exceptions back to R while in parallel. The bigger
issue is that you aren't supposed to call the R API at all in parallel,
which would mean that density functions (eg, R::dpois) wouldn't be allowed.
From
https://cran.r-project.org/doc/manuals/r-devel/R-exts.html#OpenMP-support:
Calling any of the R API from threaded code is ‘for experts only’ and
strongly discouraged. Many functions in the R API modify internal R data
structures and might corrupt these data structures if called simultaneously
from multiple threads. Most R API functions can signal errors, which must
only happen on the R main thread. Also, external libraries (e.g. LAPACK)
may not be thread-safe.
If that turns out to be a problem, we could possibly replace R:: functions
with stats:: functions from this thread-safe header-only library:
https://github.com/kthohr/stats
…On Tue, Jul 21, 2020 at 5:30 PM Ken Kellner ***@***.***> wrote:
On a machine with 4 cores and 2 threads per core, testing 1, 2, 4, and 8
threads:
Unit: milliseconds
expr min lq mean median uq max neval
m1 637.3505 637.6977 639.8118 640.2461 640.2783 643.4865 5
m2 367.2343 369.1187 370.4815 369.8476 370.8641 375.3429 5
m4 231.8589 233.7292 237.0996 236.9478 239.3805 243.5815 5
m8 218.4185 219.7414 231.8048 220.5617 221.1067 279.1956 5
Continued speed-up with more cores, and as before, going beyond the number
of cores doesn't speed things up as much.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#189 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABWRCUQN2QLO4MXOXS2LI3R4YCGTANCNFSM4PDRMCAA>
.
|
I'm working on this here. For example, code for pcount. Even after reading the R-ext guidelines it's not totally clear to me if the I looked at the source code for a few of the distribution functions and I think they're relatively thread-safe. Technically, On the other hand, the random number generators in the R API are definitely NOT thread safe from what I've seen, which makes sense. |
The One thing I wonder about is the importance of setting
but I can't say for sure if this is necessary. More info here and here. |
Setting I'm not sure how to properly set Regardless of where we land on the above, I wonder if we should just force-set |
Good points. Let's just use default affinity for now and force threads=1 in
parboot.
…On Mon, Jul 27, 2020, 11:11 AM Ken Kellner ***@***.***> wrote:
Setting OMP_PROC_BIND="spread" has a pretty big negative impact on
performance for me when I set threads to greater than the number of cores.
I'm setting it in the pragma statement with proc_bind(spread) (see here
<https://www.openmp.org/spec-html/5.0/openmpse14.html>).
I'm not sure how to properly set OMP_PLACES variable in Rcpp, seems like
it is complicated
<https://stackoverflow.com/questions/62856347/set-omp-places-options-in-rcpp-with-openmp>
?
Regardless of where we land on the above, I wonder if we should just
force-set threads=1 for any parallel operations in unmarked, eg in
parboot. Intuitively to me it doesn't seem like using more threads in the
likelihood calculation is going to help if multiple models are already
being run in parallel.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#189 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABWRCV2KTGJQK7FXSWMBLTR5WKK7ANCNFSM4PDRMCAA>
.
|
Also worth considering: Armadillo uses openMP automatically for some computations when it's available (see http://arma.sourceforge.net/faq.html). I've not been able to determine exactly what conditions trigger this, but it appears to happen consistently for |
Interesting. The docs make it sounds like matrix multiplication and most
element-wise operations will be done in parallel by default, so perhaps we
don't need to do anything on our end.
…On Mon, Sep 14, 2020 at 10:36 AM Ken Kellner ***@***.***> wrote:
Also worth considering: Armadillo uses openMP automatically for some
computations when it's available (see http://arma.sourceforge.net/faq.html).
I've not been able to determine exactly what conditions trigger this, but
it appears to happen consistently for occuMulti. If you try to layer
additional parallelization on top of this, either by using openMP in C++
code or by using parallel in R, you get much slower runtimes then if you
avoided trying to run in parallel in the first place.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#189 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABWRCSPXXDWX732J25HZ4LSFYS75ANCNFSM4PDRMCAA>
.
|
Should be easy to add OpenMP statements like:
before
for loops
computing negative log-likelihood in C++ code. Some other clauses likeshared
might be necessary. Would require anthreads
argument for each fitting function.OpenMP should result in huge speed improvements on Linux. My understanding is that support for OpenMP is limited, but getting better, on Windows and iOS.
The text was updated successfully, but these errors were encountered: