Implementing future.apply::future_*apply for massively large model via remote parallelisation #136

seonghobae · 2018-02-20T12:15:04Z

Implementing the "future_*apply" API for massively large model via remote parallelisation.

Purpose: make the faster model calibration, and evaluation for some functions what using the myLapply() and mySapply() internally especially model is too massive. (myApply() will update soon.)
The itemfit(), mdirt(), DIF(), DTF(), M2(), PLCI.mirt(), lagrange(), boot.LR(), and https://github.com/philchalmers/mirt/blob/master/R/03-estimation.R#L729-L740 may run faster or work memory efficient when working with the massively big model with remote clusters.
If future::future() supports the MPI interface in the someday (see How to implement resolved() for an MPI-based cluster? HenrikBengtsson/future#130), The mirt() may run on the supercomputer cluster on apply() related functions.
Speed depends on Network bandwidth, but future() will be run in the multiprocess manner within each remote workers, detecting the available number of cores automatically for the heterogeneous cluster.
This pull request may be useful to researchers who are using Virtual Private Servers (VPS) to calibrate models on Amazon Web Services, DigitalOcean, Vultr, and so on.

Demo run

# local (One core, Xeon E5-2660, using MKL)
> system.time(mod_local <- mirt::mirt(mirt::Science, 4, SE = T, method = 'MHRM'))
Stage 3 = 311, LL = -3333.5, AR(1.10) = [0.21], gam = 0.0024, Max-Change = 0.0005

Calculating information matrix...

Calculating log-likelihood...
 User  System elapsed 
 32.897   0.318  26.639 

# local (4 cores, Xeon E5-2660, using MKL)
> suppressWarnings(suppressMessages(mirt::mirtCluster()))
> system.time(mod_local_parallel <- mirt::mirt(mirt::Science, 4, SE = T, method = 'MHRM'))
Stage 3 = 311, LL = -3333.5, AR(1.10) = [0.21], gam = 0.0024, Max-Change = 0.0005

Calculating information matrix...

Calculating log-likelihood...
 User  System elapsed 
 26.404   0.310  22.438 
> suppressWarnings(mirt::mirtCluster(remove = TRUE))

# rely on the heterogenious remotes via SSH, localhost will not use memory in log-likelihood calculation.
> getOption('kaefaServers')
 [1] "mpiuser@s1"  "mpiuser@s2"  "mpiuser@s3"  "mpiuser@s4"  "mpiuser@s5"  "mpiuser@s6"  "mpiuser@s7" 
 [8] "mpiuser@s8"  "mpiuser@s9"  "mpiuser@s10" "mpiuser@s11" "mpiuser@b1"  "mpiuser@b2"  "mpiuser@b3" 
[15] "mpiuser@b4" 
> suppressWarnings(suppressMessages(mirt::mirtCluster(getOption('kaefaServers')))) # using parallel::par*apply
> system.time(mod_remote_parallel_traditional <- mirt::mirt(mirt::Science, 4, SE = T, method = 'MHRM'))
Stage 3 = 311, LL = -3333.5, AR(1.10) = [0.21], gam = 0.0024, Max-Change = 0.0005

Calculating information matrix...

Calculating log-likelihood...
 User  System elapsed 
 26.801   0.996  51.924 
> suppressWarnings(mirt::mirtCluster(remove = TRUE))
> suppressWarnings(suppressMessages(mirt::mirtCluster(getOption('kaefaServers'), use_future = TRUE))) # using future API
> system.time(mod_remote_parallel_futureapi <- mirt::mirt(mirt::Science, 4, SE = T, method = 'MHRM'))
Stage 3 = 311, LL = -3333.5, AR(1.10) = [0.21], gam = 0.0024, Max-Change = 0.0005

Calculating information matrix...

Calculating log-likelihood...
 User  System elapsed 
 27.305   0.888  44.218

The demo seems slower in remotes (than local parallelism) in 100Mbps bandwidth, but the future API was increasing calibration speeds than parallel about 7.706 seconds. This difference occurs from multiprocess strategy in each connection by automatic detection of the number of cores. If expanding bandwidth, speed will increase more. (e.g. Infiniband: https://en.wikipedia.org/wiki/InfiniBand)
If I can implement future_apply() someday, remote calibration works will faster than now.

Please check my request feel free.

Best,
Seongho

…ply for massively large model via remote parallelisation

philchalmers · 2018-02-21T14:43:55Z

I'll consider this, but the case to make the merge doesn't seem that great in that it's almost overkill. The parallel processing may work better over the future framework, but I don't really see the benefit over the current parallel package scheme, even with physical Infiniband support. I think a case needs to be made that it will actually improve the performance, otherwise maintaining such a codebase isn't really in my interest.

seonghobae · 2018-02-21T16:30:05Z

Yes, This commit may head up to overkill the speed up. Implementing after future_*apply functions, I may commit parallelised C++ codes (RcppParallel) and GPU matrix calculations (gpuR) in the parallelised cluster, the heterogeneous computing environment. This commit is just starting point with the performance improvements. I understand the calibration speed issue around mirt() a long time. I'll keep updating this commits steady. I want to listen to your opinion.

Future Plan

Parallelising the calculations among inter computers using future API with the automatic load balancing. (Current)
Parallelising the C++ codes within the machine for base R environment, where Intel MKL is not available.
Apply the GPU matrix estimations in some part if GPU detected.

※ I'm a user of https://www.top500.org/system/177987 now.

seonghobae · 2018-02-21T16:40:01Z

I expect the future_*apply will make a way to parallelise the E step and M step inter- and intra-machines with refactoring codes. That's why I commit the future_*apply first with head up.

(This work may require modifying some calibration part codes even the gpuR is the wrapper of OpenCL support in R environment!)

seonghobae · 2018-02-21T17:18:48Z

I want to implement OpenCL support via future_*apply: less effort but maximising efficiency and speed. So I made a placeholder for OpenCL support via mirtCluster().

The Short documentation of OpenCL support in R:
https://secure.hosting.vt.edu/www.arc.vt.edu/wp-content/uploads/2017/04/GPU_R_Workshop_2017_slidy.html
https://rpubs.com/christoph_euler/gpuR_examples
http://bobby.gramacy.com/teaching/asc/gpu_tutorial.html

philchalmers · 2018-02-21T18:08:41Z

Parallelizing the E and M-steps would take a fairly large amount of code rewriting. As it stands, the E-step could be rewritten partially to use OpenMP, but it's computations are not strictly parallel (at least not embarrassingly) because the expectation table must be shared across nodes, causing write-permission conflicts. I experimented with this idea a while back, but gave up on it when the performance was not working due to all the #pragma flags. The M-step in its current form also could never be run in parallel, nor could the derivative computations required, simply because R's object types are not typesafe.

These are, what some would consider, the major limitation of mirt, which I will never change. If I get sabbatical one year, maybe a performance inspired mirt2 could be created instead....but it couldn't have all the features that are only possible because of the R interpretive interface.

(Lists can parallelise without future_*apply where the for loop)

seonghobae · 2018-02-22T01:58:51Z

Yes, I saw the parts what you noted. I agree with you. In the current, it seems too hard to improve C++ level with refactoring. I already know #pragma may not useful in some situations and will not work well. Also outside of the C++ parts, some function requires sequential derivative computations in for() loops when I check. However, If I can implement future API during this work, I found I may parallelise some for() loops where R codes. Moreover, the GPU wrapper of solve(), crossprod(), and so on may help to improve calibration speed instantly. (I know that is not the C++ level so that may not be helpful to Windows users.) I'm reading code more and then will update.

seonghobae · 2018-02-22T02:19:31Z

How about implementing this in the C++ level? http://viennacl.sourceforge.net/viennacl-about.html
The R wrapper function of ViennaCL is available.

philchalmers · 2018-03-09T03:03:49Z

I'm going to close this for now as I just don't see the benefit to it in this package. Something like this should be applied to a fork of mirt though, or better yet a complete re-write of the package intended for performance (mirt2 maybe?)

seonghobae added 2 commits February 20, 2018 20:18

Implementing future.apply::future_lapply and future.apply::future_sap…

2fcdc97

…ply for massively large model via remote parallelisation

moving "Suggests: to "Imports:"

f552954

seonghobae added 3 commits February 22, 2018 01:50

add the GPU wrapper using the gpuR library

8e4f96a

add FIXME and placeholder for OpenCL support

fa867d3

(This work may require modifying some calibration part codes even the gpuR is the wrapper of OpenCL support in R environment!)

simplify the gpuR checking

40ef69e

seonghobae added 2 commits February 22, 2018 03:23

add placeholder of solve() as GPU support

31f0981

some parallelisation considerations applying future API

5c5690e

(Lists can parallelise without future_*apply where the for loop)

philchalmers closed this Mar 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing future.apply::future_*apply for massively large model via remote parallelisation #136

Implementing future.apply::future_*apply for massively large model via remote parallelisation #136

seonghobae commented Feb 20, 2018

philchalmers commented Feb 21, 2018

seonghobae commented Feb 21, 2018

seonghobae commented Feb 21, 2018

seonghobae commented Feb 21, 2018 •

edited

philchalmers commented Feb 21, 2018

seonghobae commented Feb 22, 2018

seonghobae commented Feb 22, 2018

philchalmers commented Mar 9, 2018

Implementing future.apply::future_*apply for massively large model via remote parallelisation #136

Implementing future.apply::future_*apply for massively large model via remote parallelisation #136

Conversation

seonghobae commented Feb 20, 2018

Implementing the "future_*apply" API for massively large model via remote parallelisation.

Demo run

philchalmers commented Feb 21, 2018

seonghobae commented Feb 21, 2018

seonghobae commented Feb 21, 2018

seonghobae commented Feb 21, 2018 • edited

philchalmers commented Feb 21, 2018

seonghobae commented Feb 22, 2018

seonghobae commented Feb 22, 2018

philchalmers commented Mar 9, 2018

seonghobae commented Feb 21, 2018 •

edited