New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accelerating darch using MKL #18
Comments
Hello, as detailed here, MKL support is working on my test machine when gputools has been (left) disabled. darch uses R's default implementations for matrix multiplication in most cases, but some algorithms have been written in C++, which provides a speedup for single-core systems but may be a slowdown when using MKL, but definitely not to the degree that "automatic offloading does not happen". Maybe I should provide parameters to disable these C++ implementations. Please provide more details about the parameters and dataset used to run darch so that I may reproduce the MKL issue. What behavior do you see when using darch 0.10? |
@saviola777 Thanks for your quick reply. We are running darch 0.12.0 and the gputools are not installed. Session infoattached base packages: other attached packages: darsh DNN commnd
Input dataThe input data size is about 1,000,000 x 50 Is version 0.10 the non optimized version for single core? |
Thanks for the feedback. I think the C++ implementation of the unit functions (more specifically of the ELU) is to blame for the lack of multi-threading in this case. I will have to investigate how I can make use of multi-threading from within the C++ code, but I'm afraid that it's going to be non-trivial (also considering that I'm not very experienced when it comes to writing C++ code). Version 0.10 does not include the C++ optimizations, but it lacks many of the new features (e.g., it does not support ELU) and contains a number of bugs and problems which were fixed in 0.10. You can of course add your own unit functions dynamically in 0.10 if you want. There are two possible solutions to this problem:
I can't promise you an update with a fix on CRAN for a while, and I'm not sure when I'll get around to fix this, but I will try to implement the first solution within the next weeks so that you can check if the problem is solved by it. |
Just a couple of… weeks later, this should finally be fixed, I moved most C++ functions to RcppParallel, so you should see a significant speedup. |
Hi, my group wants to use darch for our DNN. The training took very long about 1.5 days for our use case, so we decided to speed it up by exploiting Intel MKL which automatically offloads some computations to our Xeon Phi coprocessors.
I have recompiled R using MKL and linked it to MKL's BLAS and LAPACK. MKL is able to offload computations to Xeon Phi for operations like matrix multiplication. However, MKL automatic offloading does not happen when darch is running. I was wondering if darch uses R's default BLSA or LAPACK (in this case, MKL BLAS and LAPACK), or its own implementation. If not, is the a way to explore MKL and Xeon Phi?
Thanks,
-- Lizhong
The text was updated successfully, but these errors were encountered: