New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Matrix Algebra and Operators #916

Merged
merged 46 commits into from Jul 24, 2018

Conversation

Projects
None yet
7 participants
@SteveBronder
Contributor

SteveBronder commented Jun 24, 2018

Submission Checklist

  • Run unit tests: ./runTests.py test/unit
  • Run cpplint: make cpplint
  • Declare copyright holder and open-source license: see below

Summary:

Adds basic matrix algrebra functions and operators for matrix_gpu

  1. add()
  2. copy_submatrix()
  3. copy_triangular()
  4. copy_triangular_transposed()
  5. identity()
  6. subtract()
  7. transpose()
  8. zeros()

How to Verify:

echo STAN_OPENCL=true>> make/local
echo OPENCL_PLATFORM_ID=0>> make/local
echo OPENCL_DEVICE_ID=0>> make/local
make test/unit/math/gpu/opencl_add_test && ./test/unit/math/gpu/opencl_add_test
make test/unit/math/gpu/opencl_subtract_test && ./test/unit/math/gpu/opencl_subtract_test
make test/unit/math/gpu/opencl_copy_submatrix_test && ./test/unit/math/gpu/opencl_copy_submatrix_test
make test/unit/math/gpu/opencl_copy_triangular_test && ./test/unit/math/gpu/opencl_copy_triangular_test
make test/unit/math/gpu/opencl_copy_triangular_transpose_test && ./test/unit/math/gpu/opencl_copy_triangular_transpose_test
make test/unit/math/gpu/opencl_identity_test && ./test/unit/math/gpu/opencl_identity_test
make test/unit/math/gpu/opencl_transpose_test && ./test/unit/math/gpu/opencl_transpose_test
make test/unit/math/gpu/opencl_zeros_test && ./test/unit/math/gpu/opencl_zeros_test

or

echo STAN_OPENCL=true>> make/local
echo OPENCL_PLATFORM_ID=0>> make/local
echo OPENCL_DEVICE_ID=0>> make/local
./runTests.py ./test/unit/math/ -f opencl

Documentation:

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company):

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

rok-cesnovar and others added some commits Jun 3, 2018

Merge branch 'gpu_checks' into gpu_basic_matrix_algebra
# Conflicts:
#	stan/math/gpu/opencl_context.hpp
@SteveBronder

This comment has been minimized.

Contributor

SteveBronder commented Jun 25, 2018

I only modified docs so this should be passing. @syclik ready for review

Not sure if we should discuss here or on discourse, but it looks like we should start breaking up the files in the gpu folder into subfolders. Like memory, prim, arr, err. Not sure if that can happen here or should happen in a separate PR

@SteveBronder SteveBronder changed the title from [WIP] GPU Matrix Algebra and Operators to GPU Matrix Algebra and Operators Jun 26, 2018

@SteveBronder SteveBronder requested a review from syclik Jun 26, 2018

@SteveBronder SteveBronder self-assigned this Jun 26, 2018

@bgoodri

This comment has been minimized.

Contributor

bgoodri commented Jul 5, 2018

I am getting C++ exception with description "opencl_context: clGetPlatformIDs CL_PLATFORM_NOT_FOUND_KHR: Unknown error -1001" thrown in the test body. from the tests.

@SteveBronder

This comment has been minimized.

Contributor

SteveBronder commented Jul 5, 2018

Are you running this from the linux box, mac box, or your computer?

@bgoodri

This comment has been minimized.

Contributor

bgoodri commented Jul 5, 2018

I get a bit farther with my laptop. There are a bunch of statements that say Beignet: "unable to find good values for local_work_size[i], please provide\n" " local_work_size[] explicitly, you can find good values with\n" " trial-and-error method." although they do not seem to harm anything. A bunch of tests do pass, but they eventually fail with

./test/unit/math/gpu/opencl_check_test --gtest_output="xml:./test/unit/math/gpu/opencl_check_test.xml"
Running main() from gtest_main.cc
[==========] Running 6 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 6 tests from ErrorHandlingScalarGPU
[ RUN      ] ErrorHandlingScalarGPU.check_nan_Matrix
test/unit/math/gpu/opencl_check_test.cpp:22: Failure
Expected: check_nan(function, "xx", xx) doesn't throw an exception.
  Actual: it throws.
check_nan should be true with finite xx
[  FAILED  ] ErrorHandlingScalarGPU.check_nan_Matrix (75 ms)
[ RUN      ] ErrorHandlingScalarGPU.check_nan_Matrix_quit_nan
test/unit/math/gpu/opencl_check_test.cpp:34: Failure
Expected: check_nan(function, "xx", xx) throws an exception of type std::domain_error.
  Actual: it throws a different type.
check_nan should throw exception on NaN
[  FAILED  ] ErrorHandlingScalarGPU.check_nan_Matrix_quit_nan (2 ms)
[ RUN      ] ErrorHandlingScalarGPU.check_nan_positions
test/unit/math/gpu/opencl_check_test.cpp:45: Failure
Expected: check_nan(function, "xx_mat1", xx_mat1) throws an exception of type std::domain_error.
  Actual: it throws a different type.
test/unit/math/gpu/opencl_check_test.cpp:49: Failure
Expected: check_nan(function, "xx_mat2", xx_mat2) throws an exception of type std::domain_error.
  Actual: it throws a different type.
test/unit/math/gpu/opencl_check_test.cpp:53: Failure
Expected: check_nan(function, "xx_mat3", xx_mat3) throws an exception of type std::domain_error.
  Actual: it throws a different type.
[  FAILED  ] ErrorHandlingScalarGPU.check_nan_positions (6 ms)
[ RUN      ] ErrorHandlingScalarGPU.check_rv_v_symmetric_gpu
[       OK ] ErrorHandlingScalarGPU.check_rv_v_symmetric_gpu (0 ms)
[ RUN      ] ErrorHandlingScalarGPU.check_m_symmetric
test/unit/math/gpu/opencl_check_test.cpp:81: Failure
Expected: check_symmetric(function, "m_non_symm", mm_fail) throws an exception of type std::domain_error.
  Actual: it throws a different type.
test/unit/math/gpu/opencl_check_test.cpp:82: Failure
Expected: check_symmetric(function, "m_symm_mat1", mm_ok) doesn't throw an exception.
  Actual: it throws.
[  FAILED  ] ErrorHandlingScalarGPU.check_m_symmetric (3 ms)
[ RUN      ] ErrorHandlingScalarGPU.check_m_diagonal_zeros
test/unit/math/gpu/opencl_check_test.cpp:94: Failure
Expected: check_diagonal_zeros(function, "mm_ok", mm_ok) doesn't throw an exception.
  Actual: it throws.
test/unit/math/gpu/opencl_check_test.cpp:96: Failure
Expected: check_diagonal_zeros(function, "mm_fail", mm_fail) throws an exception of type std::domain_error.
  Actual: it throws a different type.
[  FAILED  ] ErrorHandlingScalarGPU.check_m_diagonal_zeros (4 ms)
[----------] 6 tests from ErrorHandlingScalarGPU (90 ms total)

[----------] Global test environment tear-down
[==========] 6 tests from 1 test case ran. (90 ms total)
[  PASSED  ] 1 test.
[  FAILED  ] 5 tests, listed below:
[  FAILED  ] ErrorHandlingScalarGPU.check_nan_Matrix
[  FAILED  ] ErrorHandlingScalarGPU.check_nan_Matrix_quit_nan
[  FAILED  ] ErrorHandlingScalarGPU.check_nan_positions
[  FAILED  ] ErrorHandlingScalarGPU.check_m_symmetric
[  FAILED  ] ErrorHandlingScalarGPU.check_m_diagonal_zeros

 5 FAILED TESTS
@bgoodri

This comment has been minimized.

Contributor

bgoodri commented Jul 5, 2018

@SteveBronder

This comment has been minimized.

Contributor

SteveBronder commented Jul 5, 2018

Two things

  1. I've never heard of Beignet but it sounds like some sort of openCL platform. When you run clinfo how many platforms does it list?

  2. @demsarjure has been updating the OpenCL installation wiki on the fork which should help more with setup.

https://github.com/bstatcomp/math/wiki/OpenCL-GPU-support

@SteveBronder

This comment has been minimized.

Contributor

SteveBronder commented Jul 6, 2018

The one in my office that we got working for the previous PR.

I think we have to use platform ID one for your office? Can you post a gist with your clinfo?

SteveBronder added some commits Jul 6, 2018

rok-cesnovar and others added some commits Jul 12, 2018

New functions:
  - set_kernel_args is a helper function that passes in kernel arguments
   using variadic templating.
  - sub_block places a submatrix of a source matrix into another matrix.
    The subset always starts at position (0,0) in the destination matrix.

set_kernel_args added to the gpu code.
Fix arguments of sub block so any section of the source block can go …
…in any source of the destination block
@SteveBronder

This comment has been minimized.

Contributor

SteveBronder commented Jul 13, 2018

I changed copy_triangular() so that it uses the Upper, Lower Rok made in matrix_gpu.

All the functions we place into matrix_gpu directly change the values of the matrix in place. (that's also why copy_triangular was kept out of matrix_gpu.) I like this style because I think it makes it clear that a matrices values are being manipulated directly.

// Place a subset of Boo into Foo
Foo.sub_block(Boo, 1, 1, 0, 0, 3, 3)

// make a copy of Foo's lower triangular
Goo = copy_triangular<stan::math::Lower>(Foo);

Should the definitions of Lower, Upper, etc. go in a more general file than the matrix_gpu one?

@syclik

Almost there. A few minor changes.

I think the big one is: what is set_kernel_args?

* input matrices do not have matching dimensions
*
*/
inline matrix_gpu add(matrix_gpu& A, matrix_gpu& B) {

This comment has been minimized.

@syclik

syclik Jul 16, 2018

Member

can both of these arguments be const? We should try to have const correctness whenever possible.

This comment has been minimized.

@rok-cesnovar

rok-cesnovar Jul 16, 2018

Collaborator

You are right, these can be const. Fixed.

*
*/
inline matrix_gpu add(matrix_gpu& A, matrix_gpu& B) {
check_matching_dims("add (GPU)", "A", A, "B", B);

This comment has been minimized.

@syclik

syclik Jul 16, 2018

Member

Could we change the first argument to just "add"? It'll match line 39.

This comment has been minimized.

@rok-cesnovar

rok-cesnovar Jul 16, 2018

Collaborator

Done.

*
*/
template <int triangular_map>
inline matrix_gpu copy_triangular(matrix_gpu& src) {

This comment has been minimized.

@syclik

syclik Jul 16, 2018

Member

This should also be const matrix_gpu& for the argument. I double-checked that the copy constructor does make a copy.

This comment has been minimized.

@rok-cesnovar

rok-cesnovar Jul 16, 2018

Collaborator

Fixed.

matrix_gpu dst(src);
return dst;
}
if (src.size() == 1) {

This comment has been minimized.

@syclik

syclik Jul 16, 2018

Member

Can we consolidate this with the if statement right before? Something like if (src.size() == 0 || src.size() == 1)?

This comment has been minimized.

@rok-cesnovar

rok-cesnovar Jul 16, 2018

Collaborator

Fixed.

* @param[in] B The matrix to copy the triangular from.
* @param rows The number of rows of B.
* @param cols The number of cols of B.
* @param lower_upper enum to describe

This comment has been minimized.

@syclik

syclik Jul 16, 2018

Member

Can this comment be updated? This isn't actually an enum -- it's just an unsigned int. If anything, perhaps describe which value of lower_upper corresponds to UPPER and which corresponds to LOWER so the reader doesn't need to dig through the code to figure it out?

This comment has been minimized.

@syclik

syclik Jul 16, 2018

Member

Sort of like the other kernel -- copy_triangular_transposed_matrix_kernel.cl

This comment has been minimized.

@rok-cesnovar

rok-cesnovar Jul 16, 2018

Collaborator

Added values and removed the mentioned of "enum".

* input matrices do not have matching dimensions.
*
*/
inline matrix_gpu subtract(matrix_gpu& A, matrix_gpu& B) {

This comment has been minimized.

@syclik

syclik Jul 16, 2018

Member

Add const to both A and B.

This comment has been minimized.

@rok-cesnovar

rok-cesnovar Jul 16, 2018

Collaborator

Fixed.

* @return transposed input matrix
*
*/
inline matrix_gpu transpose(matrix_gpu& src) {

This comment has been minimized.

@syclik

syclik Jul 16, 2018

Member

add const

This comment has been minimized.

@rok-cesnovar

rok-cesnovar Jul 16, 2018

Collaborator

fixed

stan::math::matrix_gpu d11(d1);
stan::math::matrix_gpu d22(d2);
stan::math::matrix_gpu d33(3, 1);
EXPECT_NO_THROW(d33 = stan::math::add(d11, d22));

This comment has been minimized.

@syclik

syclik Jul 16, 2018

Member

There are no tests making sure the addition is valid. Should we be checking for that since there are no direct tests of the kernel?

This comment has been minimized.

@rok-cesnovar

rok-cesnovar Jul 16, 2018

Collaborator

I added a test with vector, row vector and matrix additions.

EXPECT_NO_THROW(stan::math::copy_triangular<stan::math::Upper>(m1));
EXPECT_NO_THROW(stan::math::copy_triangular<stan::math::Lower>(m1));
}

This comment has been minimized.

@syclik

syclik Jul 16, 2018

Member

Can we test the behavior of this triangular copy when it's not square?

This comment has been minimized.

@rok-cesnovar

rok-cesnovar Jul 16, 2018

Collaborator

I added 2 matrices with different sizes (1 with more rows, 1 with more cols) and two tests for both cases.

#include <gtest/gtest.h>
TEST(MathMatrixGPU, subtract_v_exception_pass) {
stan::math::vector_d d1, d2;

This comment has been minimized.

@syclik

syclik Jul 16, 2018

Member

check values?

This comment has been minimized.

@rok-cesnovar

rok-cesnovar Jul 16, 2018

Collaborator

Added.

@rok-cesnovar

This comment has been minimized.

Collaborator

rok-cesnovar commented Jul 16, 2018

Thanks @syclik!

I have responded to all the comments that are fixed. The set_kernel_args I am not really sure about and will let @SteveBronder anwser. Other than that, only open comment is the non-use of check_square.

@SteveBronder

This comment has been minimized.

Contributor

SteveBronder commented Jul 16, 2018

I think the big one is: what is set_kernel_args?

Apologies, forgot to write docs for this. set_kernel_args is a variadic template'd function that helps remove some boiler plate code.

When we pass the kernel arguments instead of writing

kernel.setArg(0, boo)
kernel.setArg(1, moo)
kernel.setArg(2, foo)
kernel.setArg(3, doo)

we can write

set_kernel_args(kernel, boo, moo, foo, doo)

SteveBronder and others added some commits Jul 16, 2018

@syclik

Minor change: please move the constants to stan:::math::gpu namespace. I don't think we want to clutter the stan::math namespace with names that are so generic. And please add the doc that the constants need to match the kernel code.

Once that's in, I can review quickly.

@@ -16,7 +17,12 @@
*/
namespace stan {
namespace math {
const int Lower = 0;

This comment has been minimized.

@syclik

syclik Jul 23, 2018

Member

Sorry, I didn’t catch this before. It seems really odd having consts at the namespace level that’s so generic.

Can we put these inside a gpu namespace for now? I think that’d help with a few things: where to find the constants, that it’s meant for gpu, and it won’t put something like Lower in the stan::math namespace.

This comment has been minimized.

@syclik

syclik Jul 23, 2018

Member

Can you also put in a comment here and in the kernel that these must match? What I'm afraid of is that someone see this and thinks "oh, I can change this" without realizing that it's actually duplicated in the kernel.

Alternatively, is there a different way to do it so the constants are shared in some way?

This comment has been minimized.

@rok-cesnovar

rok-cesnovar Jul 23, 2018

Collaborator

I changed this so that the the constants for the kernels are set as flags at compile time (see: 0a2df85#diff-23051186f20ae937b29d69ce3b413239R238). This way if anyone changes the const this is changed in both places.

Due to this I had to move the constants to the opencl_context.hpp file. Is that ok? If not, I can go switch back and just add the docs as you suggested.

I also left the #ifndefs in the kernels (0a2df85#diff-ce4c3c3e53a52143616423ff7eb6c82aR10) if anyone wants to just use the kernel.

This comment has been minimized.

@syclik

syclik Jul 23, 2018

Member

Awesome! I definitely wouldn't have even envisioned a fix that like! Thanks... I really think that helps from someone accidentally breaking everything.

* stan::math::Upper
*
*/
template <int triangular_view>

This comment has been minimized.

@syclik

syclik Jul 23, 2018

Member

Question:

Would it make sense to have a default for the template parameter? Perhaps stan::math::Entire?

This comment has been minimized.

@rok-cesnovar

rok-cesnovar Jul 23, 2018

Collaborator

I think that makes perfect sense. I added gpu::Entire here.

* @note Comes from:
* simpleopencl.blogspot.com/2013/04/calling-kernels-with-large-number-of.html
*/
template <typename... Args>

This comment has been minimized.

@syclik

syclik Jul 23, 2018

Member

Awesome. Thanks.

@syclik

syclik approved these changes Jul 23, 2018

@SteveBronder SteveBronder merged commit adaebaf into stan-dev:develop Jul 24, 2018

2 checks passed

continuous-integration/jenkins/pr-merge This commit looks good
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@rok-cesnovar rok-cesnovar referenced this pull request Aug 15, 2018

Merged

Change how users access OpenCL kernels #966

6 of 6 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment