Implement Inverse(12) for CPU and CUDA #3485

yuslepukhin · 2020-04-10T19:52:01Z

Description:
Matrix Inverse (or batch)

hariharans29 · 2020-04-10T20:31:46Z

onnxruntime/core/providers/cuda/math/inverse.cc

+                                  cudaMemcpyDeviceToHost));
+  for (auto i = 0; i < num_batches; ++i) {
+    if (info_cpu[i] != 0) {
+      return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "Matrix is singular at batch:", i);


Curious: For the CPU version, Does the Eigen implementation provide such a friendly message stating that the matrix is singular ? #Resolved

No, it does not. I would have to do Full Pivoting LU decomposition to provide such checks. And the standard provides flexibility in checking. Ke suggested I try a different approach for CUDA so the code may still change.

In reply to: 406930453 [](ancestors = 406930453)

hariharans29 · 2020-04-10T20:34:53Z

onnxruntime/core/providers/cpu/math/inverse.cc

+
+  int64_t num_batches = 1;
+  const int64_t rows = input_shape.GetDims()[num_dim - 2];
+  const int64_t cols = input_shape.GetDims()[num_dim - 1];


Do we need a check to enforce rows == cols (just like in the CUDA kernel) ? #Pending

Yes, it is on my list.

In reply to: 406931953 [](ancestors = 406931953)

hariharans29 · 2020-04-10T21:01:12Z

onnxruntime/core/providers/cuda/math/inverse.cc

+      IAllocatorUniquePtr<double*> matrix_ptrs = inst->GetScratchBuffer<double*>(n_batches);
+      ORT_RETURN_IF_ERROR(ComputeMatrixOffsets<double>(input_workspace.get(), num_batches, rows, matrix_ptrs));
+      // Do LU factorization
+      CUBLAS_RETURN_IF_ERROR(cublasDgetrfBatched(cublas_h, dim, matrix_ptrs.get(), dim, pivots.get(), info.get(), n_batches));


Why not use this - https://docs.nvidia.com/cuda/cublas/index.html#cublas-lt-t-gt-matinvbatched ? It seems to basically do getrfBatched + getriBatched (provided rows and cols are < 32) ? #Resolved

I think this is exactly what is implemented.
Ke also suggested to check https://docs.nvidia.com/cuda/cusolver/#cuds-intro

In reply to: 406942817 [](ancestors = 406942817)

It looks like to use cuSolver one must generate an I matrix on a device for every invocation and that requires writing a kernel which we are trying to avoid. Also TF and PyTorch are using cuBlas so this impl should be good enough.

In reply to: 406945634 [](ancestors = 406945634,406942817)

hariharans29 · 2020-04-10T21:06:56Z

onnxruntime/core/providers/cuda/math/inverse.cc

+      ORT_RETURN_IF_ERROR(ComputeMatrixOffsets<double>(input_workspace.get(), num_batches, rows, matrix_ptrs));
+      // Do LU factorization
+      CUBLAS_RETURN_IF_ERROR(cublasDgetrfBatched(cublas_h, dim, matrix_ptrs.get(), dim, pivots.get(), info.get(), n_batches));
+      ORT_RETURN_IF_ERROR(CheckForSingularity(info, info_cpu, num_batches));


There is some external discussion as to whether this approach is performant if there was a single large matrix to be inverted (https://stackoverflow.com/questions/37731103/cublas-matrix-inverse-much-slower-than-matlab). Basically, the approach you are taking is conducive for inverting a batch of smaller matrices. Quoting cuBlas documentation - "This function is intended to be used for matrices of small sizes where the launch overhead is a significant factor." Maybe there must be a plan to deal with a single large matrix. #Pending

This means it is optimized for small matrix overhead as well as solving large matrices. This is what my research says.

In reply to: 406944729 [](ancestors = 406944729)

hariharans29 · 2020-04-13T23:03:39Z

onnxruntime/core/providers/cpu/math/inverse.cc

+
+    Eigen::Map<const MatrixT<Eigen::half>> input_matrix(input_data, rows, cols);
+    Eigen::Map<MatrixT<Eigen::half>> output_matrix(output_data, rows, cols);
+    output_matrix = input_matrix.inverse();


Does Eigen have a limit on the size of the matrix's rows (and cols) by any chance ? #Resolved

This indicates (https://stackoverflow.com/questions/17430644/what-is-the-maximum-size-of-matrix-in-eigen) that it is bound by memory available on the system.

In reply to: 407768947 [](ancestors = 407768947)

skottmckay · 2020-04-14T07:13:31Z

onnxruntime/core/providers/cpu/math/inverse.cc

+
+template <typename T>
+using MatrixT = Eigen::Matrix<T, Eigen::Dynamic, Eigen::Dynamic, Eigen::RowMajor>;
+


FWIW there's ConstEigenMatrixMapRowMajor and EigenMatrixMapRowMajor in math_cpuonly.h

skottmckay · 2020-04-14T07:14:29Z

onnxruntime/test/providers/cpu/math/inverse_test.cc

+
+namespace onnxruntime {
+namespace test {
+


Needs some tests where there are batches, preferably including where multiple dimensions provide the number of batches (e.g. rank 4 input).

ke1337 · 2020-04-17T03:26:01Z

class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 6, int8_t, Neg);

seems no change in this file?

Refers to: onnxruntime/core/providers/cpu/cpu_execution_provider.cc:72 in 1ed1955. [](commit_id = 1ed1955, deletion_comment = False)

ke1337

yuslepukhin added 7 commits April 6, 2020 17:14

Inverse history begins.

61f24d4

Merge branch 'master' into yuslepukhin/inverse_12

e163d6b

Merge branch 'master' into yuslepukhin/inverse_12

40e3d5f

Implement Inverse(12) on CPU.

d813f80

Add CUDA implementation for Inverse. TODO: Test.

b18218c

Finish CUDA implementation.

8ad2ec7

Merge branch 'master' into yuslepukhin/inverse_12

3dc62b5

yuslepukhin requested review from hariharans29, ke1337 and HectorSVC April 10, 2020 19:52

yuslepukhin requested a review from a team as a code owner April 10, 2020 19:52

hariharans29 reviewed Apr 10, 2020

View reviewed changes

hariharans29 reviewed Apr 13, 2020

View reviewed changes

skottmckay reviewed Apr 14, 2020

View reviewed changes

yuslepukhin added 8 commits April 14, 2020 15:36

Merge branch 'master' into yuslepukhin/inverse_12

5b83619

Merge branch 'master' into yuslepukhin/inverse_12

104eb94

Address compiler issues.

8191c50

Add batch test.

75caa76

Merge branch 'master' into yuslepukhin/inverse_12

2e4522d

Move CPU Inverse to contrib_ops/CPU

a7ce9dc

Move CUDA op along with the test to contrib ops.

a5db9a2

Remove stray header.

1ed1955

ke1337 previously approved these changes Apr 17, 2020

View reviewed changes

yuslepukhin added 2 commits April 17, 2020 10:51

Merge branch 'master' into yuslepukhin/inverse_12

62c6b1f

Merge branch 'master' into yuslepukhin/inverse_12

da257b4

Merge branch 'master' into yuslepukhin/inverse_12

36d21de

yuslepukhin dismissed ke1337’s stale review via 36d21de April 18, 2020 15:47

snnn approved these changes Apr 18, 2020

View reviewed changes

snnn merged commit db9566f into master Apr 19, 2020

snnn deleted the yuslepukhin/inverse_12 branch April 19, 2020 00:10

RandySheriffH mentioned this pull request Apr 24, 2020

opset 12 support onnx/tensorflow-onnx#897

Merged

justinchuby mentioned this pull request Sep 6, 2023

Exporting the operator 'aten::linalg_inv' to ONNX opset version 18 is not supported. pytorch/pytorch#107948

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Inverse(12) for CPU and CUDA #3485

Implement Inverse(12) for CPU and CUDA #3485

yuslepukhin commented Apr 10, 2020

hariharans29 Apr 10, 2020 •

edited by yuslepukhin

Loading

yuslepukhin Apr 10, 2020

hariharans29 Apr 10, 2020 •

edited by yuslepukhin

Loading

yuslepukhin Apr 10, 2020

hariharans29 Apr 10, 2020 •

edited by yuslepukhin

Loading

yuslepukhin Apr 10, 2020 •

edited

Loading

yuslepukhin Apr 16, 2020

hariharans29 Apr 10, 2020 •

edited by yuslepukhin

Loading

yuslepukhin Apr 10, 2020

hariharans29 Apr 13, 2020 •

edited by yuslepukhin

Loading

yuslepukhin Apr 13, 2020

skottmckay Apr 14, 2020

skottmckay Apr 14, 2020

ke1337 commented Apr 17, 2020

ke1337 left a comment


		template <typename T>
		using MatrixT = Eigen::Matrix<T, Eigen::Dynamic, Eigen::Dynamic, Eigen::RowMajor>;


		namespace onnxruntime {
		namespace test {

Implement Inverse(12) for CPU and CUDA #3485

Implement Inverse(12) for CPU and CUDA #3485

Conversation

yuslepukhin commented Apr 10, 2020

hariharans29 Apr 10, 2020 • edited by yuslepukhin Loading

Choose a reason for hiding this comment

yuslepukhin Apr 10, 2020

Choose a reason for hiding this comment

hariharans29 Apr 10, 2020 • edited by yuslepukhin Loading

Choose a reason for hiding this comment

yuslepukhin Apr 10, 2020

Choose a reason for hiding this comment

hariharans29 Apr 10, 2020 • edited by yuslepukhin Loading

Choose a reason for hiding this comment

yuslepukhin Apr 10, 2020 • edited Loading

Choose a reason for hiding this comment

yuslepukhin Apr 16, 2020

Choose a reason for hiding this comment

hariharans29 Apr 10, 2020 • edited by yuslepukhin Loading

Choose a reason for hiding this comment

yuslepukhin Apr 10, 2020

Choose a reason for hiding this comment

hariharans29 Apr 13, 2020 • edited by yuslepukhin Loading

Choose a reason for hiding this comment

yuslepukhin Apr 13, 2020

Choose a reason for hiding this comment

skottmckay Apr 14, 2020

Choose a reason for hiding this comment

skottmckay Apr 14, 2020

Choose a reason for hiding this comment

ke1337 commented Apr 17, 2020

ke1337 left a comment

Choose a reason for hiding this comment

hariharans29 Apr 10, 2020 •

edited by yuslepukhin

Loading

hariharans29 Apr 10, 2020 •

edited by yuslepukhin

Loading

hariharans29 Apr 10, 2020 •

edited by yuslepukhin

Loading

yuslepukhin Apr 10, 2020 •

edited

Loading

hariharans29 Apr 10, 2020 •

edited by yuslepukhin

Loading

hariharans29 Apr 13, 2020 •

edited by yuslepukhin

Loading