SVD operator #4416

williamberman · 2022-08-06T18:58:43Z

Semantics:

The SVD operator covers pytorch, numpy, and tensorflow’s SVD semantics.

Numpy and tensorflow use the same compute_uv flag for computing just the singular values. Pytorch uses two different operations, svd and svdvals.

Pytorch and numpy return the same conjugate transpose, Vh. Tensorflow returns V directly.

Tensorflow returns in the same order, S U Vh because S is the only non-optional return value. Pytorch and numpy return in the order of the factorization, U S Vh.

Derivative

"thin"/"partial" vs "full", computing only singular values vs the whole factorization, and real vs complex inputs all change the derivative, impacting both its value and its numerical stability.

There are different resources documenting the different derivative variants. The implementations in well known AD codebases (pytorch, tensorflow, and jax) all have slightly different implementations of the derivative.

I consolidated the documentation of the different cases and provided example implementations in this python notebook

Existing docs

Previous discussion on adding an SVD operator to ONNX

pytorch/pytorch#81084
#3839

Example models

An Analysis of SVD for Deep Rotation Estimation

SVD is used as a layer in a neural net for predicting rotation matrices. The layer is defined as $\mathrm{SVDO^+}(M) := U \Sigma ' V^\top$ where $\Sigma ' = diag(1, ..., 1, det(U V^\top))$ (See equation 2). There are two models, SVD-Train and SVD-Inference. SVD-Train uses $\mathrm{SVDO^+}$ as the final layer for both training and inference. SVD-Inference omits $\mathrm{SVDO^+}$ as the final layer in training but it is used as the final layer during inference (See section 4 methods).

The full network definition can be found on github. See regress_from_features for the pre-SVD layer definitions.

Training Deep Networks with Structured Layers by Matrix Backpropagation

The image recognition layer called second-order pooling computes $log(F F^\top + \epsilon I)$ where F is a matrix of image features. Given the SVD of F, the layer can be simplified so log is computed element wise over a diagonalized matrix. Given $F = U \Sigma V^\top$, the second-order pooling layer simplifies to $V log(Σ^\top Σ+ \epsilon I)V^\top $. See section 5.2

Improving training of deep neural networks via Singular Value Bounding and Orthogonal Deep Neural Networks

Training proceeds by standard SGD except that weight matrices are maintained as near orthogonal by bounding/clipping their singular values near 1. Weight matrix singular values are bounded within the range $[\frac{1}{1 + \epsilon},1 + \epsilon]$ every $T_{svb}$ iterations where $\epsilon$ and $T_{svb}$ are hyperparameters. See Algorithm 1.

SVD-Softmax: Fast Softmax Approximation on Large Vocabulary Neural Networks

SVD-Softmax is a fast approximation of softmax that can be used during inference. The decomposition of the softmax weight matrix, $A = U \Sigma V^\top$, is used to create the matrix $B = U \Sigma$. A subset W of the columns of B are used to estimate the result of the softmax where W is a hyperparameter. The complete softmax is computed for the top N approximations where N is a hyperparameter. See Algorithm 1.

SVD-Embedded Deep Autoencoder for MIMO Communications

This model embeds the SVD factorization of the channel matrix into the DAE.

The singular values of the channel matrix are used as inputs to create part of the feature vector, $v_\gamma$ (equation 4). $v_\gamma$ is concatenated with the bit input to create the complete input to the Transmitter DAE (section III.A.2). $v_\gamma$ is also concatenated with the output of the Receiver Pre-processor to create the input to the Receiver DAE (section III.F).

The Transmitter Precoding adds one layer of non-trainable weights composed of the right singular vectors of the channel matrix (section III.C).

The Receiver Pre-processing adds two layers of non-trainable weights. One is the left-singular vectors of the channel matrix. The other is the pseudo-inverse of the matrix containing the singular values as its diagonal (section III.E).

onnx/defs/math/defs.cc

williamberman · 2022-08-06T20:10:41Z

onnx/defs/math/defs.cc

+  // Copy over all dimensions but the last two
+  for (; dim_idx < A_dim_size - 2; ++dim_idx) {
+    const auto dim = A_shape.dim(dim_idx);
+
+    if (compute_uv) {
+      *U_shape->add_dim() = dim;
+      *Vh_shape->add_dim() = dim;
+    }
+
+    *S_shape->add_dim() = dim;
+  }


Confirming dimension ordering here is correct? I.e. dimension at index 0 is highest dimension and dimension at index A_shape.dim_size() - 1 is the lowest dimension.

onnx/defs/math/defs.cc

onnx/defs/schema.h

gramalingam · 2022-08-09T22:04:16Z

I realize SVD is quite well-known, but it would be useful to mention example models that use SVD, if you are aware of any, as motivation.

williamberman · 2022-08-09T23:27:11Z

I realize SVD is quite well-known, but it would be useful to mention example models that use SVD, if you are aware of any, as motivation.

I'm not aware of example models off the top of my head as I just picked it up by looking through the open issues, but I will do some digging :)

cc @coltonpeltierSE since you had the original open issue, do you have any pointers on example models we could include in the PR description?

p-wysocki · 2022-08-18T08:26:05Z

I found some models using SVD:

SVD-Embedded Deep Autoencoder for MIMO Communications mentions models with SVD
Cloud K-SVD for Image Denoising no much info and the paper is not available yet, but it looks like it uses SVD
Deep residual-SVD network for brain image registration the paper is available on request, the model described uses SVD as a denoising measure
Image Fusion-Based Watermarking in IWT-SVD Domain watermarking algorithm using SVD
A Simple Yet Effective SVD-GCN for Directed Graphs Graph Neural Network using SVD

williamberman · 2022-08-18T16:47:48Z

I found some models using SVD:

SVD-Embedded Deep Autoencoder for MIMO Communications mentions models with SVD

Cloud K-SVD for Image Denoising no much info and the paper is not available yet, but it looks like it uses SVD

Deep residual-SVD network for brain image registration the paper is available on request, the model described uses SVD as a denoising measure

Image Fusion-Based Watermarking in IWT-SVD Domain watermarking algorithm using SVD

A Simple Yet Effective SVD-GCN for Directed Graphs Graph Neural Network using SVD

Thank you!

coltonpeltier-db · 2022-09-02T18:22:03Z

@williamberman - Hi, I'm actually not at Schneider Electric anymore (hence the "SE" in my old name), I've switched to Databricks 🥳 .

So I didn't see you had tagged me in this. I can see @p-wysocki found some models which utilize SVD (thank you!).

I'm not aware of any other public models which utilize the SVD off the top of my head but internally to SE we were looking at using the SVD as part of the model to perform some de-noising on signals before classification.

williamberman · 2022-09-06T16:33:44Z

@williamberman - Hi, I'm actually not at Schneider Electric anymore (hence the "SE" in my old name), I've switched to Databricks 🥳 .

So I didn't see you had tagged me in this. I can see @p-wysocki found some models which utilize SVD (thank you!).

I'm not aware of any other public models which utilize the SVD off the top of my head but internally to SE we were looking at using the SVD as part of the model to perform some de-noising on signals before classification.

Thank you for the context @coltonpeltier-db! Makes a ton of sense

Signed-off-by: Will Berman <WLBberman@gmail.com>

onnx/backend/test/case/node/svd.py

onnx/defs/math/defs.cc

Signed-off-by: Will Berman <WLBberman@gmail.com>

Bug fix changing order of setting U's dimensions Check for input shapes being absent. Check for dimensions being present/concrete. Signed-off-by: Will Berman <WLBberman@gmail.com>

enuk1dze · 2022-09-19T16:43:01Z

@williamberman Thank you for your work! I'm having trouble with exporting my model because of SVD right now. I hope this PR will be done.

One comment on PR though. Shouldn't binary .pb files be in git-lfs of something?

For example, like that one onnx/backend/test/data/node/test_svd_3d_partial_compute_uv_manually_set/test_data_set_0/output_1.pb

williamberman · 2022-09-19T17:39:06Z

@williamberman Thank you for your work! I'm having trouble with exporting my model because of SVD right now. I hope this PR will be done.

One comment on PR though. Shouldn't binary .pb files be in git-lfs of something?

For example, like that one onnx/backend/test/data/node/test_svd_3d_partial_compute_uv_manually_set/test_data_set_0/output_1.pb

of course @enuk1dze , hope to have it merged soon!

re: git lfs, since the generated protobufs are pretty small, I would say it's ok to commit them into vanilla git as I think they already are. However, that's probably up to the onnx maintainers

yuanyao-nv · 2022-09-26T06:39:33Z

There are different flavors of numerical methods used in practice to compute SVD (direct vs iterative, deterministic vs stochastic) suited to matrices with different sizes and properties. Do we want to provide more info in the ONNX spec with respect to what method is used (and also the accuracy/tolerance in the case of iterative methods)?

williamberman · 2022-09-26T17:15:48Z

There are different flavors of numerical methods used in practice to compute SVD (direct vs iterative, deterministic vs stochastic) suited to matrices with different sizes and properties. Do we want to provide more info in the ONNX spec with respect to what method is used (and also the accuracy/tolerance in the case of iterative methods)?

@yuanyao-nv do you think docs on the method used to compute SVD might be more appropriate in the specific runtimes as opposed to the ONNX spec? If the spec mandates a particular accuracy/method which is not available on a particular platform, that might be an issue (I could be completely off base here).

A good compromise to assist runtime implementors might be "here are a set of ways to compute SVD which have X properties" in the PR description?

Regardless happy to provide the level of documentation appropriate for the PR/spec :)

gramalingam · 2022-09-28T00:21:18Z

There are different flavors of numerical methods used in practice to compute SVD (direct vs iterative, deterministic vs stochastic) suited to matrices with different sizes and properties. Do we want to provide more info in the ONNX spec with respect to what method is used (and also the accuracy/tolerance in the case of iterative methods)?

If they are going to produce different results, then the op should clarify which one is expected. What's the standard/default? I would assume that's what we want. If there is a demand for more than one of these, we may need an attribute to distinguish between them. But all of these should be driven by the use-cases (motivating examples/models).

yuanyao-nv · 2022-09-28T05:50:41Z

I think a good analogy here is with matrix diagonalization, which also has many different methods suited to different scenarios. We don't yet have a diagonalization operator in ONNX yet.

I agree with @gramalingam that it should be driven by use cases. Unfortunately. the pytorch and tensorflow and doc pages also have no mention of what method they use, but the fact that they produce all singular values suggest a direct method is used. A quick search in the literature seems to suggest the industry standard for small SVD problems is a two-phase method: first reduce to bidiagonal form, then to diagonal form, with small variations for both phases.

So far, I don't think ONNX has expanded into linear algebra computations - this would be a first. So I think it'd be worthwhile thinking more carefully about how (perhaps also if) we should handle such operations. And if SVD is included, it would seem incomplete not to include other ops, such as diagonalization, and various matrix decompositions.

williamberman · 2022-09-28T08:16:45Z

Unfortunately. the pytorch and tensorflow and doc pages also have no mention of what method they use, but the fact that they produce all singular values suggest a direct method is used. A quick search in the literature seems to suggest the industry standard for small SVD problems is a two-phase method: first reduce to bidiagonal form, then to diagonal form, with small variations for both phases.

Here's what I could pull out of the source and docs. Happy to do more digging to go into more detail if it's helpful :)

Tensorflow - CPU

TF CPU uses eigen and calls bidiagonal divide and conquer which internally falls back to jacobi method for matrices with less than 16 cols.

Tensorflow - GPU

TF GPU uses cuSOLVER and calls gesvdj (jacobi method) for batches of smaller than 32x32 matrices. Additionally, the matrices must be either square or be computing the full factorization for the jacobi method. See source for the full condition.

Otherwise, TF GPU uses gesvd which uses QR algorithm.

Pytorch - CPU

Pytorch CPU uses lapack and calls gesdd which uses divide and conquer.

Pytorch - GPU - MAGMA

When using magma, pytorch calls gesdd which uses divide and conquer.

Pytorch - GPU - cuSOLVER

When using cuSOLVER, pytorch tip actually lets you pass an option (driver) that lets you choose between gesvd (QR), gesvdj (jacobi), or gesvda (approximates decompositions of skinny matrices). If gesvdj or gesvda are chosen, the result is checked for convergence and it falls back to gesvd. The default behavior is to use gesvdj with the gesvd fallback.

See the doc string in source.

The behavior in the current stable release is to use gesvdj and fallback to gesvd without the driver option.

I agree with @gramalingam that it should be driven by use cases.

Unfortunately none of the example models that I found gave specifics on hard requirements they needed out of the SVD implementation they used. Deep rotation estimation released their model in both pytorch and tensorflow if that's a relevant datapoint.

williamberman · 2022-12-26T06:54:54Z

Closing this PR as it seems svd and potentially related op types are outside of the current scope of the onnx spec :)

justinchuby · 2023-04-05T13:26:00Z

Is this a good time to reopen this?

xadupre · 2023-04-05T13:28:42Z

Sure.

williamberman commented Aug 6, 2022

View reviewed changes

onnx/defs/math/defs.cc Show resolved Hide resolved

williamberman commented Aug 6, 2022

View reviewed changes

onnx/defs/math/defs.cc Outdated Show resolved Hide resolved

williamberman mentioned this pull request Aug 6, 2022

Singular Value Decomposition (SVD) #3839

Open

williamberman force-pushed the will/svd-op branch from f85db99 to 77fd7a6 Compare August 6, 2022 20:23

xadupre reviewed Aug 6, 2022

View reviewed changes

onnx/defs/math/defs.cc Outdated Show resolved Hide resolved

xadupre reviewed Aug 6, 2022

View reviewed changes

onnx/defs/schema.h Outdated Show resolved Hide resolved

williamberman force-pushed the will/svd-op branch from 60528b9 to bd92d78 Compare August 18, 2022 05:57

williamberman changed the title ~~SVD and SVDVals ops~~ SVD op Aug 18, 2022

williamberman force-pushed the will/svd-op branch 2 times, most recently from 45849cb to 2abf95d Compare September 9, 2022 23:08

williamberman changed the title ~~SVD op~~ SVD operator Sep 10, 2022

williamberman marked this pull request as ready for review September 13, 2022 00:01

williamberman requested review from a team as code owners September 13, 2022 00:01

SVD op

aa12140

Signed-off-by: Will Berman <WLBberman@gmail.com>

williamberman force-pushed the will/svd-op branch from 2abf95d to aa12140 Compare September 13, 2022 00:09

williamberman requested review from wschin, xadupre and gramalingam and removed request for wschin and xadupre September 13, 2022 00:29

xadupre reviewed Sep 13, 2022

View reviewed changes

onnx/backend/test/case/node/svd.py Outdated Show resolved Hide resolved

xadupre reviewed Sep 13, 2022

View reviewed changes

onnx/defs/math/defs.cc Show resolved Hide resolved

use same dict for node and numpy function

a9254f0

Signed-off-by: Will Berman <WLBberman@gmail.com>

williamberman force-pushed the will/svd-op branch from 8d8253e to a9254f0 Compare September 13, 2022 17:35

SVD shape inference fixes.

bbed9b2

Bug fix changing order of setting U's dimensions Check for input shapes being absent. Check for dimensions being present/concrete. Signed-off-by: Will Berman <WLBberman@gmail.com>

xadupre approved these changes Sep 19, 2022

View reviewed changes

titaiwangms mentioned this pull request Sep 23, 2022

[ONNX: feature request]: support for linalg_pinv/ svd pytorch/pytorch#85371

Closed

williamberman closed this Dec 26, 2022

gramalingam mentioned this pull request Jul 7, 2023

Linear Algebra Operators onnx/sigs#169

Open

gramalingam mentioned this pull request Jan 30, 2024

add pnp domain with one op: LinalgSVD #5821

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SVD operator #4416

SVD operator #4416

williamberman commented Aug 6, 2022 •

edited

williamberman Aug 6, 2022 •

edited

gramalingam commented Aug 9, 2022

williamberman commented Aug 9, 2022

p-wysocki commented Aug 18, 2022

williamberman commented Aug 18, 2022

coltonpeltier-db commented Sep 2, 2022

williamberman commented Sep 6, 2022

enuk1dze commented Sep 19, 2022

williamberman commented Sep 19, 2022

yuanyao-nv commented Sep 26, 2022

williamberman commented Sep 26, 2022 •

edited

gramalingam commented Sep 28, 2022

yuanyao-nv commented Sep 28, 2022

williamberman commented Sep 28, 2022 •

edited

williamberman commented Dec 26, 2022

justinchuby commented Apr 5, 2023

xadupre commented Apr 5, 2023

SVD operator #4416

SVD operator #4416

Conversation

williamberman commented Aug 6, 2022 • edited

Semantics:

Derivative

Existing docs

Previous discussion on adding an SVD operator to ONNX

Example models

Improving training of deep neural networks via Singular Value Bounding and Orthogonal Deep Neural Networks

SVD-Softmax: Fast Softmax Approximation on Large Vocabulary Neural Networks

SVD-Embedded Deep Autoencoder for MIMO Communications

williamberman Aug 6, 2022 • edited

Choose a reason for hiding this comment

gramalingam commented Aug 9, 2022

williamberman commented Aug 9, 2022

p-wysocki commented Aug 18, 2022

williamberman commented Aug 18, 2022

coltonpeltier-db commented Sep 2, 2022

williamberman commented Sep 6, 2022

enuk1dze commented Sep 19, 2022

williamberman commented Sep 19, 2022

yuanyao-nv commented Sep 26, 2022

williamberman commented Sep 26, 2022 • edited

gramalingam commented Sep 28, 2022

yuanyao-nv commented Sep 28, 2022

williamberman commented Sep 28, 2022 • edited

Tensorflow - CPU

Tensorflow - GPU

Pytorch - CPU

Pytorch - GPU - MAGMA

Pytorch - GPU - cuSOLVER

williamberman commented Dec 26, 2022

justinchuby commented Apr 5, 2023

xadupre commented Apr 5, 2023

williamberman commented Aug 6, 2022 •

edited

williamberman Aug 6, 2022 •

edited

williamberman commented Sep 26, 2022 •

edited

williamberman commented Sep 28, 2022 •

edited