Skip to content

Commit

Permalink
Implement torch.linalg.svd (#45562)
Browse files Browse the repository at this point in the history
Summary:
This is related to #42666 .
I am opening this PR to have the opportunity to discuss things.
First, we need to consider the differences between `torch.svd` and `numpy.linalg.svd`:

1. `torch.svd` takes `some=True`, while `numpy.linalg.svd` takes `full_matrices=True`, which is effectively the opposite (and with the opposite default, too!)

2. `torch.svd` returns `(U, S, V)`, while `numpy.linalg.svd` returns `(U, S, VT)` (i.e., V transposed).

3. `torch.svd` always returns a 3-tuple; `numpy.linalg.svd` returns only `S` in case `compute_uv==False`

4. `numpy.linalg.svd` also takes an optional `hermitian=False` argument.

I think that the plan is to eventually deprecate `torch.svd` in favor of `torch.linalg.svd`, so this PR does the following:

1. Rename/adapt the old `svd` C++ functions into `linalg_svd`: in particular, now `linalg_svd` takes `full_matrices` and returns `VT`

2. Re-implement the old C++ interface on top of the new (by negating `full_matrices` and transposing `VT`).

3. The C++ version of `linalg_svd` *always* returns a 3-tuple (we can't do anything else). So, there is a python wrapper which manually calls `torch._C._linalg.linalg_svd` to tweak the return value in case `compute_uv==False`.

Currently, `linalg_svd_backward` is broken because it has not been adapted yet after the `V ==> VT` change, but before continuing and spending more time on it I wanted to make sure that the general approach is fine.

Pull Request resolved: #45562

Reviewed By: H-Huang

Differential Revision: D25803557

Pulled By: mruberry

fbshipit-source-id: 4966f314a0ba2ee391bab5cda4563e16275ce91f
  • Loading branch information
antocuni authored and facebook-github-bot committed Jan 8, 2021
1 parent 006cfeb commit 5c5abd5
Show file tree
Hide file tree
Showing 14 changed files with 558 additions and 255 deletions.
61 changes: 54 additions & 7 deletions aten/src/ATen/native/BatchLinearAlgebra.cpp
Expand Up @@ -1411,7 +1411,7 @@ std::tuple<Tensor, Tensor, Tensor> _svd_helper_cpu(const Tensor& self, bool some

if (compute_uv) {
if (some) {
VT_working_copy = VT_working_copy.narrow(-1, 0, k);
VT_working_copy = VT_working_copy.narrow(-2, 0, k);
}
} else {
VT_working_copy.zero_();
Expand All @@ -1421,24 +1421,71 @@ std::tuple<Tensor, Tensor, Tensor> _svd_helper_cpu(const Tensor& self, bool some
U_working_copy.zero_();
VT_working_copy.zero_();
}
// so far we have computed VT, but torch.svd returns V instead. Adjust accordingly.
VT_working_copy.transpose_(-2, -1);
return std::make_tuple(U_working_copy, S_working_copy, VT_working_copy);
}

std::tuple<Tensor, Tensor, Tensor> svd(const Tensor& self, bool some, bool compute_uv) {
TORCH_CHECK(self.dim() >= 2,
"self should have at least 2 dimensions, but has ", self.dim(), " dimensions instead");
"svd input should have at least 2 dimensions, but has ", self.dim(), " dimensions instead");
return at::_svd_helper(self, some, compute_uv);
}

std::tuple<Tensor&, Tensor&, Tensor&> svd_out(Tensor& U, Tensor& S, Tensor& VT,
std::tuple<Tensor&, Tensor&, Tensor&> svd_out(Tensor& U, Tensor& S, Tensor& V,
const Tensor& self, bool some, bool compute_uv) {
TORCH_CHECK(self.dim() >= 2,
"self should have at least 2 dimensions, but has ", self.dim(), " dimensions instead");
Tensor U_tmp, S_tmp, VT_tmp;
std::tie(U_tmp, S_tmp, VT_tmp) = at::_svd_helper(self, some, compute_uv);
"svd input should have at least 2 dimensions, but has ", self.dim(), " dimensions instead");
Tensor U_tmp, S_tmp, V_tmp;
std::tie(U_tmp, S_tmp, V_tmp) = at::_svd_helper(self, some, compute_uv);
U.resize_as_(U_tmp).copy_(U_tmp);
S.resize_as_(S_tmp).copy_(S_tmp);
VT.resize_as_(VT_tmp).copy_(VT_tmp);
V.resize_as_(V_tmp).copy_(V_tmp);
return std::tuple<Tensor&, Tensor&, Tensor&>(U, S, V);
}

// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ linalg_svd ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

/* torch.linalg.svd, implemented in terms of torch.svd. There are two main
differences:
1. the 2nd parameter is bool some=True, which if effectively the opposite
of full_matrices=True
2. svd returns V, while linalg.svd returns VT. To accommodate the
difference, we transpose() V upon return
*/

std::tuple<Tensor, Tensor, Tensor> linalg_svd(const Tensor& self, bool full_matrices, bool compute_uv) {
TORCH_CHECK(self.dim() >= 2,
"svd input should have at least 2 dimensions, but has ", self.dim(), " dimensions instead");

bool some = !full_matrices;
Tensor U, S, V;
std::tie(U, S, V) = at::_svd_helper(self, some, compute_uv);
if (compute_uv) {
Tensor VT = V.transpose(-2, -1);
return std::make_tuple(U, S, VT);
} else {
Tensor empty_U = at::empty({0}, self.options());
Tensor empty_VT = at::empty({0}, self.options());
return std::make_tuple(empty_U, S, empty_VT);
}
}

static void svd_resize_and_copy(const char *name, const Tensor& src, Tensor &dst) {
TORCH_CHECK(src.device() == dst.device(), "svd output tensor ", name, " is on the wrong device: expected ", src.device(), " got ", dst.device());
at::native::resize_output(dst, src.sizes());
dst.copy_(src);
}

std::tuple<Tensor&, Tensor&, Tensor&> linalg_svd_out(Tensor& U, Tensor& S, Tensor& VT,
const Tensor& self, bool full_matrices, bool compute_uv) {
Tensor U_tmp, S_tmp, VT_tmp;
std::tie(U_tmp, S_tmp, VT_tmp) = at::linalg_svd(self, full_matrices, compute_uv);
svd_resize_and_copy("U", U_tmp, U);
svd_resize_and_copy("S", S_tmp, S);
svd_resize_and_copy("V", VT_tmp, VT);
return std::tuple<Tensor&, Tensor&, Tensor&>(U, S, VT);
}

Expand Down
9 changes: 6 additions & 3 deletions aten/src/ATen/native/LinearAlgebraUtils.h
Expand Up @@ -261,18 +261,21 @@ static inline std::tuple<Tensor, Tensor, Tensor> _create_U_S_VT(const Tensor& in
U_empty = at::empty_strided(sizes, strides, input.options().device(at::kCPU));
}

// VT should be a column-major or a batch of column-major matrices
sizes[input.dim() - 2] = n;
sizes[input.dim() - 1] = n;
// VT should be a row-major or a batch of row-major matrices
strides = at::detail::defaultStrides(sizes);
strides[input.dim() - 1] = n;
strides[input.dim() - 2] = 1;
Tensor VT_empty;
if (!input.is_cuda()) {
VT_empty = at::empty(sizes, input.options());
VT_empty = at::empty_strided(sizes, strides, input.options());
} else {
// NB: VT_empty is an empty tensor created on the CPU intentionally, because magma_(d/s)gesdd
// (which is the driver routine for the divide and conquer SVD operation)
// takes in arrays on the CPU as input. This routine is a hybrid CPU-GPU routine that
// moves the inputs between devices internally.
VT_empty = at::empty(sizes, input.options().device(at::kCPU));
VT_empty = at::empty_strided(sizes, strides, input.options().device(at::kCPU));
}

sizes.pop_back();
Expand Down
4 changes: 3 additions & 1 deletion aten/src/ATen/native/cuda/BatchLinearAlgebra.cu
Expand Up @@ -2194,7 +2194,7 @@ std::tuple<Tensor, Tensor, Tensor> _svd_helper_cuda(const Tensor& self, bool som

if (compute_uv) {
if (some) {
VT_working_copy = VT_working_copy.narrow(-1, 0, k);
VT_working_copy = VT_working_copy.narrow(-2, 0, k);
}
} else {
VT_working_copy.zero_();
Expand All @@ -2205,6 +2205,8 @@ std::tuple<Tensor, Tensor, Tensor> _svd_helper_cuda(const Tensor& self, bool som
S_working_copy = same_stride_to(S_working_copy, S_working_copy.options().device(self.device()));
VT_working_copy = same_stride_to(VT_working_copy, self.options()).zero_();
}
// so far we have computed VT, but torch.svd returns V instead. Adjust accordingly.
VT_working_copy.transpose_(-2, -1);
return std::make_tuple(U_working_copy, S_working_copy, VT_working_copy);
}

Expand Down
15 changes: 12 additions & 3 deletions aten/src/ATen/native/native_functions.yaml
Expand Up @@ -5820,14 +5820,14 @@
- func: svd.U(Tensor self, bool some=True, bool compute_uv=True, *, Tensor(a!) U, Tensor(b!) S, Tensor(c!) V) -> (Tensor(a!) U, Tensor(b!) S, Tensor(c!) V)
use_c10_dispatcher: hacky_wrapper_for_legacy_signatures
dispatch:
DefaultBackend: svd_out
Math: svd_out

- func: svd(Tensor self, bool some=True, bool compute_uv=True) -> (Tensor U, Tensor S, Tensor V)
variants: method, function
dispatch:
DefaultBackend: svd
Math: svd

- func: _svd_helper(Tensor self, bool some, bool compute_uv) -> (Tensor, Tensor, Tensor)
- func: _svd_helper(Tensor self, bool some, bool compute_uv) -> (Tensor U, Tensor S, Tensor V)
variants: function
dispatch:
CPU: _svd_helper_cpu
Expand Down Expand Up @@ -8962,6 +8962,15 @@
python_module: linalg
variants: function

- func: linalg_svd.U(Tensor self, bool full_matrices=True, bool compute_uv=True, *, Tensor(a!) U, Tensor(b!) S, Tensor(c!) V) -> (Tensor(a!) U, Tensor(b!) S, Tensor(c!) V)
use_c10_dispatcher: hacky_wrapper_for_legacy_signatures
python_module: linalg

- func: linalg_svd(Tensor self, bool full_matrices=True, bool compute_uv=True) -> (Tensor U, Tensor S, Tensor V)
python_module: linalg
use_c10_dispatcher: full
variants: function

- func: linalg_cond(Tensor self, Scalar? p=None) -> Tensor
python_module: linalg
variants: function
Expand Down
1 change: 1 addition & 0 deletions docs/source/linalg.rst
Expand Up @@ -19,6 +19,7 @@ Functions
.. autofunction:: eigvalsh
.. autofunction:: matrix_rank
.. autofunction:: norm
.. autofunction:: svd
.. autofunction:: solve
.. autofunction:: tensorinv
.. autofunction:: tensorsolve
Expand Down
Expand Up @@ -37,6 +37,7 @@
("aten::ifft", datetime.date(2021, 1, 31)),
("aten::irfft", datetime.date(2021, 1, 31)),
("aten::rfft", datetime.date(2021, 1, 31)),
("aten::_svd_helper", datetime.date(2021, 1, 31)),
("aten::_cudnn_rnn_flatten_weight", datetime.date(2020, 12, 31)),
("aten::_cudnn_rnn", datetime.date(2020, 12, 31)),
("aten::_cudnn_rnn_backward", datetime.date(2020, 12, 31)),
Expand Down

0 comments on commit 5c5abd5

Please sign in to comment.