New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented torch.linalg.multi_dot #51807
Implemented torch.linalg.multi_dot #51807
Conversation
[ghstack-poisoned]
ghstack-source-id: c3e939b78950eaca6376c8675a263a6f78d1d0fd Pull Request resolved: #51807
💊 CI failures summary and remediationsAs of commit 905ce1b (more details on the Dr. CI page):
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
This might be logically implied and seems like a reasonable limitation, but I don't think the docs actually say these must be 1 or 2D? https://numpy.org/doc/stable/reference/generated/numpy.linalg.multi_dot.html As you mentioned offline, PyTorch's dot (https://pytorch.org/docs/master/generated/torch.dot.html?highlight=dot#torch.dot) is distinct from NumPy's dot (https://numpy.org/doc/stable/reference/generated/numpy.dot.html). In particular, PyTorch's dot wouldn't support the multi_dot use case. Let's be sure to be clear about that. |
@@ -431,6 +431,45 @@ | |||
[1, 2, 2, 2]]) | |||
""") | |||
|
|||
multi_dot = _add_docstr(_linalg.linalg_multi_dot, r""" | |||
linalg.multi_dot(tensors, *, out=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewer's note: the use of "tensors" here is consistent with the language in other functions that accept a list of tensors, like torch.cat (https://pytorch.org/docs/master/generated/torch.cat.html?highlight=cat#torch.cat)
mypy issues aside this looks pretty good to me, I made a few comments. @IvanYashchuk, would you like to take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy with the fixes in the implementation and that there is no code duplication with chain_matmul now. The only thing we need to remember is to loosen the restriction for the "out= variant" on dtypes.
TORCH_CHECK( | ||
dtype == out.dtype(), | ||
"multi_dot(): expected out tensor to have dtype ", | ||
dtype, | ||
" but got ", | ||
out.dtype()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the follow-up PR, we should remember to change this requirement to c10::canCast
instead.
Differential Revision: [D26375734](https://our.internmc.facebook.com/intern/diff/D26375734) Implemented torch.linalg.multi_dot similar to [numpy.linalg.multi_dot](https://numpy.org/doc/stable/reference/generated/numpy.linalg.multi_dot.html). This function does not support broadcasting or batched inputs at the moment. **NOTE** numpy.linalg.multi_dot allows the first and last tensors to have more than 2 dimensions whereas here we only allows the them to be either 1D or 2D. **BENCHMARK** In the following benchmarks the labels are the sizes ``k0 x k1 x k2 x k3`` and the matrices being multiplied have shapes ``(k0, k1) x (k1, k2) x (k2, k3) x (k3, k0)``. Baseline is just multiplying the tensors left to right in PyTorch. _cpu results_ ``` [------------------------------- -------------------------------] | Baseline | PyTorch | NumPy 1 threads: ------------------------------------------------------- 725 x 257 x 151 x 49 | 1500 | 900 | 1040 81 x 1772 x 83 x 37 | 430 | 320 | 390 3 x 3022 x 78 x 1598 | 110 | 120 | 200 1082 x 2 x 78 x 5 | 322 | 400 | 630 7 x 35 x 4077 x 63 | 140 | 140 | 200 190 x 56 x 8984 x 2 | 2800 | 200 | 230 79 x 311 x 22 x 500 | 161 | 60 | 100 2247 x 2 x 8050 x 74 | 89100 | 3190 | 2760 11 x 123 x 2 x 201 | 11 | 10 | 50 54 x 1057 x 38 x 3838 | 600 | 330 | 400 8 threads: ------------------------------------------------------- 725 x 257 x 151 x 49 | 280 | 190 | 500 81 x 1772 x 83 x 37 | 100 | 90 | 200 3 x 3022 x 78 x 1598 | 600 | 600 | 600 1082 x 2 x 78 x 5 | 100 | 80 | 910 7 x 35 x 4077 x 63 | 130 | 100 | 160 190 x 56 x 8984 x 2 | 900 | 99 | 150 79 x 311 x 22 x 500 | 88 | 68 | 110 2247 x 2 x 8050 x 74 | 18000 | 1000 | 3000 11 x 123 x 2 x 201 | 11 | 11 | 45 54 x 1057 x 38 x 3838 | 200 | 110 | 150 Times are in microseconds (us). ``` _cuda results_ ``` [-------------------------- --------------------------] | Baseline | PyTorch 1 threads: --------------------------------------------- 725 x 257 x 151 x 49 | 62 | 40 81 x 1772 x 83 x 37 | 59 | 61 3 x 3022 x 78 x 1598 | 42 | 59 1082 x 2 x 78 x 5 | 36 | 181 7 x 35 x 4077 x 63 | 111 | 101 190 x 56 x 8984 x 2 | 112 | 60 79 x 311 x 22 x 500 | 34 | 35 2247 x 2 x 8050 x 74 | 910 | 175 11 x 123 x 2 x 201 | 34 | 38 54 x 1057 x 38 x 3838 | 139 | 133 Times are in microseconds (us). ``` _script_ ```python from torch.utils import benchmark from torch.utils.benchmark import Fuzzer, FuzzedParameter, FuzzedTensor fuzzer = Fuzzer( parameters=[ FuzzedParameter('k0', minval=1, maxval=10000, distribution='loguniform'), FuzzedParameter('k1', minval=1, maxval=10000, distribution='loguniform'), FuzzedParameter('k2', minval=1, maxval=10000, distribution='loguniform'), FuzzedParameter('k3', minval=1, maxval=10000, distribution='loguniform'), ], tensors=[ FuzzedTensor('a', size=('k0', 'k1'), min_elements=128, max_elements=1000000, probability_contiguous=0.6), FuzzedTensor('b', size=('k1', 'k2'), min_elements=128, max_elements=1000000, probability_contiguous=0.6), FuzzedTensor('c', size=('k2', 'k3'), min_elements=128, max_elements=1000000, probability_contiguous=0.6), FuzzedTensor('d', size=('k3', 'k0'), min_elements=128, max_elements=1000000, probability_contiguous=0.6), ], seed=0, ) results = [] for tensors, tensor_params, params in fuzzer.take(10): # description is the column label sub_label = f"{params['k0']:<4} x {params['k1']:<4} x {params['k2']:<4} x {params['k3']:<4}" for num_threads in (1, 8): results.append(benchmark.Timer( stmt='a @ b @ c @ d', globals=tensors, sub_label=sub_label, description='Baseline', num_threads=num_threads, ).blocked_autorange(min_run_time=1)) results.append(benchmark.Timer( stmt='torch.linalg.multi_dot((a, b, c, d))', globals=tensors, sub_label=sub_label, description='PyTorch', num_threads=num_threads, ).blocked_autorange(min_run_time=1)) results.append(benchmark.Timer( stmt='np.linalg.multi_dot((a, b, c, d))', setup='import numpy as np', globals=tensors, sub_label=sub_label, description='NumPy', num_threads=num_threads, ).blocked_autorange(min_run_time=1)) compare = benchmark.Compare(results) compare.trim_significant_figures() compare.print() results = [] for tensors, tensor_params, params in fuzzer.take(10): sub_label = f"{params['k0']:<4} x {params['k1']:<4} x {params['k2']:<4} x {params['k3']:<4}" tensors = {k: tensors[k].cuda() for k in tensors.keys()} results.append(benchmark.Timer( stmt='a @ b @ c @ d', globals=tensors, sub_label=sub_label, description='Baseline', ).blocked_autorange(min_run_time=1)) results.append(benchmark.Timer( stmt='torch.linalg.multi_dot((a, b, c, d))', globals=tensors, sub_label=sub_label, description='PyTorch', ).blocked_autorange(min_run_time=1)) compare = benchmark.Compare(results) compare.trim_significant_figures() compare.print() ``` **TODO** - [x] Benchmark against NumPy - [x] Add OpInfo testing - [x] Remove unnecessary copy for out= argument [ghstack-poisoned]
ghstack-source-id: eeb51368c1261412489cb828d4b0259eaeadf6a9 Pull Request resolved: #51807
@heitorschueroff merged this pull request in 0396f49. |
This pull request has been reverted by 92a4ee1. |
Summary: Pull Request resolved: pytorch#51807 Implemented torch.linalg.multi_dot similar to [numpy.linalg.multi_dot](https://numpy.org/doc/stable/reference/generated/numpy.linalg.multi_dot.html). This function does not support broadcasting or batched inputs at the moment. **NOTE** numpy.linalg.multi_dot allows the first and last tensors to have more than 2 dimensions despite their docs stating these must be either 1D or 2D. This PR diverges from NumPy in that it enforces this restriction. **TODO** - [ ] Benchmark against NumPy - [x] Add OpInfo testing - [x] Remove unnecessary copy for out= argument Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D26375734 Pulled By: heitorschueroff fbshipit-source-id: 839642692424c4b1783606c76dd5b34455368f0b
Summary: Pull Request resolved: pytorch#51807 Implemented torch.linalg.multi_dot similar to [numpy.linalg.multi_dot](https://numpy.org/doc/stable/reference/generated/numpy.linalg.multi_dot.html). This function does not support broadcasting or batched inputs at the moment. **NOTE** numpy.linalg.multi_dot allows the first and last tensors to have more than 2 dimensions despite their docs stating these must be either 1D or 2D. This PR diverges from NumPy in that it enforces this restriction. **TODO** - [ ] Benchmark against NumPy - [x] Add OpInfo testing - [x] Remove unnecessary copy for out= argument Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D26375734 Pulled By: heitorschueroff fbshipit-source-id: 839642692424c4b1783606c76dd5b34455368f0b
Stack from ghstack:
Differential Revision: D26375734
Implemented torch.linalg.multi_dot similar to numpy.linalg.multi_dot.
This function does not support broadcasting or batched inputs at the moment.
NOTE
numpy.linalg.multi_dot allows the first and last tensors to have more than 2 dimensions whereas here we only allows the them to be either 1D or 2D.
BENCHMARK
In the following benchmarks the labels are the sizes
k0 x k1 x k2 x k3
and the matrices being multiplied have shapes(k0, k1) x (k1, k2) x (k2, k3) x (k3, k0)
. Baseline is just multiplying the tensors left to right in PyTorch.cpu results
cuda results
script
TODO