-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Adding matmul equivalent of multi_dot (Issue #8719) #10690
Conversation
Allows numpy.asscalar to pass scalars
BUG: fixes numpy#4701 related to numpy.asscalar
multi_dot is modified to allow stacks of arrays (i.e. N-D arrays rather than just 2-D arrays) internally using matmul. The optimum order of multiplication is calculated accordingly.
This needs to be a new function, we cannot change the behavior of existing functions in this way and maintain backward compatibility. Proposals for new functions should also be discussed on the mailing list before heading off to implement them. |
If the proposal for a new function goes through, it will also need a mention in |
Thanks for the review @charris. I will post this proposal on the mailing list. |
@charris - by a change in functionality, do you mean that >2-d arrays no longer raise exceptions but actually return a result? I.e., is the worry that there is code out there that relies on getting an exception? I ask since one should weigh breaking such code that against the benefit of not introducing yet another name. p.s. One might also deprecate via an argument, say, |
numpy/linalg/linalg.py
Outdated
of the matrices [1]_ [2]_. Depending on the shapes of the matrices, | ||
this can speed up the multiplication a lot. | ||
|
||
If the first argument is 1-D it is treated as a row vector. | ||
If the last argument is 1-D it is treated as a column vector. | ||
The other arguments must be 2-D. | ||
If one of the other arguments is N-D, N>2, it is treated as a stack of matrices and broadcast accordingly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that the first and last arguments can't be N-D? Why not? Couldn't these also be treated like stacks of matrices?
numpy/linalg/linalg.py
Outdated
of the matrices [1]_ [2]_. Depending on the shapes of the matrices, | ||
this can speed up the multiplication a lot. | ||
|
||
If the first argument is 1-D it is treated as a row vector. | ||
If the last argument is 1-D it is treated as a column vector. | ||
The other arguments must be 2-D. | ||
If one of the other arguments is N-D, N>2, it is treated as a stack of matrices and broadcast accordingly | ||
All other arguments must be 2-D. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this wording a little confusing now -- what does "all other arguments" now refer to?
I might say:
All arguments other than the first and last must be at least 2-D.
I don't think this is a backwards compatibility break. Every case that would be changed here (ndim > 2) raised an error previously. |
Thanks, @shoyer. I made the corrections you suggested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs test coverage.
numpy/linalg/linalg.py
Outdated
# cost1 = cost((AB)C) = a0*a1b0*b1c0 + a0*b1c0*c1 | ||
cost1 = a0 * b1c0 * (a1b0 + c1) | ||
cost1 = a0 * b1c0 * max(dim[0:1]) * (a1b0 + c1 * dim[2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This indexing isn't right: dim[0:1]
only pulls out one element.
When matrices are stacked we want a stack of 1x1 matrices or vectors and not the number of stacks to appear as the second or first dimension.
Thanks, @shoyer . I added test coverage. |
D1d = np.random.random(2) # 1-D | ||
|
||
# the result should be a scalar | ||
assert_equal(multi_dot([A1d, B, C, D1d]).shape, ()) | ||
# the result should be a a stack of 1x1 matrices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't consistent with how multi_dot
currently works. We can't change existing behavior, so you will need to change this back.
I would feel more confident in this change in general if you preserved existing tests unchanged and simply added additional tests for higher dimensional inputs.
# the result should be a scalar | ||
assert_equal(multi_dot([A1d, B, C, D1d]).shape, ()) | ||
# the result should be a a stack of 1x1 matrices | ||
assert_equal(multi_dot([A1d, B, C, D1d]).shape, (3, 2, 1, 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this input in particular, I think the correct return shape should be (3, 2)
.
|
||
cost[i, j] = min([ | ||
cost[prefix] + cost[suffix] + cost_mult(prefix, suffix) | ||
for k in range(i, j)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't remove this. The algorithm is presumably still based on Coram, but with some adaptations? The reference to source material is valuable.
# the result should be a scalar | ||
assert_equal(multi_dot([A1d, B, C, D1d]).shape, ()) | ||
# the result should be a a stack of 1x1 matrices | ||
assert_equal(multi_dot([A1d, B, C, D1d]).shape, (3, 2, 1, 1)) | ||
|
||
def test_dynamic_programming_logic(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add another test like this for higher dimensional inputs.
@@ -2536,7 +2528,7 @@ def _multi_dot_matrix_chain_order(arrays, return_costs=False): | |||
j = i + l | |||
m[i, j] = Inf | |||
for k in range(i, j): | |||
q = m[i, k] + m[k+1, j] + p[i]*p[k+1]*p[j+1] | |||
q = m[i, k] + m[k+1, j] + p[i]*p[k+1]*p[j+1]*max(dim[i:j+1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this new expression max(dim[i:j+1])
for calculating the number of repeats in broadcasting is correct.
Consider matrix multiplication between inputs of shape (N, 1, X, Y)
and (1, M, X, Y)
. These inputs would have dim = [N, M]
, but the number of repeats per broadcasting rules is N*M
, not max(N, M)
.
Something like the utility function _broadcast_shape()
from stride_tricks
could potentially be helpful for calculating this.
@SpoorthyBhat Thank you for working on this! Do you think we could get to the finish line some time soon? |
This PR is problematic:
I think we should close it. |
Closing, for reasons mentioned above. Please comment or reopen if you disagree. |
Closing. If you wish to continue work on this, you can cherry-pick the commits onto a new PR. |
multi_dot is modified to allow stacks of arrays (i.e. N-D arrays rather than just 2-D arrays) internally using matmul.
The optimum order of multiplication is calculated accordingly.