Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Adding matmul equivalent of multi_dot (Issue #8719) #10690

Closed
wants to merge 11 commits into from

Conversation

SpoorthyBhat
Copy link

multi_dot is modified to allow stacks of arrays (i.e. N-D arrays rather than just 2-D arrays) internally using matmul.
The optimum order of multiplication is calculated accordingly.

Allows numpy.asscalar to pass scalars
BUG: fixes numpy#4701 related to numpy.asscalar
multi_dot is modified to allow stacks of arrays (i.e. N-D arrays rather than just 2-D arrays) internally using matmul.
The optimum order of multiplication is calculated accordingly.
@charris
Copy link
Member

charris commented Mar 4, 2018

This needs to be a new function, we cannot change the behavior of existing functions in this way and maintain backward compatibility. Proposals for new functions should also be discussed on the mailing list before heading off to implement them.

@charris charris added the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Mar 4, 2018
@charris
Copy link
Member

charris commented Mar 4, 2018

If the proposal for a new function goes through, it will also need a mention in doc/release/1.15.0-notes.rst.

@SpoorthyBhat
Copy link
Author

Thanks for the review @charris. I will post this proposal on the mailing list.

@mhvk
Copy link
Contributor

mhvk commented Mar 5, 2018

@charris - by a change in functionality, do you mean that >2-d arrays no longer raise exceptions but actually return a result? I.e., is the worry that there is code out there that relies on getting an exception? I ask since one should weigh breaking such code that against the benefit of not introducing yet another name.

p.s. One might also deprecate via an argument, say, allow_nd, initially defaulting to False but giving a FutureWarning that the default may change to True.

of the matrices [1]_ [2]_. Depending on the shapes of the matrices,
this can speed up the multiplication a lot.

If the first argument is 1-D it is treated as a row vector.
If the last argument is 1-D it is treated as a column vector.
The other arguments must be 2-D.
If one of the other arguments is N-D, N>2, it is treated as a stack of matrices and broadcast accordingly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that the first and last arguments can't be N-D? Why not? Couldn't these also be treated like stacks of matrices?

of the matrices [1]_ [2]_. Depending on the shapes of the matrices,
this can speed up the multiplication a lot.

If the first argument is 1-D it is treated as a row vector.
If the last argument is 1-D it is treated as a column vector.
The other arguments must be 2-D.
If one of the other arguments is N-D, N>2, it is treated as a stack of matrices and broadcast accordingly
All other arguments must be 2-D.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this wording a little confusing now -- what does "all other arguments" now refer to?

I might say:

All arguments other than the first and last must be at least 2-D.

@shoyer
Copy link
Member

shoyer commented Mar 5, 2018

I don't think this is a backwards compatibility break. Every case that would be changed here (ndim > 2) raised an error previously.

@SpoorthyBhat
Copy link
Author

Thanks, @shoyer. I made the corrections you suggested.

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs test coverage.

# cost1 = cost((AB)C) = a0*a1b0*b1c0 + a0*b1c0*c1
cost1 = a0 * b1c0 * (a1b0 + c1)
cost1 = a0 * b1c0 * max(dim[0:1]) * (a1b0 + c1 * dim[2])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This indexing isn't right: dim[0:1] only pulls out one element.

When matrices are stacked we want a stack of 1x1 matrices or vectors and not the number of stacks to appear as the second or first dimension.
@SpoorthyBhat
Copy link
Author

Thanks, @shoyer . I added test coverage.

D1d = np.random.random(2) # 1-D

# the result should be a scalar
assert_equal(multi_dot([A1d, B, C, D1d]).shape, ())
# the result should be a a stack of 1x1 matrices
Copy link
Member

@shoyer shoyer Apr 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't consistent with how multi_dot currently works. We can't change existing behavior, so you will need to change this back.

I would feel more confident in this change in general if you preserved existing tests unchanged and simply added additional tests for higher dimensional inputs.

# the result should be a scalar
assert_equal(multi_dot([A1d, B, C, D1d]).shape, ())
# the result should be a a stack of 1x1 matrices
assert_equal(multi_dot([A1d, B, C, D1d]).shape, (3, 2, 1, 1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this input in particular, I think the correct return shape should be (3, 2).


cost[i, j] = min([
cost[prefix] + cost[suffix] + cost_mult(prefix, suffix)
for k in range(i, j)])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't remove this. The algorithm is presumably still based on Coram, but with some adaptations? The reference to source material is valuable.

# the result should be a scalar
assert_equal(multi_dot([A1d, B, C, D1d]).shape, ())
# the result should be a a stack of 1x1 matrices
assert_equal(multi_dot([A1d, B, C, D1d]).shape, (3, 2, 1, 1))

def test_dynamic_programming_logic(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add another test like this for higher dimensional inputs.

@@ -2536,7 +2528,7 @@ def _multi_dot_matrix_chain_order(arrays, return_costs=False):
j = i + l
m[i, j] = Inf
for k in range(i, j):
q = m[i, k] + m[k+1, j] + p[i]*p[k+1]*p[j+1]
q = m[i, k] + m[k+1, j] + p[i]*p[k+1]*p[j+1]*max(dim[i:j+1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this new expression max(dim[i:j+1]) for calculating the number of repeats in broadcasting is correct.

Consider matrix multiplication between inputs of shape (N, 1, X, Y) and (1, M, X, Y). These inputs would have dim = [N, M], but the number of repeats per broadcasting rules is N*M, not max(N, M).

Something like the utility function _broadcast_shape() from stride_tricks could potentially be helpful for calculating this.

Base automatically changed from master to main March 4, 2021 02:04
@InessaPawson
Copy link
Member

@SpoorthyBhat Thank you for working on this! Do you think we could get to the finish line some time soon?

@mattip
Copy link
Member

mattip commented Mar 28, 2023

This PR is problematic:

  • conceptually, np.dot and np.matmul handle 3d and larger arrays differently:

    >>> a = np.zeros([4, 5, 5])
    >>> b = np.zeros([1, 5, 5])
    >>> np.matmul(a, b).shape
    (4, 5, 5)
    >>> np.dot(a, b).shape
    (4, 5, 1, 5)
    

    thus is would be problematic to use matmul to chain arrays in a function called multi_dot.

  • tests were modified to change behaviour

I think we should close it.

@mattip
Copy link
Member

mattip commented Mar 30, 2023

Closing, for reasons mentioned above. Please comment or reopen if you disagree.

@mattip mattip closed this Mar 30, 2023
@mattip
Copy link
Member

mattip commented Apr 3, 2023

Closing. If you wish to continue work on this, you can cherry-pick the commits onto a new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

None yet

6 participants