Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add mdot: fast dot with multiple arguments. #4977

Merged
merged 1 commit into from Nov 13, 2014

Conversation

sotte
Copy link
Contributor

@sotte sotte commented Aug 20, 2014

This is not ready to be merged!

mdot makes it easy to chain dot. Think of mdot as this:

def mdot(*args): return reduce(np.dot, args)

However, mdot uses the fact that matrix multiplication is associative and
finds the optimal order of multiplications with dynamic programing before
doing the mutiplication. This can speed up the multiplication quite a bit.

The algorithm follows:
Cormen, "Introduction to Algorithms", Chapter 15.2, p. 370-378

Note: currently mdot only works with arrays which have ndim == 2 to avoid
ambiguities of dot. Also see the discussion in #4311.

TODO:

  • Decide if ndim != 2 should be ok
  • Add unittests
  • decide on name. possible names are:
    • mdot: think "multi dot"
    • multidot: as above :)
    • multi_dot: as above :)
    • chain_dot
    • fdot: think "fast dot" (should the parameter optimize be optional then? I don't think so.)
    • ddot: think "multiple d for multiple dot"
    • cdot: think "chain dot"
    • odot: think "optimal do

@seberg
Copy link
Member

seberg commented Aug 20, 2014

About your todo's. The first thing, more then ndim == 2 is OK, if it stick to the new __matmul__ rule (not current dot outer broadcasting) in my opinion. But, I would say that it could be added later on. The only thing I am a bit unsure on is the exact name, because I am not sure if we will add (and what) other names to do with __matmul__ introduction.

@sotte
Copy link
Contributor Author

sotte commented Aug 20, 2014

@seberg I agree, we can add that later.

Possible names if you want to reserve mdot for __matmul__:

  • fdot: think "fast dot" (should the parameter optimize be optional then? I don't think so.)
  • ddot: think "multiple d for multiple dot"
  • cdot: think "chain dot"
  • odot: think "optimal dot"

I think I prefer cdot of these...

@juliantaylor
Copy link
Contributor

I think an explicit chain_dot is best, a single letter is easily confused with lowlevel blas dots, e.g. sdot, ddot, zdot etc.

@njsmith
Copy link
Member

njsmith commented Aug 20, 2014

Yeah, for ndim > 2 I think that leaving it out for now is fine, and we
certainly shouldn't implement legacy dot's behaviour.

For ndim = 1, I think this should be allowed for the first and last entries
in the chain.

On Wed, Aug 20, 2014 at 12:53 PM, Julian Taylor notifications@github.com
wrote:

I think an explicit chain_dot is best, a single letter is easily confused
with lowlevel blas dots, e.g. sdot, ddot, zdot etc.


Reply to this email directly or view it on GitHub
#4977 (comment).

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org


"""
for array in args:
if array.ndim != 2:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most numpy functions do not assume that their inputs are ndarrays, only that they are "array-like". You probably want an args = [np.asanyarray(args) for arg in args] before this check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In most places (and all places in linalg) I looked at np.asarray is used. Any reason why I should use np.asanyarray here?

@jaimefrio
Copy link
Member

I think this is a cool addition as is. Aside from the unittests, everything else in your TODO list can be added afterwards.

# The cost for multiplying AB is then: 10 * 100 * 5
p = [arg.shape[0] for arg in args]
p.append(args[-1].shape[1])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be premature optimization, but since the future plans will likely include taking stacks of arrays, once the @ functionality is in place, checking for shape[-2] and shape[-1], instead ofshape[0]andshape[1]` may be appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's handle this once @ is in place. I don't think there is even a prototype yet (see #4464).

@sotte
Copy link
Contributor Author

sotte commented Aug 21, 2014

@juliantaylor chain_dot is kinda long. I didn't even know about ddot and sdot. I don't think most people care about the underlying blas function. There is vdot already: http://docs.scipy.org/doc/numpy/reference/generated/numpy.vdot.html. So why not have cdot?

@sotte
Copy link
Contributor Author

sotte commented Aug 21, 2014

I wonder if there is a place for "benchmark unittest". I mean I want to make sure that mdot with optimize=True is faster than the not optimal version.

I also wonder if I should handle cases where there are less than 3 arrays? Should I just throw an Error? Currently I'm using my personal version of mdot everywhere where I used dot before. So it's useful to be able to handle 2 arguments.

@sotte
Copy link
Contributor Author

sotte commented Aug 21, 2014

I'm just realizing that reduce is not part of python 3 anymore. There is functools.reduce which is available in python 2.6 and 3.x. Since the current version of numpy wants 2.6 and above I'll use functools.reduce.

@larsmans
Copy link
Contributor

I called a similar function multidot once. I still like that name, it was easy to remember and type.

@sotte
Copy link
Contributor Author

sotte commented Aug 25, 2014

@njsmith would you broadcats a 1D vector into a row matrix (1 x n) for the first in the chain and column matrix (m x 1) for the last in the chain? In that case we don't have to modify the non optimized reduce part and the dynamic programming part also works without changes.

@njsmith
Copy link
Member

njsmith commented Aug 25, 2014

Technically this should be called "promotion" or something rather than
broadcasting, but yes, that's how dot works. (Try it!)
On 25 Aug 2014 09:39, "Stefan Otte" notifications@github.com wrote:

@njsmith https://github.com/njsmith would you broadcats a 1D vector
into a row matrix (1 x n) for the first in the chain and column matrix (m x

  1. for the last in the chain? In that case we don't have to modify the non
    optimized reduce part and the dynamic programming part also works without
    changes.


Reply to this email directly or view it on GitHub
#4977 (comment).

@argriffing
Copy link
Contributor

+1 for the mdot name and for allowing 1d first and last args

I wonder if there is a place for "benchmark unittest".

Have you checked python runtests.py --bench? The analogous script in scipy runs benchmarks that are separate from the unit tests but which are organized similarly. Setting up something like vbench could be interesting but would be more work.

elif n == 2:
return dot(arrays[0], arrays[1])

arrays = [asarray(a) for a in arrays]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just happened across this: would suggest to always use asanyarray rather than asarray, since there would seem to be no reason here not to preserve a possible subclass.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separately, should this line be moved up? It seems illogical that the type of output array might depend on the number of arguments passed in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asarray is used throughout linalg.py and I wanted to preserve it. I'll change it.

The second argument: mdot behaves like dot if there are only one or two arguments. No need to do anything that is unique to mdot. That's why I moved arrays = [asarray(a) for a in arrays] down.

@sotte
Copy link
Contributor Author

sotte commented Aug 28, 2014

@argriffing I'll take a look at python runtests.py --bench. Thanks for the pointer!

@sotte
Copy link
Contributor Author

sotte commented Aug 31, 2014

I added tests and documentation for the ndim==1 cases.

The name of the function is not clear yet. Some for, some against mdot. If there is no crazy opposition against mdot I'd keep the name.

I think we can merge it soon.

(I'll still add some benchmark tests later. Btw, @argriffing thanks for the pointer to vbench. It's a cool tool!)

@juliantaylor
Copy link
Contributor

I still prefer chain_dot or multi_dot if its too long users can always change its name in the import, but if I'm the only one mdot is fine.

Can you still write a small section in the release notes for 1.10? this is a really nice feature which should also have a summary line in the highlights.

@njsmith
Copy link
Member

njsmith commented Sep 2, 2014

I also vote for a more obvious name. Reading is harder than writing etc.
On 2 Sep 2014 17:17, "Julian Taylor" notifications@github.com wrote:

I still prefer chain_dot or multi_dot if its too long users can always
change its name in the import, but if I'm the only one mdot is fine.

Can you still write a small section in the release notes for 1.10? this is
a really nice feature which should also have a summary line in the
highlights.


Reply to this email directly or view it on GitHub
#4977 (comment).

@argriffing
Copy link
Contributor

Those longer names are OK too, I only +1'd the name mdot because I'd coincidentally used this same name for the same function.

@sotte
Copy link
Contributor Author

sotte commented Sep 3, 2014

Ok, I can live with a longer name (I'll import it as mdot anyway :)). multi_dot it is.

@juliantaylor I edited the release notes.

I wonder if I should mention multi_dot in linalg/info.py and linalg/__init__.py. info.py and __init__.py contain almost the same docstring (also in different subpackages) but not quite the same. Is there a reason for this?

@sotte
Copy link
Contributor Author

sotte commented Sep 3, 2014

Do you normally squash the commits before you merge?

@seberg
Copy link
Member

seberg commented Sep 3, 2014

If it gets a lot then typically yes (and here it seems like quite a lot ;)).

@sotte
Copy link
Contributor Author

sotte commented Sep 3, 2014

Once I get a final OK I'll squash the commits.

of the matrices [1]_ [2]_. Depending on the shape of the matrices
this can speed up the multiplication a lot.

The first and last argument can have `ndim==1` and are treated as
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking around the docstrings of other functions, see e.g. np.dot, 1-D and 2-D seem to be the preferred ways of indicating the dimensionality of the input arrays. You should try to stick to that convention.

@sotte
Copy link
Contributor Author

sotte commented Nov 3, 2014

@shoyer the documentation mentions the constraints regarding the dimensionality:

The first and last argument can be 1-D and are treated respectively as
row and column vector. The other arguments must be 2-D.

I could make it more explicit.

@abalkin
Copy link
Contributor

abalkin commented Nov 3, 2014

You do realize that solving the dynamic programming problem at all for a set of square matrices is a waste of effort?

Yes, and this is another reason in favor of multi_dot(x) signature. A set of conforming square matrices is easier to detect.

I think the signature taking *args serves as a useful nudge to signal to users that using a 3D matrix as input is probably not what they want want to be doing ...

Nudging the users is not what a well-designed library should be doing. When NumPy can select the best algorithm by itself, it should just do it. It is perfectly fine to have different code paths for multi_dot of square matrices and rectangular ones. Nudging the users into not using a convenience function in the probably most common case is not.

@shoyer
Copy link
Member

shoyer commented Nov 3, 2014

Well, I'm definitely +1 on skipping the dynamic programming if all matrices have the same shape (at least if there are enough matrices for the speed of the calculating the optimal ordering to matter). That is indeed a better idea than telling users to use reduce/np.dot.

@sotte
Copy link
Contributor Author

sotte commented Nov 3, 2014

The dynamic programming optimization does not help in case of square matrices. So we could skip it of course. Checking the shape of arrays does not really depend on how we supply arguments though.

As for the 3-D arrays: I'm not really interested in it, but of course one can add it. I don't see how this influences the choice of *args, but I'm willing to be convinced :)

@abalkin
Copy link
Contributor

abalkin commented Nov 3, 2014

Checking the shape of arrays does not really depend on how we supply arguments though.

Yes, it does. If args is a 3-d array you know that all shapes are the same, but you loose this information if you pass args in as *args. So you get an O(1) vs. O(n) difference in this case.

@sotte
Copy link
Contributor Author

sotte commented Nov 3, 2014

I agree. But if you want to circumvent the DP optimization for a list of 2-D square arrays you still have to check every element. The runtime difference (O(1) vs O(n)) is probably neglectable compared to the real multiplication.

@jaimefrio
Copy link
Member

In case I haven't made this clear, I think *args is a dreadful idea, because I really hope and wish for this function to have a more complete functionality and go on to need the ugly **kwargs (I still don't like it). And if there are ever going to be kwargs, the benefit of *args is exactly two keystrokes, a saving which hardly justifies anything.

That said, the 3D stack of arrays argument is not a very solid one: if a is the 3D array, you could always do multi_dot(*a), which turns your array into a tuple of 2D views into the original 3D array, so it should be a fast operation. And anyway, worrying about performance degradation because of this, or in shape checking, when matrix multiplication is at best an O(n^2.x) algorithm seems pointless.

If there really is a strong liking for *args, I think that we should take the opportunity to discuss what numpy's policy regarding this should be, so that we can at least spare ourselves having this discussion again.

Personally, and as a general rule, I don't think that *args, **kwargs is ok, and I don't buy the Python 3 argument either: support for Python 2.x is not going to be dropped in a very long time. If there really is a strong liking for *args, I think the better option would be to eventually have two functions in the API: one with all the bells and whistles, that took a sequence of arrays as first argument, and whatever kwargs are deemed necessary afterwards, and a second, simplified version of this, that took a variable list of arguments and had predefined options for the kwargs.

This is, by the way, what I think concatenate and the family of hstack, vstack, dstack and the like should have done.

@abalkin
Copy link
Contributor

abalkin commented Nov 3, 2014

That said, the 3D stack of arrays argument is not a very solid one: if a is the 3D array, you could always do multi_dot(*a), which turns your array into a tuple of 2D views into the original 3D array, so it should be a fast operation. And anyway, worrying about performance degradation because of this, or in shape checking, when matrix multiplication is at best an O(n^2.x) algorithm seems pointless.

Note that in the problems where you need to multiply many square matrices these matrices tend to be small. Oftentimes they are 2x2 or 3x3.

@sotte
Copy link
Contributor Author

sotte commented Nov 6, 2014

My main point was that the interface from the user's perspective would be a bit nicer if we used *args. But I don't have a very strong opinion. If we won't change it to *args we can merge the pull request.

@Nodd
Copy link
Contributor

Nodd commented Nov 7, 2014

@jaimefrio Just my 2 cents: using *args is not about saving two keystrokes but increasing readability. The more parenthesis you have, the more confusing it is to see where each argument/call/tuple begins and ends. Code is read way more often than it is written...

@jaimefrio
Copy link
Member

We could go on for ever giving qrguments and counter arguments: one could argue that, because of its very nature, this function is geared towards runtime generated lists of matrices, not with a hardcoded list of them. e.g.it would likely more often be used as something like:

mats = [get_me_a_matrix() for j in range(n)]
multi_dot(*mats)

rather than:

multi_dot(m0, m1, m2, ..., mn)

and that having that extra *, which is not one of the most prominent features of the language, similarly hurts readability.

And don't get me wrong, I find the extra parenthesis as annoying as anyone else, but I much prefer this minor annoyance than being shown an f(*args, **kwargs) signature by an IDE while coding. But if that's not how the community feels, and we can come to an agreement on what the general numpy rule should be, I'll happily give in. I'll try to find some time to articulate the options and send a mail to the discussion list so that we can have a vote there.

@sotte
Copy link
Contributor Author

sotte commented Nov 7, 2014

@jaimefrio maybe a discussion about *args (independent of this concrete pull request) would be good simply to have guidelines for the future. 👍

@argriffing
Copy link
Contributor

On one hand I've only ever used the mdot utility for hardcoded lists of arrays and I'd implemented my version using *args, but on the other hand I dislike *args because for whatever reason it tends to spread, turning the signature of every related function into f(*args, **kwargs). For example, it has metastasized throughout the scipy.optimize.minimize framework making that code less than readable to me.

@seberg
Copy link
Member

seberg commented Nov 7, 2014

Frankly, to me this is more about whether the function would be used as

a @ b @ c @ d  -->  multi_dot(a, b, c, d)

or

reduce(np.dot, arrays)  --> multi_dot(arrays)

and less about whether or not *args, **kwargs is horrible or not. There are a lot of horrible *args, **kwargs uses, but there is no passing through.
If the second one is a common use case, I think we should probably not use the *args signature, if it is rare enough, I guess then depending on how horrible you think *args, **kwargs is, you would choose differently...

@argriffing
Copy link
Contributor

@seberg if your question is purely to gather evidence about use cases, I've used your case (1) many times and never case (2). This is the opposite of @jaimefrio's intuition.

@sotte
Copy link
Contributor Author

sotte commented Nov 7, 2014

I used both cases.

@juliantaylor
Copy link
Contributor

would need a rebase, probably due to the release notes

`np.linalg.multi_dot` computes the dot product of two or more arrays in
a single function call, while automatically selecting the fastest evaluation
order.

The algorithm for selecting the fastest evaluation order uses dynamic
programming and closely follows:

    Cormen, "Introduction to Algorithms", Chapter 15.2, p. 370-378
@sotte
Copy link
Contributor Author

sotte commented Nov 10, 2014

I rebased onto the current master.

@sotte
Copy link
Contributor Author

sotte commented Nov 13, 2014

So, what are the next steps?

@juliantaylor
Copy link
Contributor

list of arrays is consistent with already existing api so its probably not worth bikesheding more about it.
Thanks, this is a very nice new feature.

juliantaylor added a commit that referenced this pull request Nov 13, 2014
ENH: add `mdot`: fast dot with multiple arguments.
@juliantaylor juliantaylor merged commit f14d5e1 into numpy:master Nov 13, 2014
@sotte
Copy link
Contributor Author

sotte commented Nov 13, 2014

Thanks everybody who was involved in this and gave feedback!

@solarjoe
Copy link
Contributor

solarjoe commented Mar 1, 2017

Though np.linalg.multi_dot was added in Numpy 1.10 a long, long time ago
it is not listed in the Numpy docs. Search engines only find other pages with examples.
Does anyone know how to fix this?

@eric-wieser
Copy link
Member

@solarjoe: One line in :/doc/source/reference/routines.linalg.rst should do it

@solarjoe
Copy link
Contributor

solarjoe commented Mar 1, 2017

Ok, line added.
#8723

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet