A new PEP for infix matrix multiplication #4351

Merged
merged 34 commits into from Jul 6, 2014

Conversation

Projects
None yet
Owner

njsmith commented Feb 22, 2014

This is now PEP 465, and a properly rendered version can be found here here.

A poorly rendered version of the latest draft can be found here.

Let's use this PR as a central place to discuss this draft PEP for adding an infix matrix multiplication operator to numpy. The PR format is nice in that it allows line by line comments, updating, etc.

Hopefully this goes without saying, but just in case: We especially welcome comments about how the proposal will work for non-numpy projects (or could work, with fixes); this proposal started in the numpy community but the idea here is to build a consensus about how to make Python better for all projects that care about linear algebra.

Some possible points of discussion:

  • Does this in fact seem like a good idea?
  • Do people agree with the points where I claim "there is consensus that..."? (You can search on "consensus")
  • Any suggestions on making the argument stronger and clearer?

@efiring efiring and 1 other commented on an outdated diff Feb 22, 2014

doc/neps/return-of-revenge-of-matmul-pep.rst
+ operator.
+
+Add a ``.dot`` method to array types so as to allow "pseudo-infix"
+A.dot(B) syntax:
+ This has been in numpy for some years, and in many cases it's
+ better than dot(A, B). But it's still much less readable than
+ real infix notation, and in particular still suffers from an
+ extreme overabundance of parentheses. See `Motivation`_ above.
+
+Add lots of new operators / add a new generic syntax for defining
+infix operators:
+ In addition to this being generally un-Pythonic and repeatedly
+ rejected by BDFL fiat, this would be using a sledgehammer to smash
+ a fly. There is a strong consensus in the scientific python
+ community that matrix multiplication really is the only missing
+ infix operator that matters enough to bother about. (In
@efiring

efiring Feb 22, 2014

Contributor

I would really like to see && and || for the element-wise logical operations; faking them with the bitwise operators and parentheses is ugly.

@pv

pv Feb 22, 2014

Owner

The overloadable boolean ops PEP was recently rejected: https://mail.python.org/pipermail/python-dev/2012-March/117510.html
I think it's best to not divert attention to that topic, but just focus on the matmul operation.

@jtratner jtratner and 3 others commented on an outdated diff Feb 22, 2014

doc/neps/return-of-revenge-of-matmul-pep.rst
+ ``*=`` 2 19 22 5
+ ``/=`` 0 23 16 4
+ ``>>`` 4 0 0 3
+ ``^`` 3 0 0 3
+ ``~`` 2 4 5 2
+ ``|=`` 3 0 0 2
+ ``&=`` 1 0 0 1
+``//=`` 1 0 0 1
+ ``^=`` 1 0 0 0
+``**=`` 0 2 0 0
+ ``%=`` 0 0 0 0
+``<<=`` 0 0 0 0
+``>>=`` 0 0 0 0
+======= ======= ======= ======= ========
+
+We see that sklearn and nipy together contain nearly 700 uses of
@jtratner

jtratner Feb 22, 2014

This table is slightly confusing (especially the combined column) - could you label it to say something like "Instances per 10K SLOC" at the top? It gets a little buried in the previous paragraph. I might also suggest sorting it by sklearn to drive home the point about dot products.

@mforbes

mforbes Feb 23, 2014

I also found this table very confusing. When scanning a document like this, one often skips straight to tables, and without a caption, my first reaction was that there are only 80 uses of dot (which of course makes no sense). Is there a way of adding a "caption" or similar?

Since the point of this discussion dot compared to the rest, I would put the dot row at the top. Of course it does not sort there, but it should be clear that this is what we are comparing too. (It can be left in its sorted order too.)

@pv

pv Feb 23, 2014

Owner

Should the table be restricted only to binary ops (and inplace ops). (, { are not binary ops and having fewer entries in the table would make it easier to read.

@njsmith

njsmith Feb 24, 2014

Owner

Check out the table again, I think I handled everyone's concerns. (Partly by giving up on using the ReST table support, which has no way to get readable alignment.)

👍 well written and I'd be for it (for what that's worth), plus it'd be a big incentive to move to Python 3 and reduce the 2/3 fragmentation. (even if the libraries themselves couldn't use those operators because of compatibility issues)

Contributor

fperez commented Feb 22, 2014

Hey @njsmith, we had some discussions ages ago at SciPy'07 or '08, I did my best to summarize them here. That stuff is in an old bazaar repo, but you can get the reST sources here.

I never really followed this up beyond summarizing the conversations in that document. Just mentioning it here in case any of that ancient history is useful, feel free to ignore it in light of the last 5-6 years' worth of experience :)

Contributor

fperez commented Feb 22, 2014

BTW, there's a trivial makefile too that I was using to make rendered versions of the rst sources; it would be handy to have such a view somewhere, as it does make it much nicer to read.

@pv pv and 2 others commented on an outdated diff Feb 22, 2014

doc/neps/return-of-revenge-of-matmul-pep.rst
+ def __rmatpow__(self, other):
+ if isinstance(other, numbers.Number):
+ return other ** self
+ else:
+ return NotImplemented
+
+And for builtin types, statements like ``a @= b`` will perform
+``a = (a @ b)`` via the usual mechanism.
+
+
+Motivation
+==========
+
+The main motivation for this PEP is the addition of a binary operator
+``@`` for matrix multiplication. No-one cares terribly much about the
+matrix power operator ``@@`` -- it's useful and well-defined, but not
@pv

pv Feb 22, 2014

Owner

I'd make the discussion of @@ much less prominent.
It can be mentioned somewhere toward the end, but I think it should not be proposed in the PEP.

@shoyer

shoyer Feb 23, 2014

Member

I agree. Including @@ in this PEP is a distraction which makes it marginally more likely to be rejected. In my experience, matrix powers with arbitrary bases are rare (usually the matrix exponential is preferred).

@njsmith

njsmith Feb 24, 2014

Owner

I moved the motivation of @@ to the end, but my feeling is that it can't hurt and might help. Remember, we're not trying to ask the BDFL for some favor -- we're helping him see the facts about how much more elegant and nice Python will become after making these changes. Suggesting that we have @, *, and ** but not @@ is actually asymmetric and inelegant.

And anyway, if he likes one half of a PEP but not the other than he'll just tell us to change it. It's still clear from the motivation that we'd be totally happy with just @.

@pv pv and 1 other commented on an outdated diff Feb 23, 2014

doc/neps/return-of-revenge-of-matmul-pep.rst
+more often than ``//`` or other integer operations.
+
+
+But isn't it weird to add an operator with no stdlib uses?
+----------------------------------------------------------
+
+It's certainly unusual, but the important thing is whether a change
+will benefit users, not where the software is being downloaded from.
+It's clear from the above that ``@`` will be used, and used heavily.
+And -- who knows? -- perhaps someday the stdlib will contain a matrix
+type of some sort. This PEP only moves us closer to that possibility,
+by helping the Python numerical community finally standardize on a
+single duck type for all matrix-like objects.
+
+
+Summary
@pv

pv Feb 23, 2014

Owner

I think this summary would be better to put before the detailed items --- on top of the motivation section.

@njsmith

njsmith Feb 24, 2014

Owner

You're right. Moved to the top.

@pv pv and 1 other commented on an outdated diff Feb 23, 2014

doc/neps/return-of-revenge-of-matmul-pep.rst
+Alternatives to adding a new operator at all
+--------------------------------------------
+
+Over the past 15+ years, the Python numeric community has explored a
+variety of ways to handle the tension between matrix and elementwise
+multiplication operations. PEP 211 and PEP 225, both proposed in
+2000, were early attempts to add new operators to solve this problem,
+but suffered from serious flaws; in particular, at that time the
+Python numerical community had not yet reached consensus on the proper
+API for array objects, or on what operators might be needed or useful
+(e.g., PEP 225 proposes 6 new operators with underspecified
+semantics). Experience since then has eventually led to consensus
+among the numerical community that the best solution is to add a
+single infix operator for matrix multiply (together with any other new
+operators this implies like ``@=``).
+
@pv

pv Feb 23, 2014

Owner

I wonder if we need to counter "too late, after all these years".

One argument is that adoption of Python3 in the numerical world is still not yet so big.
There's still a time window left to to introduce this feature.

@njsmith

njsmith Feb 24, 2014

Owner

IME that's not usually an argument that people make -- otherwise it would apply to every change anyone ever considered making to Python. Why did we do this back in 1.5? :-) And it's obvious to everyone that Python's use in numerics has grown hugely over the last 5 years...

@mforbes mforbes commented on an outdated diff Feb 23, 2014

doc/neps/return-of-revenge-of-matmul-pep.rst
+Built-ins
+'''''''''
+
+Why are the new special methods defined the way they are for Python
+builtins? The three goals are:
+
+* Define a meaningful ``@`` and ``@@`` for builtin and user-defined
+ numeric types, to maximize duck compatibility between Python scalars
+ and 1x1 matrices, single-element vectors, and zero-dimensional
+ arrays.
+* Do this in as forward-compatible a way as possible.
+* Ensure that ``scalar @ matrix`` does *not* delegate to ``scalar *
+ matrix``; ``scalar * matrix`` is well-defined, but ``scalar @
+ matrix`` should raise an error.
+
+Therefore, we implement these methods so that numbers.Number objects
@mforbes

mforbes Feb 23, 2014

Should this not be numbers.Numbers (as code? It looked like strange grammer with a missing space after the "period" when I first read this.) Similarly with NotImplemented below etc.

+1 from me. Nice work. Thanks for putting this together.

This leaves out things like cross product and matrix division. What about using something like @ as a prefix for an operator to define it as a matrix operation? So @, @/, @__. You could then use @@ for the "reverse" version of that operator, so @@ would be cross product, @@/ would be back division.

Owner

pv commented Feb 23, 2014

@toddrjen: the suggestion of using a prefix for multiple operators is similar to PEP-225, and it has not made progress.

Also, I think cross product and matrix division are below in priority to matrix multiplication in necessity.

Probably, but I would say matrix exponent is below all of those in importance.

I think the argument for including @@ is applicable to other mathematical operators as well, and I would say they are far more useful.

So I would either cut out @@, or better justify including it but not any of other operators.

@shoyer shoyer and 1 other commented on an outdated diff Feb 23, 2014

doc/neps/return-of-revenge-of-matmul-pep.rst
+We see that sklearn and nipy together contain nearly 700 uses of
+matrix multiplication. Within these two libraries, matrix
+multiplication is used more heavily than most comparison operators
+(``<`` ``>`` ``!=`` ``<=`` ``>=``), and more heavily even than ``{``
+and ``}``. In total across all three of the codebases examined here,
+matrix multiplication is used more often than almost all the bitwise
+operators (only ``|`` just barely edges it out), and ~2x as often as
+``//``. This is true even though the stdlib, which contains a fair
+amount of integer arithmetic and no matrix operations, is ~4x larger
+than the numeric libraries put together. While it's impossible to
+know for certain, from this data it seems plausible that on net across
+the whole Python ecosystem, matrix multiplication is currently used
+more often than ``//`` or other integer operations.
+
+
+But isn't it weird to add an operator with no stdlib uses?
@shoyer

shoyer Feb 23, 2014

Member

There is precedent for adding syntax to Python not used (to my knowledge) in the stdlib in the form of x[...] which is equivalent to x[Ellipsis].

@njsmith

njsmith Feb 24, 2014

Owner

Good point, added a mention of that.

Contributor

abalkin commented Feb 23, 2014

It will be helpful to include a survey of what other languages use. For example, there is a precedent for using $ for dot/matrix multiply. R uses %*%. APL used an equivalent of +.* (multiply compose add).

mforbes commented Feb 23, 2014

@fperez You can view a Rendered version on github through the Files Changed tab.

@larsmans larsmans and 1 other commented on an outdated diff Feb 23, 2014

doc/neps/return-of-revenge-of-matmul-pep.rst
+
+
+So ``@`` is good for matrix formulas, but how common are those really?
+----------------------------------------------------------------------
+
+We've seen that ``@`` makes matrix formulas dramatically easier to
+work with, and that matrix formulas are extremely important in
+general. But being important doesn't necessarily mean taking up a lot
+of code: if such formulas only occur in one or two places in the
+average numerically-oriented project, then it still might not be worth
+adding a new operator.
+
+When the going gets tough, the tough get empirical. To get a rough
+estimate of how useful the ``@`` operator will be, this table shows
+the rate at which different Python operators are used in the stdlib,
+and also in two high-profile numerical projects -- the sklearn machine
@larsmans

larsmans Feb 23, 2014

Contributor

It's called scikit-learn, not sklearn (that's just the package name).

@njsmith

njsmith Feb 24, 2014

Owner

Thanks, fixed.

@larsmans larsmans and 1 other commented on an outdated diff Feb 23, 2014

doc/neps/return-of-revenge-of-matmul-pep.rst
+When the going gets tough, the tough get empirical. To get a rough
+estimate of how useful the ``@`` operator will be, this table shows
+the rate at which different Python operators are used in the stdlib,
+and also in two high-profile numerical projects -- the sklearn machine
+learning library, and the nipy neuroimaging library. Units are
+(rounded) usages per 10,000 source lines of code (SLOC). Rows are
+sorted by the 'combined' column, which gives the usage per 10,000 SLOC
+when the three code bases are pooled together. The combined column is
+thus strongly weighted towards the stdlib, which is much larger than
+both projects put together (stdlib: 411575 SLOC, sklearn: 50924 SLOC,
+nipy: 37078 SLOC). [#sloc-details]
+
+The ``dot`` row counts matrix multiply operations, estimated by
+assuming there to be zero matrix multiplies in the stdlib, and in
+sklearn/nipy assuming -- reasonably -- that all instances of the token
+``dot`` are calls to ``np.dot``.
@larsmans

larsmans Feb 23, 2014

Contributor

This is an underestimate for scikit-learn. To get a more accurate estimate, include the ~50 calls to safe_sparse_dot, which works around the fact that matrix multiplication is spelled * in scipy.sparse :( and the dozen or so calls to fast_dot.

@njsmith

njsmith Feb 24, 2014

Owner

Awesome, thanks! That bumps dot up above |, which makes the exposition even easier :-)

@larsmans larsmans and 1 other commented on an outdated diff Feb 23, 2014

doc/neps/return-of-revenge-of-matmul-pep.rst
+
+ * Diving deeper into Machine Learning with Scikit-learn
+
+ * Data Wrangling for Kaggle Data Science Competitions – An etude
+
+ * Hands-on with Pydata: how to build a minimal recommendation
+ engine.
+
+ * Python for Social Scientists
+
+ * Bayesian statistics made simple
+
+ In addition, the following tutorials could easily deal with
+ matrices:
+
+ * Introduction to game programming
@larsmans

larsmans Feb 23, 2014

Contributor

Speaking of which, has anyone been in touch with the PyOpenGL about this PEP?

@njsmith

njsmith Feb 24, 2014

Owner

AFAICT PyOpenGL doesn't actually define a matrix type, it just has a few functions that can accept matrices as arguments (possibly represented as list-of-lists I guess). If you want matrix operators they refer you to numpy.

Contributor

fperez commented Feb 23, 2014

Thanks, @pv, @mforbes; it would be good to add that link to the PR description at the top for readability; I don't have the permissions to edit that box. @njsmith?

In addition to the APL, Kx, and R syntax given by @abalkin above, I skimmed some from Rosetta Code:

language operator
APL +.×
IDL #
J +/ .*
Julia *
K _mul
Mathematica .
MATLAB/Octave * (with .* for elementwise)
Maxima . (with * for elementwise)
Pari/GP *
R %*%
Ruby * (but only for a special matrix class)
TI-83 BASIC *

I'd say APL's use of non-ASCII characters was historically unpopular. IDL's hash clashes with Python's comment. J uses an ASCII reworking of APL, I think, so it's got a much more elaborate idea of combining multiple infix operators together; powerful but perhaps not really Pythonic. The dot of Mathematica and Maxima is nice but clashes with Python's object attribute/method/module accessor. Julia, Octave, Pari/GP, and TI-83 BASIC's star is nice but would be backward-incompatible with NumPy's elementwise multiplication; Ruby's star relies on the subclass idea eloquently rebutted by the original poster. R's and K's multicharacter things are pretty ugly.
I don't think any of these alternatives are superior to the proposed at-sign, given that star and dot are unavailable.

@gdmcbain gdmcbain and 1 other commented on an outdated diff Feb 24, 2014

doc/neps/return-of-revenge-of-matmul-pep.rst
+--------------------------------------------------------------------
+
+Choice of operator
+''''''''''''''''''
+
+Why ``@`` instead of some other punctuation symbol? It doesn't matter
+much, but ``@`` has a few advantages:
+
+* ``@`` is a friendly character that Pythoneers are already used to
+ typing in decorators, and its use in email addresses means it is
+ more likely to be easily accessible across keyboard layouts than
+ some other characters (e.g. $).
+* The mATrices mnemonic is cute.
+* The swirly shape is reminiscent of the simultaneous sweeps over rows
+ and columns that define matrix multiplication.
+
@gdmcbain

gdmcbain Feb 24, 2014

Another mnemonic motivation for the at-sign: it slightly resembles the centred dot (LaTeX \cdot) often used in printed mathematics to represent an algebraic product.

@njsmith

njsmith Feb 24, 2014

Owner

Good point, added.

@gdmcbain gdmcbain and 1 other commented on an outdated diff Feb 24, 2014

doc/neps/return-of-revenge-of-matmul-pep.rst
+ survives is because of strong arguments from some educators who
+ find that its problems are outweighed by the need to provide a
+ simple and clear mapping between mathematical notation and code
+ for novices.
+
+Add a new ``@`` (or whatever) operator that has some other meaning in
+general Python, and then overload it in numeric code:
+ This was the approach proposed by PEP 211, which suggested
+ defining ``@`` to be the equivalent of ``itertools.product``. The
+ problem with this is that when taken on its own terms, adding an
+ infix operator for ``itertools.product`` is just silly. Matrix
+ multiplication has a uniquely strong rationale for inclusion as an
+ infix operator. There almost certainly don't exist any other
+ binary operations that will ever justify adding another infix
+ operator.
+
@gdmcbain

gdmcbain Feb 24, 2014

What about function composition? Haskell uses dot for this, as in

g . f

I got to that by wondering were there such an operator as proposed, besides making heavy use of its intended purpose, how might I overload it? The first thing that springs to mind is functional composition and application. The fundamental idea of a matrix is as a function on vectors; matrix-matrix multiplication is defined subsequently by associativity. If I had arbitrary compatible functions f and g and an argument x, would I like to write f @ x for f (x) and g @ f for lambda x: g (f (x))? Maybe not the former so much but the latter definitely.

@njsmith

njsmith Feb 24, 2014

Owner

It's an interesting idea, but IMO has about a snowball's chance in a sauna of actually getting added. Python has moved away from functional programming in general, and doesn't even bother to include a compose function in the functools module.

@gdmcbain gdmcbain and 2 others commented on an outdated diff Feb 24, 2014

doc/neps/return-of-revenge-of-matmul-pep.rst
+ This was the approach proposed by PEP 211, which suggested
+ defining ``@`` to be the equivalent of ``itertools.product``. The
+ problem with this is that when taken on its own terms, adding an
+ infix operator for ``itertools.product`` is just silly. Matrix
+ multiplication has a uniquely strong rationale for inclusion as an
+ infix operator. There almost certainly don't exist any other
+ binary operations that will ever justify adding another infix
+ operator.
+
+Add a ``.dot`` method to array types so as to allow "pseudo-infix"
+A.dot(B) syntax:
+ This has been in numpy for some years, and in many cases it's
+ better than dot(A, B). But it's still much less readable than
+ real infix notation, and in particular still suffers from an
+ extreme overabundance of parentheses. See `Motivation`_ above.
+
@gdmcbain

gdmcbain Feb 24, 2014

Another 'alternative to adding a new operator at all': given the interpretation of a matrix multiplication as the operation of a matrix on a vector, it's a lot like a function operating on a vector (that just happens to be linear), so what about defining numpy.ndarray.call as matrix multiplication? That would allow A (B) instead of A.dot(B). It doesn't shed any parentheses but is pretty compact.
An advantage of this approach would be that it wouldn't require changing Python, only NumPy. It can nearly be done in user-code, except that

>>> np.ndarray.__call__ = np.ndarray.dot
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't set attributes of built-in/extension type 'numpy.ndarray'
@njsmith

njsmith Feb 24, 2014

Owner

I'm going to leave this out unless someone brings it up on python-dev... if you can look at

S = (H(beta) - r).T(inv(H(V)(H.T)))(H(beta) - r) 

without your eyes starting to bleed then you have higher ocular robustness than me.

@charris

charris Feb 24, 2014

Owner

If you look at the linked past conversations you will see that proposal and some criticisms of it.

@Nodd Nodd commented on an outdated diff Feb 24, 2014

doc/neps/return-of-revenge-of-matmul-pep.rst
+ ^ 3 0 0 3
+ ~ 2 4 5 2
+ |= 3 0 0 2
+ &= 1 0 0 1
+ //= 1 0 0 1
+ ^= 1 0 0 0
+ **= 0 2 0 0
+ %= 0 0 0 0
+ <<= 0 0 0 0
+ >>= 0 0 0 0
+ ==== ====== ============ ==== ========
+
+These numerical packages together contain ~780 uses of matrix
+multiplication. Within these packages, matrix multiplication is used
+more heavily than most comparison operators (``<`` ``!=`` ``<=``
+``>=``), and more heavily even than ``{`` and ``}``. When we include
@Nodd

Nodd Feb 24, 2014

Contributor

{ and } were removed from the table, you can't bring them here.

@Nodd Nodd and 1 other commented on an outdated diff Feb 24, 2014

doc/neps/return-of-revenge-of-matmul-pep.rst
+
+We review some of the rejected alternatives here.
+
+**Use a type that defines ``__mul__`` as matrix multiplication:**
+Numpy has had such a type for many years: ``np.matrix``. And based on
+this experience, a strong consensus has developed that it should
+essentially never be used. The problem is that the presence of two
+different duck-types for numeric data -- one where ``*`` means matrix
+multiply, and one where ``*`` means elementwise multiplication --
+makes it impossible to write generic functions that can operate on
+arbitrary data. In practice, the vast majority of the Python numeric
+ecosystem has standardized on using ``*`` for elementwise
+multiplication, and deprecated the use of ``np.matrix``. Most
+3rd-party libraries which receive a ``matrix`` as input will either
+error out, return incorrect results, or simply convert the input into
+a standard ``ndarray``, and return ``ndarray``s as well. The only
@Nodd

Nodd Feb 24, 2014

Contributor

there is one ``` too much here, it breaks formatting.

@jorisvandenbossche

jorisvandenbossche Feb 24, 2014

Contributor

I think it is the ```ndarray``swhere there is no space between ```` ands` that breaks the formatting

@Nodd

Nodd Feb 24, 2014

Contributor

Oh, you're right.

Member

seberg commented Feb 24, 2014

Here is something to chew on (I think it may be an argument of why scalar multiplication should not be defined). What if I have an array of objects. There are two possibilities (disregarding possible mixed case for now, which gets ugly):

  1. the object implement matrix multiplication,
  2. the objects do not implement matrix multiplication.
    In case 2. things are obvious. We should do a matrix multiplication by calling the appropriate * on each element. However, in case 1. we could do * on each element, but the user may also want us to do @ and we would just stick to broadcasting.

If we do not allow scalar multiplication with @ we could check whether it is implemented or not. Though this is a little shaky if we have mixed types (ideally it would probably be an error if some objects implement @ and some do not). Just some thoughts, maybe someone has an insight. In general I quite like this now (and I was pretty sceptical first :)).

EDIT: Actually I changed my mind (already...). I don't think there is a good way to allow delegation of the matrix multiply. So you just won't be able to multiply many sparse matrices put into arrays with the operator.

@jrovegno jrovegno and 1 other commented on an outdated diff Feb 24, 2014

doc/neps/return-of-revenge-of-matmul-pep.rst
+
+
+Why should matrix multiplication be infix?
+------------------------------------------
+
+When moving from scalars -- like ordinary Python floats -- to more
+general n-dimensional arrays and matrices, there are two standard ways
+to generalize the usual multiplication operation. One is elementwise
+multiplication::
+
+ [2, 3] * [4, 5] = [2 * 4, 3 * 5] = [8, 15]
+
+and the other is the `matrix product`_. For various reasons, the
+numerical Python ecosystem has settled on the convention that ``*``
+refers to elementwise multiplication. However, this leaves us with no
+convenient notation for matrix multiplication.
@jrovegno

jrovegno Feb 24, 2014

Why infix operators for matrix multiplication is better than use the Matrix object like sympy? Ref. http://docs.sympy.org/latest/tutorial/matrices.html

@pv

pv Feb 24, 2014

Owner

See below. Sympy has the luxury that it does not have a second object type for which * is the elementwise operation. numpy.matrix has been around for years, and it causes more problems than it solves, as it drives towards code where * means matrix product in some places and elementwise multiplication in others. The switch between the two modes occurs via type casts that may be done far elsewhere in the code.

The claim (currently line 41) that scalar multiplication conforms with matrix multiplication and should be included in the new operator seems to me a false claim, mathematically.

  • what is being asserted as first principles for @ (e.g., what mathematical understanding)?
  • to what extent do other important implementations (e.g., Mathematica and Julia) deviate from the proposed mathematical understanding?
  • were the motivations for any deviations adequate (e.g., supported by strong use cases)?

I would hope the new operator @ would initally be clearly identified with a single mathematical operation that is well defined on ndarrays has matrix multiplication as a special case. I would hope the operator would not be defined outside the mathematical usage. To give a point of comparison, consider Mathematica's Dot command, which is very well behaved. It does not do scalar multiplication.

Contributor

argriffing commented Feb 24, 2014

As instigator of that recent numpy.matrix mailing list fest which was a motivating factor for this PEP I thought I should comment.

I'm +/-0 for a new dedicated matrix multiplication operator, as my questions are about semantics not syntax. And not just for matrix multiplication. In particular, what do indexing, reduction along an axis, and dot product with a 1d array mean? I already use .dot and .multiply instead of * in code that works with objects that act roughly like arrays to avoid the matrix product vs. elementwise product ambiguity, but even avoiding * altogether is not enough.

The github thread that motivated yet another death-to-numpy.matrix mailing list spam was scipy/scipy#3297 and I do not see how adding more syntax will address the difficulties mentioned in scipy/scipy#3297 (comment).

Contributor

abalkin commented Feb 24, 2014

I agree with @alan-isaac and I would go even further and require ndim >= 2 for @ operands.

From help(dot):

    For N dimensions it is a sum product over the last axis of `a` and
    the second-to-last of `b`::

        dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])

This definition does not generalize well to ndim < 2. In practice, dot behavior when mixing matrices with ndim = 1 vectors is a common source of confusion because ndim = 1 vectors can be arbitrarily treated as rows or columns.

Another advantage of limiting to ndim >= 2 is that Python does not have ndim >= 2 objects and users will not expect [1, 2] @ [2, 1] to work.

Contributor

abalkin commented Feb 24, 2014

@gdmcbain - you included old K (version 2) syntax in your table, but did not mention the $ operator from K (version 3) or Q (another Kx language) mmu operator.

I think $ is the only contender to @ for the purposes of this PEP and I personally like @ more for the reasons already spelled out. Yet, for completeness, I think PEP should mention that $ has been used as a matrix multiplication operator.

Owner

pv commented Feb 24, 2014

@argriffing: a new matrix multiplication operator does not solve the always-2D behavior of np.matrix, but it will remove the reason for its existence and allow really deprecating it.

Owner

pv commented Feb 24, 2014

@abalkin: IMHO, in A @ b == A_ij b_j is mathematically well-defined and non-arbitrary.

Owner

charris commented Feb 24, 2014

@pv I think it leads to associativity problems. What is A @ v @ C?

Contributor

abalkin commented Feb 24, 2014

@pv, it is arbitrary because the alternative is

   (A @ b)_i == sum_j A_ji b_j

The familiar row x column rule breaks because 1d arrays print as rows:

>>> arange(3)
array([0, 1, 2])

but what one would expect on the r.h.s. of a matrix @ vector product is a column vector like this:

>>> arange(3)[:,newaxis]
array([[0],
       [1],
       [2]])
Owner

pv commented Feb 24, 2014

Ah, true, it's not associative, and that would be pretty bad for binary op (as matmul is usually though to be associative...). So yeah, you're right that the 1D case is out.

This may bring a renaissance of squeezing...
OTOH, the cases that are not associative probably have no meaning as transcriptions of traditional linalg formulas.

Owner

njsmith commented Feb 24, 2014

@alan-isaac: I think you must have been looking at an out-of-date version of the PEP, the current version makes passing scalars to @ always an error, and for pretty much the exact reason you're suggesting :-). Even in the first draft it didn't occur to me that @ should mean scalar multiplication, I was thinking vaguely that the rule for 1d arrays should be extended to 0d arrays, so scalars passed to @ would be treated as 1x1 matrices. But this is silly for a lot of reasons, not least that absolutely nobody else shared my intuition :-). So it's gone.

The point about associativity is a good one! I'm glad I wrote down these details. The answer to @charris's question is that according to what's written in the PEP right now, A @ v @ C would mean (A @ v) @ C because @ (like all python operators except ** and @@) is left-associative. So it'd be equivalent to the easier-to-read C.T @ A @ v.

Lack of associativity is disturbing, but even so, I think this is one of those rare cases where practicality beats purity. Allowing @ to work on 1d vectors does allow some for some weird edge cases, but it also allows for a lot of very very common operations to "just work". Who wants to explain to newbies why to solve the simplest linear system in the obvious way, they have to type (inv(A) @ b[:, np.newaxis]).ravel(), or do OLS by typing solve(X.T @ X, X @ y[:, np.newaxis]).ravel()? Who wants to type (a[np.newaxis, :] @ a[:, np.newaxis])[()] every time they compute an inner product (or (a[np.newaxis, :] @ V @ a[:, np.newaxis])[()] for general quadratic forms)? Who wants to go back to the matlab world of having to stop every few minutes and check whether some stupid variable is a row or a column vector and randomly sprinkling transposes around until your code works? There's a reason why dot and solve and everything are very careful to have this handling for 1d vectors now... (Also: - and / aren't associative and they do okay.)

Owner

charris commented Feb 24, 2014

@pv Associativity should probably be part of the PEP. Hmm, squeezing is nasty, it would be nice to get a scalar out of v.T @ A @ v. So I think there needs to be, again, some kind of row/column vectors. The @ is effectively function composition. I don't see how this could be part of the spec, though.

Associativity is certainly an understandable mathematical principle.
Mathematica's Dot command violates associativity, as does numpy.dot.
I suspect this was dictated by convenience in use rather than by any
mathematical principle.

Contributor

sotte commented Feb 24, 2014

Hey guys,

I'm not sure how relevant this is but I wrote a function mdot [1] which takes multiple arrays as argument, finds an optimal parenthesization [3], and multiplies the arrays accordingly. In many cases the multiplication is significantly faster and you don't have to determine the best order manually.

[1] #4311
[2] https://github.com/sotte/numpy_mdot
[3] http://en.wikipedia.org/wiki/Matrix_chain_multiplication

Contributor

argriffing commented Feb 24, 2014

@sotte if .dot is not associative then how do you decide which parenthesization is intended? Or does it guarantee the same result as, say, left-associativity when not all arrays have ndim 2?

Contributor

abalkin commented Feb 24, 2014

Associativity can be preserved as long as we consistently interpret 1d arrays as column-vectors (or row-vectors, but for consistency with LA textbooks and @pv formula above, I 'll use columns.)

@charris

What is A @ v @ C?

  (A @ v @ C)_ij = sum_kl A_ik v_k C_lj

since we treat v as a column vector, to multiply it by a matrix on the right, you need to broadcast it in the row direction to conform to C shape. Effectively v_k ~ v_k*1_l.

Unfortunately, this is not what numpy.dot does. Instead it treats its first 1d argument as a row and its second as a column. This gives numpy.dot commutativity, but sacrifices associativity.

Owner

pv commented Feb 24, 2014

I wouldn't characterize the current behavior of np.dot "unfortunate" at all, as it is the definition you need if you want to deal only with 1D vectors. Casting to column vectors is somewhat arbitrary, and leads to more squeezing and np.newaxis.

Contributor

argriffing commented Feb 24, 2014

I'm curious what interface will be used by the next generation of array libraries from Enthought or Continuum with their millions of DARPA and Big Data Science Foundation grants? I assume they will not use numpy.matrix but I don't know exactly what notation they use for e.g. matrix multiplication.

Contributor

abalkin commented Feb 24, 2014

@alan-isaac - while associativity is not essential in function notation, lack of it would kill most of the advantages of having a dyadic operator.

If a @ b @ c can alternatively mean (a @ b) @ c or a @ (c @ b) and those are different quantities, people will defensively put parentheses everywhere and their code will not look
much cleaner than a.dot(b.dot(c)).

I would also like to mention, that numpy itself may not be the main beneficiary of the @ operator because in imperative programming you want to control the order of evaluation and may even want to occasionally write a.dot(b, out) instead of out = a.dot(b). It is the declarative systems like numexpr or Theano that can take the full advantage of @.

@abalkin I do not think that is the right description of np.dot. I would say instead that (like Mathematica's Dot) it makes some decisions based on object type. So when the arguments are both 1d, it forms an ordinary scalar product, not (!) a 1x1 matrix. We get a lot of convenience from this kind of operator overloading, at the expense of some behavioral consistency (e.g., we lose associativity). It is not NumPy alone that has made this trade-off I would propose that the PEP emphasize either matrix multiplication only or at least only multiplication of tensors when the last dimension of the premultiplier matches the first dimension of the postmultiplier. The behavior in all other cases pretty clearly deserves an extended discussion and is likely to involve pragmatic trade-offs.

As a background observation, however hard it is to characterize np.dot, I do not recall many complaints about its behavior. The lack of associativity does suggest however that the mdot function of @sotte should insist on strict conformability. I'm guessing this should similarly be true of the new operator. That is, its behavior should not simply replicate that of dot. If anyone wants to mix 1d and 2d arrays, for example, they can always return to the use of dot. I am very concerned that (what I take to be) the suggestion of @charris that 1d objects have an orientation will lead to hard to debug code. If it were nevertheless implemented, I think that the default orientation should "remain" horizontal.

Contributor

abalkin commented Feb 24, 2014

I wouldn't characterize the current behavior of np.dot "unfortunate" at all, as it is the definition you need if you want to deal only with 1D vectors.

It is fine to have a "scalar product" function that is defined on 1d vectors, but it is unfortunate that matrix multiplication got the same name. Scalar product is commutative and not associative, while matrix multiplication is associative and not commutative. Yes, the two operations are computationally similar, but they are quite different mathematically.

Note that in addition to dot, numpy has inner, outer, and tensordot that mostly differ in their treatment of 1d arrays while reducing to matrix (tensor) multiplication under proper reshaping.

Contributor

abalkin commented Feb 24, 2014

@alan-isaac - what is not "the right description of np.dot"? The only description I gave was a quote from help(np.dot) for ndim >= 2 case and an observation that that definition does not extend logically to ndim < 2. A mathematically sound extension should preserve the algebraic properties such as associativity and is possible in two different ways (1d means column and 1d means row) neither of which is followed by np.dot.

I have a question looking at this from the point of view of SymPy (I am the lead developer of SymPy). @jrovegno already pointed out that SymPy has its own Matrix type that uses * for matrix multiplication and ** for matrix power. If this PEP were accepted, what would be the recommended course of action for a library like SymPy (for the sake of the argument, pretend its in the future and SymPy no longer supports versions of Python that do not contain this operator, and ignore any backwards incompatibility issues, though those are quite valid concerns)? Should we make * and ** work element-wise, even though those are quite rare from the SymPy point of view? Should we keep both * and @ as the same thing (making the operator basically useless)? Should we create some new NumPyMatrix object that acts like a numpy matrix in that * does element-wise multiply? Keep in mind that SymPy can also be used to generate code for things like numpy using functions like lambdify.

@abalkin You said of numpy.dot that "it treats its first 1d argument as a row and its second as a column". That is what I objected to. It treats them both as 1d, not 2d, and produces and ordinary scalar product (not a 1x1 array). I consider this good behavior. Going much further, I believe this will be good behavior for @ as well. And it does meet the criterion of a matching last and first dimension (since there is only one dimension). However it may not belong in the PEP, since the return is a different type of object (a scalar, not a 1d array).

Contributor

abalkin commented Feb 24, 2014

And it does meet the criterion of a matching last and first dimension ...

It may, but in doing so, it violates ndim >= 2 definition:

    For N dimensions it is a sum product over the last axis of `a` and
    the **second-to-last** of `b`::

        dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])
Contributor

abalkin commented Feb 24, 2014

Ultimately, the current behavior of np.dot is irrelevant for the PEP. The PEP is proposing a new operation and is not constrained by backward compatibility.

It is clear how @ should be defined for 2d-arrays. It is the familiar row x column matrix multiplication and 2d-arrays are interpreted consistently with the way they are displayed: 0-axis indexing rows and 1-axis indexing columns.

The open question is what @ should do when one or both operands have ndim < 2. One possible answer is broadcasting. In broadcasting, missing dimensions are added on the left:

>>> array([1,2]) + zeros((2,3))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: operands could not be broadcast together with shapes (2,) (2,3)
>>> array([1,2]) + zeros((3,2))
array([[ 1.,  2.],
       [ 1.,  2.],
       [ 1.,  2.]])

This is consistent with 1d-arrays being treated as row-vectors:

>>> array([[1,2]]) + zeros((2,3))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: operands could not be broadcast together with shapes (1,2) (2,3)
>>> array([[1,2]]) + zeros((3,2))
array([[ 1.,  2.],
       [ 1.,  2.],
       [ 1.,  2.]])

Another way to interpret a @ b is as sum(a * b.T), where * is numpy's element-wise multiplication and sum is over the appropriate axis. Since for 1d-vectors b.T ~ b, we will have A @ b ~ b @ A.

Contributor

abalkin commented Feb 24, 2014

@alan-isaac - consider matrix-vector multiplication:

>>> A = ones((2,3))
>>> dot([1,2], A)
array([ 3.,  3.,  3.])
>>> dot(A, [1,2,3])
array([ 6.,  6.])

This is what I meant by "it treats its first 1d argument as a row and its second as a column"

The problem manifests itself when you try to add dimensions. The first product behaves intuitively:

>>> dot([[1,2]], [A])
array([[[ 3.,  3.,  3.]]])

but the second does not:

>>> dot([A], [[1,2,3]])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: objects are not aligned
>>> dot(A[...,newaxis], [[1,2,3]])
array([[[ 1.,  2.,  3.],
        [ 1.,  2.,  3.],
        [ 1.,  2.,  3.]],

       [[ 1.,  2.,  3.],
        [ 1.,  2.,  3.],
        [ 1.,  2.,  3.]]])

@abalkin It seems we use the terms "row" and "column" differently. To me, this terminology implies that the 1d vectors are treated as if they were augmented to 2d. The that is clearly not the case: we get dimension reduction in your new example too.
On two other points ...

  1. If the behavior of dot, tensordot, etc irrelevant. Far from it. These are established and useful behaviors. The very first thing that should be considered is whether @ will implement one of these behaviors. Starting from scratch is likely to prove tragic.
  2. I am not sure why dot and tensordot each choose a different dimension for multiplication. I'm probably overlooking something obvious.
Owner

njsmith commented Feb 24, 2014

Guys, why are we making this so complicated? Associativity is a nice property -- particularly for an infix operator that will often be used without parentheses. But for every expression anyone ever writes down as a mathematical formula, the proposed semantics are associative. The only time associativity is violated is for expressions that have matrices on both sides of a vector, M1 @ v @ M2, or more than two vectors multiplied together, v1 @ v2 @ v3. These situations basically never happen in practice (in fact the latter doesn't even make much sense), and when they do you need some extra notation to deal with them regardless of how @ is defined.

OTOH, forcing everything into being 2d all the time causes tons of hassle all over the place trying to keep track of row-ness versus column-ness for items where this is not a natural way to think about things.

There's a reason dot works the way it does. People use the 1d behaviour constantly.

tensordot and inner are interesting, but across scikit-learn and nipy, there are: ~780 uses of dot, 8 uses of inner, and no uses of tensordot. It's clear that dot covers the vast majority of what people actually want.

Owner

njsmith commented Feb 24, 2014

Or, put this another way... above I gave 4 obvious, real-life examples where the proposed "outside-1-rule" semantics (for 1d, add a 1 on the side away from the @) just DTRT, while AFAICT every other proposal requires something really nasty in at least 2 of these examples.

Can anyone give any examples at all that demonstrate the opposite? I.e. a place where one of these alternative proposals produces cleaner code than the one in the PEP?

Owner

njsmith commented Feb 24, 2014

@asmeurer: Yes, excellent question!

I guess my feeling is that in a perfect world, SymPy and numpy would have the same notation? And for numpy it's very clear that * and @ are both different and important, while SymPy doesn't really care that much about elementwise multiplication, so the globally optimal solution would be for SymPy to follow numpy's lead? If @ were a real first class operator like * then one might as well use @ for matrix product, it's not like * is easier to type or anything.

In practice, backwards compat would make this pretty tricky, of course. Do you have elementwise multiplication at all? If not then having both * and @ with the same meaning wouldn't be too horrible -- one for backwards compat, and one for consistency with numpy and other libraries.

I'm curious why this PEP cares so much about the implementation details of this operator, since that would be up to the libraries, not Python core? Are there plans to add a reference array implementation to the standard library?

Owner

pv commented Feb 24, 2014

I guess one motivation for discussing the details is to think forward aiming at better intercompatibility between different projects that might implement matrix-like objects. Adding a reference array to the stdlib hasn't been suggested so far (and I doubt it is appearing in anyone's dreams either).

Contributor

argriffing commented Feb 24, 2014

intercompatibility between different projects that might implement matrix-like objects.

Just as another data point, the pyoperators project seems to have a completely different array interface -- if I understand correctly they use functional composition of operators as their analogue of matrix multiplication, and * of operators means entrywise products of actions of the operators (not of the matrix representations of the operators themselves).

Edit: as noticed by @pv * is synonymous with functional composition which is like matrix multiplication of matrix representations, and the 'entrywise product of actions' is something completely different -- the MultiplicationOperator.

It just seems odd to add a different operator for matrix multiplication, instead of adding a new operator for elementwise multiplication. I understand that this has been tried in a PEP in the past and was rejected, so please fill me in if there are good reasons to not do this.

Owner

pv commented Feb 24, 2014

AFAIK, in pyoperators __call__ and __mul__ are the same operation.

@asmeurer asmeurer commented on the diff Feb 24, 2014

doc/neps/return-of-revenge-of-matmul-pep.rst
+much larger than both projects put together (stdlib: 411575 SLOC,
+scikit-learn: 50924 SLOC, nipy: 37078 SLOC). [#sloc-details]
+
+The **dot** row (marked ``******``) counts how common matrix multiply
+operations are in each codebase.
+
+::
+
+ ==== ====== ============ ==== ========
+ op stdlib scikit-learn nipy combined
+ ==== ====== ============ ==== ========
+ = 2969 5536 4932 3376 / 10,000 SLOC
+ - 218 444 496 261
+ + 224 201 348 231
+ == 177 248 334 196
+ * 156 284 465 192
@asmeurer

asmeurer Feb 24, 2014

Does this count uses of np.matrix as well?

@pv

pv Feb 24, 2014

Owner

nipy and scikit-learn do not appear to use np.matrix at all.
(Except testing that it is cast to ndarray in some cornercase testcases.)

@argriffing

argriffing Feb 24, 2014

Contributor

Does this count uses of np.matrix as well?

who knows? it could also be scipy.sparse

@pv

pv Feb 24, 2014

Owner

sklearn seems to encapsulate sparse matrix products to a safe_sparse_dot function, so probably only one of the * counted is scipy.sparse matrix product.

@larsmans

larsmans Apr 9, 2014

Contributor

I don't think any of the sklearn uses of * is a matrix multiplication.

@cjw43

cjw43 Apr 10, 2014

On 09-Apr-2014 3:38 PM, Lars Buitinck wrote:
  In doc/neps/return-of-revenge-of-matmul-pep.rst:
  > +much larger than both projects put together (stdlib: 411575 SLOC,

+scikit-learn: 50924 SLOC, nipy: 37078 SLOC). [#sloc-details]
+
+The dot row (marked ******) counts how common matrix multiply
+operations are in each codebase.
+
+::
+

  • ==== ====== ============ ==== ========
  •  op  stdlib  scikit-learn  nipy  combined
    
  • ==== ====== ============ ==== ========
  •   =    2969          5536  4932      3376 / 10,000 SLOC
    
  •   -     218           444   496       261
    
  •   +     224           201   348       231
    
  •  ==     177           248   334       196
    
  •   \*     156           284   465       192
    
  I don't think any of the sklearn uses of * is a
    matrix multiplication.
  —
    Reply to this email directly or view
      it on GitHub.

Perhaps this is because much of the SciPy was developed before the
the Numarray (Space Telescope) people took over the maintenance.
It seems to me that  there is merit in recognizing the Matrix, and
perhaps the Vector, as separate data types (Classes).
Colin W.
Colin W.
@njsmith

njsmith Apr 10, 2014

Owner

Hi Colin,

I honestly have no idea what you're trying to insinuate about the
connection between the Space Telescope, SciPy, and scikit-learn (??), but I
don't think it's helping anything. Nor is simply posting over and over in
different forums that you like Matrix and Vector classes. We know that now
:-). But just saying that is not going to convince anyone to like them with
you.

Realistically, my best guess is that nothing you say is likely to convince
anyone of anything, given that np.matrix has come to its present state of
poor reputation through years of experience, and it isn't clear you even
understand why most numpy developers and users are convinced that the
approach is fundamentally flawed and unfixable. But if you really want to
try, then the place to do that is the numpy-discussion list, and the way to
do that is to understand and engage with specific the problems people have
raised.

Hope that helps,
-n

On Thu, Apr 10, 2014 at 1:54 PM, Colin J. Williams <notifications@github.com

wrote:

In doc/neps/return-of-revenge-of-matmul-pep.rst:

+much larger than both projects put together (stdlib: 411575 SLOC,
+scikit-learn: 50924 SLOC, nipy: 37078 SLOC). [#sloc-details]
+
+The dot row (marked ******) counts how common matrix multiply
+operations are in each codebase.
+
+::
+

  • ==== ====== ============ ==== ========
  •  op  stdlib  scikit-learn  nipy  combined
    
  • ==== ====== ============ ==== ========
  •   =    2969          5536  4932      3376 / 10,000 SLOC
    
  •   -     218           444   496       261
    
  •   +     224           201   348       231
    
  •  ==     177           248   334       196
    
  •   \*     156           284   465       192
    

On 09-Apr-2014 3:38 PM, Lars Buitinck wrote: In
doc/neps/return-of-revenge-of-matmul-pep.rst: +much larger than both
projects put together (stdlib: 411575 SLOC,
+scikit-learn: 50924 SLOC, nipy: 37078 SLOC). [#sloc-details] + +The
dot row (marked ******) counts how common matrix multiply
+operations are in each codebase. + +:: + + ==== ====== ============ ====
======== + op stdlib scikit-learn nipy combined + ==== ====== ============
==== ======== + = 2969 5536 4932 3376 / 10,000 SLOC + - 218 444 496 261 + +
224 201 348 231 + == 177 248 334 196 + * 156 284 465 192
I don't think any of the sklearn uses of * is a matrix multiplication. —
Reply to this email directly or view it on GitHub. Perhaps this is because
much of the SciPy was developed before the the Numarray (Space Telescope)
people took over the maintenance. It seems to me that there is merit in
recognizing the Matrix, and perhaps the Vector, as separate data types
(Classes). Colin W. Colin W.


Reply to this email directly or view it on GitHubhttps://github.com/numpy/numpy/pull/4351/files#r11482704
.

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

@robertwb

robertwb Apr 10, 2014

On Thu, Apr 10, 2014 at 12:58 PM, njsmith notifications@github.com wrote:

Hi Colin, I honestly have no idea what you're trying to insinuate about
the connection between the Space Telescope, SciPy, and scikit-learn (??),
but I don't think it's helping anything. Nor is simply posting over and
over in different forums that you like Matrix and Vector classes. We know
that now :-). But just saying that is not going to convince anyone to like
them with you. Realistically, my best guess is that nothing you say is
likely to convince anyone of anything, given that np.matrix has come to its
present state of poor reputation through years of experience, and it isn't
clear you even understand why most numpy developers and users are convinced
that the approach is fundamentally flawed and unfixable. But if you really
want to try, then the place to do that is the numpy-discussion list, and
the way to do that is to understand and engage with specific the problems
people have raised.

This keeps coming up (and not by just one or two isolated people) because
the PEP doesn't convincingly show that a better Matrix class than np.matrix
isn't a viable alternative.

@njsmith

njsmith Apr 10, 2014

Owner

I know it can be frustrating to feel like you're being ignored, and I'm
trying to be constructive, but if you're going to reply to my post saying
"please engage with the specific points instead of just repeating
assertions over and over" by repeating vague assertions and ignoring
specific points then I've kind of run out of responses.

A PEP which has, in fact, convinced the BDFL and (apparently) the rest of
the Python core team is, by definition, convincing. Maybe we were all
wrong to be convinced, it's always possible. But asserting that it isn't
convincing and saying nothing else just makes it sound like you think we're
all idiots.

(To be clear, there's nothing wrong with thinking we're all idiots. That
could be true to. But as a practical matter it's not going to produce any
of the changes in behaviour that you're hoping for.)

On Thu, Apr 10, 2014 at 10:05 PM, Robert Bradshaw
notifications@github.comwrote:

In doc/neps/return-of-revenge-of-matmul-pep.rst:

+much larger than both projects put together (stdlib: 411575 SLOC,
+scikit-learn: 50924 SLOC, nipy: 37078 SLOC). [#sloc-details]
+
+The dot row (marked ******) counts how common matrix multiply
+operations are in each codebase.
+
+::
+

  • ==== ====== ============ ==== ========
  •  op  stdlib  scikit-learn  nipy  combined
    
  • ==== ====== ============ ==== ========
  •   =    2969          5536  4932      3376 / 10,000 SLOC
    
  •   -     218           444   496       261
    
  •   +     224           201   348       231
    
  •  ==     177           248   334       196
    
  •   \*     156           284   465       192
    

On Thu, Apr 10, 2014 at 12:58 PM, njsmith notifications@github.com
wrote: Hi Colin, I honestly have no idea what you're trying to insinuate
about the connection between the Space Telescope, SciPy, and scikit-learn
(??), but I don't think it's helping anything. Nor is simply posting over
and over in different forums that you like Matrix and Vector classes. We
know that now :-). But just saying that is not going to convince anyone to
like them with you. Realistically, my best guess is that nothing you say is
likely to convince anyone of anything, given that np.matrix has come to its
present state of poor reputation through years of experience, and it isn't
clear you even understand why most numpy developers and users are convinced
that the approach is fundamentally flawed and unfixable. But if you really
want to try, then the place to do that is the numpy-discussion list, and
the way to do that is to understand and engage with specific the problems
people have raised.
This keeps coming up (and not by just one or two isolated people) because
the PEP doesn't convincingly show that a better Matrix class than np.matrix
isn't a viable alternative.


Reply to this email directly or view it on GitHubhttps://github.com/numpy/numpy/pull/4351/files#r11506487
.

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

@rlamy

rlamy Apr 10, 2014

Contributor

Le 10/04/14 22:15, njsmith a écrit :

I know it can be frustrating to feel like you're being ignored, and I'm
trying to be constructive, but if you're going to reply to my post
saying "please engage with the specific points instead of just repeating
assertions over and over" by repeating vague assertions and ignoring
specific points then I've kind of run out of responses. A PEP which has,
in fact, convinced the BDFL and (apparently) the rest of the Python core
team is, by definition, convincing. Maybe we were all wrong to be
convinced, it's always possible. But asserting that it isn't convincing
and saying nothing else just makes it sound like you think we're all
idiots. (To be clear, there's nothing wrong with thinking we're all
idiots. That could be true to. But as a practical matter it's not going
to produce any of the changes in behaviour that you're hoping for.)

That is, to put it mildly, a fallacy. It's quite obvious that Guido
trusted the PEP authors to produce a fair assessment of domain-specific
technical issues and did not scrutinise each claim.

@stefanv

stefanv Apr 10, 2014

Contributor

@robertwb Adding another class would mean that all packages that depend on NumPy would have to build in support for Matrices. Or, we could just subclass from an ndarray, but that gives us the mess we're already in.

@cjw43

cjw43 Apr 11, 2014

On 10-Apr-2014 3:58 PM, njsmith wrote:

  In doc/neps/return-of-revenge-of-matmul-pep.rst:
  > +much larger than both projects put together (stdlib: 411575 SLOC,

+scikit-learn: 50924 SLOC, nipy: 37078 SLOC). [#sloc-details]
+
+The dot row (marked ******) counts how common matrix multiply
+operations are in each codebase.
+
+::
+

  • ==== ====== ============ ==== ========
  •  op  stdlib  scikit-learn  nipy  combined
    
  • ==== ====== ============ ==== ========
  •   =    2969          5536  4932      3376 / 10,000 SLOC
    
  •   -     218           444   496       261
    
  •   +     224           201   348       231
    
  •  ==     177           248   334       196
    
  •   \*     156           284   465       192
    
  Hi Colin,
    I honestly have no idea what you're trying to insinuate about
    the
    connection between the Space Telescope, SciPy, and scikit-learn
    (??), but I
    don't think it's helping anything. Nor is simply posting over
    and over in
    different forums that you like Matrix and Vector classes. We
    know that now
    :-). But just saying that is not going to convince anyone to
    like them with
    you.
    Realistically, my best guess is that nothing you say is likely
    to convince
    anyone of anything, given that np.matrix has come to its present
    state of
    poor reputation through years of experience, and it isn't clear
    you even
    understand why most numpy developers and users are convinced
    that the
    approach is fundamentally flawed and unfixable. But if you
    really want to
    try, then the place to do that is the numpy-discussion list, and
    the way to
    do that is to understand and engage with specific the problems
    people have
    raised.
    Hope that helps,
    -n

Nathaniel,
I'm not trying to insinuate anything.  You don't quote the words
that you are troubled by.
What I intended to convey was that the scikit material was likely
developed prior to the Numarray period in the evolution of Numpy.  I
believe that the Matrix class was introduce during the Numarray
time.  In about 2007, I think, Travis Olliphant took over
development, under the original name - Numpy and continued the
Matrix class.
I was pleased to see that the Matrix class was introduced and
continued, as it provides for provides for a different type of data.
Unfortunately, you provide no examples of how the class failed to
meet your needs.  I hope to be be able to put together a critical
analysis of PEP 465.
Best wishes,
Colin W.
Colin W.

    On Thu, Apr 10, 2014 at 1:54 PM,
      Colin J. Williams <notifications@github.com > wrote: In
      doc/neps/return-of-revenge-of-matmul-pep.rst: > +much
      larger than both projects put together (stdlib: 411575 SLOC,
      > +scikit-learn: 50924 SLOC, nipy: 37078 SLOC).
      [#sloc-details] > + > +The **dot** row (marked
      ``******``) counts how common matrix multiply > +operations
      are in each codebase. > + > +:: > + > + ====
      ====== ============ ==== ======== > + op stdlib
      scikit-learn nipy combined > + ==== ====== ============
      ==== ======== > + = 2969 5536 4932 3376 / 10,000 SLOC >
      + - 218 444 496 261 > + + 224 201 348 231 > + == 177 248
      334 196 > + * 156 284 465 192 On 09-Apr-2014 3:38 PM, Lars
      Buitinck wrote: In
      doc/neps/return-of-revenge-of-matmul-pep.rst: +much larger
      than both projects put together (stdlib: 411575 SLOC,
      +scikit-learn: 50924 SLOC, nipy: 37078 SLOC). [#sloc-details]
      + +The **dot** row (marked ``******``) counts how common
      matrix multiply +operations are in each codebase. + +:: + +
      ==== ====== ============ ==== ======== + op stdlib
      scikit-learn nipy combined + ==== ====== ============ ====
      ======== + = 2969 5536 4932 3376 / 10,000 SLOC + - 218 444 496
      261 + + 224 201 348 231 + == 177 248 334 196 + * 156 284 465
      192 I don't think any of the sklearn uses of * is a matrix
      multiplication. — Reply to this email directly or view it on
      GitHub. Perhaps this is because much of the SciPy was
      developed before the the Numarray (Space Telescope) people
      took over the maintenance. It seems to me that there is merit
      in recognizing the Matrix, and perhaps the Vector, as separate
      data types (Classes). Colin W. Colin W. — Reply to this email
      directly or view it on GitHub<https://github.com/numpy/numpy/pull/4351/files#r11482704>

      .

    -- Nathaniel J. Smith
      Postdoctoral researcher - Informatics - University of
      Edinburgh
      http://vorpus.org


  —
    Reply to this email directly or view
      it on GitHub.
@rkern

rkern Apr 11, 2014

Member

The Matrix class was in Numeric and predates numpy, numarray, scikit-learn, and nipy. This is why your message was so confusing.

@rkern

rkern Apr 11, 2014

Member

@robertwb The PEP does explain the problem with a 2-type-1-operator system, any 2-type-1-operator system. It is not limited to the particular implementation choices of numpy.matrix.

@cjw43

cjw43 Apr 11, 2014

On 10-Apr-2014 5:15 PM, njsmith wrote:

  In doc/neps/return-of-revenge-of-matmul-pep.rst:
  > +much larger than both projects put together (stdlib: 411575 SLOC,

+scikit-learn: 50924 SLOC, nipy: 37078 SLOC). [#sloc-details]
+
+The dot row (marked ******) counts how common matrix multiply
+operations are in each codebase.
+
+::
+

  • ==== ====== ============ ==== ========
  •  op  stdlib  scikit-learn  nipy  combined
    
  • ==== ====== ============ ==== ========
  •   =    2969          5536  4932      3376 / 10,000 SLOC
    
  •   -     218           444   496       261
    
  •   +     224           201   348       231
    
  •  ==     177           248   334       196
    
  •   \*     156           284   465       192
    
  I know it can be frustrating to feel
    like you're being ignored, and I'm
    trying to be constructive, but if you're going to reply to my
    post saying
    "please engage with the specific points instead of just
    repeating
    assertions over and over" by repeating vague assertions and
    ignoring
    specific points then I've kind of run out of responses.
    A PEP which has, in fact, convinced the BDFL and (apparently)
    the rest of
    the Python core team is, by definition, convincing. Maybe we
    were all
    *wrong* to be convinced, it's always possible. But asserting
    that it isn't
    convincing and saying nothing else just makes it sound like you
    think we're
    all idiots.
    (To be clear, there's nothing wrong with thinking we're all
    idiots. That
    could be true to. But as a practical matter it's not going to
    produce any
    of the changes in behaviour that you're hoping for.)
  …

    On Thu, Apr 10, 2014 at 10:05
      PM, Robert Bradshaw <notifications@github.com>wrote: In
      doc/neps/return-of-revenge-of-matmul-pep.rst: > +much
      larger than both projects put together (stdlib: 411575 SLOC,
      > +scikit-learn: 50924 SLOC, nipy: 37078 SLOC).
      [#sloc-details] > + > +The **dot** row (marked
      ``******``) counts how common matrix multiply > +operations
      are in each codebase. > + > +:: > + > + ====
      ====== ============ ==== ======== > + op stdlib
      scikit-learn nipy combined > + ==== ====== ============
      ==== ======== > + = 2969 5536 4932 3376 / 10,000 SLOC >
      + - 218 444 496 261 > + + 224 201 348 231 > + == 177 248
      334 196 > + * 156 284 465 192 On Thu, Apr 10, 2014 at 12:58
      PM, njsmith <notifications@github.com> wrote: Hi Colin,
      I honestly have no idea what you're trying to insinuate about
      the connection between the Space Telescope, SciPy, and
      scikit-learn (??), but I don't think it's helping anything.
      Nor is simply posting over and over in different forums that
      you like Matrix and Vector classes. We know that now :-). But
      just saying that is not going to convince anyone to like them
      with you. Realistically, my best guess is that nothing you say
      is likely to convince anyone of anything, given that np.matrix
      has come to its present state of poor reputation through years
      of experience, and it isn't clear you even understand why most
      numpy developers and users are convinced that the approach is
      fundamentally flawed and unfixable. But if you really want to
      try, then the place to do that is the numpy-discussion list,
      and the way to do that is to understand and engage with
      specific the problems people have raised. This keeps coming up
      (and not by just one or two isolated people) because the PEP
      doesn't convincingly show that a better Matrix class than
      np.matrix isn't a viable alternative. — Reply to this email
      directly or view it on GitHub<https://github.com/numpy/numpy/pull/4351/files#r11506487>

      .

    -- Nathaniel J. Smith
      Postdoctoral researcher - Informatics - University of
      Edinburgh
      http://vorpus.org


  —
    Reply to this email directly or view
      it on GitHub.

Nathaniel,
I have explained to you that I am preparing a critique of the PEP
465, which, in my view is flawed.
It provides no balanced view of the status quo.  You
provide an illustration of some calculations using @, but have not
shown how objects of the Matrix Class fail to handle the similar
calculations.
I suggest that balance is important as you need to convince those
outside the Numpy community that the addition of two new operators
is justified for all the Python users, most of whom are not users of
linear algebra.  It adds to the Python learning curve with no
benefit to most and very limited benefit to the users of linear
algebra.
Best wishes,
Colin W.
@rkern

rkern Apr 11, 2014

Member

@cjw43 Can you please use the Github web interface for commenting? If you must use email, please make sure there are no signatures in your response or quoted content except for the smallest part that you are responding to. Your comments are unreadable, both in email and on Github.

@cjw43

cjw43 Apr 11, 2014

Robert,
Thanks for this clarification.  You have been very much closer to
these things than I have over the years.
My conjecture, as to the reason that SciKits was said not to mention
Matrix, was clearly wrong.
Colin W.On 11-Apr-2014 5:14 AM, Robert Kern
  wrote:

  In doc/neps/return-of-revenge-of-matmul-pep.rst:
  > +much larger than both projects put together (stdlib: 411575 SLOC,

+scikit-learn: 50924 SLOC, nipy: 37078 SLOC). [#sloc-details]
+
+The dot row (marked ******) counts how common matrix multiply
+operations are in each codebase.
+
+::
+

  • ==== ====== ============ ==== ========
  •  op  stdlib  scikit-learn  nipy  combined
    
  • ==== ====== ============ ==== ========
  •   =    2969          5536  4932      3376 / 10,000 SLOC
    
  •   -     218           444   496       261
    
  •   +     224           201   348       231
    
  •  ==     177           248   334       196
    
  •   \*     156           284   465       192
    
  The Matrix class was in Numeric and predates
    numpy, numarray, scikit-learn, and nipy. This is why your
    message was so confusing.
  —
    Reply to this email directly or view
      it on GitHub.
@argriffing

argriffing Apr 11, 2014

Contributor

I'm kind of reluctant to join this thread, but regarding scikit-learn and np.matrix I thought I'd just add some relevant links. Here's the scikit-learn history page http://scikit-learn.org/stable/about.html, here's a 2011 scikit-learn thread about what to do about np.matrix http://osdir.com/ml/python-scikit-learn/2011-10/msg00212.html and here's a related sklearn github issue scikit-learn/scikit-learn#392. I haven't read these links in detail and I'm not trying to use them to make any argument regarding the PEP.

@cjw43

cjw43 Apr 11, 2014

On 11-Apr-2014 11:45 AM, argriffing
  wrote:

  In doc/neps/return-of-revenge-of-matmul-pep.rst:


  I'm kind of reluctant to join this thread, but regarding
    scikit-learn and np.matrix I thought I'd just add some relevant
    links. Here's the scikit-learn history page http://scikit-learn.org/stable/about.html,
    here's a 2011 scikit-learn thread about what to do about
    np.matrix http://osdir.com/ml/python-scikit-learn/2011-10/msg00212.html
    and here's a related sklearn github issue scikit-learn/scikit-learn#392. I haven't
    read these links in detail and I'm not trying to use them to
    make any argument regarding the PEP.
  —
    Reply to this email directly or view
      it on GitHub.


Thanks, this is helpful:The issue is: many modules currently fail when handed an
  np.matrix. We
  can resolve this by either:
  CONVERTING to np.ndarray
  * matrices are obviously array-like
  * matrices may come about inadvertently, because some NumPy/SciPy
  routines/methods happen to return them
  * we loosely use the term "matrix" throughout the docs, so it
  would be
  strange to require a matrix not to be an np.matrix
  * we are (or should be) doing input validation anyway, with
  utils.{safe_asanyarray, as_float_array, atleast2d_or_csr}
  * we already need to handle the matrix API in all modules that
  accept
  sparse input
  * backwards compatibility
  or
  REJECTING np.matrix by throwing a TypeError
  * mathematical operators may have different meanings than on array
  (*
  means dot product on matrix, Hadamard product on arrays)
  * matrices are always 2-d, so ravel, flatten and reshape don't
  behave
  as expected
  * converting to array is easy enough for the user: just call .A on
  every matrix
  * explicit is better than implicit
  * we'd need to test every routine against matrices (but then, we
  should be testing input validation anyway)   
Three votes were in favour of Rejecting and six were in favour of 
Converting.
No consideration was given to addressing various problems raised in
the discussion, including improved testing routines.
One of the examples above: matrices may come about inadvertently, because some
  NumPy/SciPy
  routines/methods happen to return them
The documentation should make it clear what type(s) each function is
expected to return.  It is then up to the calling process to deal
with the returned object appropriately.
Colin W.
@charris

charris Apr 11, 2014

Owner

On Fri, Apr 11, 2014 at 3:16 PM, Colin J. Williams <notifications@github.com

wrote:

In doc/neps/return-of-revenge-of-matmul-pep.rst:

+much larger than both projects put together (stdlib: 411575 SLOC,
+scikit-learn: 50924 SLOC, nipy: 37078 SLOC). [#sloc-details]
+
+The dot row (marked ******) counts how common matrix multiply
+operations are in each codebase.
+
+::
+

  • ==== ====== ============ ==== ========
  •  op  stdlib  scikit-learn  nipy  combined
    
  • ==== ====== ============ ==== ========
  •   =    2969          5536  4932      3376 / 10,000 SLOC
    
  •   -     218           444   496       261
    
  •   +     224           201   348       231
    
  •  ==     177           248   334       196
    
  •   \*     156           284   465       192
    

On 11-Apr-2014 11:45 AM, argriffing wrote: In
doc/neps/return-of-revenge-of-matmul-pep.rst: I'm kind of reluctant to join
this thread, but regarding scikit-learn and np.matrix I thought I'd just
add some relevant links. Here's the scikit-learn history page
http://scikit-learn.org/stable/about.html, here's a 2011 scikit-learn
thread about what to do about np.matrix
http://osdir.com/ml/python-scikit-learn/2011-10/msg00212.html and here's
a related sklearn github issue scikit-learn/scikit-learn#392scikit-learn/scikit-learn#392.
I haven't read these links in detail and I'm not trying to use them to make
any argument regarding the PEP. -- Reply to this email directly or view it
on GitHub. Thanks, this is helpful:The issue is: many modules currently
fail when handed an np.matrix. We can resolve this by either: CONVERTING to
np.ndarray * matrices are obviously array-like * matrices may come about
inadvertently, because some NumPy/SciPy routines/methods happen to return
them * we loosely use the term "matrix" throughout the docs, so it would be
strange to require a matrix not to be an np.matrix * we are (or should be)
doing input validation anyway, with utils.{safe_asanyarray, as_float_array,
atleast2d_or_csr} * we already need to handle the matrix API in all modules
that accept sparse input * backwards compatibility or REJECTING np.matrix
by throwing a TypeError * mathematical operators may have different
meanings than on array (* means dot product on matrix, Hadamard product on
arrays) * matrices are always 2-d, so ravel, flatten and reshape don't
behave as expected * converting to array is easy enough for the user: just
call .A on every matrix * explicit is better than implicit * we'd need to
test every routine against matrices (but then, we should be testing input
validation anyway) Three votes were in favour of Rejecting and six were
in favour of Converting. No consideration was given to addressing various
problems raised in the discussion, including improved testing routines. One
of the examples above: matrices may come about inadvertently, because some
NumPy/SciPy routines/methods happen to return them The documentation should
make it clear what type(s) each function is expected to return. It is then
up to the calling process to deal with the returned object appropriately.
Colin W.

But Colin, we are trying to make writing code easier, not more difficult ;)

Chuck

Owner

pv commented Feb 24, 2014

I believe the PEP just assumes that * is the elementwise multiplication, as it is in Numpy.
This is in practice a more common operation than matrix products, so the choice is defensible.

@njsmith I agree that one possibility is that @ just means dot. Plus side: simplest. Downside: carries with it some odd (if useful) semantics. The downside I think is the source of any controversy about this. My core point is not about which behavior is desirable, but rather that the PEP should focus on the non-controversial part: A@B would mean matrix multiplication when A and B are 2d. That I believe is the core proposal. I do not think the PEP needs to address what happens outside of this case.
Debate over any additional behavior(s) could take place in the future, if we are lucky enough to get the addition of @ approved. The consensus might well be that this is just another way to write dot, or it might go elsewhere.

Owner

charris commented Feb 24, 2014

@alan-isaac You mean this?

Contributor

argriffing commented Feb 24, 2014

I was more afraid of this

Owner

njsmith commented Feb 24, 2014

So my reasoning for putting this section in in the first place is this:

a) If we were proposing adding an implementation of @ to the stdlib, then
we would absolutely, 100%, no way to even consider getting around it,
have to nail down each and every one of these details before anyone would
consider accepting the PEP, or even debating it properly.

b) If I were a Python maintainer, I would be pretty nervous from the start
being asked to sign off on a new operator (the first one since 2.2!)
whose behaviour was completely out of my hands (!!!). Asking me to do that
with less due diligence than usual would really scare me. What if in
the next version someone does want to add matrix product support to
memoryview? Whatever numpy does with @ will strongly constrain how that
could work, and you can only add @ once. Better to reject it now unless we
can be sure it's really right.

c) ...Especially when this proposal has the appearance of coming out of the
community around just one package. A maintainer's sacred duty is to take
the global view and balance the interests of different user communities. So
a good maintainer never trusts the person who proposed something to have a
fair view of its tradeoffs; this has to be re-evaluated with other users in
mind.

So this section serves four purposes: it reassures external readers that
we've thought all the details through, it gives them a chance to "check our
work" and raise any concerns they have, it frames the proposal as not "give
numpy an operator" but rather "add an operator with these semantics (oh hey
and numpy will use it)" which is much easier to evaluate, and it gives us a
flag for multiple projects to rally around which demonstrates that there's
a broad base of support.

So if we can nail down the details I really think it is valuable. We're
going to have to do this sooner or later. Even if @ gets rejected we're
still going to have the exact same argument about 'dot'. AFAICT the only
difference is that associativity is a little more important for an infix
operator.

And, I really think that the behaviour proposed in the PEP is the unique,
obvious, correct way to implement @. (It differs from current numpy.dot for
0d and >2d arrays, so saying that it's "just dot" is a little weird. But I
think those are are bugs in dot, not in the proposed @. And AFAICT no-one
is arguing here with the 0d and >2d behaviour of the proposed @, all the
debate is about the 1d behaviour, even though AFAICT this is a part of
'dot', and 'solve', scipy.sparse's mul, etc., that is ancient,
battle-tested, and which no-one has ever objected to.)

Therefore, since this behaviour is the right behavior ;-), I think that we
ought to be able to converge on consensus that it's the right behaviour
pretty quickly. And that will make the PEP much stronger for the reasons
stated above.

(Though, the argument will also make it stronger, because this debate is
totally getting coverage in the rationale section ;-).)

-n

On Mon, Feb 24, 2014 at 3:54 PM, alan-isaac notifications@github.comwrote:

@njsmith https://github.com/njsmith I agree that one possibility is
that @ just means dot. Plus side: simplest. Downside: carries with it
some odd (if useful) semantics. The downside I think is the source of any
controversy about this. My core point is not about which behavior is
desirable, but rather that the PEP should focus on the non-controversial
part: A@B would mean matrix multiplication when A and B are 2d. That I
believe is the core proposal. I do not think the PEP needs to address what
happens outside of this case.
Debate over any additional behavior(s) could take place in the future, if
we are lucky enough to get the addition of @ approved. The consensus
might well be that this is just another way to write dot, or it might go
elsewhere.


Reply to this email directly or view it on GitHubhttps://github.com/numpy/numpy/pull/4351#issuecomment-35935522
.

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

Contributor

argriffing commented Feb 24, 2014

It differs from current numpy.dot for 0d and >2d arrays

My mental model of @ in this PEP is like a.dot(b), so maybe add a section clarifying the distinction explicitly? I understand that the PEP probably already has enough information so that if I read it carefully then I could determine this for myself.

Member

jaimefrio commented Feb 24, 2014

Not sure if it is worth mentioning it explicitly, but one of the strengths of the proposed broadcasting behavior is that, if any one misses the old dot, once @ is in place you can simply do:

def olddot(a, b):
    return a[(slice(None),) * (a.ndim - 2) + (None,) * (b.ndim - 2)] @ b

and get the outer-product like behavior back. I don;t think there is any comparably efficient way to get the current dot to produce the output described for @.

Owner

njsmith commented Feb 24, 2014

I don't know if this belongs in the PEP (which is aimed at more of a
general audience who presumably doesn't care about bugs in numpy-specific
functions) but the differences from current np.dot are:

  • Currently, if np.dot sees a scalar, then it does scalar multiplication.
    Why? Who knows. Probably to make implementing matrix.mul simpler. I
    can't imagine people are using np.dot to do scalar multiplication when they
    could be using * instead.
  • in the example with two arrays with shapes (10, 2, 3) and (10, 3, 4),
    then as the PEP says, @ will return an array with shape (10, 2, 4). But
    currently np.dot will return an array with shape (10, 10, 2, 4). Basically
    what we have here is two lists, each containing 10 matrices, right? The
    proposed @ will walk down these two lists in parallel, taking the product
    between corresponding pairs of matrices. The current np.dot does a
    nested loop over the two lists, taking the product between all possible
    pairs of matrices (so it does 100 matrix products, not 10). How np.dot
    currently works is inconsistent with how vectorized operations work in
    literally every other place in numpy, including all the matrix operations
    in np.linalg.

And as Jaime notes, using broadcasting gets us a lot of power -- if you
really want the nested loop version, all you have to do is reshape your
input arrays to (10, 1, 2, 3) and (1, 10, 3, 4), and bam! Now @ will give
you the (10, 10, 2, 4) array with 100 different matrix products in it.

On Mon, Feb 24, 2014 at 5:18 PM, argriffing notifications@github.comwrote:

It differs from current numpy.dot for 0d and >2d arrays

My mental model of @ in this PEP is like a.dot(b), so maybe add a section
clarifying the distinction explicitly? I understand that the PEP probably
already has enough information so that if I read it carefully then I could
determine this for myself.


Reply to this email directly or view it on GitHubhttps://github.com/numpy/numpy/pull/4351#issuecomment-35946716
.

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

Contributor

argriffing commented Feb 24, 2014

@njsmith Thanks that is helpful. I guess it would be like dot2d gufunc? I see why this corner of the discussion may not be of interest to general python development.

Member

jaimefrio commented Feb 24, 2014

@argriffing There is an undocumented matrix_multiply gufunc hidden in numpy.core.umath_tests that does just that:

>>> from numpy.core.umath_tests import matrix_multiply
>>> print(matrix_multiply.__doc__)
matrix_multiply(x1, x2[, out])

matrix multiplication on last two dimensions 
     "(m,n),(n,p)->(m,p)" 

I think there is a "should" missing at the end of this line

Owner

njsmith replied Feb 25, 2014

Thanks, fixed.

Contributor

abalkin commented Feb 25, 2014

@alan-isaac, @njsmith:

Let me try to state what we agree on:

  1. The ndim of the result of @ operation is
     Nd @ Md -> (N+M-2)d

so, for matrices and vectors, we have familiar 2d @ 2d -> 2d, 2d @ 1d -> 1d, 1d @ 2d -> 1d, and 1d @ 1d -> 0d. This also explains why @ does not have much sense with 0d operands.

  1. The shape resulting from @ operation is
    (n1, ..., k, ..., nN) @ (m1, ..., k, ..., mN) -> (n1, ..., nN, m1, ..., mM)

In other words, conforming operands should have one axis of matching length and that axis is removed from each of the shapes when forming the shape of the result. I deliberately not specify which axis is dropped from which operand, so that we stay in complete agreement so far.

  1. There is no uncertainty in the 2d case: the axes to be removed are axis=1 from the first and axis=0 from the second operand:
    (n,k) @ (k,m) -> (n,m)
  1. I don't have a problem with the following 1d @ 2d shape rule:
   (k,) @ (k,m) -> (k,)

These all fit into "remove last axis from first and the second to last axis from the second operand" rule. So far we are also in agreement with the numpy.dot behavior.

The case that does not fit and causes associativity to be violated is

   (m,k) dot (k,) -> (m,)

There is no "second to last" axis in a 1d array and numpy.dot makes an exception to the general Nd rule to make this work and violates associativity as a result.

I argue that the new @ operator should not sacrifice associativity for compatibility with numpy.dot and 2d @ 1d should be an error.

Contributor

abalkin commented Feb 25, 2014

@njsmith - I missed your note about ndim > 2 case. I actually like your proposal to effectively treat 2d case as "elementary" and broadcast @ over higher dimensions as any element-wise operation.

This is a significant departure from numpy.dot, but clearly a useful one. I almost wish that @@ be included so that we could get a parallel inverse spelled as [A, B, C] @@ -1.

Owner

njsmith commented Feb 25, 2014

@abalkin: Thanks, that's a great summary for making clear the disagreement.

My counter-argument is that your rule gains associativity, but at the cost
of breaking possibly the single most common use of np.dot, i.e.,
transforming a vector by left-multiplication by a matrix. The first
equation everyone learns in linear algebra, Ax = b, is one that uses the
(m,k) dot (k,) -> (m,) rule. So we have to pick which is more important to
us: associativity, or matrix dot vector.

My argument is that in practical terms, associativity (specifically in the
cases where this rule breaks associativity) provides nowhere near the
practical value that matrix dot vector provides. My reason for thinking
this is that I've written lots and lots of code where matrix dot vector was
useful, and don't think I've ever written any code where this particular
non-associativity would even arise, never mind cause a problem. And it's
not an especially pernicious kind of problem -- every expression you can
write still does have a perfectly well-defined and predictable
interpretation, nothing will explode if someone writes M1 @ v @ M2. It's
just a bit confusing.

What about associativity makes you think that we should consider it more
important than matrix dot vector? Do you have any concrete examples or
anything like that?

On Mon, Feb 24, 2014 at 8:42 PM, abalkin notifications@github.com wrote:

@alan-isaac https://github.com/alan-isaac, @njsmithhttps://github.com/njsmith
:

Let me try to state what we agree on:

  1. The ndim of the result of @ operation is

    Nd @ Md -> (N+M-2)d

so, for matrices and vectors, we have familiar 2d @ 2d -> 2d, 2d @ 1d ->
1d, 1d @ 2d -> 1d, and 1d @ 1d -> 0d. This also explains why @ does not
have much sense with 0d operands.

  1. The shape resulting from @ operation is

    (n1, ..., k, ..., nN) @ (m1, ..., k, ..., mN) -> (n1, ..., nN, m1, ..., mM)

In other words, conforming operands should have one axis of matching
length and that axis is removed from each of the shapes when forming the
shape of the result. I deliberately not specify which axis is dropped from
which operand, so that we stay in complete agreement so far.

  1. There is no uncertainty in the 2d case: the axes to be removed are
    axis=1 from the first and axis=0 from the second operand:

    (n,k) @ (k,m) -> (n,m)

  2. I don't have a problem with the following 1d @ 2d shape rule:

    (k,) @ (k,m) -> (k,)

These all fit into "remove last axis from first and the second to last
axis from the second operand" rule. So far we are also in agreement with
the numpy.dot behavior.

The case that does not fit and causes associativity to be violated is

(m,k) dot (k,) -> (m,)

There is no "second to last" axis in a 1d array and numpy.dot makes an
exception to the general Nd rule to make this work and violates
associativity as a result.

I argue that the new @ operator should not sacrifice associativity for
compatibility with numpy.dot and 2d @ 1d should be an error.


Reply to this email directly or view it on GitHubhttps://github.com/numpy/numpy/pull/4351#issuecomment-35965714
.

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

Owner

njsmith commented Feb 25, 2014

The current spec for @@ doesn't allow -1 as a power -- maybe it should, but
I'm not sure, given how very rare it is for inversion to be actually the
right thing to do, and how verbose @@ -1 ends up being.

We already have vectorized inverse though: inv([A, B, C]) should work. (Or
maybe you have to use dstack or something, I'm not sure, but anyway once
you construct the matrix then inv() should process it.)

On Mon, Feb 24, 2014 at 9:05 PM, abalkin notifications@github.com wrote:

@njsmith https://github.com/njsmith - I missed your note about ndim > 2case. I actually like your proposal to effectively treat 2d case as
"elementary" and broadcast @ over higher dimensions as any element-wise
operation.

This is a significant departure from numpy.dot, but clearly a useful one.
I almost wish that @@ be included so that we could get a parallel inverse
spelled as [A, B, C] @@ -1.


Reply to this email directly or view it on GitHubhttps://github.com/numpy/numpy/pull/4351#issuecomment-35967058
.

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

Contributor

abalkin commented Feb 25, 2014

What about associativity makes you think that we should consider it more
important than matrix dot vector?

It is not associativity as much as the desire to have consistent rules when going from higher to lower dimensionality.

I find it useful to have low-ndim tests for functions that are designed to operate on high-ndim arrays in real scenarios. It is annoying to always add dummy size 1 axes simply because scalars don't behave like arbitrary shape arrays filled with the same value or 1d arrays don't behave consistently as 1xN matrices.

I now realize that the current proposal for @ as matrix product broadcast to dimensions higher than 2 is much better than numpy.dot but it is still problematic to have say 2d @ 1d because
A @ [u, v] = [A @ u, A @ v] if v.ndim >= 2, but for v.ndim = 1 means something completely different.

Contributor

abalkin commented Feb 25, 2014

We already have vectorized inverse though: inv([A, B, C]) should work.

I'll try it now! What about vectorized linalg.solve?

Contributor

abalkin commented Feb 25, 2014

@njsmith : Nevermind - I should reread numpy docs more than once every ten years. :-)

Owner

njsmith commented Feb 25, 2014

I don't see how we can possibly have consistent rules when going from lower
to higher dimensionality, and I don't think that should bother us too much.

The basic idea of broadcasting is that you have some "core" operation which
takes a certain input. And then you have a loop that applies this core
operation to many instances of the same input, with some clever rules for
how to figure out which inputs should be used on each loop. For traditional
cases like "+" this core operation acts on scalars, and the broadcast loop
goes over the whole array; for matrix operations like @ or solve it acts on
2d matrices, and the broadcast loop just goes over the axes that are not
part of the "core". So there's nothing strange about saying that the rules
for how you treat axes "inside" the core is different from how you treat
the ones "outside" -- the whole system is based around this distinction,
they'll end up getting acted on by different code, etc.

And even in your proposal, we do treat these dimensions differently.
Consider (k,) @ (m, k, n). If we were treating dimensions consistently,
what we'd do is broadcast (k,) to match (m, k, n), which would mean first
extending the shorter shape by prepending ones: (k,) -> (1, 1, k), and then
lining up the two axes, replacing all the 1's by the value in the other
array, and then checking that the shapes match. So (1, 1, k) -> (m, k, k),
and then we error out because (m, k, k) != (m, k, n). And even if you skip
the shape check, you'll still get the wrong result, because (k,) @ (k, n)
is totally different from (k, k) @ (k, n). Here the (k, k) matrix is
created by doing np.row_stack([original_matrix] * k).

Obviously this is all very silly, though, because the entire point of the
broadcasting rules is to let us line up two arrays to do a parallel loop
over them. And we don't want to do a parallel loop over the "core" dot
axes, we want to do a dot!

The proper way to think about this (IMO) is what it says in the PEP: first
as one operation promote the (k,) to 2d, (k,) -> (1, k), and then as a
second operation use broadcasting on the non-core axes. So (1, k) -> (1,
1, k) -> (m, 1, k). Or a better way to think about it is, the non-core axes
are () and (m,), respectively, so we broadcast these against each other to
get (m,), (m,), and then we do a loop over these m entries and at each
location we apply the dot product to the (1, k) and (k, n) matrices that we
find there. There's really no similarity between the "prepend a 1" rule
used in broadcasting and the "prepend a 1" rule used in promoting 1d -> 2d
for purposes of dot -- they look sort of similar, but this is just a red
herring.

Also I think it's a nice bit of elegance that once we write down the rule
to make vec @ Mat and Mat @ vec work, then it automatically also gives us a
very useful vec @ vec.

And BTW, something else you might want to worry about -- if we have to
choose between treating vectors as rows and treating them as columns, then
the overwhelming weight of mathematical convention and practice is on the
vectors-as-columns side..

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

Contributor

abalkin commented Feb 25, 2014

@njsmith - please bear with me while I am switching my mental model from dot's Nd @ Md -> (N+M-2)d to the PEP's Nd @ Md -> (max(M,N))d ndim rule.

I think what you propose can be explained by starting from vec @ vec -> scalar and vectorizing to higher dimensions. The only difference between vectorizing @ and vectorizing traditional operators element-wise is that left and right operands are vectorized over different axes.

When we replace a vector on the l.h.s. with a list of vectors, [v1, .., vN] @ v = [v1 @ v, .., vN @ v] rule is equivalent to mat @ vec if mat is treated as a list of rows, but to get vec @ mat from v @ [v1, .., vN] = [v @ v1, .., v @ vN] we treat the matrix as a list of columns.

In general,

     a @ b = [a[0], .., a[N-1]] @ b = a @ [b[...,0], .., b[...,M-1]]

in other words, instead of traditional vectorize over the first axis rule we have the vectorize over the axis away from @ rule.

It looks like this works.

I will try to keep this short. I came to python from Matlab and always found the difference between the " * " operator on array vs matrix to be bothersome. Especially since the matrix seemed to dominated my Matlab experiece and and array's my new numpy life. That was a LONG time ago but before that I spent just as much time in APL where the operation is " +.* "

History colors all our judgements but as this topic is re-raised, I believe it looks most "pythonic" to go with a " +.* " operator. A python developer could reasonably be expected to assume that this is a special "mult" property on a "operator.add" operator (yeah I know that's not the operator but you get the idea). The dot there is an attribute lookup in python and why shouldn't a lookup on an operator return another operator?

I say, "Back to the Future" (part 2) " +.* "

Contributor

Nodd commented Feb 25, 2014

I think that the examples given by @njsmith in #4351 (comment) are the best arguments for being able to have a vector on either side of @.
Also note that in mathematical formulas, vectors are vertical, so the transposition is made on the left vector. It would feel very akward to be able to write b @ A (where b is usually transposed in formulas) but to have to write (A @ b[:, np.newaxis]).ravel() where the usual formula is simply Ab.

I have another remark : why limit the new @ operator to matrix multiplication only ? I mean, it could have uses in other libraries but with a totally different meaning. If if it is added to python, some people will definitely use it for lots of different things (networking comes to mind, but developper imagination has no limit). In this case, why not associate @ with the function __at__ rather than __matmul__ which is very specific ? @ would still be used for matrix multiplication in the case of scientific libraries but it could have another meaning somewhere else. A similar exemple is the % operator which is both the modulo function and used for string formatting. Maybe promoting a more general operator but with matrix multiplicaiton in mind would ease its adoption in python ?

Owner

charris commented Feb 25, 2014

I've been thinking about row/column vectors a bit and one possible solution I see is to add a flag to 1-D vectors. Then a = array([1,2,3]) has the flag set to 0, and a.T sets the flag to 1 - a_value.

The current spec for @@ doesn't allow -1 as a power -- maybe it should, but
I'm not sure, given how very rare it is for inversion to be actually the
right thing to do, and how verbose @@ -1 ends up being.

It's thinking like this that will keep this PEP from being accepted. You are thinking only about Numpy, but this PEP will affect all of Python---the whole ecosystem. Consider SymPy again. If you are doing things symbolically, then you do want to represent inverses. You write things down as you know them mathematically. You then take the symbolic representation and figure out how to actually compute it later. Blaze has a similar goal. Look up Matthew Rocklin's talk from SciPy 2013 (the talk, not the lightening talk, although the lightening talk is quite entertaining), and read his blog posts about computation.

Why won't @@ -1 work? I thought @@ had the exact same presedence rules as **.

Contributor

abalkin commented Mar 14, 2014

Nothing you posted contradicts the statement of mine that you quoted.

I am not trying to contradict what you are saying. Not defining whether 1d is a row or a column is a perfectly valid choice as well. It just does not necessarily help when you need to explain why np.dot([1,2], [[1,2,3], [4,5,6]]) works, but np.dot([[1,2,3], [4,5,6]], [1,2]) does not.

Contributor

abalkin commented Mar 14, 2014

@njsmith - I think I see the root of our misunderstanding each other. I use the term "broadcasting" in the original Numeric sense:

"""
In general, such expressions [a + b] raise an exceptions if the arrays a and b do not have the same shape. There is, however, one important exception: If an axis of an array has length one, this array will be compatible with any array that matches in the other axes. For example, and array with shape (3,1,2) can be combined with an array of shape (3,5,2) or (3,100,2). ...

This extension process goes even one step further. If the two arrays do not have the same number of dimensions, the lower-rank array is “upgraded” to the rank of the other one by adding axes of length one in front of its shape. For example, when an array of shape (3,2) is combined with an array of shape (5,3,2), it is considered to be of shape (1,3,2) and then repeated to have shape (5,3,2).
"""

P. F. Dubois, K. Hinsen, and J. Hugunin, "Numerical Python", Computers
in Physics, v. 10, #3, May/June 1996.

(The hyperlink is not authoritative - just what Google was able to find. The quote is from page 9.)

Contributor

abalkin commented Mar 14, 2014

@njsmith - another interesting bit of history:

"""
There were some changes made to the Python parser, involving imaginary constants, the ** operator for exponentiation, and multiple slicing subscripts. These changes were only needed to improve the syntactic appearance, such as being able to write a [i, j] rather than a [ (i, j)], and mostly were on the “to-do” list for the language anyway and didn’t break any existing code.
""" (Numerical Python)

Apparently ** operator and parentheses-less tuples were influenced by the numeric extension.

Contributor

fperez commented Mar 14, 2014

@njsmith, not sure why you felt it was necessary to accuse me of cargo cultism ("the ritual inclusion of code or program structures that serve no real purpose") or of not making serious comments.

My note about an alternate spelling was perfectly serious and meant in earnest as a way to make the intent of the new operator be a little less tied specifically to linear algebra. I imagined (perhaps incorrectly) that this could help make it more appealing to python-dev as something that could serve multiple communities equally well.

But it's a very minor point that will be certainly brought up by python-dev themselves if they happen to care about the rest of the PEP. So there's no need to sweat it for now.

Owner

njsmith commented Mar 14, 2014

Oh man, I didn't mean to accuse you of anything - sorry! I misunderstood
and thought you were suggesting we put mul2 into an alternatives
section even though none of us actually liked the name, just as a way of
showing how serious we were about considering alternatives, and my point
was that there are enough alternatives one could imagine someone liking,
and the PEP is complicated enough already, we could better leave this part
out until it proved useful for documenting real disagreements.

Anyway, yeah, people are starting to respond on the python-ideas thread
now, so soon we'll see whether or not this comes up for real!

cjw43 commented Mar 14, 2014

On 13-Mar-2014 7:33 PM, rfateman wrote:

On 3/13/2014 4:07 PM, njsmith wrote:

@rfateman https://github.com/rfateman:

...
I don't understand, then. It would seem to me to be important, in some
applications to distinguish, for example,
A @ B create a new matrix
We already have a+=25, does the same pattern, with the questionable @,
not give us:
a@= 25?
A !@b multiply A times B and overwrite A
A @! B multiply A times B and overwrite B
...

Contributor

rlamy commented Mar 14, 2014

Why would it be better to raise an error on n-dimensional inputs, instead of following numpy's usual broadcasting rules?

The proposed implementation of @ offers users a choice between readable but wrong and correct but unreadable. That is one of the worst things library developers and language designers can do to their users. OTOH, if there was a clear indication that @ doesn't work with higher-dimensional arrays, then the nice syntax could be used without worry.

Contributor

Nodd commented Mar 14, 2014

It is very common for me to have (very big) arrays of matrix, and I'd like to be able to use a single optimized @ rather than writing an inefficient python for loop by hand. I could use an adequate function, but then the infix operator loses its interest. Being able to use @ notation for higher dilmentional array doesn't prevent to use it for 2D arrays, so I don't see the need in artificially limiting @.
Also, if some library want to limit @ to 2D matrix, the can implement __matmul__ any way they want.

I read the thread on python-ideas, is matrix power used this often ? Similarity to ** is a good thing in theory, but maybe not necessary. On the other hand, I'd use A@@-1 if it was available.

Member

rkern commented Mar 14, 2014

On the other hand, I'd use A@@-1 if it was available.

Hmm, I'd count that as a mark against the syntax, then. :-)

Contributor

Nodd commented Mar 14, 2014

Member

rkern commented Mar 14, 2014

Point taken.

Contributor

argriffing commented Mar 14, 2014

The several dozen replies to the PEP post on python-ideas seem to have reached a consensus: German keyboards have a degree (º) symbol whereas American keyboards do not.

Member

rkern commented Mar 14, 2014

That's pretty good for the first day of a PEP on python-ideas. Give it some time. :-)

Contributor

Nodd commented Mar 14, 2014

Hey, I like ° too ! (I'm French)
The discussion is more on the choice of the symbol(s) than on the fact that the operator is appropriate or not. Is it because the need is so obvious ? ^^

Owner

charris commented Mar 14, 2014

On Fri, Mar 14, 2014 at 10:23 AM, argriffing notifications@github.comwrote:

The several dozen replies to the PEP post on python-ideas seem to have
reached a consensus: German keyboards have a degree (º) symbol whereas
American keyboards do not.

TIL ;)

Chuck

+1 to right associativity, it feels more natural to me too.

Owner

njsmith commented Mar 14, 2014

The draft posted to the list has now been committed to the PEP repository and assigned the number 465. It should be available here once some cron job runs or something:
http://legacy.python.org/dev/peps/pep-0465

Contributor

takluyver commented Mar 14, 2014

It looks like the choice of the symbol is the biggest point of dissent so far. The PEP has a justification of why @ was chosen, but doesn't list alternatives. Is it worth listing considered alternatives with advantages/disadvantages of each, so the discussion doesn't spend too long going round in circles with that question?

Guido has just made a fairly strong statement in favour of keeping it to ASCII characters, so that should constrain the discussion a bit. Multi-character operators seem to be the other big thing.

Member

rkern commented Mar 14, 2014

I wouldn't read too much into the hyper-focus on the choice of character yet. Sometimes python-ideas discussions go on bikeshedding tangents for days that ultimately have no effect one way or the other on people's final opinions on the PEP. Any attempt to short-circuit the bikeshedding by enumerating the alternatives just spawns more bikeshedding. :-)

The discussion will get back on track to discuss more substantive issues soon enough.

Do you think they work if you pass them np.matrix?

To me, this just strengthens @rlamy's earlier point that np.matrix should not subclass from np.array. Subclasses should always be substitutable for the superclass.

Regarding the alternate names thing, I wouldn't worry about it. As I said, probably at the end of the day if this PEP is accepted, the BDFL will just choose whatever name he likes.

Owner

njsmith commented Mar 15, 2014

@asmeurer: Unfortunately, the problem isn't the implementation detail of whether np.matrix subclasses np.ndarray or not, it's whether there in practice exist 2 classes with subtly incompatible APIs and which people use for these kinds of number crunching tasks. Changing np.matrix to use, say, delegation, wouldn't make using it with scikit-learn any more workable. And in practice one of the worst pain points we get from this API fragmentation is caused by scipy.sparse matrices following np.matrix's API -- they're really frustrating to work with, even though they don't inherit from ndarray.

And yeah, someone did suggest __at__ as a name in the mailing list discussion, and got immediately show down by Nick Coghlan. So... that was that.

Owner

njsmith commented Mar 15, 2014

Note for those who are subscribed to this PR, but don't follow python-ideas or numpy-discussion: Guido has accepted @ in principle: http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069439.html

He wants us to make a decision on the precedence/associativity thing, and he wants us to decide whether we really want @@ as well. Because this PR discussion has gotten so unwieldy, and it isn't really suited for more focused discussions like these, I've started threads on numpy-discussion about these two topics:

I'd especially appreciate people's help in figuring out to do with the precedence/associativity thing; it's the main blocker for getting @ formally accepted, and it's not clear what data to base a solid decision on.

I believe that if you aren't subscribed to numpy-discussion, then you can still post through gmane (though unfortunately their anti-spam munging of the @ character kind of makes these threads hard to read!): http://news.gmane.org/gmane.comp.python.numeric.general

I'd also like to express my deep thanks to everyone here for your comments and feedback; it made a huge difference.

cjw43 commented Mar 15, 2014

Aaron,

I've looked through the thread, but I don't see what "them" means.

I feel that this is important, as I feel that np.matrix has been
dismissed to easily, without an adequate explanation of the reasons for
its dismissal.

I would appreciate your help.

Colin W.

On 14-Mar-2014 8:30 PM, Aaron Meurer wrote:

Do you think they work if you pass_them_ np.matrix?

To me, this just strengthens @rlamy https://github.com/rlamy's
earlier point that np.matrix should not subclass from np.array.
Subclasses should always be substitutable
https://en.wikipedia.org/wiki/Liskov_substitution_principle for the
superclass.

Regarding the alternate names thing, I wouldn't worry about it. As I
said, probably at the end of the day if this PEP is accepted, the BDFL
will just choose whatever name he likes.


Reply to this email directly or view it on GitHub
#4351 (comment).

Regarding whether A.B.C is (A.B).C or A.(B.C), you don't have
to make a choice. Parse it into
Dot(A,B,C) and let the implementation decide what order of
multiplication uses fews element adds and multiplies.
(this extends of course to A.B.C.D ....)

I don't know how python compilers work, but some others allow
optimization to vary the order of (say) additions
for speed. For example,

sin(a+b)+a+c+b might be rearranged to

sin(a+b)+ (a+b) +c

thereby "optimizing" away one addition.

The fact that this is sometimes bad numerically is ignored.

RJF
On 3/14/2014 9:47 PM, njsmith wrote:

Note for those who are subscribed to this PR, but don't follow
python-ideas or numpy-discussion: Guido has accepted |@| in principle:
http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069439.html

He wants us to make a decision on the precedence/associativity thing,
and he wants us to decide whether we really want |@@| as well. Because
this PR discussion has gotten so unwieldy, and it isn't really suited
for more focused discussions like these, I've started threads on
numpy-discussion about these two topics:

Precedence/associativity of |@|:
http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069444.html
Do we care about |@@|:
http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069448.html

I'd especially appreciate people's help in figuring out to do with the
precedence/associativity thing; it's the main blocker for getting |@|
formally accepted, and it's not clear what data to base a solid
decision on.

I believe that if you aren't subscribed to numpy-discussion, then you
can still post through gmane (though unfortunately their anti-spam
munging of the |@| character kind of makes these threads hard to
read!): http://news.gmane.org/gmane.comp.python.numeric.general

I'd also like to express my deep thanks to everyone here for your
comments and feedback; it made a huge difference.


Reply to this email directly or view it on GitHub
#4351 (comment).

Contributor

rlamy commented Mar 15, 2014

Unfortunately, the problem isn't the implementation detail of whether np.matrix subclasses np.ndarray or not, it's whether there in practice exist 2 classes with subtly incompatible APIs and which people use for these kinds of number crunching tasks.

The inheritance hierarchy is not an implementation detail, it's an important part of the public API, with major consequences on the rest of the API. The problem with having subtly incompatible classes lies in the "subtly". If matrices are made suitably different from arrays (e.g. matrix + array should raise), then the two types won't be mixed randomly.

Owner

njsmith commented Mar 15, 2014

@rlamy: If you want to make a matrix type that acts like you describe, then
go for it! (Or maybe sympy or sage already have such a type?) It's a
different animal than numpy.matrix, that's all. (And probably not an animal
that has any reason to ship as part of numpy itself.)
On 15 Mar 2014 14:53, "Ronan Lamy" notifications@github.com wrote:

Unfortunately, the problem isn't the implementation detail of whether
np.matrix subclasses np.ndarray or not, it's whether there in practice
exist 2 classes with subtly incompatible APIs and which people use for
these kinds of number crunching tasks.

The inheritance hierarchy is not an implementation detail, it's an
important part of the public API, with major consequences on the rest of
the API. The problem with having subtly incompatible classes lies in the
"subtly". If matrices are made suitably different from arrays (e.g. matrix

  • array should raise), then the two types won't be mixed randomly.


Reply to this email directly or view it on GitHubhttps://github.com/numpy/numpy/pull/4351#issuecomment-37727682
.

Contributor

Nodd commented Mar 15, 2014

@rfateman As far as I know, this is not how operators in python work. Each operator is independant, and the same operator could be linked to a different function, as any class can redefine matmul (same goes for any operator).

Just a thought: why the subclassing od np.matrix from np.ndarray is a problem, given that duck typing would still create ambiguity ?

Contributor

rlamy commented Mar 15, 2014

@Nodd That's part of the design: duck-typing should not create ambiguity. If matrix op array raises for every op, calculations with mixed types cannot go very far.

Contributor

rlamy commented Mar 15, 2014

@njsmith I thought about it already, but it sounds like work... OTOH, recent discussions have clarified maybe 80% of the design for such a library, and it would be a shame to throw all that away.

Regarding your question, I don't know about Sage, but SymPy has some stuff like this, but it's not suitable for numerical work.

You should add this pull request (https://github.com/numpy/numpy/pull/4351) as the first reference. The page at http://legacy.python.org/dev/peps/pep-0465/ doesn't reference it at all!

As has been pointed out, the issue with np.matrix vs np.ndarary largely an illustration of http://en.wikipedia.org/wiki/Composition_over_inheritance . It is rare for the two different kinds of multiplication to be needed in the same formula. Also, this doesn't solve A + r where r is a scalar--if A is a matrix then this is A + Ir where I is the identity matrix. Subtraction too. Do we need @+ and @-?

Similarly, when one writes A == 1 in a matrix context, we want to know if it's equal to the identity matrix (not the singular all-one matrix) and wants a truth value (not an array of booleans that doesn't have a boolean operator, which, by the way, is another operator that could behave differently: the all-zero matrix is False). We're up to quite a few operators that have both matrix and array versions. Hmm... how to assign different meanings to the same set of operators on an object oriented language?

Do you have evidence that the pain of np.matrix is not due to the fact that it (sometimes, inconsistently) pretends to be a np.ndarray? If not, this PEP is not needed, and adding new (seldom used) operators is not to be done lightly.

Contributor

stefanv commented Mar 15, 2014

@robertwb I see your point about multiple operators, but are you suggesting either casting to or having a .M (matrix) property before using matrix multiplication on an array?

Owner

charris commented Mar 15, 2014

On Sat, Mar 15, 2014 at 1:33 PM, Robert Bradshaw
notifications@github.comwrote:

As has been pointed out, the issue with np.matrix vs np.ndarary largely an
illustration of http://en.wikipedia.org/wiki/Composition_over_inheritance. It is rare for the two different kinds of multiplication to be needed in
the same formula. Also, this doesn't solve A + r where r is a scalar--if A
is a matrix then this is A + Ir where I is the identity matrix. Subtraction
too. Do we need @+ and @-?

Similarly, when one writes A == 1 in a matrix context, we want to know if
it's equal to the identity matrix (not the singular all-one matrix) and
wants a truth value (not an array of booleans that doesn't have a
boolean operator, which, by the way, is another operator that could
behave differently: the all-zero matrix is False). We're up to quite a few
operators that have both matrix and array versions. Hmm... how to assign
different meanings to the same set of operators on an object oriented
language?

Do you have evidence that the pain of np.matrix is not due to the fact
that it (sometimes, inconsistently) pretends to be a np.ndarray? If not,
this PEP is not needed, and adding new (seldom used) operators is not to be
done lightly.

Those are good points. But even if we only used matrices a full set of
operators, such as Matlab(TM) has, would be desirable. We can't have that,
but it looks like we can get '@' and possibly '@@', so the question is,
what should they be used to implement? Matrix like multiplication looks
like the best bet. For the other things you mention, they are uncommon
enough that the available methods suffice. For instance, A == 1 becomes
all(A == eye(n)), etc. Equality of that sort actually isn't very useful in
numerics due to roundoff error, and it might even be considered a
programming error. It is, however, appropriate for symbolic systems. So I
would suggest that folks interested in the mathematical side of matrices
use sympy or sage, while for those doing numerical work numpy would be
better. Symbolic and numerical can be mixed, I believe theano does that,
with symbolic manipulation used as a sort intermediate compiler step. But
numpy tries to fill the middle ground.

Chuck

On Sat, Mar 15, 2014 at 1:45 PM, Charles Harris
notifications@github.com wrote:

On Sat, Mar 15, 2014 at 1:33 PM, Robert Bradshaw
notifications@github.comwrote:

As has been pointed out, the issue with np.matrix vs np.ndarary largely an
illustration of http://en.wikipedia.org/wiki/Composition_over_inheritance. It is rare for the two different kinds of multiplication to be needed in
the same formula. Also, this doesn't solve A + r where r is a scalar--if A
is a matrix then this is A + Ir where I is the identity matrix. Subtraction
too. Do we need @+ and @-?

Similarly, when one writes A == 1 in a matrix context, we want to know if
it's equal to the identity matrix (not the singular all-one matrix) and
wants a truth value (not an array of booleans that doesn't have a
boolean operator, which, by the way, is another operator that could
behave differently: the all-zero matrix is False). We're up to quite a few
operators that have both matrix and array versions. Hmm... how to assign
different meanings to the same set of operators on an object oriented
language?

Do you have evidence that the pain of np.matrix is not due to the fact
that it (sometimes, inconsistently) pretends to be a np.ndarray? If not,
this PEP is not needed, and adding new (seldom used) operators is not to be
done lightly.

Those are good points. But even if we only used matrices a full set of
operators, such as Matlab(TM) has, would be desirable. We can't have that,
but it looks like we can get '@' and possibly '@@', so the question is,
what should they be used to implement?

Just because you can get something doesn't mean it's a good idea,
especially in language design (I'll refrain from mentioning some
languages that try to throw in everything under the sun...). See, e.g.
"import this."

Matrix like multiplication looks
like the best bet. For the other things you mention, they are uncommon
enough that the available methods suffice. For instance, A == 1 becomes
all(A == eye(n)), etc.

But you have to know that, and to write A + eye(A.shape[0]) * t, etc.
This is even uglier than calling dot by name.

Equality of that sort actually isn't very useful in
numerics due to roundoff error, and it might even be considered a
programming error. It is, however, appropriate for symbolic systems. So I
would suggest that folks interested in the mathematical side of matrices
use sympy or sage, while for those doing numerical work numpy would be
better. Symbolic and numerical can be mixed, I believe theano does that,
with symbolic manipulation used as a sort intermediate compiler step. But
numpy tries to fill the middle ground.

This only reinforces the point that one wants matrices or arrays,
rarely both at the same time, or an object that kind of behaves like
both.

On Sat, Mar 15, 2014 at 1:07 PM, Stefan van der Walt <
notifications@github.com> wrote:

@robertwb https://github.com/robertwb I see your point about multiple
operators, but are you suggesting either casting to or having a .M (matrix)
property before using matrix multiplication on an array?

Yes. In fact

a.M * b.M

is actually not too bad; IMHO more obvious than (a @ b). Note in nearly all
the examples I've seen, there isn't a lot (if any) swapping back and forth,
if one multiplies as matrices one wants to continue doing so, taking
matrix-vector products, adding/comparing scalars as matrices, etc. so it
wouldn't be that bad at all.

Contributor

abalkin commented Mar 16, 2014

Note in nearly all the examples I've seen, there isn't a lot (if any) swapping back and forth, if one multiplies as matrices one wants to continue doing so, taking matrix-vector products, adding/comparing scalars as matrices, etc.

Here is one example where mixing element-wise and matrix multiplication is useful: when one of the matrices in the chained product is diagonal - it is more efficient to multiply element-wise by a 1d array than matrix-multipy by the diagonal matrix.

Owner

njsmith commented Mar 16, 2014

@robertwb: I can see how a matrix object with all those behaviours could be
a useful thing, but it's not something I've ever had any use for... and
even in math formulas I've never seen "A + r" written as a shorthand for "A

  • rI". It seems to target a very different set of use cases than numpy
    does. And why should everyone be forced to learn two totally different
    object APIs, whether they have a use for them or not?

On Sun, Mar 16, 2014 at 1:51 AM, abalkin notifications@github.com wrote:

Note in nearly all the examples I've seen, there isn't a lot (if any)
swapping back and forth, if one multiplies as matrices one wants to
continue doing so, taking matrix-vector products, adding/comparing scalars
as matrices, etc.

Here is one example where mixing element-wise and matrix multiplication is
useful: when one of the matrices in the chained product is diagonal - it is
more efficient to multiply element-wise by a 1d array than matrix-multipy
by the diagonal matrix.


Reply to this email directly or view it on GitHubhttps://github.com/numpy/numpy/pull/4351#issuecomment-37745471
.

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

On Sat, Mar 15, 2014 at 6:51 PM, abalkin notifications@github.com wrote:

Note in nearly all the examples I've seen, there isn't a lot (if any)
swapping back and forth, if one multiplies as matrices one wants to
continue doing so, taking matrix-vector products, adding/comparing scalars
as matrices, etc.

Here is one example where mixing element-wise and matrix multiplication is
useful: when one of the matrices in the chained product is diagonal - it is
more efficient to multiply element-wise by a 1d array than matrix-multipy
by the diagonal matrix.

Good one. Ideally the matrix class would have a special diagonal
implementation to handle this, but short of that I still think I'd prefer a
comment and explicit conversion to an array and back to explain this clever
trick.

On Sat, Mar 15, 2014 at 7:20 PM, njsmith notifications@github.com wrote:

@robertwb: I can see how a matrix object with all those behaviours could
be
a useful thing, but it's not something I've ever had any use for... and
even in math formulas I've never seen "A + r" written as a shorthand for
"A

  • rI". It seems to target a very different set of use cases than numpy
    does. And why should everyone be forced to learn two totally different
    object APIs, whether they have a use for them or not?

You only have to learn the part of the API you care about, e.g. "to
multiply arrays as matrices, (cheaply) turn them into matrices and
multiply." And though you don't care about anything but multiplication (and
possibly exponentiation) it's a shame to have a solution that's by nature
not extensible beyond these two operators. Let arrays be arrays, with all
their element-wise broadcasting goodness, and matrices be real matrices.

In the PEP you provide plenty of evidence that numpy.matrix is bad, but
don't account for the possibility that its current implementation (in
particular inheritance) that's the problem that a saner implementation may
well resolve. I'd like to see that addressed, 'cause that'd be my preferred
solution to the problem.

Owner

charris commented Mar 16, 2014

On Sat, Mar 15, 2014 at 9:46 PM, Robert Bradshaw
notifications@github.comwrote:

On Sat, Mar 15, 2014 at 6:51 PM, abalkin notifications@github.com wrote:

Note in nearly all the examples I've seen, there isn't a lot (if any)
swapping back and forth, if one multiplies as matrices one wants to
continue doing so, taking matrix-vector products, adding/comparing
scalars
as matrices, etc.

Here is one example where mixing element-wise and matrix multiplication
is
useful: when one of the matrices in the chained product is diagonal - it
is
more efficient to multiply element-wise by a 1d array than matrix-multipy
by the diagonal matrix.

Good one. Ideally the matrix class would have a special diagonal
implementation to handle this, but short of that I still think I'd prefer a
comment and explicit conversion to an array and back to explain this clever
trick.

Broadcasting is one of the strong points of numpy and fundamental to how it
is used. It is certainly a clever trick, but it is a trick that everyone
using numpy makes use of.

On Sat, Mar 15, 2014 at 7:20 PM, njsmith notifications@github.com wrote:

@robertwb: I can see how a matrix object with all those behaviours could
be
a useful thing, but it's not something I've ever had any use for... and
even in math formulas I've never seen "A + r" written as a shorthand for
"A

  • rI". It seems to target a very different set of use cases than numpy
    does. And why should everyone be forced to learn two totally different
    object APIs, whether they have a use for them or not?

You only have to learn the part of the API you care about, e.g. "to
multiply arrays as matrices, (cheaply) turn them into matrices and
multiply." And though you don't care about anything but multiplication (and
possibly exponentiation) it's a shame to have a solution that's by nature
not extensible beyond these two operators. Let arrays be arrays, with all
their element-wise broadcasting goodness, and matrices be real matrices.

In the PEP you provide plenty of evidence that numpy.matrix is bad, but
don't account for the possibility that its current implementation (in
particular inheritance) that's the problem that a saner implementation may
well resolve. I'd like to see that addressed, 'cause that'd be my preferred
solution to the problem.

Matrices have limited usefulness in the broader computational picture. The
ndarray is a more general tool, and with the '@' operator there is no
particular reason for a separate matrix class.

Chuck

On Sat, Mar 15, 2014 at 10:07 PM, Charles Harris
notifications@github.com wrote:

On Sat, Mar 15, 2014 at 9:46 PM, Robert Bradshaw
notifications@github.comwrote:

On Sat, Mar 15, 2014 at 6:51 PM, abalkin notifications@github.com wrote:

Note in nearly all the examples I've seen, there isn't a lot (if any)
swapping back and forth, if one multiplies as matrices one wants to
continue doing so, taking matrix-vector products, adding/comparing
scalars
as matrices, etc.

Here is one example where mixing element-wise and matrix multiplication
is
useful: when one of the matrices in the chained product is diagonal - it
is
more efficient to multiply element-wise by a 1d array than matrix-multipy
by the diagonal matrix.

Good one. Ideally the matrix class would have a special diagonal
implementation to handle this, but short of that I still think I'd prefer a
comment and explicit conversion to an array and back to explain this clever
trick.

Broadcasting is one of the strong points of numpy and fundamental to how it
is used.

Agreed.

It is certainly a clever trick, but it is a trick that everyone
using numpy makes use of.

The clever trick is using broadcasting to efficiently do
multiplication by a diagonal matrix while you're in the middle of
doing a bunch of other matrix operations.

On Sat, Mar 15, 2014 at 7:20 PM, njsmith notifications@github.com wrote:
You only have to learn the part of the API you care about, e.g. "to
multiply arrays as matrices, (cheaply) turn them into matrices and
multiply." And though you don't care about anything but multiplication (and
possibly exponentiation) it's a shame to have a solution that's by nature
not extensible beyond these two operators. Let arrays be arrays, with all
their element-wise broadcasting goodness, and matrices be real matrices.

In the PEP you provide plenty of evidence that numpy.matrix is bad, but
don't account for the possibility that its current implementation (in
particular inheritance) that's the problem that a saner implementation may
well resolve. I'd like to see that addressed, 'cause that'd be my preferred
solution to the problem.

Matrices have limited usefulness in the broader computational picture. The
ndarray is a more general tool, and with the '@' operator there is no
particular reason for a separate matrix class.

To paraphrase (a bit tongue in cheek): "Matrices have limited
usefulness in the broader [numerical|scientific] computational
picture. Except for matrix multiplication. Oh, and sometimes
exponentiation. But there's no value in having more than than, no
siree."

Even if this were entirely true, it's a fairly week argument. Adding a
class is much less invasive than adding a new infix operator.

  • Robert

@robertwb robertwb commented on the diff Mar 16, 2014

doc/neps/return-of-revenge-of-matmul-pep.rst
+meets the high bar for adding a new operator.
+
+
+Why should matrix multiplication be infix?
+------------------------------------------
+
+Right now, most numerical code in Python uses syntax like
+``numpy.dot(a, b)`` or ``a.dot(b)`` to perform matrix multiplication.
+This obviously works, so why do people make such a fuss about it, even
+to the point of creating API fragmentation and compatibility swamps?
+
+Matrix multiplication shares two features with ordinary arithmetic
+operations like addition and multiplication on numbers: (a) it is used
+very heavily in numerical programs -- often multiple times per line of
+code -- and (b) it has an ancient and universally adopted tradition of
+being written using infix syntax. This is because, for typical
@robertwb

robertwb Mar 16, 2014

[Putting on mathematician hat.] This is because matrix multiplication correctly defines composition of linear operators. It's a "natural" definition independent of choice of basis. I don't even know what notation would be used for element-wise operations, short of "Let c_{i,j} = a_{i,j}b_{i,j}."

Carrying over this to code is of course an advantage of this PEP, but function call syntax isn't avoided in papers due to its readability.

@larsmans

larsmans Apr 11, 2014

Contributor

Elementwise products can be written A ∘ B.

Owner

rgommers commented Mar 16, 2014

In the PEP you provide plenty of evidence that numpy.matrix is bad, but > don't account for the possibility that its current implementation (in > particular inheritance) that's the problem that a saner implementation may > well resolve. I'd like to see that addressed, 'cause that'd be my preferred > solution to the problem.
...
Even if this were entirely true, it's a fairly week argument. Adding a class is much less invasive than adding a new infix operator.

@robertwb the answer has been given several times in this discussion already. The problem is not the implementation of the class. This reply from @njsmith summarizes it well:

Ah, I think you've missed a key point :-). The problem is not implementing np.matrix -- that's trivial, it's like 100 lines of code or something. The problem is that if people are using both np.ndarray and np.matrix, then ever single function that anyone ever writes has to be careful to handle both appropriately. Scikit-learn, as noted in the PEP, has a bit over 50,000 lines of code, which were written with np.ndarray in mind, not np.matrix. Do you think they work if you pass them np.matrix? Do you feel like auditing all of them to check? There are thousands of uses of * to check... If you did audit them and knew that they worked properly now, how would you make sure that they still work 6 months from now as the code gets updated? Aren't there better things the developers could be spending their time on, like making the code faster or more featureful, instead of worrying about np.matrix support?

In your proposal of a.M * b.M, the result would have to be a new matrix class, otherwise a.M * b.M * c.M wouldn't work. If you have that new class, you will have to cast it back to ndarray or be stuck with the same problem again that every function you write would have to explicitly check for this array class or risk having the wrong semantics for d * e.

Owner

rgommers commented Mar 16, 2014

And Re: inheritance: as has also been pointed out before, the sparse matrices provided by scipy.sparse do not inherit from ndarray and suffer from the same issue.

On 3/16/2014 2:16 AM, Robert Bradshaw wrote:

Adding a class is much less invasive than adding a new infix operator.

As others have stated, the issue is not about whether or
not to add a class. The issue is about the right way to
handle a certain set of common operations on arrays.
A side effect is that if the choice is well made, then
most (all?) need for the numpy's matrix class will go away.

Separately, about associativity ...
For numpy the new operator will be used to implement
matrix multiplication (and generalizations) among arrays.
The discussion of associativity feels a bit jumbled to me,
but making it compatible with function composition in
general seems rather compelling, as this seems a likely
use outside the NumPy community.

Alan

Contributor

rlamy commented Mar 16, 2014

@rgommers Indeed the problem is not with the implementation of the matrix class, it's its design. A well-designed matrix class should not attempt to disguise itself as an array. Functions expecting an array should not be expected to work with matrices.

What exactly are the problems with scipy.sparse? I've never used it, but it seems obvious that it needs different algorithms than the dense case and that any implicit conversion to np.ndarray is a disaster.

Owner

njsmith commented Mar 16, 2014

The problem with scipy.sparse is that the sparse matrix objects follow the
np.matrix API, rather than the np.ndarray API, e.g., * on scipy.sparse
objects does matrix multiplication. And, like you say, the solution to this
is certainly not just just convert everything to ndarray. This makes it a
big hassle to write code that can work on both dense and sparse matrices.

On Sun, Mar 16, 2014 at 3:41 PM, Ronan Lamy notifications@github.comwrote:

@rgommers https://github.com/rgommers Indeed the problem is not with
the implementation of the matrix class, it's its design. A well-designed
matrix class should not attempt to disguise itself as an array. Functions
expecting an array should not be expected to work with matrices.

What exactly are the problems with scipy.sparse? I've never used it, but
it seems obvious that it needs different algorithms than the dense case and
that any implicit conversion to np.ndarray is a disaster.


Reply to this email directly or view it on GitHubhttps://github.com/numpy/numpy/pull/4351#issuecomment-37760318
.

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

Contributor

rlamy commented Mar 16, 2014

Then that means that scipy.sparse inherits the flawed design of np.matrix. I'd say that it reinforces the point that there are no fundamental problems with having a matrix class.

Owner

njsmith commented Mar 16, 2014

No-one is arguing that there is a fundamental problem with having a matrix
class as a thing that exists on PyPI and you can use if you want. The point
is that only people who actually want "a matrix class" should have to deal
with one.

On Sun, Mar 16, 2014 at 4:17 PM, Ronan Lamy notifications@github.comwrote:

Then that means that scipy.sparse inherits the flawed design of np.matrix.
I'd say that it reinforces the point that there are no fundamental problems
with having a matrix class.


Reply to this email directly or view it on GitHubhttps://github.com/numpy/numpy/pull/4351#issuecomment-37761379
.

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

Contributor

rlamy commented Mar 16, 2014

Actually, the PEP argues precisely that, cf. "The two-type solution is worse than the disease." in the "Rejected alternatives" section.

cjw43 commented Mar 16, 2014

On 16-Mar-2014 2:16 AM, Robert Bradshaw wrote:

On Sat, Mar 15, 2014 at 10:07 PM, Charles Harris
notifications@github.com wrote:

On Sat, Mar 15, 2014 at 9:46 PM, Robert Bradshaw
notifications@github.comwrote:

On Sat, Mar 15, 2014 at 6:51 PM, abalkin
notifications@github.com wrote:

Note in nearly all the examples I've seen, there isn't a lot (if
any)
swapping back and forth, if one multiplies as matrices one wants to
continue doing so, taking matrix-vector products, adding/comparing
scalars
as matrices, etc.

Here is one example where mixing element-wise and matrix
multiplication
is
useful: when one of the matrices in the chained product is
diagonal - it
is
more efficient to multiply element-wise by a 1d array than
matrix-multipy
by the diagonal matrix.

Good one. Ideally the matrix class would have a special diagonal
implementation to handle this, but short of that I still think I'd
prefer a
comment and explicit conversion to an array and back to explain
this clever
trick.

Broadcasting is one of the strong points of numpy and fundamental to
how it
is used.

Agreed.

It is certainly a clever trick, but it is a trick that everyone
using numpy makes use of.

The clever trick is using broadcasting to efficiently do
multiplication by a diagonal matrix while you're in the middle of
doing a bunch of other matrix operations.

On Sat, Mar 15, 2014 at 7:20 PM, njsmith
notifications@github.com wrote:
You only have to learn the part of the API you care about, e.g. "to
multiply arrays as matrices, (cheaply) turn them into matrices and
multiply." And though you don't care about anything but
multiplication (and
possibly exponentiation) it's a shame to have a solution that's by
nature
not extensible beyond these two operators. Let arrays be arrays,
with all
their element-wise broadcasting goodness, and matrices be real
matrices.

In the PEP you provide plenty of evidence that numpy.matrix is
bad, but
don't account for the possibility that its current implementation (in
particular inheritance) that's the problem that a saner
implementation may
well resolve. I'd like to see that addressed, 'cause that'd be my
preferred
solution to the problem.

Matrices have limited usefulness in the broader computational
picture. The
ndarray is a more general tool, and with the '@' operator there is no
particular reason for a separate matrix class.

To paraphrase (a bit tongue in cheek): "Matrices have limited
usefulness in the broader [numerical|scientific] computational
picture. Except for matrix multiplication. Oh, and sometimes
exponentiation. But there's no value in having more than than, no
siree."

Even if this were entirely true, it's a fairly week argument. Adding a
class is much less invasive than adding a new infix operator.
+1

Having said that, there are some blemishes with np.matrix, which could
be addressed.

Colin W

  • Robert


Reply to this email directly or view it on GitHub
#4351 (comment).

Owner

njsmith commented Mar 16, 2014

@rlamy: The PEP argues that it's a bad idea to mix array and matrix classes
when doing the sort of stuff that number-crunchers do, and that they need a
better solution to the elmul/matmul problem. You've explicitly said that
when you refer to matrix classes, you're talking about something very
different than what the PEP is talking about, that is not designed to mix
with arrays, and thus is not trying to solve the problems that the PEP is
trying to solve. That's cool, the PEP has no argument with you, you're just
talking about something else.

On Sun, Mar 16, 2014 at 4:33 PM, Ronan Lamy notifications@github.comwrote:

Actually, the PEP argues precisely that, cf. "The two-type solution is
worse than the disease." in the "Rejected alternatives" section.


Reply to this email directly or view it on GitHubhttps://github.com/numpy/numpy/pull/4351#issuecomment-37761857
.

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

cjw43 commented Mar 16, 2014

On 16-Mar-2014 9:37 AM, alan-isaac wrote:

On 3/16/2014 2:16 AM, Robert Bradshaw wrote:

Adding a class is much less invasive than adding a new infix operator.

As others have stated, the issue is not about whether or
not to add a class. The issue is about the right way to
handle a certain set of common operations on arrays.
A side effect is that if the choice is well made, then
most (all?) need for the numpy's matrix class will go away.
Wasn't .dot() the standard way of handling this in the past.

A * B, where A and B are appropriate matrices has handled this, but this
class offers other useful functionality.

Separately, about associativity ...
For numpy the new operator will be used to implement
matrix multiplication (and generalizations) among arrays.
The discussion of associativity feels a bit jumbled to me,
but making it compatible with function composition in
general seems rather compelling, as this seems a likely
use outside the NumPy community.
It wasn't clear to me, from the PEP, just what types or classes @ was
intended to operate on. Something beyond numpy seems intended.

Colin W.

Alan


Reply to this email directly or view it on GitHub
#4351 (comment).

Member

rkern commented Mar 16, 2014

It wasn't clear to me, from the PEP, just what types or classes @ was intended to operate on. Something beyond numpy seems intended.

The Adoption section seems to cover this extensively.

Contributor

abalkin commented Mar 16, 2014

@alan-isaac

The discussion of associativity feels a bit jumbled to me, but making it compatible with function composition in general seems rather compelling, as this seems a likely use outside the NumPy community.

Can you explain how compatibility with function composition affects the right/left associativity of @?

On 3/16/2014 5:22 PM, abalkin wrote:

Can you explain how compatibility with function composition affects the right/left associativity of |@|?

On the one hand, composition is associative.
On the other hand, as someone else mentioned,
(f o g o h)(x) is usually read f(g(h(x))).

But I'm not pushing any particular choice on this.
I am just requesting that the principles for the
choice be articulated, and that they consider
possible applications beyond the dot product.

fwiw,
Alan Isaac

Contributor

abalkin commented Mar 16, 2014

On the one hand, composition is associative.

Yes, so if @ is composition operator then f @ g @ h = (f @ g) @ h = f @ (g @ h).

On the other hand, as someone else mentioned, (f o g o h)(x) is usually read f(g(h(x))).

Once again, very true. this follows from the definition of function composition: c = f @ g is a new function such that for any x in the domain of g, c(x) = f(g(x)). The associativity trivially follows from this definition:

Let

 c1 = (f @ g) @ h
 c2 = f @ (g @ h)

From the definition c1(x) = (f @ g)(h(x)). If h(x) = y, then applying the definition again gives c1(x) = f(g(y)) = f(g(h(x)).

Now, let c0 = g @ h, then c2(x) = f(c0(x)), and c0(x) = g(h(x)), so c2(x) = f(g(h(x))).

Thus for any x in the domain of h, c1(x) = c2(x) = f(g(h(x))) and therefore c1 = c2. Q. E. D.

I don't see any consideration here that would help selecting right or left associativity for @.

On 3/16/2014 7:05 PM, abalkin wrote:

I don't see any consideration here that would help selecting right or left associativity for |@|.

You are conflating algebraic and computational questions.
Alan Isaac

Contributor

argriffing commented Mar 17, 2014

I don't see any consideration here that would help selecting right or left associativity for @.

I could see a practical consideration, although I hope this is not taken as advocating for or against @ as function evaluation or composition, but rather as elaborating on Alan Isaac's point about how left vs. right associativity in Python could matter even though function composition is associative.

If a function evaluation Python operator has right-associativity then in some limited situations it looks like a function composition operator.

Hypothetically let a @ b be like a(b) and let f and g be Python functions that take foo objects and return foo objects. Let x be a foo object. Then with right-associativity Python would evaluate f @ g @ x as f(g(x)) without any error. On the other hand with left-associativity Python would try to evaluate f @ g @ x as (f(g))(x) which raises an error because f is a Python function expecting a foo object as input but g itself is not a foo object.

Similarly if f and g treat @ as an operator representing either evaluation or composition depending on context, as in pyoperators Operator or scipy LinearOperator then a similar practical argument for right-associativity still applies, where "raises an error" would be replaced by "starts building an unnecessarily complicated Python object."

Of course if f and g are defined as mathematical abstractions or if @ is interpreted in a way that always causes a complicated Python object to be constructed, then these asymmetry arguments may not apply.

Hypothetically let a @ b be like a(b) and let f and g be Python functions that take foo objects and return foo objects. Let x be a foo object. Then with right-associativity Python would evaluate f @ g @ x as f(g(x)) without any error. On the other hand with left-associativity Python would try to evaluate f @ g @ x as (f(g))(x) which raises an error because f is a Python function expecting a foo object as input but g itself is not a foo object.

No, f @ g would create some intermediary function that applies g first and then f. f(g) is mathematical abuse of notation.

I don't think function composition would be accepted. Unlike Haskell, multivariable functions in Python are not just magic Currys of single variable ones. f @ g doesn't really work if f takes more than one input (unless you allow f @ *g or something, but that's new syntax). You could use functools.partial, but at that point you could just write your own lambda x: f(g(x), y).

Contributor

argriffing commented Mar 17, 2014

@asmeurer I see that my post above is confusing and has some mistakes, especially where I let f and g be Python functions. To make the point about practical implications of left vs. right associativity of the @ implementation maybe I should have let f and g be callable Python objects with __matmul__ implemented like __call__ for these objects. Just to be clear, I'm not suggesting doing this. In the second part of my post where I cited pyoperators and scipy, the implementation of f and g would be more complicated in ways that I also am not necessarily advocating.

I don't think function composition would be accepted.

I'm not sure exactly what you mean by this, but I almost certainly agree with you.

If you mean that you don't think that Python devs wouldn't accept @ to mean something more intrusive than as an unspecified .__(r)matmul__ binary method then I agree. Or maybe you mean that you don't think that sympy would accept @ as function composition? I'm not advocating using @ for function composition. But for the types for which __matmul__ will be implemented as function composition or evaluation based on context (e.g. scipy LinearOperator, maybe pyoperators Operator), I was trying to support Alan Isaac's point that there is some practical computational asymmetry between left and right associativity of the operator, and that right-associativity happens to be more computationally convenient in these cases. I was also trying to make this point in a way that applies not only to linear functions.

On Sun, Mar 16, 2014 at 5:53 PM, alan-isaac notifications@github.com wrote:

On 3/16/2014 7:05 PM, abalkin wrote:

I don't see any consideration here that would help selecting right or left associativity for |@|.

You are conflating algebraic and computational questions.

This.

You can't decide from a correctness point of view whether @ should be
left or right associative by considering a (mathematically)
associative operator (e.g. matrix multiplication or its generalization
function composition). You have to consider a case where (a @ b) @ c
!= a @ (b @ c). Unfortunately, it's hard to come up with other uses
for this operator.

From a computational point of view, if a, b are n by n, and c is an m
by n matrix, then (a @ b) @ c will be cheaper if n > m and more
expensive if n < m.

All that aside, I think the decision is fixed by giving it the same
precedence of . If it's not left associative how would one parse a *
b @ c? Similarly @@, with the precedence as *
, must be right
associative.

  • Robert

On Sun, Mar 16, 2014 at 9:22 AM, njsmith notifications@github.com wrote:

No-one is arguing that there is a fundamental problem with having a matrix
class as a thing that exists on PyPI and you can use if you want. The
point
is that only people who actually want "a matrix class" should have to deal
with one.

I find this statement quite ironic in light of a proposal to change the
Python language itself (and force everyone in the Python ecosystem to "deal
with it.") Extra classes are much more easily ignored than extra builtin
operators.

Your argument to reject the two-type solution is largely a straw man based
on the (bad in hindsight) existing numpy.matrix class. You don't address at
all in the PEP the idea that a better design could cut out most of the
pain, and it's clear that I'm not the only one unconvinced. Especially if
it's good enough to avoid the need to change the grammar. I filed
njsmith#1 to start separating these ideas.

Owner

njsmith commented Mar 17, 2014

I find this statement quite ironic in light of a proposal to change the
Python language itself (and force everyone in the Python ecosystem to "deal
with it.") Extra classes are much more easily ignored than extra builtin
operators.

These situations are not nearly as parallel as you make them seem. The only
people who are "forced" to deal with @ are those who are (a) writing code
where it's useful, (b) reading code where it's used, (c) looking at the
table of operators in their language reference, plus maybe a paragraph of
prose next to it explaining what @ is.

OTOH in your proposal, anyone who just needs matrix multiplication -- the
kind of people for whom the best solution right now is to use np.dot -- are
forced to deal with a whole second complex API for matrices, and to do so
every time they want to use matrix multiplication. This is probably worse
for them than just using np.dot, and certainly worse for them than having a
dedicated operator.

Your argument to reject the two-type solution is largely a straw man
based on the (bad in hindsight) existing numpy.matrix class.

Maybe we're just misunderstanding each other. Could you point out any
specific lines in the PEP that you feel apply specifically to np.matrix,
and not to, say, scipy.sparse matrices, or the pyviennacl array type (which
follows the ndarray API except for using * to mean matrix multiplication)?
I honestly don't think any exist, so if you're seeing them then something
is going wrong.

Member

rkern commented Mar 17, 2014

@robertwb The community has had many years to propose an alternative matrix implementation that does not have these issues. Quite a number of designs have been proposed and rejected over the years. Other matrix-like types like scipy.sparse have shown similar problems. Your PR has a sketch of a proposal, but you have not implemented it, and I'm willing to bet that it wouldn't hold up if you do. I think it's reasonable to take this history as evidence that the problems are more intrinsic to the design space of two-type/one-operator solutions than they are to specific implementation choices. You can disagree with that interpretation of our community's history, but it is not a strawman to cheaply bolster the argument for the operator.

On 3/17/2014 11:51 AM, Robert Bradshaw wrote:

Extra classes are much more easily ignored than extra builtin
operators.

In what sense? How many Python users have any
real awareness of the bitwise operators (which
have additionally been completely and usefully
hijacked for other uses).

You cannot make a serious cost case on that score.

Separately,
I would like to see posted objections avoid
the rhetorical tactic of discussing only costs,
when any reasonable decision must weigh both costs and benefits.
There may be serious costs. (Not the one above.)
There are real benefits. (One can dispute their size.)
They should be compared. Any comparison should include
the demand by a substantial community to do matrix multiplication
with an infix operator, the demand by a substantial community
to produce the Hadamard product with an infix operator, and
the huge convenience to both communities of having access to a
single library that handles the overlapping scientific
computation needs of the two communities.

In my opinion, most of the objections that have been posted
i. completely miscast the motivation for the PEP, and
ii. treat rather dismissively the needs of the large community
using Python for applied science. The emphasis of advocates
on pragmatics rather than principles has perhaps fostered this.

In particular, I would like to see the disappearance of objections
that attempt to tell users they don't "really" need to use infix
operators as they do. That one can use functions rather than
operators is both obviously true and almost completely irrelevant.
It would be more interesting if objectors would carefully explain
how their own Python programming needs will be compromised by the
addition of this operator.

Alan Isaac

njsmith added some commits Mar 18, 2014

@njsmith njsmith New version just submitted to PEP editors.
Main changes:
- @@ is gone
- A few small factual inaccuracies have been fixed
- I added a discussion of Guido's ".M *" idea to the "Rejected
  alternatives" section
- Added an (incomplete) "Implementation details" section, based on
  Nick's comments.
0e933e6
@njsmith njsmith fix PEP headers to placate PEP-0000 builder, mention Julia in notatio…
…n section, and remove stray ^^ operator left over from the @@ purge
1d884dd

cjw43 commented Mar 18, 2014

I would like to see a more complete examination of np.Matrix before
considering new operators. The PEP dismisses the Matrix class without
sufficient justification.

There are perhaps warts with np.Matrix, can these be identified and
removed, with less pain than introducing two new operators?

Colin W.

On 15-Mar-2014 3:33 PM, Robert Bradshaw wrote:

As has been pointed out, the issue with np.matrix vs np.ndarary
largely an illustration of
http://en.wikipedia.org/wiki/Composition_over_inheritance . It is rare
for the two different kinds of multiplication to be needed in the same
formula. Also, this doesn't solve A + r where r is a scalar--if A is a
matrix then this is A + Ir where I is the identity matrix. Subtraction
too. Do we need @+ and @-?

Similarly, when one writes A == 1 in a matrix context, we want to know
if it's equal to the identity matrix (not the singular all-one matrix)
and wants a truth value (not an array of booleans that doesn't have a
boolean operator, which, by the way, is another operator that could
behave differently: the all-zero matrix is False). We're up to quite a
few operators that have both matrix and array versions. Hmm... how to
assign different meanings to the same set of operators on an object
oriented language?

Do you have evidence that the pain of np.matrix is not due to the fact
that it (sometimes, inconsistently) pretends to be a np.ndarray? If
not, this PEP is not needed, and adding new (seldom used) operators is
not to be done lightly.


Reply to this email directly or view it on GitHub
#4351 (comment).

Member

rkern commented Mar 18, 2014

@cjw43 There have been several attempts over the years. The PEP acknowledges that history without attempting to recapitulate it, proposal for proposal. The PEP is not the beginning of the examination of the problems with a two-type solution, but the culmination of many years of experience with them (yes, plural). The past threads on the subject can be found in the mailing list archive.

By all means, please make concrete proposals if you have them. In the absence of concrete proposals, I think we have a reasonable case not to hold out hope that a good one will show up after all of these years.

Contributor

rlamy commented Mar 19, 2014

@alan-isaac

In what sense? How many Python users have any real awareness of the bitwise operators (which have additionally been completely and usefully hijacked for other uses).

Every user of a library that hijacks these operators needs to be aware of both the hijacked and original meanings. Anybody who is used to a language where the power operator is written '^' gets painfully acquainted with operator.xor at some point. I bet that thousands of man-hours have been wasted answering the question "Why does 2^2 return 0 instead of 4?"

Any new operator increases the total complexity of the language, takes valuable developer time to implement, uses up some memory in the interpreter, etc. There are costs. They are small but they affect everyone in the Python ecosystem. Given that the benefits will only be felt by a small subset of the Python community, they had better be significant.

On Mar 19, 2014 5:13 AM, "Ronan Lamy" notifications@github.com wrote:

@alan-isaac

In what sense? How many Python users have any real awareness of the
bitwise operators (which have additionally been completely and usefully
hijacked for other uses).

Every user of a library that hijacks these operators needs to be aware of
both the hijacked and original meanings. Anybody who is used to a language
where the power operator is written '^' gets painfully acquainted with
operator.xor at some point. I bet that thousands of man-hours have been
wasted answering the question "Why does 2^2 return 0 instead of 4?"

What specific confusions ate likely with this proposal?

Yes, in a general sense that could be an issue with singer operators, but
we are talking about a specific proposal here, so if you have such
criticisms you need to explain how they are relevant to the current
proposal.

Any new operator increases the total complexity of the language, takes
valuable developer time to implement, uses up some memory in the
interpreter, etc. There are costs. They are small but they affect everyone
in the Python ecosystem. Given that the benefits will only be felt by a
small subset of the Python community, they had better be significant.

First, I don't think the scientific community is a small subset, I think it
is a pretty large subset. Even when they don't directly use linear algebra
in their code, a huge improvement in an operation that underlies a lot of
other common data analysis operations will have a positive impact. The
discussions covered this on a lot of detail. This operator will greatly
simplify and improve key parts of the scientific python software stack,
which benefits all scientific python users.

Second, you need to demonstrate that these changes will be anything other
than negligible. For the memory example, are we talking megabytes,
kilobytes, bytes? The question isn't just whether there are effects, the
question is whether the effects are actually large enough to be noticeable
in the real world. If someone really cared about every last byte they
wouldn't be using python to begin with.

Third, in terms of developer time, you can't just look at the time needed
to implement the operator, you need to compare that to the developer time
need to maintain the status quo. Again, the mailing list discussions
looked at this and there is a lot if time wasted dealing with the current
situation. And it is ongoing, as long as the status quo is maintained,
further development in the scientific software stack will have to work
around the current limitations.

Member

rkern commented Mar 19, 2014

@rlamy I think that it's worth noting that no one on python-ideas, except perhaps yourself, has objected to the PEP on these grounds or asked us to do more soul searching about whether we really need the operator. These are people most of whom will never use matrix multiplication. It's nice to be concerned for them, but they can speak for themselves and have spoken in favor of the PEP.

Member

rkern commented Mar 19, 2014

@toddrjen Implementing @ and @= will require two new pointers added to the PyNumber_Methods table, thus increasing the size of many type objects by 8 or 16 bytes (e.g. all sequences that implement + for concatenation). This does not affect instances, only types, but with things like namedtuple, dynamically creating instances is getting more common.

Contributor

argriffing commented Mar 25, 2014

@ should definitely have right precedence.

@pchanial they are suggesting left precedence and asking for compelling arguments against it on the mailing list

Owner

njsmith commented Apr 6, 2014

I've just added text for "same-left" associativity/precedence:
http://mail.scipy.org/pipermail/numpy-discussion/2014-April/069834.html
and sent the updated version of the PEP in to the official repository.

@larsmans larsmans commented on the diff Apr 11, 2014

doc/neps/return-of-revenge-of-matmul-pep.rst
+possible to switch between the conventions, because numpy provides two
+different types with different ``__mul__`` methods. For
+``numpy.ndarray`` objects, ``*`` performs elementwise multiplication,
+and matrix multiplication must use a function call (``numpy.dot``).
+For ``numpy.matrix`` objects, ``*`` performs matrix multiplication,
+and elementwise multiplication requires function syntax. Writing code
+using ``numpy.ndarray`` works fine. Writing code using
+``numpy.matrix`` also works fine. But trouble begins as soon as we
+try to integrate these two pieces of code together. Code that expects
+an ``ndarray`` and gets a ``matrix``, or vice-versa, may crash or
+return incorrect results. Keeping track of which functions expect
+which types as inputs, and return which types as outputs, and then
+converting back and forth all the time, is incredibly cumbersome and
+impossible to get right at any scale. Functions that defensively try
+to handle both types as input and DTRT, find themselves floundering
+into a swamp of ``isinstance`` and ``if`` statements.
@larsmans

larsmans Apr 11, 2014

Contributor

Actually handling both types as input is not so hard if you do it like scikit-learn, were matrices may come in, but never come out. The trouble starts when you also produce np.matrix when you get it as input.

@stefanv

stefanv Apr 11, 2014

Contributor

The point is still that libraries have to be aware of this double-class
convention, otherwise only the broken sub-class solutions works (or
doesn't).

@rkern

rkern Apr 11, 2014

Member

Dealing with matrix in this fashion is not really an additional overhead to allowing lists or other sequences as inputs: you just always use np.asarray() on the inputs. If everyone did this, though, matrix would be useless since it would disappear whenever you passed one to a function. scikit-learn accepts this; it doesn't care about supporting matrix users except to transparently consume their data. It also doesn't have many functions that one would really use in expressions, so it's not much loss for matrix users. But there are a lot of other functions that do need to output matrix objects to be useful to matrix users, and this is difficult so few people do it right. That's why the status quo sucks. You can choose between having an operator for matrix multiplication by using the matrix type or having the full range of the community's code available to you. You can't have both. And just to be clear, there is nothing about this situation that depends on the implementation details of numpy.matrix. Subclass or not, all of these issues remain as long as you have two types that you want to interoperate.

@larsmans

larsmans Apr 11, 2014

Contributor

@rkern I agree with you. What I was really trying to point out is that DTRT is ambiguous in this context, because to a new user there may seem to be two "obvious" ways of "doing it right".

@rkern

rkern Apr 11, 2014

Member

Point.

@njsmith

njsmith Apr 11, 2014

Owner

Ah, I see. That sentence was meant as an alternative to the previous one,
i.e., the first point is that keeping track of which functions expect what
is impossible, so the obvious alternative is to write functions in such a
way that you don't need to keep track, but that is also impossible...

On Fri, Apr 11, 2014 at 1:27 PM, Lars Buitinck notifications@github.comwrote:

In doc/neps/return-of-revenge-of-matmul-pep.rst:

+possible to switch between the conventions, because numpy provides two
+different types with different __mul__ methods. For
+numpy.ndarray objects, * performs elementwise multiplication,
+and matrix multiplication must use a function call (numpy.dot).
+For numpy.matrix objects, * performs matrix multiplication,
+and elementwise multiplication requires function syntax. Writing code
+using numpy.ndarray works fine. Writing code using
+numpy.matrix also works fine. But trouble begins as soon as we
+try to integrate these two pieces of code together. Code that expects
+an ndarray and gets a matrix, or vice-versa, may crash or
+return incorrect results. Keeping track of which functions expect
+which types as inputs, and return which types as outputs, and then
+converting back and forth all the time, is incredibly cumbersome and
+impossible to get right at any scale. Functions that defensively try
+to handle both types as input and DTRT, find themselves floundering
+into a swamp of isinstance and if statements.

@rkern https://github.com/rkern I agree with you. What I was really
trying to point out is that DTRT is ambiguous in this context, because to a
new user there may seem to be two "obvious" ways of "doing it right".


Reply to this email directly or view it on GitHubhttps://github.com/numpy/numpy/pull/4351/files#r11529482
.

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

@cjw43

cjw43 Apr 11, 2014

On 11-Apr-2014 7:39 AM, Stefan van der Walt wrote:
  In doc/neps/return-of-revenge-of-matmul-pep.rst:
  > +possible to switch between the conventions, because numpy provides two

+different types with different __mul__ methods. For
+numpy.ndarray objects, * performs elementwise multiplication,
+and matrix multiplication must use a function call (numpy.dot).
+For numpy.matrix objects, * performs matrix multiplication,
+and elementwise multiplication requires function syntax. Writing code
+using numpy.ndarray works fine. Writing code using
+numpy.matrix also works fine. But trouble begins as soon as we
+try to integrate these two pieces of code together. Code that expects
+an ndarray and gets a matrix, or vice-versa, may crash or
+return incorrect results. Keeping track of which functions expect
+which types as inputs, and return which types as outputs, and then
+converting back and forth all the time, is incredibly cumbersome and
+impossible to get right at any scale. Functions that defensively try
+to handle both types as input and DTRT, find themselves floundering
+into a swamp of isinstance and if statements.

  The point is still that libraries have
    to be aware of this double-class
    convention, otherwise only the broken sub-class solutions works
    (or
    doesn't).


Stefan,
Thanks for this clarification.  
You say "Code that expects  an ``ndarray`` and gets a ``matrix``, or
vice-versa, may crash or return incorrect 
results.
"
I would look at this as an error in the existing code.  Would it not
be simpler to correct this than construct code for a new operator? 
Has this been explored?  It's been a long time since I looked at the
code.
It would help if some example of the need for "converting back and
forth" could be given
Colin W.
  —
    Reply to this email directly or view
      it on GitHub.
@njsmith

njsmith Apr 11, 2014

Owner

On Fri, Apr 11, 2014 at 2:43 PM, Colin J. Williams <notifications@github.com

wrote:

Stefan, Thanks for this clarification. You say "Code that expects an
ndarray and gets a matrix, or vice-versa, may crash or return
incorrect results. " I would look at this as an error in the existing
code. Would it not be simpler to correct this than construct code for a
new operator? Has this been explored?

No, it wouldn't be simpler. Fixing every piece of code to handle both types
as inputs and outputs means that every function must have special checks,
which creates a combinatorial explosion in code complexity and is a serious
obstacle to maintainability, testing, etc.

This is why if you look at actual code there are few (possibly zero)
libraries that actually handle mixed ndarray and matrix objects correctly
(even numpy itself doesn't in many cases) -- it's just not doable. Everyone
prefers to use ndarray objects with the cumbersome dot() function call
syntax for matrix multiplication. It's annoying, but it's still better than
checking that every single code path can handle arrays and matrices (and
combinations thereof etc). (In fact, there are some places where
scikit-learn is forced to handle matrix objects because scipy.sparse only
provides a matrix-style API, and even in these cases they don't actually
use '*' for matrix multiplication -- which is the point that started this
whole thread. Instead they have a special safe_sparse_dot() function which
provides a dot()-style API that works for both arrays and matrices.)

@stefanv

stefanv Apr 11, 2014

Contributor

@cjw43 Here's a library's code snippet for squaring the elements of an input array:

def square_it(A):
    return A * A

That code is now broken, even though the library author has never even heard of a Matrix class. The new, correct code, becomes:

def square_it(A):
    A = np.asarray(A)
    return A * A

That's not a problem as long as all consumers out there are aware of this possibility, but you are forcing a convention on programmers you don't control, and one which is auxiliary their intention. Contrast that to the following library code:

def square_it(A):
    return A * A

We know what this does. It squares the elements of the array.

def square_it(A):
    return A @ A

We know what this does, it performs a dot product between the two matrices.

samwyse commented Apr 21, 2014

This is probably a day late and a dollar short (as my Mom used to say) but has any serious consideration ever been given to defining dot (".") as an inner product operator? That's the way that APL does it, but every time it's brought up, it is dismissed out of hand. Yes, it would give an additional, different, meaning to ".", but "%" is used for very different purposes without too much confusion. Uisng the string "+._" in Python code is a syntax error, so we don't have to worry about breaking pre-existing code. I envision A+._B as syntactic sugar for

inner_product(operator.add, operator.mul)(A, B)

where inner_product is defined (for simple sequence objects) as equivalent to this:

class inner_product(object):
    def __init__(self, f, g):
        self.f, self.g = f, g
    def __call__(self, a, b):
        return reduce(self.f, (self.g(*pair) for pair in zip(a, b)))

The compiler would be allowed to instantiate +.* only once per sequence type, no matter how many times it appeared in the code, however non-standard sequence types (such as numpy's ndarrays) would be able to provide alternative implementations which would only be used by those types. (I envision a dict attached to the PySequenceMethods struct that would cache, for example, inner_product(operator.add, operator.mul) under the key '"+.*")

Member

seberg commented Apr 21, 2014

@samwyse, I am not sure how this could be an option. . already is the __getattr__ operator. There would be no way I can see to distinguish between A @ B and A . B (or as it would normally be written: A.B), neither for the user nor for the language.

Owner

njsmith commented Apr 21, 2014

Hi Sam,

I think the challenge you'd face if you tried proposing this to
python-ideas/-dev is to justify what all the extra complexity gets you.
This proposal would require a lot of extra machinery that Python doesn't
otherwise use (there are no other "meta-operators" like this), and it's not
clear that the extra complexity provides extra value -- like, it allows a
lot of other weird operators like *.+ and /./ and so forth, but would
anyone actually use these? If you look at the discussion of the PEP on the
Python lists, the core devs spoke explicitly about how they felt it hit the
"sweet spot" of being just the minimal bit of added functionality needed to
provide 99% of the value.

-n

On Mon, Apr 21, 2014 at 5:39 PM, Sam Denton notifications@github.comwrote:

This is probably a day late and a dollar short (as my Mom used to say) but
has any serious consideration ever been given to defining dot (".") as an
inner product operator? That's the way that APL does it, but every time
it's brought up, it is dismissed out of hand. Yes, it would give an
additional, different, meaning to ".", but "%" is used for very different
purposes without too much confusion. Uisng the string "+._" in Python code
is a syntax error, so we don't have to worry about breaking pre-existing
code. I envision A+._B as syntactic sugar for

inner_product(operator.add, operator.mul)(A, B)

where inner_product is defined (for simple sequence objects) as equivalent
to this:

class inner_product(object):
def init(self, f, g):
self.f, self.g = f, g
def call(self, a, b):
return reduce(self.f, (self.g(*pair) for pair in zip(a, b)))

The compiler would be allowed to instantiate +.* only once per sequence
type, no matter how many times it appeared in the code, however
non-standard sequence types (such as numpy's ndarrays) would be able to
provide alternative implementations which would only be used by those
types. (I envision a dict attached to the PySequenceMethods struct that
would cache, for example, inner_product(operator.add, operator.mul) under
the key '"+.*")


Reply to this email directly or view it on GitHubhttps://github.com/numpy/numpy/pull/4351#issuecomment-40950255
.

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

samwyse commented Apr 21, 2014

@seberg: I'm proposing "." as a meta-operator, which would not be used by itself. That particular character is currently used to access attributes, and also to separate the integral and fractional parts of floating point numbers. This would be a third use.

samwyse commented Apr 21, 2014

@njsmith: Way back when I learned APL, I recall lots of ways that f,g could be used besides matrix multiplication. For example, '<.&' would tell you if every number in a list was smaller that the corresponding number in another list. That said, my use of '&' in my example (as a logical and) has different semantics from Python's definition ('&' is bitwise and). Really, you'd want to use '<.and' which I suspect would really only be parsable because of 'and' being a keyword; and yes, Python already allows 'all(x < y for x, y in zip(a,b))' which means the same thing (albeit not as concisely). Iverson provides several examples, but they tend to use APL's 'max' and 'min' operators, which in Python aren't even keywords, and so would look to the compiler a lot like attribute access; I suspect that a proposal to implement GCC's minimum and maximum operators ('<?' and '>?') would be met with cat-calls.

Owner

njsmith commented Apr 21, 2014

I think a relevant data point here is that back in the day, Guido was
reluctant to even add the power operator ** -- it didn't show up until
Python 1.4. Since then, the only new punctuation operators have been //
(which was necessary to resolve the int/float mess), and now @. So yeah, I
don't think a proposal for a full APL-like system, min/max operators, etc.,
is likely to go very far :-).

On Mon, Apr 21, 2014 at 8:17 PM, Sam Denton notifications@github.comwrote:

@njsmith https://github.com/njsmith: Way back when I learned APL, I
recall lots of ways that f,g could be used besides matrix multiplication.
For example, '<.&' would tell you if every number in a list was smaller
that the corresponding number in another list. That said, my use of '&' in
my example (as a logical and) has different semantics from Python's
definition ('&' is bitwise and). Really, you'd want to use '<.and' which I
suspect would really only be parsable because of 'and' being a keyword; and
yes, Python already allows 'all(x < y for x, y in zip(a,b))' which means
the same thing (albeit not as concisely). Iverson provides several
examples, but they tend to use APL's 'max' and 'min' operators, which in
Python aren't even keywords, and so would look to the com;iler a lot like
attribute access; I suspect that a proposal to implement GCC's minimum and
maximum operators ('<?' and '>?') would be met with cat-cal ls.


Reply to this email directly or view it on GitHubhttps://github.com/numpy/numpy/pull/4351#issuecomment-40966469
.

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

cjw43 commented Apr 22, 2014

I see the relative merits of a
    Matrix Class and the proposed operator as having no been
    adequately explored.
    The destruction of the Class was not made clear in the PEP.
    Colin W.
On 4/21/2014 12:39 PM, Sam Denton
  wrote:

  This is probably a day late and a dollar short (as my Mom used
    to say) but has any serious consideration ever been given to
    defining dot (".") as an inner product operator? That's the way
    that APL does it, but every time it's brought up, it is
    dismissed out of hand. Yes, it would give an additional,
    different, meaning to ".", but "%" is used for very different
    purposes without too much confusion. Uisng the string "+.*" in
    Python code is a syntax error, so we don't have to worry about
    breaking pre-existing code. I envision A+.*B as syntactic sugar
    for
  inner_product(operator.add, operator.mul)(A, B)

  where inner_product is defined (for simple sequence objects) as
    equivalent to this:
  class inner_product(object):
def __init__(self, f, g):
    self.f, self.g = f, g
def __call__(self, a, b):
    return reduce(self.f, (self.g(*pair) for pair in zip(a, b)))

  The compiler would be allowed to instantiate +.* only once per
    sequence type, no matter how many times it appeared in the code,
    however non-standard sequence types (such as numpy's ndarrays)
    would be able to provide alternative implementations which would
    only be used by those types. (I envision a dict attached to the
    PySequenceMethods struct that would cache, for example,
    inner_product(operator.add, operator.mul) under the key '"+.*")
  —
    Reply to this email directly or view
      it on GitHub.
Member

rkern commented Apr 22, 2014

@cjw43 There is no "destruction of the Class" to be discussed in the PEP. If you are talking about the possible deprecation of the matrix class, it will only come quite a few years from now, if at all, and is irrelevant to the PEP as it is entirely internal to numpy.

Please, please, please use the Github interface for commenting or else trim your email replies to just your content. But as the PEP has already been accepted, continuing to press this particular point does not appear to be a good use of anyone's time. The decisionmakers in the core Python dev team consider the amount of discussion in the PEP as it stands perfectly adequate.

Owner

charris commented Jul 6, 2014

@njsmith Ready to merge?

Owner

njsmith commented Jul 6, 2014

Oh right, this PR.

Sure, I guess that makes sense to save the history, though I suppose the
version in the PEPs repository is the canonical one now. They should be
identical though.
On 6 Jul 2014 17:42, "Charles Harris" notifications@github.com wrote:

@njsmith https://github.com/njsmith Ready to merge?


Reply to this email directly or view it on GitHub
#4351 (comment).

@charris charris added a commit that referenced this pull request Jul 6, 2014

@charris charris Merge pull request #4351 from njsmith/matmul-pep
A new PEP for infix matrix multiplication
1c71d46

@charris charris merged commit 1c71d46 into numpy:master Jul 6, 2014

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details
Owner

charris commented Jul 6, 2014

History saved! Thanks for that.

@charris charris added a commit to charris/numpy that referenced this pull request Jul 7, 2014

@charris charris Merge pull request #4351 from njsmith/matmul-pep
A new PEP for infix matrix multiplication
7d047ad

jiahao commented Nov 24, 2015

Super late to the party, but I'd like to cross-reference JuliaLang/julia#4774 here, where it looks like very similar discussions have taken place in the context of how to define vector transposition.

Re the use of a potential @@ operator. In a lot of mathematical contexts, we actually consider exp(A) for some matrix A. exp(A)=I + A + A@! + A@! + A@! + .... continuing like the Taylor Series. Other useful contexts for the @@ operator exist, but it is definitely specifically for math people and may be noise to most non-scientific programmers.

dimpase commented Jan 30, 2016

On 30 January 2016 at 17:12, owendavey notifications@github.com wrote:

Re the use of a potential @@ operator. In a lot of mathematical contexts,
we actually consider exp(A) for some matrix A. exp(A)=I + A A@! + A@!

  • A@! + .... continuing like the Taylor Series. Other useful contexts
    for the @@ operator exist, but it is definitely specifically for math
    people and may be noise to most non-scientific programmers.

In computer graphics, one multiplies matrices a lot. Considering that this
is certainly going beyond "science",
I don't quite understand what you meant to say.

@dimpase - I was not referring to matrix multiplication. I was referring to matrix exponentiation - hence the double @ operator which above seems to have been considered not important enough to be added in addition to the single @ operator of matrix multiplication for this PEP.

Member

rkern commented Jan 30, 2016

The discussion here has ended. The PEP has been accepted and implemented. If you would like to propose or discuss the merits of further changes, please find an appropriate mailing list. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment