-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
matrix multiplication expression blowup control #18049
Conversation
✅ Hi, I am the SymPy bot (v149). I'm here to help you write a release notes entry. Please read the guide on how to write release notes. Your release notes are in good order. Here is what the release notes will look like:
This will be added to https://github.com/sympy/sympy/wiki/Release-Notes-for-1.6. Note: This comment will be updated with the latest check if you edit the pull request. You need to reload the page to see it. Click here to see the pull request description that was parsed.
Update The release notes on the wiki have been updated. |
Normally a matrix For this reason two separate functions - Trivial examples (since this PR is really meant for large complicated matrices):
For an example of a larger matrix which does not currently terminate see |
Some benchmarks using sympy_benchmarks. Four different matrices exponentiated in three ways - "master" which is just the current master branch
Matrices used:
|
Codecov Report
@@ Coverage Diff @@
## master #18049 +/- ##
=============================================
+ Coverage 74.855% 74.946% +0.091%
=============================================
Files 640 642 +2
Lines 166551 166930 +379
Branches 39176 39275 +99
=============================================
+ Hits 124672 125109 +437
+ Misses 36361 36299 -62
- Partials 5518 5522 +4 |
sympy/matrices/common.py
Outdated
|
||
return self.mul(other) | ||
|
||
def mul(self, other, mulsimp=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using the existing Matrix.multiply
in https://docs.sympy.org/latest/modules/matrices/matrices.html#sympy.matrices.matrices.MatrixBase.multiply?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MatrixBase.mul()
function essentially took the body of MatrixBase.__mul__()
but allowed the extra mulsimp
parameter so that is why I left it in common.py
, I don't like the idea of transposing code from common into another module (like moving this code into Matrix.multiply
) and I also didn't want to add a mulsimp
keyword to the __mul__()
function since that would be non-standard, so I split the function.
As for the regressions there are two extra calls inserted into the chain for multiplication, __mul__()
-> mul()
and then call into dotprodsimp()
where originally the operation was carried out directly in the individual _eval_matrix_mul()
functions. I did this on the correct by my view suggestion of @oscarbenjamin that I centralize the simplifying inner loop across all matrix type multiplies.
Other source could be the use of iterables instead of sequences in dotprodsimp()
to make it more general. I will look at speeding this up in the centralized function but another orthogonal option is to bring back the individual _eval_matrix_mul()
function bodies to be used in the non-simplified case and call out to dotprodsimp()
only if simplification is requested. Will experiment...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be better to have a separate function for the mulsimp multiply and mul can just dispatch to one or other function leaving the existing _eval_matrix_mul
routines unmodified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The slowdown is due entirely to calling out to the dotprodsimp()
function - even if I modify the function to do the simplest fastest dot product - it is still slow due to the overhead of calling the function and passing iterators I guess. So yes, what you say is the route I am taking now, only calling out to dotprodsimp()
if needed. Otherwise the _eval_matrix_mul
functions will essentially do what they did before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually meant by moving MatrixBase.multiply
to common.py
.
It should already have been moved to common.py
. If we add another mul
, we may end up with having a duplicate function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, weird, multiply_elementwise
is in MatrixArithmetic
and multiply
is in MatrixBase
so they probably should both be in common. As for naming, as a function multiply
only appears in matrices.py
and mathml.py
and does not seem to actually be called from anywhere in SymPy, whereas mul
seems to be the common shorthand across many modules.
So how about I move the multiply
into common.py
alongside multiply_elementwise
and leave the mul
since it is such so common and we do wind up with a duplicate function but it doesn't actually harm anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the duplicates will be regretted and may have to be deprecated in the future.
Even if you don't like the naming, it's better to respect the old one already came before.
Or you should better get agreed upon deprecating Matrix.multiply
and introduce some shorthand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will go with MatrixArithmetic.multiply
and nuke mul
and MatrixBase.multiply
.
There are some benchmark regressions
And if the default flag |
Good catch on the slowdown. The mistake I made was assuming that if I did not apply the simplification step the performance would be equal to the original master and thus tested like this. Its surprising that it was simply a call to a separate dotprod function that caused it so I restored the summation to the inner loops of Also I moved the core Revised benchmarks, the standard "master" perfomance improves a good bit but the "master + simplify" stays huge since the hit there is 99.99% from the
|
I think that the current approach will complicate the code a lot like We should introduce a similar context manager like
|
That would be easy enough to do but it touches on something I had a pretty long back and forth in #17247 with @oscarbenjamin, the idea of setting a global operating mode for the matrix multiplication - simplified or not. TL;DR he doesn't like the ideas of globals or even thread-local-storage for setting the operating mode, which is what a context manager would do in its operation even if only in its own isolated scope. The other caveat being that on each context switch the cache would need to be flushed since a function with the same sig could return a different value. Is there a way to flush only certain functions from the cache, not the whole thing? I could do this, even leaving the optional On the other hand if all you want to do is avoid the ugliness of |
There are many bug reports for with evaluate(False):
# The author of some_function had no idea that you would do this
x = some_function(a * b) Maybe in the end some kind of context manager might be a useful way to go but for now I think it is better to get the different options implemented so that they can be used explicitly in places where blowup is known to happen. For the particular case of multiplication that mostly happens when multiplying in a loop (i.e. matrix powers). Some other cases like the Jordan-power method are improved but those only use a couple of multiplies so post-simplification is typically viable. There are other places like rref, det etc that also suffer from blowup but don't use multiplication (those would most likely be the problem for Jordan-pow and matrix exponential). If we can get implementations of those things working then we can discuss what the default behaviour of things like
If you are worried about complicating code do you mean code internal to sympy or code written by end users? I think that introducing global state will ultimately complicate the internals of sympy much more than having explicit method calls. |
Can you give me some concrete examples of expressions which suffer from this? |
The elimination algorithms in |
Depending how those multiplications are done it might be feasible to do a similar simplification, though that would be for another PR, this one deals with multiplication so I would say if it is good enough and nothing else is to be added like the context manager then it is good enough? |
The example in #16823 is for rref. It's not hard to make others: you just need a Matrix of
The time spent in charpoly there is in post-simplification. If you call it as |
Ok, just a preliminary check...
Made two small changes to
to:
And
to:
These make charpoly() return nearly instantly with a clean poly.
to:
Which returns a cleaner rref since expanding the sum of two scaled matrices forces the like terms to cancel. But again, just a quick prelim of one case, but seems doable. |
I think that the fundamental source of expression blowup in the examples we have considered is the fact that sympy doesn't expand There are probably many places that can be used to control blowup but that doesn't all need doing in one PR. The only thing to consider here is what are the most useful functions to define for use elsewhere. A matrix-multiply is clearly useful. The dotprodsimp function seems less likely to be useful in all situations (e.g. you haven't used it in rref there because it no longer makes sense in that context even though essentially the same simplification is needed). Btw I've just noticed that there is a function |
Yes.
Also yes, separate PR please.
Umm, I wouldn't expect dotprodsimp to be useful in an operation which is not actually a dot product such as in rref which is a simple sum of scaled matrices, which is not exactly the case you delineate at the top, this is more like a*(x+y) + b*(s+t). EDIT: Which kind of is a dot product I guess but it doesn't need the full treatment, or maybe it will if I look at other cases.
Sure. My question is how do you want to proceed... |
I guess there are two parts to dotprodsimp used in matrix multiply:
Is it possible (or desirable) to separate those? I think that the inner The post-simplification could be done as a separate step at the whole matrix level and is presumably beneficial in the case of computing a matrix power by repeated multiply. In the context of the matrix power operation as a whole it isn't really a post-simplification because it happens in between each inner multiply. What I wonder is if the post-simplification part of
I'm not sure about the API or general applicability of For the future the things that I think can (possibly) be improved are:
I think that the answers to some of these questions might become clearer when looking at more cases of blowup in other operations. I'd like to know whether @sylee957 thinks this is mergeable as well though. |
I moved the
Maybe, though I've left it as a single function for now until the use-case for separating appears when I look at other functions.
It is not really tuned exclusively to matrices but to the kind of blowup that occurs in any generic sum of products.
The quick prelim I did seems to point to
For now
Might be slower to try to detect than the operation itself. The difference in execution time for |
In my opinion, it should better to have the keyword symmetrical like So I would expect
|
Also, I just remembered I read an article a couple of weeks ago that says a new method has been discovered to derive eigenvectors from just eigenvalues, here is the article: https://www.quantamagazine.org/neutrinos-lead-to-unexpected-discovery-in-basic-math-20191113/ Ignore the neutrino stuff, its the eigen stuff which is potentially revolutionary. Has either of you heard of this or has access to smart-people publications which may have or may mention this? |
Okay, I wouldn't say anything more about the keyword naming. |
Ok, |
The actual paper can be https://arxiv.org/pdf/1908.03795.pdf where you can find the formula. I have made the quick demonstration below
But there can be some limitations like
But if this can be implemented, it can be considered as some other named function |
Ok, well it is interesting then but of no real practical use here :( |
The restriction to Hermitian matrices is not a big problem as symmetric/Hermitian is an important use case. However only finding the mod-square of the components of the eigenvector is more of a problem. If the eigenvectors were known to be real you would only need to figure out the sign of each component but although Hermitian matrices have real eigenvalues the eigenvectors can still be complex. However real, symmetric matrices always have real eigenvectors so in that case you can just figure out the sign of each component. It could be worth investigating as a method for finding eigenvectors of a real symmetric matrix. The big problem with analytically finding eigenvectors is that in general the eigenvalues/eigenvectors may not have simple forms. You mentioned the blowup of the roots earlier but you only have to look at the general solutions for cubics/quartics to see where that blowup comes from. Of course in some situations the matrix can be large but still have simple eigenvalues and eigenvectors. Then there is the possibility to eliminate avoidable blowup. |
From @sylee957's example: In [15]: M = Matrix([[ 1, 1, -1],[ 1, 3, 1],[-1, 1, 3]])
In [16]: M
Out[16]:
⎡1 1 -1⎤
⎢ ⎥
⎢1 3 1 ⎥
⎢ ⎥
⎣-1 1 3 ⎦
In [17]: s1, s2, s3 = symbols('s1, s2, s3', real=True)
In [18]: vect = Matrix([s1*2, s2*1, s3*1])
In [19]: M*vect
Out[19]:
⎡ 2⋅s₁ + s₂ - s₃ ⎤
⎢ ⎥
⎢2⋅s₁ + 3⋅s₂ + s₃ ⎥
⎢ ⎥
⎣-2⋅s₁ + s₂ + 3⋅s₃⎦
In [20]: solve([M*vect, s1**2-1, s2**2-1, s3**2-1], [s1, s2, s3], dict=True)
Out[20]: [{s₁: -1, s₂: 1, s₃: -1}, {s₁: 1, s₂: -1, s₃: 1}]
In [21]: srep = solve([M*vect, s1**2-1, s2**2-1, s3**2-1], [s1, s2, s3], dict=True)[0]
In [22]: vect.subs(srep)
Out[22]:
⎡-2⎤
⎢ ⎥
⎢1 ⎥
⎢ ⎥
⎣-1⎦ (Note that this involves solving a linear system similar to the one that would be solved to find the eigenvectors in the normal way though...) |
Which could be done with a simple dot product and compare the sign with the sign of the eigenvalue times the sqrt of the v(i,j) if I understand this correctly. Might allow finding eigenvects for much larger matrices than is currently possible. Might check into it if you think it is worth it for real symmetric matrices but must finish this PR first :) |
Which btw I am obviously on track to do all the blowup control in one push, not splitting it up as we had said, hope that is ok... |
I take it back, you are right, it is a system, I wonder if there would be a way to hack it though given the unique circumstances... EDIT: I mean the fact that each element of the solution can only take on one of two values instead of an infinite range. EDIT: Excluding zeros... |
Often, having a lot of stuff in one PR makes it difficult to deal with conflicts. |
Ok, well I just pushed what I have so far, will leave it there and continue with remaining methods in different PR then once this is merged. Also do you want me to push the slow but illustrative Two minor changes here unrelated to |
Currently, the property
And probably that was the reason why I didn't touch that part in my previous attempt in #15887 |
What about making the base |
Oops on the docs, this should be ready to merge if all tests pass. |
sympy/matrices/matrices.py
Outdated
feature='clear_subproducts', | ||
deprecated_since_version=1.4, | ||
issue=15887 | ||
).warn() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about moving the deprecation warnings to is_diagonalizable
, and also delete arbitrary kwargs
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done...
Thank you for contributing. |
There are still some functions left to add |
References to other Issues or PRs
Fixes #17246
Brief description of what is fixed or changed
Matrices tend to blow up during certain operations, this PR introduces an optional intermediate simplification step during multiplication which can keep these blowups in check and permit certain matrix operations to successfully complete on larger matrices where they currently do not.
Other comments
This PR is a restart of a larger previous PR #17247 since that thread got too long to quickly digest for for the person reviewing this.
Release Notes