New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default value of simplify flag of rref() should be True #10120

Open
MooVI opened this Issue Nov 5, 2015 · 15 comments

Comments

Projects
None yet
5 participants
@MooVI
Contributor

MooVI commented Nov 5, 2015

The default value of the simplify flag of rref() is False, which is confusing because it means some symbolic calculations do not work by default. This was recently highlighted in #10056 , which was caused by linsolve not setting the flag correctly. I notice in matrices.py both rank and QRDecomposition call rref without simplify by default, which also risks causing problems.

In this stackoverflow question about this issue @asmeurer says:

The option is off by default because in general such simplification can be expensive, through this can be controlled by passing in a less general simplify function than simplify (for rational functions, use cancel). The defaults here could probably be smarter.

Indeed, you would think it should be an option which you could turn off for speed if you knew it was safe, rather than the other way around, as users (like the stackoverflow user) might not know about the flag.

@aktech

@moorepants

This comment has been minimized.

Member

moorepants commented Nov 5, 2015

-1. Simplification is too slow of an operation to have the default as True. In general, no SymPy computations should rely on simplification. I agree with @asmeurer's comment.

@aktech

This comment has been minimized.

Member

aktech commented Nov 5, 2015

I think there could probably be some pre-processing in rref so that it knows, when to simplify the things.

@MooVI

This comment has been minimized.

Contributor

MooVI commented Nov 5, 2015

@moorepants Certainly simplification is slow, but slow is better than silently being wrong. I agree with @aktech 's suggestion if it is possible. At the very least on encountering the symbolic entries in the matrix rref could spit out a warning that it may well give an incorrect result.

@moorepants

This comment has been minimized.

Member

moorepants commented Nov 5, 2015

If you get wrong answers without simplification then there is likely another issue that needs to be dealt with.

Note that we solve linear systems in the sympy mechanics package that have huge expressions. If you want some test cases that show how detrimental the simplification routines can be, we can provide them.

@MooVI

This comment has been minimized.

Contributor

MooVI commented Nov 5, 2015

In the part of his answer @asmeurer I didn't quote, he explains why simplification is fundamentally necessary for rref() with symbolic entries:

The rref algorithm fundamentally requires the ability to tell if the elements of the matrix are identically zero. In SymPy, the simplify=True option instructs SymPy to simplify the entries first at the relevant stage of the algorithm. With symbolic entries, this is necessary, as you can easily have symbolic expressions that are identically zero but which don't simplify to such automatically, like x_(x - 1) - x_*2 + x.

I don't doubt this slows the algorithm down, and obviously the flag should remain an option so you can go fast if you want for large expressions which you know are safe. With the current default, most users won't initially be aware of the importance of this flag, and assume that rref() is giving them the correct result when it isn't. If instead the default is True, then once they find it is running slow they will look in the docs and see that they might be able to set it to false.

Of course, whether functions like linsolve which use rref() should expose this flag or not is another matter. I think they should, because then you could use linsolve for your large systems, but my understanding is that @aktech thinks this will complicate the input api.

@moorepants

This comment has been minimized.

Member

moorepants commented Nov 5, 2015

I see, yes the rref zero detection is an issue. I wonder if evaluating the expression with numbers could be faster than trying to simplify the expression. For an arbitrary expression in a matrix entry, simplification is likely the slowest possible way to determine if it is zero. This is an issue with all symbolic manipulators and others must have some solutions for this. Have you researched how this is done in other systems? Simplification is an obvious choice to find out if an expression is zero but I'd much rather see if there is a fast way to do this.

@asmeurer

This comment has been minimized.

Member

asmeurer commented Nov 5, 2015

Zero equivalence is an impossible problem in general (see Richardson's Theorem). For restricted classes of functions it is computable (e.g., for rational functions). I my opinion the only completely right way to solve this is to have domains for matrices, e.g., a matrix over QQ(x) would only allow rational functions in x with rational number coefficients as its entries. This domain would know how to solve its own zero equivalence, and the most efficient way to do it (with the right representation, zero equivalence becomes trivial).

That's a larger project, however, I do think we could be smarter here. Maybe use something a little smarter, but still not the full simplify. Also, fast checks using equals could be used to rule out expressions that definitely aren't zero.

@MooVI

This comment has been minimized.

Contributor

MooVI commented Nov 5, 2015

Possibly of relevance here is the Mathematica notes on implementation for symbolic linear algebra.

RowReduce, LinearSolve, NullSpace, and MatrixRank are based on Gaussian elimination. ... Zero testing for various functions is done using symbolic transformations and interval-based numerical approximations after random numerical values have been substituted for variables.

But how in practice you'd go about implementing without falling into the obvious pitfalls with substituting in random values, I don't know

@moorepants

This comment has been minimized.

Member

moorepants commented Dec 17, 2015

@MooVI If you want to submit this change, then let's do it. But we need to test it for speed, so a new benchmark needs to be added to:

https://github.com/sympy/sympy_benchmarks

This benchmark should test a Matrix with fairly substantial expressions.

@MooVI

This comment has been minimized.

Contributor

MooVI commented Dec 17, 2015

@moorepants OK, I'll add a submit changing the value of the flag shortly when I have a moment. Having been using it myself I can probably supply such matrices, and I guarantee you it will be very slow (so slow I had to switch to Mathematica in the end), but at least it won't silently give the wrong answers.

I profiled, and this slowness is of course entirely due to the slowness of simplify, for my case of cancel in particular. I played around with implementing something like how Mathematica does it with random evaluation, which I think is ultimately the answer for practical scientific problems, but I'm afraid it will require someone with better knowledge of numerical computing to get it to work at all stably.

@moorepants

This comment has been minimized.

Member

moorepants commented Dec 17, 2015

Ok, sounds good. I guess the main debate on PR with this functionality will be the default setting of simplify. My vote is to make it default to off so that problems that don't need simplification to zero will still be fast and to have a big bold warning telling you to try simplify True if you suspect wrong answers. But I don't really know. This is tricky. We should open an issue about adding in a more efficient zero check.

@moorepants

This comment has been minimized.

Member

moorepants commented Dec 17, 2015

This issue #9121 is also relevant.

@siefkenj

This comment has been minimized.

Contributor

siefkenj commented Sep 11, 2016

PR #11554 should mostly resolve this. The code is also set up to be able to raise warnings if non-zeroness of a pivot had to be assumed, though warnings aren't currently raised.

@MooVI MooVI referenced this issue Nov 2, 2016

Closed

Incorrect rref #10718

@moorepants

This comment has been minimized.

Member

moorepants commented Dec 28, 2016

@siefkenj You say "mostly" above. Should this issue be closed?

@siefkenj

This comment has been minimized.

Contributor

siefkenj commented Dec 28, 2016

As far as the simplify=True, the issue is resolved. What isn't resolved is giving a warning if the result might not be accurate. Perhaps that should be a separate bug, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment