Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Theano computation output #1905

Merged
merged 25 commits into from Mar 27, 2013
Merged

Theano computation output #1905

merged 25 commits into from Mar 27, 2013

Conversation

mrocklin
Copy link
Member

This transforms SymPy expressions into Theano computations. Theano can then be used to generate relatively high performance code. Perhaps this is an alternative to lambdify? Theano can use numpy or CUDA under the hood.

In [1]: expr = x**y + 1
In [2]: from sympy.printing.theanocode import theano_code
In [3]: output = theano_code(expr)
In [4]: inputs = map(theano_code, (x, y))

In [5]: from theano import function
In [6]: f = function(inputs, output)
In [7]: f(2, 3)
Out[7]: 9.0

@mrocklin
Copy link
Member Author

What is the best way to shield this code in case Theano isn't available on the system? For example, I fully expect travis to choke on this PR.

@mrocklin
Copy link
Member Author

This work derives from a project I did with @nouiz
github.com/nouiz/theano_sympy

@asmeurer
Copy link
Member

Checkout how the tests in external are done.

@asmeurer
Copy link
Member

So, um, now that you know how the printers work...

@mrocklin
Copy link
Member Author

So, um, now that you know how the printers work...

Ha! Maybe sometime, dot-printing isn't anywhere near the top of my stack though. In general though I still don't think that the printer system handles the dot situation well though. The output format isn't strictly hierarchical like all of our other printing systems. I don't think that they're a good fit.

@mrocklin
Copy link
Member Author

SymPy Bot Summary: ✳️ Passed after merging mrocklin/theano-print (6d6c6ca) into master (33a926c).
✳️ Python 2.7.3-final-0: pass

@jrioux
Copy link
Member

jrioux commented Mar 18, 2013

SymPy Bot Summary: 🔴 Failed after merging mrocklin/theano-print (6d6c6ca) into master (b498c0f).
@mrocklin: Please fix the test failures.
🔴 PyPy 2.0.0-beta-1; 2.7.3-final-42: fail
🔴 Python 2.7.2-final-0: fail
🔴 Python 3.2.1-final-0: fail
✳️ Sphinx 1.1.3: pass
Docs build command: make clean && make html-errors && make latex && cd _build/latex && xelatex sympy-*.tex

@jrioux
Copy link
Member

jrioux commented Mar 19, 2013

SymPy Bot Summary: ✳️ Passed after merging mrocklin/theano-print (7ddd3f9) into master (592024a).
✳️ PyPy 2.0.0-beta-1; 2.7.3-final-42: pass
✳️ Python 2.7.2-final-0: pass
✳️ Python 3.2.1-final-0: pass
✳️ Sphinx 1.1.3: pass
Docs build command: make clean && make html-errors && make latex && cd _build/latex && xelatex sympy-*.tex

@Krastanov
Copy link
Member

+1. I would be extremely happy to see a working and cleanly implemented lambdify. For the moment the original one is neither, while the experimental one might be working but is certainly not cleanly implemented.

However this PR will fail on stuff like Integral(sqrt(x**2+x+1),x) or bra*ket from quantum, right? Or even on LambertW...

The way this was "solved" in experimental_lambdify was to wrap the expression in evalf which returns uselessly high precision done in mpmath/sympy but at least returns. Is there a way for this to be done in theano (i.e. feeding it a blackbox function containing the evalf)?

@mrocklin
Copy link
Member Author

We could create a SymPyOp in Theano that calls wraps up SymPy calls. They might also have a generic Python function wrapper; I'm not sure.

I wouldn't mind failing on these cases though. This encourages users to simplify their expressions as much as possible in SymPy before moving down to a computational layer.

E.g. "You're using Bras and Kets in numeric code? Maybe you should think about how those can be represented with vectors...."

A numeric approach to quantum mechanics needs to think about this at some point; I don't mind forcing the issue sooner rather than later during a profiling stage. I'm afraid that wrapping up SymPy code will leave people wondering "why isn't this working faster?"

I remember a conversation with you when you espoused the "garbage in, garbage out" philosophy. I think I usually take the opposite approach, requiring clean inputs.

@Krastanov
Copy link
Member

I don't mind forcing the issue sooner rather than later

I do agree about that. I like "garbage in, garbage out" in cases when it simplifies the live of the developer and forces the user to understand the workflow (not always a selling point). I like the aforementioned "forcing of the issue" for similar reasons.

However, as we already started talking about that, I would like to mention a tangential issue. For use in the prototyping stage (e.g. heavy use of the plotting module) one needs a:

  1. relatively fast evaluation
  2. of all existing SymPy objects

Issue 2. is trivial if we use subs().evalf() in a loop over a python list of points. Issue 1. is relatively simple with translation to numpy/python.math/theano. Having both together results in compromises like experimental_lambdify. I would actually prefer to remove both lambdify and experimental_lambdify, and leave a very simplistic function that translates to numpy, without all the lambdify workarounds just like what is done here. For the plotting module we can use subs().evalf(). The obvious issue is that is about 10-100 times slower. Hopefully the world will switch overnight to a magical pypy-based interpreter and we will make this switch. :)

Anyway, this is not an issue in this PR. Again, I am +1 for merging.

@mrocklin
Copy link
Member Author

Hrm, automated and robust use by things like the plotting module is a good point. I hadn't though of non-human use. Maybe nouiz's suggestion about raising a warning when using slow SymPy operations is a good idea.

In the particular case of plotting, do we want SymPy to depend on Theano? It's substantially rarer than numpy and I suspect that the compilation time wouldn't be sensible for rapid prototyping.

@Krastanov
Copy link
Member

In the particular case of plotting, do we want SymPy to depend on Theano? It's substantially rarer than numpy and I suspect that the compilation time wouldn't be sensible for rapid prototyping.

It would be amusing to have the option to play with, but from practical/architectural/philosophical point of view I am against it. My reasons are slow compilation and a lot of added complexity. As I said, I actually prefer to see even lambdify removed and experimental_lambdify simplified.

@Krastanov
Copy link
Member

By the way, some time ago you employed lambdify for the Monte Carlo sampling in the stats module, right? It raises the same issues as the plotting module:

  1. lambdify - ugly and not always working
  2. experimental_lambdify - ugly but usually working
  3. subs.evalf - slow but simple and always working

@mrocklin
Copy link
Member Author

Ever tried sympy.utilities.autowrap.ufuncify? Probably not as robust as you need but it gives me good results and seems simple (although I haven't dived too deeply.)

@Krastanov
Copy link
Member

I have forgotten about autowrap. Wow, this needs a wiki or a docs page...

@mrocklin
Copy link
Member Author

Yes, I think my approach was

try:
   return lambdify(...)
except:
    return subs(...)

@raoulb
Copy link

raoulb commented Mar 19, 2013

  1. lambdify - ugly and not always working
  2. experimental_lambdify - ugly but usually working
  3. subs.evalf - slow but simple and always working

I'd not stress the "always" too much. For example it stops
working once we hit a special function not supported by mpmath.

I'd have to search one but I remember that it happened.

@asmeurer
Copy link
Member

Wow, this needs a wiki or a docs page...

Yes, the code generation/translation from SymPy to numerical has grown into several modules. This definitely needs to be all documented nicely, especially since this is such a big use-case for SymPy.

I'd not stress the "always" too much. For example it stops
working once we hit a special function not supported by mpmath.

Maybe not a case of exactly that, but issue 3706 is an example of the printer breaking because evalf breaks.

But you probably know more then me if this case will happen frequently enough to justify to implement it.

The example that @Krastanov gave of an Integral is in general very slow (on the order of half a minute to compute a single value). I think we would be better off shipping off to a compiled numerical integrator in that case. @mrocklin's argument that the user should try to simplify things isn't really helpful here, though. Yes, obviously the user should call integrate() instead of Integrate, and if he gets a closed-form solution, it will be orders of magnitude faster to evaluate than the integral itself. But, powerful as they are, our integrators are limited, and so in real life you are always going to come across an integral that SymPy can't do, sometimes because the algorithms aren't good enough, and sometimes just because there is no nice closed-form answer, even in terms of special functions. Furthermore, if you do get an answer, there's a good chance that it will be in terms of special functions. In the most general case with the meijerg algorithm, you will get a Meijer G-function, again, sometimes because hyperexpand isn't good enough yet, but also sometimes because there really isn't a better special function to represent it. Google seems to suggest that the only Python numerical implementation of this is in mpmath (note that unlike Integral, numerical evaluation of meijerg will be fast). There are also tons of other special functions which are likely hard to come by, lerchphi, hyper, polylog, and so on.

@mrocklin
Copy link
Member Author

But they should be aware that their call to integrate produced an unevaluated result. They should go and look for something. Maybe they convert it into a Sum and send it to Theano.sum or maybe we implement a QuadratureOp in Theano. Seems like a reasonable place for it.

Mainly the opinion that I'm voicing is that when we run into a not-very-well-handled case the user should be alerted. I think in the long run this will provide valuable feedback to us, the developers, so that we can implement quality solutions.

When working with sympy.stats I think I ran across the SciPy numerical quadrature routines. They call down to some battle-hardened Fortran code. Actually I think it'd be a pretty cool project to use SymPy to select good numerical quadrature routines. Many times when we can't compute integrals we can still compute derivatives which can be very useful for efficient adaptivity. I'll bet we could produce some pretty awesome results here.

@mrocklin
Copy link
Member Author

Wait, couldn't we describe numeric quadrature well with Sum? Given derivatives couldn't we describe an efficient numeric quadrature with known error bounds? This is an important numeric algorithm that we can derive and represent well entirely within SymPy.

@Krastanov
Copy link
Member

Here is a start of the wiki page https://github.com/sympy/sympy/wiki/Philosophy-of-Numerics-and-Code-Generation-in-SymPy

I'd not stress the "always" too much. For example it stops working once we hit a special function not supported by mpmath.

That is what I meant. lambdify fails on most input that is not arithmetics or elementary functions. experimental_lambdify defers to evalf in these cases and should not fail.

@Krastanov
Copy link
Member

Wait, couldn't we describe numeric quadrature well with Sum?

Yes (with some caveats), but this will still fail if the expression contains special functions not supported by SciPy/Theano/Fortran/etc.

My stance is that if the user (and we) should not expect magic (i.e. fast evaluation of all possible expressions). We have different tools for "all possible expression" and "fast evaluation". We will work to bridge the gap but given that we already cover the extremes I am rather happy.

@mrocklin
Copy link
Member Author

The quadrature bit is somewhat unrelated. I think it would make a cool project (and possibly small paper?) on its own. Computing integrals is harder than computing derivatives. Derivatives are useful for computing integrals, but just require bookkeeping. We're good at that; we can exploit this.

My stance is that if the user (and we) should not expect magic (i.e. fast evaluation of all possible expressions).

I'm not ready to give up on this as a goal yet. I think it can be rephrased in a more constructive and achievable manner. I think we need to create simple routes from specialized representations to general ones (e.g. Integral -> Sum via Quadrature). Ideally we have a few such routes and can select between them. Perhaps when people make new structures we ask them to provide routes to old structures (e.g. stats expressions -> Integral expressions).

We rarely stress this. Perhaps we should.

@raoulb
Copy link

raoulb commented Mar 19, 2013

That is what I meant. lambdify fails on most input that is not
arithmetics or elementary functions. experimental_lambdify defers
to evalf in these cases and should not fail.

Let me show an own use case: I use lambdify for getting potential definitions
of the form "V(x1,...,xn) = ..." ready for numerical evaluation (with numpy)
inside a physics simulation. This way my code can find out the derivatives.
It is important that it is really fast while allowing "generic" definitions.
But these definitions rarely contain anything more exotic than elementary
functions, maybe an "erf" or similar but nothing more fancy like unevaluated
integrals, sums, hyper or meijer function etc and let's not forget about limits ...

Summarized this is the "fast evaluation of a (small) subset of expressions"
case and deferring things to "evalf" is not an option.

@raoulb
Copy link

raoulb commented Mar 19, 2013

That is what I meant. lambdify fails on most input that is not
arithmetics or elementary functions. experimental_lambdify defers
to evalf in these cases and should not fail.

Maybe this discussion reflects in some parts why Mathematica has
"N[Integrate[..., x]]" and "NIntegrate[..., x]" and the same for
DSolve and a few others.

@Krastanov
Copy link
Member

@raoulb, I was unaware of these two different workflows in Mathematica. Could you mention them on the wiki when you have the time? https://github.com/sympy/sympy/wiki/Philosophy-of-Numerics-and-Code-Generation-in-SymPy

@jrioux
Copy link
Member

jrioux commented Mar 20, 2013

SymPy Bot Summary: ✳️ Passed after merging mrocklin/theano-print (3539973) into master (5125961).
✳️ PyPy 2.0.0-beta-1; 2.7.3-final-42: pass
✳️ Python 2.7.2-final-0: pass
✳️ Python 3.2.1-final-0: pass
✳️ Sphinx 1.1.3: pass
Docs build command: make clean && make html-errors && make latex && cd _build/latex && xelatex sympy-*.tex

@asmeurer
Copy link
Member

As I recall Maple also splits things like this, at least for differential equations.

@asmeurer
Copy link
Member

SymPy Bot Summary: ✳️ Passed after merging mrocklin/theano-print (3539973) into master (37a148e).
✳️ Python 2.5.6-final-0: pass
✳️ Python 2.6.8-final-0: pass
✳️ Python 2.7.3-final-0: pass
✳️ Python 3.2.3-final-0: pass
✳️ Sphinx 1.1.3: pass

@Krastanov
Copy link
Member

Is there anything blocking this. It look good for merging.

@mrocklin
Copy link
Member Author

I'm out of contact for a few days. please wait until I can take another look
On Mar 23, 2013 12:16 AM, "Stefan Krastanov" notifications@github.com
wrote:

Is there anything blocking this. It look good for merging.


Reply to this email directly or view it on GitHubhttps://github.com//pull/1905#issuecomment-15316145
.

@mrocklin
Copy link
Member Author

This isn't perfect but it's in a usable state. +1

@smichr
Copy link
Member

smichr commented Mar 27, 2013

OK, on the basis of Krastanov and Rocklin, I'll commit this.

smichr added a commit that referenced this pull request Mar 27, 2013
@smichr smichr merged commit c255786 into sympy:master Mar 27, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants