Theano computation output #1905

mrocklin · 2013-03-17T17:50:27Z

This transforms SymPy expressions into Theano computations. Theano can then be used to generate relatively high performance code. Perhaps this is an alternative to lambdify? Theano can use numpy or CUDA under the hood.

In [1]: expr = x**y + 1
In [2]: from sympy.printing.theanocode import theano_code
In [3]: output = theano_code(expr)
In [4]: inputs = map(theano_code, (x, y))

In [5]: from theano import function
In [6]: f = function(inputs, output)
In [7]: f(2, 3)
Out[7]: 9.0

https://github.com/nouiz/theano_sympy

Recreated Theano variables with the same inputs are not equal. Recreated SymPy variables with the same inputs are. We cache the transformation to preserve SymPy behavior

mrocklin · 2013-03-17T17:51:27Z

What is the best way to shield this code in case Theano isn't available on the system? For example, I fully expect travis to choke on this PR.

mrocklin · 2013-03-17T17:59:30Z

This work derives from a project I did with @nouiz
github.com/nouiz/theano_sympy

asmeurer · 2013-03-17T18:06:17Z

Checkout how the tests in external are done.

asmeurer · 2013-03-17T18:13:31Z

So, um, now that you know how the printers work...

mrocklin · 2013-03-17T18:25:52Z

So, um, now that you know how the printers work...

Ha! Maybe sometime, dot-printing isn't anywhere near the top of my stack though. In general though I still don't think that the printer system handles the dot situation well though. The output format isn't strictly hierarchical like all of our other printing systems. I don't think that they're a good fit.

mrocklin · 2013-03-17T19:29:04Z

SymPy Bot Summary: ✳️ Passed after merging mrocklin/theano-print (6d6c6ca) into master (33a926c).
✳️ Python 2.7.3-final-0: pass

jrioux · 2013-03-18T04:19:52Z

SymPy Bot Summary: 🔴 Failed after merging mrocklin/theano-print (6d6c6ca) into master (b498c0f).
@mrocklin: Please fix the test failures.
🔴 PyPy 2.0.0-beta-1; 2.7.3-final-42: fail
🔴 Python 2.7.2-final-0: fail
🔴 Python 3.2.1-final-0: fail
✳️ Sphinx 1.1.3: pass
Docs build command: make clean && make html-errors && make latex && cd _build/latex && xelatex sympy-*.tex

jrioux · 2013-03-19T13:01:38Z

SymPy Bot Summary: ✳️ Passed after merging mrocklin/theano-print (7ddd3f9) into master (592024a).
✳️ PyPy 2.0.0-beta-1; 2.7.3-final-42: pass
✳️ Python 2.7.2-final-0: pass
✳️ Python 3.2.1-final-0: pass
✳️ Sphinx 1.1.3: pass
Docs build command: make clean && make html-errors && make latex && cd _build/latex && xelatex sympy-*.tex

Krastanov · 2013-03-19T13:24:05Z

+1. I would be extremely happy to see a working and cleanly implemented lambdify. For the moment the original one is neither, while the experimental one might be working but is certainly not cleanly implemented.

However this PR will fail on stuff like Integral(sqrt(x**2+x+1),x) or bra*ket from quantum, right? Or even on LambertW...

The way this was "solved" in experimental_lambdify was to wrap the expression in evalf which returns uselessly high precision done in mpmath/sympy but at least returns. Is there a way for this to be done in theano (i.e. feeding it a blackbox function containing the evalf)?

mrocklin · 2013-03-19T15:11:15Z

We could create a SymPyOp in Theano that calls wraps up SymPy calls. They might also have a generic Python function wrapper; I'm not sure.

I wouldn't mind failing on these cases though. This encourages users to simplify their expressions as much as possible in SymPy before moving down to a computational layer.

E.g. "You're using Bras and Kets in numeric code? Maybe you should think about how those can be represented with vectors...."

A numeric approach to quantum mechanics needs to think about this at some point; I don't mind forcing the issue sooner rather than later during a profiling stage. I'm afraid that wrapping up SymPy code will leave people wondering "why isn't this working faster?"

I remember a conversation with you when you espoused the "garbage in, garbage out" philosophy. I think I usually take the opposite approach, requiring clean inputs.

Krastanov · 2013-03-19T16:05:42Z

I don't mind forcing the issue sooner rather than later

I do agree about that. I like "garbage in, garbage out" in cases when it simplifies the live of the developer and forces the user to understand the workflow (not always a selling point). I like the aforementioned "forcing of the issue" for similar reasons.

However, as we already started talking about that, I would like to mention a tangential issue. For use in the prototyping stage (e.g. heavy use of the plotting module) one needs a:

relatively fast evaluation
of all existing SymPy objects

Issue 2. is trivial if we use subs().evalf() in a loop over a python list of points. Issue 1. is relatively simple with translation to numpy/python.math/theano. Having both together results in compromises like experimental_lambdify. I would actually prefer to remove both lambdify and experimental_lambdify, and leave a very simplistic function that translates to numpy, without all the lambdify workarounds just like what is done here. For the plotting module we can use subs().evalf(). The obvious issue is that is about 10-100 times slower. Hopefully the world will switch overnight to a magical pypy-based interpreter and we will make this switch. :)

Anyway, this is not an issue in this PR. Again, I am +1 for merging.

mrocklin · 2013-03-19T16:11:31Z

Hrm, automated and robust use by things like the plotting module is a good point. I hadn't though of non-human use. Maybe nouiz's suggestion about raising a warning when using slow SymPy operations is a good idea.

In the particular case of plotting, do we want SymPy to depend on Theano? It's substantially rarer than numpy and I suspect that the compilation time wouldn't be sensible for rapid prototyping.

Krastanov · 2013-03-19T16:20:21Z

In the particular case of plotting, do we want SymPy to depend on Theano? It's substantially rarer than numpy and I suspect that the compilation time wouldn't be sensible for rapid prototyping.

It would be amusing to have the option to play with, but from practical/architectural/philosophical point of view I am against it. My reasons are slow compilation and a lot of added complexity. As I said, I actually prefer to see even lambdify removed and experimental_lambdify simplified.

Krastanov · 2013-03-19T16:22:29Z

By the way, some time ago you employed lambdify for the Monte Carlo sampling in the stats module, right? It raises the same issues as the plotting module:

lambdify - ugly and not always working
experimental_lambdify - ugly but usually working
subs.evalf - slow but simple and always working

mrocklin · 2013-03-19T16:22:34Z

Ever tried sympy.utilities.autowrap.ufuncify? Probably not as robust as you need but it gives me good results and seems simple (although I haven't dived too deeply.)

Krastanov · 2013-03-19T16:23:46Z

I have forgotten about autowrap. Wow, this needs a wiki or a docs page...

mrocklin · 2013-03-19T16:24:03Z

Yes, I think my approach was

try:
   return lambdify(...)
except:
    return subs(...)

raoulb · 2013-03-19T16:29:46Z

lambdify - ugly and not always working

experimental_lambdify - ugly but usually working

subs.evalf - slow but simple and always working

I'd not stress the "always" too much. For example it stops
working once we hit a special function not supported by mpmath.

I'd have to search one but I remember that it happened.

asmeurer · 2013-03-19T17:07:13Z

Wow, this needs a wiki or a docs page...

Yes, the code generation/translation from SymPy to numerical has grown into several modules. This definitely needs to be all documented nicely, especially since this is such a big use-case for SymPy.

I'd not stress the "always" too much. For example it stops
working once we hit a special function not supported by mpmath.

Maybe not a case of exactly that, but issue 3706 is an example of the printer breaking because evalf breaks.

But you probably know more then me if this case will happen frequently enough to justify to implement it.

The example that @Krastanov gave of an Integral is in general very slow (on the order of half a minute to compute a single value). I think we would be better off shipping off to a compiled numerical integrator in that case. @mrocklin's argument that the user should try to simplify things isn't really helpful here, though. Yes, obviously the user should call integrate() instead of Integrate, and if he gets a closed-form solution, it will be orders of magnitude faster to evaluate than the integral itself. But, powerful as they are, our integrators are limited, and so in real life you are always going to come across an integral that SymPy can't do, sometimes because the algorithms aren't good enough, and sometimes just because there is no nice closed-form answer, even in terms of special functions. Furthermore, if you do get an answer, there's a good chance that it will be in terms of special functions. In the most general case with the meijerg algorithm, you will get a Meijer G-function, again, sometimes because hyperexpand isn't good enough yet, but also sometimes because there really isn't a better special function to represent it. Google seems to suggest that the only Python numerical implementation of this is in mpmath (note that unlike Integral, numerical evaluation of meijerg will be fast). There are also tons of other special functions which are likely hard to come by, lerchphi, hyper, polylog, and so on.

mrocklin · 2013-03-19T17:18:07Z

But they should be aware that their call to integrate produced an unevaluated result. They should go and look for something. Maybe they convert it into a Sum and send it to Theano.sum or maybe we implement a QuadratureOp in Theano. Seems like a reasonable place for it.

Mainly the opinion that I'm voicing is that when we run into a not-very-well-handled case the user should be alerted. I think in the long run this will provide valuable feedback to us, the developers, so that we can implement quality solutions.

When working with sympy.stats I think I ran across the SciPy numerical quadrature routines. They call down to some battle-hardened Fortran code. Actually I think it'd be a pretty cool project to use SymPy to select good numerical quadrature routines. Many times when we can't compute integrals we can still compute derivatives which can be very useful for efficient adaptivity. I'll bet we could produce some pretty awesome results here.

mrocklin · 2013-03-19T17:22:41Z

Wait, couldn't we describe numeric quadrature well with Sum? Given derivatives couldn't we describe an efficient numeric quadrature with known error bounds? This is an important numeric algorithm that we can derive and represent well entirely within SymPy.

Krastanov · 2013-03-19T18:27:28Z

Here is a start of the wiki page https://github.com/sympy/sympy/wiki/Philosophy-of-Numerics-and-Code-Generation-in-SymPy

I'd not stress the "always" too much. For example it stops working once we hit a special function not supported by mpmath.

That is what I meant. lambdify fails on most input that is not arithmetics or elementary functions. experimental_lambdify defers to evalf in these cases and should not fail.

Krastanov · 2013-03-19T18:32:17Z

Wait, couldn't we describe numeric quadrature well with Sum?

Yes (with some caveats), but this will still fail if the expression contains special functions not supported by SciPy/Theano/Fortran/etc.

My stance is that if the user (and we) should not expect magic (i.e. fast evaluation of all possible expressions). We have different tools for "all possible expression" and "fast evaluation". We will work to bridge the gap but given that we already cover the extremes I am rather happy.

mrocklin · 2013-03-19T18:49:10Z

The quadrature bit is somewhat unrelated. I think it would make a cool project (and possibly small paper?) on its own. Computing integrals is harder than computing derivatives. Derivatives are useful for computing integrals, but just require bookkeeping. We're good at that; we can exploit this.

My stance is that if the user (and we) should not expect magic (i.e. fast evaluation of all possible expressions).

I'm not ready to give up on this as a goal yet. I think it can be rephrased in a more constructive and achievable manner. I think we need to create simple routes from specialized representations to general ones (e.g. Integral -> Sum via Quadrature). Ideally we have a few such routes and can select between them. Perhaps when people make new structures we ask them to provide routes to old structures (e.g. stats expressions -> Integral expressions).

We rarely stress this. Perhaps we should.

raoulb · 2013-03-19T19:47:02Z

That is what I meant. lambdify fails on most input that is not
arithmetics or elementary functions. experimental_lambdify defers
to evalf in these cases and should not fail.

Let me show an own use case: I use lambdify for getting potential definitions
of the form "V(x1,...,xn) = ..." ready for numerical evaluation (with numpy)
inside a physics simulation. This way my code can find out the derivatives.
It is important that it is really fast while allowing "generic" definitions.
But these definitions rarely contain anything more exotic than elementary
functions, maybe an "erf" or similar but nothing more fancy like unevaluated
integrals, sums, hyper or meijer function etc and let's not forget about limits ...

Summarized this is the "fast evaluation of a (small) subset of expressions"
case and deferring things to "evalf" is not an option.

raoulb · 2013-03-19T19:48:43Z

That is what I meant. lambdify fails on most input that is not
arithmetics or elementary functions. experimental_lambdify defers
to evalf in these cases and should not fail.

Maybe this discussion reflects in some parts why Mathematica has
"N[Integrate[..., x]]" and "NIntegrate[..., x]" and the same for
DSolve and a few others.

Krastanov · 2013-03-19T20:14:19Z

@raoulb, I was unaware of these two different workflows in Mathematica. Could you mention them on the wiki when you have the time? https://github.com/sympy/sympy/wiki/Philosophy-of-Numerics-and-Code-Generation-in-SymPy

jrioux · 2013-03-20T02:21:47Z

SymPy Bot Summary: ✳️ Passed after merging mrocklin/theano-print (3539973) into master (5125961).
✳️ PyPy 2.0.0-beta-1; 2.7.3-final-42: pass
✳️ Python 2.7.2-final-0: pass
✳️ Python 3.2.1-final-0: pass
✳️ Sphinx 1.1.3: pass
Docs build command: make clean && make html-errors && make latex && cd _build/latex && xelatex sympy-*.tex

asmeurer · 2013-03-20T03:55:41Z

As I recall Maple also splits things like this, at least for differential equations.

asmeurer · 2013-03-20T10:36:10Z

SymPy Bot Summary: ✳️ Passed after merging mrocklin/theano-print (3539973) into master (37a148e).
✳️ Python 2.5.6-final-0: pass
✳️ Python 2.6.8-final-0: pass
✳️ Python 2.7.3-final-0: pass
✳️ Python 3.2.3-final-0: pass
✳️ Sphinx 1.1.3: pass

Krastanov · 2013-03-22T19:16:17Z

Is there anything blocking this. It look good for merging.

mrocklin · 2013-03-23T16:33:16Z

I'm out of contact for a few days. please wait until I can take another look
On Mar 23, 2013 12:16 AM, "Stefan Krastanov" notifications@github.com
wrote:

Is there anything blocking this. It look good for merging.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/1905#issuecomment-15316145
.

mrocklin · 2013-03-26T14:36:36Z

This isn't perfect but it's in a usable state. +1

smichr · 2013-03-27T16:29:57Z

OK, on the basis of Krastanov and Rocklin, I'll commit this.

Theano computation output

mrocklin added 6 commits March 17, 2013 09:41

Print theano symbols

bb21dcf

print exprs and numbers - add mapping from

dd67dc1

https://github.com/nouiz/theano_sympy

dtype testing

19ca416

theano printing for matrix symbol and matmul

fb4e96f

cache functions that create theano variables

5ba5fcc

Recreated Theano variables with the same inputs are not equal. Recreated SymPy variables with the same inputs are. We cache the transformation to preserve SymPy behavior

add MatAdd and HadamardProduct matrix operations

3b5d8d2

mrocklin added 5 commits March 17, 2013 11:36

Safely import theano in test_theanocode

3d4c8d7

s -> ts in theanocode

a82c1f7

Theano: _print_MatrixExpr = _print_Expr

e7aed2e

_print_Expr -> _print_Basic

fe34e1a

add matrix inverse and trace

6d6c6ca

mrocklin added 9 commits March 18, 2013 09:25

tensor/scalar theano function

34a325c

support Rationals in theano printing

114c116

dim_handling supports alternative keys

0def7f6

theano prints integers as python ints

3d2d4c8

sympy.factorial -> theano.gamma

09ef839

theano printing: Transpose

1accbf4

theanocode.py guards theano import

f2ee4cd

dict comprehensions -> dict((k,v) for ...)

0f135c4

move errant theano code inside safezone

7ddd3f9

mrocklin added 2 commits March 19, 2013 08:51

test theano_function

35da710

add Derivative/gradient

605f515

remove excess newline

3539973

smichr added a commit that referenced this pull request Mar 27, 2013

Merge pull request #1905 from mrocklin/theano-print

c255786

Theano computation output

smichr merged commit c255786 into sympy:master Mar 27, 2013

mrocklin mentioned this pull request Mar 29, 2013

Current status? nouiz/theano_sympy#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Theano computation output #1905

Theano computation output #1905

mrocklin commented Mar 17, 2013

mrocklin commented Mar 17, 2013

mrocklin commented Mar 17, 2013

asmeurer commented Mar 17, 2013

asmeurer commented Mar 17, 2013

mrocklin commented Mar 17, 2013

mrocklin commented Mar 17, 2013

jrioux commented Mar 18, 2013

jrioux commented Mar 19, 2013

Krastanov commented Mar 19, 2013

mrocklin commented Mar 19, 2013

Krastanov commented Mar 19, 2013

mrocklin commented Mar 19, 2013

Krastanov commented Mar 19, 2013

Krastanov commented Mar 19, 2013

mrocklin commented Mar 19, 2013

Krastanov commented Mar 19, 2013

mrocklin commented Mar 19, 2013

raoulb commented Mar 19, 2013

asmeurer commented Mar 19, 2013

mrocklin commented Mar 19, 2013

mrocklin commented Mar 19, 2013

Krastanov commented Mar 19, 2013

Krastanov commented Mar 19, 2013

mrocklin commented Mar 19, 2013

raoulb commented Mar 19, 2013

raoulb commented Mar 19, 2013

Krastanov commented Mar 19, 2013

jrioux commented Mar 20, 2013

asmeurer commented Mar 20, 2013

asmeurer commented Mar 20, 2013

Krastanov commented Mar 22, 2013

mrocklin commented Mar 23, 2013

mrocklin commented Mar 26, 2013

smichr commented Mar 27, 2013

Theano computation output #1905

Theano computation output #1905

Conversation

mrocklin commented Mar 17, 2013

mrocklin commented Mar 17, 2013

mrocklin commented Mar 17, 2013

asmeurer commented Mar 17, 2013

asmeurer commented Mar 17, 2013

mrocklin commented Mar 17, 2013

mrocklin commented Mar 17, 2013

jrioux commented Mar 18, 2013

jrioux commented Mar 19, 2013

Krastanov commented Mar 19, 2013

mrocklin commented Mar 19, 2013

Krastanov commented Mar 19, 2013

mrocklin commented Mar 19, 2013

Krastanov commented Mar 19, 2013

Krastanov commented Mar 19, 2013

mrocklin commented Mar 19, 2013

Krastanov commented Mar 19, 2013

mrocklin commented Mar 19, 2013

raoulb commented Mar 19, 2013

asmeurer commented Mar 19, 2013

mrocklin commented Mar 19, 2013

mrocklin commented Mar 19, 2013

Krastanov commented Mar 19, 2013

Krastanov commented Mar 19, 2013

mrocklin commented Mar 19, 2013

raoulb commented Mar 19, 2013

raoulb commented Mar 19, 2013

Krastanov commented Mar 19, 2013

jrioux commented Mar 20, 2013

asmeurer commented Mar 20, 2013

asmeurer commented Mar 20, 2013

Krastanov commented Mar 22, 2013

mrocklin commented Mar 23, 2013

mrocklin commented Mar 26, 2013

smichr commented Mar 27, 2013