New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Leveraging SymPy for PyMC3 #178
Comments
Very interesting. Theano_Sympy doesn't provide examples that make it clear how one could take a calculation in sympy and then convert it to a theano calculation, so it's a little hard to evaluate. I'm not sure what he means when he says 'you have created some sort of symbolic algebra class structure (you add two pymc.Normal objects)'. Is that just x = Normal('x', 0,1) ? |
Yes, that is what I mean. |
Okay that makes sense. Right now, in pymc3, theano takes care of that stuff. The pretty printing etc does sound pretty nice. If I wanted to calculate the probability density of a Normal distribution given some value for the mean, variance and data value using the SymPy distribution but do that calculation in Theano, how would I do that? |
It looks like theano_sympy was designed to take Theano expressions to SymPy, simplify them, and then send them back again. This is not your application The function that seems most useful in your case is the aptly named Here is an example using sympy.stats and theano In [1]: from sympy.stats import *
In [2]: from sympy import Symbol, pprint, pi
In [3]: import theano
In [4]: X = Normal('x', 2, 3)
In [5]: x = Symbol('x')
In [6]: expr = density(X)(x)
In [7]: pprint(expr)
2
-(x - 2)
─────────
___ 18
╲╱ 2 ⋅ℯ
────────────────
___
6⋅╲╱ π
In [8]: from graph_translation import sympy_to_theano
In [9]: var_map = {'x': ('float64', (False, False))} # x is a rank 2 tensor of floats
In [10]: texpr = sympy_to_theano(expr, var_map) # oops, SymPy.pi doesn't have a Theano analog
KeyError: <class 'sympy.core.numbers.Pi'>
In [12]: expr = expr.subs(pi, pi.evalf()) # replace Pi with the float equivalent in SymPy
In [13]: texpr = sympy_to_theano(expr, var_map) # try this again
In [14]: theano.printing.debugprint(texpr)
Elemwise{mul,no_inplace} [@A] ''
|InplaceDimShuffle{x,x} [@B] ''
| |TensorConstant{0.0940315946937} [@C]
|InplaceDimShuffle{x,x} [@D] ''
| |Elemwise{pow,no_inplace} [@E] ''
| |TensorConstant{2} [@F]
| |TensorConstant{0} [@G]
|Elemwise{exp,no_inplace} [@H] ''
|Elemwise{mul,no_inplace} [@I] ''
|InplaceDimShuffle{x,x} [@J] ''
| |TensorConstant{-1} [@K]
|Elemwise{pow,no_inplace} [@L] ''
|Elemwise{add,no_inplace} [@M] ''
| |InplaceDimShuffle{x,x} [@N] ''
| | |TensorConstant{-2} [@O]
| |x [@P]
|InplaceDimShuffle{x,x} [@Q] ''
|TensorConstant{2} [@R] |
It looks like I can just say:
I'm curious whether that interacts well with things like indexing. For example if I say
Whether this will work well. |
In your example above is In SymPy |
Notes on my example above. You would want to extend the |
The example above doesn't really use SymPy for anything other than a database of distributions. The random variable formalism allows the creation of some interesting problems. |
I meant |
For this I would use a collection of Normals. They can be manipulated as normal symbolic variables to form products, tuples, etc.... >>> from sympy import *
>>> from sympy.stats import *
>>> xs = symbols('x:10')
>>> mus = symbols('mu:10')
>>> Xs = [Normal(x, mu, 1) for x, mu in zip(xs, mus)]
>>> pprint(simplify(E(sum(X**2 for X in Xs))))
2 2 2 2 2 2 2 2 2 2
μ₀ + μ₁ + μ₂ + μ₃ + μ₄ + μ₅ + μ₆ + μ₇ + μ₈ + μ₉ + 10 |
Or if you wanted the full pdf of the joint probability space you could access it as follows In [1]: from sympy.stats import *
In [2]: from sympy import *
In [4]: mus = symbols('mu:10', real=True)
In [5]: Xs = [Normal('x_%d'%i, mu, 1) for i, mu in enumerate(mus)]
In [6]: pprint(pspace(Tuple(*Xs)).pdf)
2 2 2 2 2 2 2 2 2 2
-(-μ₀ + x₀) -(-μ₁ + x₁) -(-μ₂ + x₂) -(-μ₃ + x₃) -(-μ₄ + x₄) -(-μ₅ + x₅) -(-μ₆ + x₆) -(-μ₇ + x₇) -(-μ₈ + x₈) -(-μ₉ + x₉)
──────────── ──────────── ──────────── ──────────── ──────────── ──────────── ──────────── ──────────── ──────────── ────────────
2 2 2 2 2 2 2 2 2 2
ℯ ⋅ℯ ⋅ℯ ⋅ℯ ⋅ℯ ⋅ℯ ⋅ℯ ⋅ℯ ⋅ℯ ⋅ℯ
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
5
32⋅π |
Okay, this makes pretty good sense to me. Looks like Sympy could be good as a source of distribution math, but probably not a good replacement for the symbolic algebra part since it's not geared to be numpy like, which I think is an important feature. So I think we'd probably use SymPy by
What are other people's thoughts? |
@jsalvatier I don't see where we would need 1. Couldn't the model specification consist of sympy symbols which then get translated to theano for computation? Also, I think there is a non-trivial probability that this approach will lead to some unexpected benefits further down the line. |
What numpy-like features do you need? SymPy has a dense Matrix type and a set of purely symbolic MatrixExpr types. Each of these are strictly rank 2 tensors. Support for higher rank tensors in SymPy exists but is poorly integrated. SymPy.stats had a multivariate branch a while back that generated matrix expressions. I could resurrect it if there was a definite need. If you're interested I would look at this blogpost I recommend skipping over the beginning and just scroll down to where the code starts. |
Perhaps you're right. I think the part that makes me nervous is that sympy wasn't designed to do computation, so there can be unknown pitfalls. For example, does sympy have a notion of multidimensional arrays? Does it do broadcasting? How about indexing and advanced indexing into vectors? Those seem pretty important to me. It's also adding another interface between packages, and thus more possibility for problems. Sympy computations need to convert pretty well into theano graphs. |
Definitely you're correct that trying to represent computations with SymPy is a recipe for disappointment. Rather, SymPy is good for representing mathematical models. In my experience there is usually a clear distinction between the model and the computation. Does such a distinction exist in your application? You're definitely also correct that adding SymPy will bring a bunch of issues with it. SymPy.stats really isn't battle-tested. I'm confident that if you go this route you'll expose bugs. I'm around to support those but it adds inertia to the development cycle. |
You're right that there is such a distinction, and I think the separation is pretty clean in pymc3 too. Perhaps a better way to say it is that often things like indexing and advanced indexing come in at the model level here. For example, you will want to specify things like group = [0,0,1,1,2, 0] Meaning that the mean of the prior distribution for y_i is the value of x_{group_i}. Often these are treated in a handwavy fashion in mathematics. For example: the notion of broadcasting is not a commonly discussed issue in math. So I would be somewhat surprised to find features like these to be well developed in sympy or to allow for easy implementation. I would love to be surprised though. |
Something along these lines does seem to exist in SymPy, but it doesn't look that well developed. |
Yeah, I would steer clear of Your intuition is correct that SymPy would not cleanly allow that syntax. You would have to resort to list comprehensions, map, for-loops, etc.... Fancy indexing has been on our issues list for matrices for a while though. It's certainly conceivable that we would support this in the future. If you spoke up on the mailing list saying that you needed it for your project it's likely that someone would implement it relatively soon. We're in GSoC application mode right now so we have a number of aspiring contributors looking to pounce on things like this. I wouldn't support adding a In [1]: from sympy.stats import *
In [2]: groups = [0, 0, 1, 1, 2, 0]
In [3]: numgroups = len(set(groups))
In [4]: x = Matrix(1, numgroups, lambda i,j: Normal('x_%d'%j, 0, 1))
In [5]: x
Out[5]: [x₀ x₁ x₂ x₃ x₄]
In [6]: y = Normal(x[1, groups], 1) # This fails due to lack of fancy indexing
IndexError: Invalid index a[[0, 0, 1, 1, 2, 0]] In general though if you switch to SymPy as an modeling input language your interface will change. You'll get a huge amount of general mathematical formalism but will lose the specific features you've cooked into your own language. For some of those features you can add them to SymPy or ask the SymPy community to add them for you. I'm sure that some language features would be difficult to support though. |
Hm, the array broadcasting is a real issue. One other idea: If we mainly just wanted to use sympy.stats for it's distributions. Would it be possible to borrow those and replace the sympy symbols with Theano ones (e.g. sympy.sin -> Theano.sin)? |
The |
If you just wanted the distributions you might want to skip the random variable and probability space formalism. Functions like |
With this approach we don't have to invest too much. We could even have some theano, some Sympy based distributions. |
@mrocklin I didn't find |
https://github.com/sympy/sympy/blob/master/sympy/stats/crv_types.py#L1632
If you want SymPy to do statistical simplifications for you then use Using >>> X = Normal('x', 0, 1)
>>> Y = Normal('y', 0, 1)
>>> Z = Normal('z', 0, 1)
>>> density(X**2 + Y**2 + Z**2)
ChiSquaredDistribution(3) But it would force you to use SymPy as your modeling language, which may not be appropriate. |
Oh, that's really neat! Although it doesn't seem to work on my install, the last call just returns a lambda with a nasty integral. Maybe a version issue (I'm running 0.7.2, this is sympy master branch I suppose?. Do you think this could figure out conjugate priors? For example:
Where the result is another Beta distribution with updated parameters. There is an explicit example at https://en.wikipedia.org/wiki/Conjugate_prior |
In principle, yes that should be solvable. In practice no, that is not yet solvable. I've never used sympy.stats with random parameters so this exposes some bugs which I'm now fixing. Obviously this is an important application though. In general sympy.stats transforms random expressions like what you have above into integrals. Sympy.integrate is generally surprisingly good at solving integrals. Of course, there are limits. There is nothing stopping us from generating the relevant integrals; I can't speak to what will happen after that. The statistical simplification is in a separate branch and not yet in master. |
OK, I see that it's not practical. But just to finish this line of thought. The trick with conjugate priors is that there is no integration required. Just simplifying the numerator and denominator one can see that this is again a beta distribution, but with different parameters. Not sure if that changes anything. |
Mixing finite and continuous RVs is a bit buggy. Here is an example with strictly normal random variables. >>> from sympy.stats import *
>>> from sympy import *
>>> mu = Normal('mu', 2, 3)
>>> x = Normal('x', mu, 1)
>>> y, z = Symbol('y'), Symbol('z')
>>> simplify(density(mu, Eq(x, y))(z)) # density of mu given x == y
2 2
9⋅y y 5⋅z 2⋅z 1
- ──── + y⋅z - ─ - ──── + ─── - ──
___ 20 5 9 9 45
╲╱ 5 ⋅ℯ
─────────────────────────────────────────
___
3⋅╲╱ π Internally it sets up and solves the following integral In [17]: simplify(density(mu, Eq(x, y), evaluate=False)(z)) # density of mu given x == y
Out[17]:
oo
/
|
| 2 2
| (x - z) (z - 2)
| - -------- - --------
| 2 18
| e *DiracDelta(x - y)
| ----------------------------------------------------------- dx
| oo oo
| / /
| | |
| | | 2 2
| | | (x - z) (z - 2)
| | | - -------- - --------
| | | 2 18
| | | e *DiracDelta(x - y)
| 6*pi* | | ---------------------------------------- dx dz
| | | 6*pi
| | |
| / /
| -oo -oo
|
/
-oo This is all in a branch. |
Is it correct to assume you meant to say "The denominator doesn't come into it" ? Otherwise I'm more confused than I thought :) I suspect that the |
Heh, yep. Looks like I was confused and not you! Yes simplifying either the log posterior (logp) or the derivative (dlogp) On Thu, Feb 21, 2013 at 4:38 PM, Matthew Rocklin
|
This is obviously not my field but I've been playing with this a while. Here is a problem with a Poisson distribution whose rate parameter is distributed with a Beta distribution. In this particular case it looks like dlogp is cheaper than logp. In [1]: from sympy import *
In [2]: from sympy.stats import *
In [3]:
In [3]: x = Symbol('x', positive=True)
In [4]: l = Symbol('lambda', positive=True)
In [5]:
In [5]: rate = Beta(l, 2, 3)
In [6]: X = Poisson(x, rate)
In [7]: numer = density(X, rate)(X.symbol) * density(rate)(rate.symbol)
In [8]: numer
Out[8]:
x 2 -λ
12⋅λ⋅λ ⋅(-λ + 1) ⋅ℯ
─────────────────────
x!
In [9]: log(numer)
Out[9]:
⎛ x 2 -λ⎞
⎜12⋅λ⋅λ ⋅(-λ + 1) ⋅ℯ ⎟
log⎜─────────────────────⎟
⎝ x! ⎠
In [10]: simplify(log(numer))
Out[10]:
⎛ 2 ⎞
⎜λ - 2⋅λ + 1⎟
-λ + x⋅log(λ) + log(λ) + log⎜────────────⎟ + log(12)
⎝ x! ⎠
In [11]: simplify(log(numer).diff(l))
Out[11]:
2
- λ + λ⋅x + 4⋅λ - x - 1
────────────────────────
λ⋅(λ - 1) If this is interesting at all let me know. If you can provide a more likely example (that doesn't include array broadcasting) let me know. |
It looks like the fully general solution to Poisson parametrized by a Beta is actually pretty simple In [25]: rate = Beta(l, a, b)
In [26]: X = Poisson(x, rate)
In [27]: numer = density(X, rate)(X.symbol) * density(rate)(rate.symbol)
In [28]: simplify(log(numer).diff(l))
Out[28]:
2
a⋅λ - a + b⋅λ - λ + λ⋅x - λ - x + 1
────────────────────────────────────
λ⋅(λ - 1)
In [29]: simplify(log(numer).diff(l)).count_ops()
Out[29]: 14
In [30]: print ccode(simplify(log(numer).diff(l)))
(a*lambda - a + b*lambda - pow(lambda, 2) + lambda*x - lambda - x + 1)/(lambda*(lambda - 1)) |
I suppose the question is which (implicit) solution Theano would come up with using autodiff. @jsalvatier is there any way to get the complexity of that for this case? |
I get the not very informative result of: 'Flatten{1}(Elemwise{Composite{[Composite{[Composite{[Composite{[Composite{[sub(add(i0, I'm working on graphing this On Fri, Feb 22, 2013 at 4:39 AM, Thomas Wiecki notifications@github.comwrote:
|
Try |
Also make sure that this comes after you've run all of the optimizations. Theano does simplification too, it just tends not to be quite as algebraically focused. The syntactically easiest way to do this it to get the fgraph object after it has been made into a function. from theano import *
x = tensor.matrix('x')
y = 3+ x**2 + x + 1
f = function([x], y)
theano.printing.debugprint(f.maker.fgraph) |
The awesomest solution would be to use SymPy to generate the algebraic result and simplify algebraically, then print to Theano and use Theano to optimize numerically (e.g. inplace operations, constant folding). They optimize in very different ways and it'd be great to have a practical example where they work together. This was the original motivation behind the |
That's actually what I was thinking too. I'm still having trouble with pydot :( |
You only need call pydotprint and debugprint like this:
Also, to disable the fusion of elemwise to see more clearly the operation done, you can use this theano flags: optimizer_excluding=local_elemwise_fusion |
Any development on this? If you provide comparison pymc code I can handle the Theano analysis side. |
Hey, I've been super busy, so I haven't done work on this, but I do have an example.
Gives and 'Flatten{1}(Elemwise{Composite{[Composite{[Composite{[Composite{[Composite{[sub(add(i0, i1), add(i2, i3))]}(Switch(i0, i1, i2), Switch(i3, i4, i2), Switch(i0, i5, i6), Switch(i3, i7, i2))]}(i0, true_div(i1, i2), i3, i4, true_div(i5, i2), i6, i7, true_div(i8, i9))]}(i0, i1, i2, i3, i4, add(i5, i6), i7, i8, add(i5, i9), sub(i7, i2))]}(Composite{[AND(i0, GT(i1, i2))]}(i0, i1, i2), i3, i1, i2, Composite{[Composite{[Composite{[Composite{[AND(AND(i0, i1), GT(i2, i3))]}(AND(i0, i1), GT(i2, i3), i4, i3)]}(AND(i0, i1), LE(i2, i0), i3, i4, i5)]}(i0, GE(i1, i2), i1, i3, i2, i4)]}(i0, i1, i2, i4, i5), i6, i4, i7, i8, i5)]}}(TensorConstant{1}, l, TensorConstant{0}, x, a, b, TensorConstant{-1.0}, TensorConstant{1.0}, TensorConstant{0.0}))' I can translate these into Theano if that's better. I'm thinking that this is complex enough that we probably wouldn't want to put this work into pymc3. If it's worth pursuing, I think it would be better to put it into Theano as better optimizations. That way other people can benefit anyway. |
I'm thinking again about the Theano-SymPy-Theano crossover, particularly about what's possible and at what level of generality it should be done. I think that there is value here but negotiating the best crossover point is complex. It's clear that, at least for this problem, a fair amount can be gained by involving sympy. The Theano result isn't very intuitive and appears to be a more complex solution. I assume that this complexity affects runtime. One option is to take the result you present above, translate it to SymPy, see what it can do, and then translate back. This is good because it's general, applies to all of Theano, and doesn't require any legwork by the pymc folks. It's bad though because the form presented here is fairly low-level, potentially limiting SymPy effectiveness. In general Theano representations are mathematically slightly lower-level than SymPy. It feels weird to travel up the abstraction stack rather than down. For this particular project I'm curious if I can start from the Normal, Beta, etc... layer of random variables. I wonder if it is reasonable to implement these as standard Theano Ops. That way there would be a clear high-level transition over to SymPy. It would also mean that there was a single point of truth for statistical information.
Ah, well, maybe it's just not installed properly
Any tips? |
I'm not sure why quadpotential is giving you a problem. I'll check that out. The second problems aren't related to the first I don't think. |
I've solved the second problem I think, can you update and try again? |
For the first problem, it's related to not having https://pypi.python.org/pypi/scikits.sparse . You could install that, or I'll fix it in an hour or two. |
|
I think I've fixed the problem with importing (it did something conditional on scikits.sparse being available). |
By the way I'm running on Ubuntu 12.04 with the Enthought Python Distribution 7.3.2 |
If I pip install numdifftools on my own I can import pymc cleanly. I rarely use setup.py. Does it handle dependencies? |
Yes it does, and it's already listed there. I'm not sure what would cause that.
(Maybe I should also list pandas since the examples that load external data require it) |
I had been using distutils to do the install, but I switched to using setuptools. Then if you tell pip to install from source, it seems to work better:
|
SymPy now has a fairly natural Theano printer which supports dimensionality. In [1]: from sympy.stats.crv_types import ExponentialDistribution
In [2]: from sympy.printing.theanocode import theano_code, theano
In [3]: from sympy import *
In [4]: rate = Symbol('lambda', positive=True)
In [5]: x = Symbol('x', real=True)
In [6]: ExponentialDistribution(rate) # This is a SymPy object
Out[6]: ExponentialDistribution(lambda)
In [7]: ExponentialDistribution(rate)(x) # This is a SymPy expression
Out[7]:
-λ⋅x
λ⋅ℯ
In [8]: theano_code(ExponentialDistribution(rate)(x)) # This is a Theano var
Out[8]: Elemwise{mul,no_inplace}.0
In [10]: theano_code(ExponentialDistribution(rate)(x), broadcastables={x: (False,), rate: (True,)}) # This is a Theano tensor var
Out[10]: Elemwise{mul,no_inplace}.0
In [11]: theano.printing.debugprint(_)
Elemwise{mul,no_inplace} [@A] ''
|lambda [@B]
|Elemwise{exp,no_inplace} [@C] ''
|Elemwise{mul,no_inplace} [@D] ''
|InplaceDimShuffle{x} [@E] ''
| |TensorConstant{-1} [@F]
|x [@G]
|lambda [@B] I'm not sure if this is of any use to you all (I suspect that you've moved beyond this idea). I'm not sure how this would be integrated into your system but it does supply a clean transition and opens up the possibility to use SymPy's simplification (both stats specific and general algebraic simplification). In [20]: simplify(log(ExponentialDistribution(rate)(x)))
Out[20]: -λ⋅x + log(λ)
In [28]: theano.printing.debugprint(theano_code(_, broadcastables={x: (False,), rate: (True,)}))
Elemwise{add,no_inplace} [@A] ''
|Elemwise{mul,no_inplace} [@B] ''
| |InplaceDimShuffle{x} [@C] ''
| | |TensorConstant{-1} [@D]
| |x [@E]
| |lambda [@F]
|Elemwise{log,no_inplace} [@G] ''
|lambda [@F] I recently wrote about the benefits of SymPy and Theano integration here If we can find a motivating use case I'd be to support it from the SymPy and Theano ends. |
This is pretty cool Matthew. I think we won't use this right now, but it might come in handy in the near future. |
I agree. It also seems like Theano is working on the Theano -> SymPy conversion which would be easier to use in pymc3 for e.g. better simplification (as your blog post clearly shows). It would be easy to enough to tie in SymPy on a per-need basis now that one can do the SymPy -> Theano conversion if, for example, someone requires a distribution that's only present in there. Maybe I can cook up an example. |
I don't know where you take that we are working on this. I just made a ticket to don't forget it. I didn't heard anyone telling he will work on that and I have other thing to do before. |
My bad. Let me restate that there is a chance this will be possible in Theano in the unspecified future. |
SymPy (http://sympy.org/en/index.html) is a Python library for symbolic mathematics.
My initial motivation for looking at SymPy resulted from #172 and #173. Instead of recoding all probability distributions, samplers etc in Theano, maybe we could just use the ones provided by sympy.stats (http://docs.sympy.org/dev/modules/stats.html).
For this to work we needed to convert the sympy computing graph to a theano one. It seems that there is some work that shows that this is possible (https://github.com/nouiz/theano_sympy)
Looking at sympy (and sympy.stats) more closely it seems that there are potentially more areas where integrating this could help. Maybe this would give the best of both worlds: "Theano focuses more on tensor expressions than Sympy, and has more machinery for compilation. Sympy has more sophisticated algebra rules and can handle a wider variety of mathematical operations (such as series, limits, and integrals)."
There is additional discussion here: nouiz/theano_sympy#1.
Copy pasting some chunks from @mrocklin response to move the discussion over here:
Overlap
There are some obvious points of overlap between the various projects
What is the relationship with statsmodels? They also have a home-grown internal algebraic system. My guess is that if everyone were to unite under one algebraic system there would be some pleasant efficiencies. I obviously have a bias about what that algebraic system should be :)
Derivatives
Both Theano and SymPy provide derivatives which, apparently, you need. SymPy provides analytic ones, Theano provides automatic ones. My suggestion would be to use SymPy if it works and fall back on Theano if it doesn't work. You don't need SymPy.stats for this (in case you didn't want to offload your distributions work.) SymPy.core would be just fine.
Other benefits
In general the benefits to using symbolic systems tend to be unexpected. SymPy can provide lots of general aesthetic fluff like awesome pretty printing, symbolic simplification, C/Fortran code snippet generation, etc....
The text was updated successfully, but these errors were encountered: