Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model specification syntax #189

Closed
fonnesbeck opened this issue Mar 20, 2013 · 38 comments
Closed

Model specification syntax #189

fonnesbeck opened this issue Mar 20, 2013 · 38 comments

Comments

@fonnesbeck
Copy link
Member

I thought it would be worth discussing how models are specified in PyMC3, and revisit whether it represents the best user experience that we can offer. I'm coming to terms with the fact that PyMC3 will not look like PyMC2, but I think we should offer something analogously clean and clear, since so many of the user base will be scientists first and Python programmers second.

One example is the following:

H = model.d2logpc()

s = find_MAP(model)

step = hmc_step(model, model.vars, H(s))

I'm wondering if, for example, all the business with H cannot happen automagically inside of hmc_step, so that we only need to do:

s = find_MAP(model)

step = hmc_step(model, model.vars, s)

The secret to making model specification clean is separating the "what" from the "how" and hiding as much of the "how" as possible. Not so say, of course, that we limit the flexibility of what users can do with PyMC, but only exposing the complexity when it is needed. An example of this from PyMC2 is that the selection of step methods is done automatically, so it appears nowhere in the user specification of the model, but at any time I can go in and call MCMC.use_step_method() to override the choice.

Let's use this thread to discuss any potential improvements to model specification in PyMC3. I'm hoping others aside from just @jsalvatier and I will participate! What works, what doesn't, what can be improved?

@twiecki

@jsalvatier
Copy link
Member

I like this example. We can just test whether something is a point or a matrix/vector and then try to make the inference. My only reluctance is that computations built using hessian() seem to be fairly slow to compile. Perhaps the solution to that is to cache the compiled versions.

@jsalvatier
Copy link
Member

Possible usability improvements to hmc_step:

  • vars argument a default value of using all the continuous parameters
  • use model.test_point or the MAP as the default point to use in finding the scaling matrix as you suggest above, so users wouldn't even have to give any value for C.

@jsalvatier
Copy link
Member

Maybe post this to the mailing list?

@jsalvatier
Copy link
Member

We could probably get rid of the sampler states or at least make them an internal property. My notion was that it would be good to explicitly pass around the sampler state, but I don't think we've seen that pay off, and it does add an extra return parameter to sample, and make CompoundStep more complex.

@aflaxman
Copy link
Contributor

I'd love to be part of this discussion, but I think I'll need to get up to speed to contribute anything useful. First question: what is "d2logpc"?

@fonnesbeck
Copy link
Member Author

It generates the Hessian of the model log-probability for use in the Hamiltonian Monte Carlo step method. It is something I am suggesting could be automated.

@aflaxman
Copy link
Contributor

This is very exciting stuff. As best as I can understand @fonnesbeck 's suggestion makes sense.

I'm sorry again for my ignorance, but how will deterministics and potentials fit into this framework?

I like the PyMC2 framework, but I hope to come around to PyMC3 as well. Progress is good.

@fonnesbeck
Copy link
Member Author

Because stochastics are based on Theano tensor objects, you essentially get deterministics for free because you can perform arithmetic operations and transformations on them (see the probabilities in the logistic model example as a simple example of this). What we don't have (yet) are traces for these deterministics. Factor potentials are not available yet; we need to add this to the list.

@mamacneil
Copy link

Gents,
As the guinea pig target audience (i.e. a user not a programmer) I am keen to pitch in and will likely push for more automation rather than less. In looking at the logistic model example:

effects = Var('effects', Normal(mu = 0, tau = 2.**-2), (1, npred))

looks less clean to me than

effects = Normal('effects', mu = 0, tau = 2.**-2, value=(1, npred))

which is unambiguous to anyone familiar with Bayes notation. I suppose this would be my wish, to keep as much as possible in notation style and automate the rest, with the option for the flexibility we have come to love. Even the numpy normal used to generate the data is xtrue = normal(scale = 2., size = 1), which makes sense to, what I suspect is a major target audience, those fed up with WinBUGS.

One thing that might help the chattering classes is to get a rough idea of the challenges or ways in which the syntax needs to change; people might have better ideas if they know the constraints.

@jsalvatier
Copy link
Member

Hey @mamacneil , thanks for helping out.

That's something I've struggled with. I definitely agree it would be good to have the x = Normal(...) syntax. But there are a few issues I've run into.

First is that there is no clear ownership of random variables created in this way. There is no way for a model to say 'I should be tracking these random variables'. This is not a major issue, but does mean you have to manually tell the model which random variables it should care about.

Second, this syntax sort of ties you to one way of creating a random variable. It becomes significantly more complex to write a TransformedVar function (which specifies creates a random variable in terms of one space, but allows you to sample in another space). It maybe be possible to keep this kind of functionality, but it would definitely be more complicated.

Third, the syntax sort of confuses the difference between a distribution and a random variable. Right now, when you call Normal(mu, tau) you create a distribution, while Var (and other functions) create a random variable. The x = Normal('x',... ) syntax doesn't leave a way for you to manipulate distributions themselves which can be useful (truncating distributions, applying transformations, finding the expectation etc.).

Overall these points don't seem that strong to me, so I'll think about whether there's a way to overcome them.

@twiecki
Copy link
Member

twiecki commented Mar 22, 2013

Maybe we don't have to break the existing way which I kinda like but just add a way of passing a list of variables to the init method of the Model class.

@jsalvatier
Copy link
Member

One way to think of Var and its variants is as the ~ bit in the x ~ Normal(mu, tau) syntax.

@jsalvatier
Copy link
Member

Possible alternative syntaxes:

syntax comment
1 x = Var('x', Normal(mu, tau, dtype, shape)) This is just a variation on the current syntax, where we make dtype and shape a property of distributions.
2 x = Var('x') << Normal(mu, tau, dtype, shape)
3 x = ~Normal('x', mu, tau, dtype, shape) This one would require later manual collection; something like model = Model([x,y,obervations])
4 x = model << Normal('x', mu, tau, dtype, shape)
5 x = 'x' << Normal(mu, tau, dtype, shape)

Thoughts?

@twiecki
Copy link
Member

twiecki commented Mar 24, 2013

Personally, I'm not a fan of any of them (compared to the current syntax).

It's true that ideally we'd have 'x ~ Normal()' as in BUGS. But these packages have model specification and other code well separated which we can't. pymc2 solved this by requiring you to pass all random variables to MCMC(). The problem is that there is not real sense of a model other than a list of random variables. I think pymc3 improves on this by having the whole model in one place.

I guess what I'm saying is that the logic @jsalvatier outlined above is a good motivation to stick with the current interface. It's a deviation from what pymc2 was but so is everything else.

@aflaxman
Copy link
Contributor

I also am still attached to the x = Normal('x', mu, tau) approach in PyMC2, and I don't mind making a dict of all the nodes in a model.

I don't understand the other two points in comment #189 (comment) by @jsalvatier, though, so I may be missing something here. Perhaps more details will help me see.

What is a TransformedVar? My first impression is that this sounds like information that should go into a StepMethod, not a Stochastic.

What is the distinction about distributions and random variables you are making here? Is this about the composition of distributions, i.e. a Normal distribution composed with a truncation operator yields a truncated normal distribution?

It seems like you've put a lot of thought into this, so maybe some use cases/user stories would help to bring the rest of us along.

@jsalvatier
Copy link
Member

I'll try to supply some more detail tomorrow evening.

Lets say you have a scale variable tau, which has an InvGamma prior distribution: tau ~ InvGamma(.01, .01), which translates to tau = Var('tau', InvGamma(.01, .01)). However, you think that it would be easier to sample from the posterior if tau was in log space. You want the likelihood function is parameterized by log_tau = log(tau), not tau. This translates to log_tau, tau = TransformedVar('tau', InvGamma(.01, .01), logtransform).

I like your idea of having that transformation step go into a step method, not the model. That does seem cleaner. We'd have to come up with a way to do that though.

Re: distributions vs. random variables. Yes, truncation operators are the sort of thing I have in mind. Sometimes you want to manipulate the distribution itself. Another example, you might want to find the expectation/median/mode/variance of a distribution as distinct from finding the expectation of a random variable (conditioning on other things).

@jsalvatier
Copy link
Member

@aflaxman The main notable use of different functions for adding random variables has been TransformedVar, so if that could effectively be moved to a post model building step, I think that would be a strong argument for doing so, and then using simpler syntax.

Does the part about distributions not being the same random variables make sense now? There are a couple of use cases: truncation/bounding, linear transforms, finding properties (mean/mode/variance) of a distribution, and using them as building blocks (say ZeroInflatedPoisson out of Poisson and Bernoulli or GaussianRandomWalk out of Normal).

@jsalvatier
Copy link
Member

@fonnesbeck Do you think it would be advisable to support both syntaxes?

x = Var('x', Normal(mu, tau, dtype, shape)) and
x = ~Normal('x', mu, tau, dtype, shape)

I've been thinking about it and I don't think it would be too challenging. However, it seems like it could make the package significantly more confusing.

@johnmcdonnell
Copy link
Contributor

As another ill-informed guinea pig on this project, I like the use of Var because it's clear what's general about the call: the call will create a new random variable, the first parameter for a random variable is its name, then the underlying distribution. If you made a variable with a different distribution, you'd still use Var but your second parameter would change, by syntax not just convention.

However I do consider these decisions to be mostly aesthetic, and I have no idea if my perspective is typical.

@jsalvatier
Copy link
Member

Here's my current idea: It turns out that Python with statements can be used to do context management. We can use a with statement to set a model as the 'currently active model' globally and functions that implicitly refer to a model will use this model behind the scenes.

with Model() as model: 
    x = ~Normal('x', 0, 1, shape, dtype)
    # same as 
    #x = Var('x', Normal(0, 1, shape, dtype))
    # which will also be supported
    log_tau, tau = TransformedVar('tau', InvGamma(.01, .01), logtransform)
...

with statements work well with nesting, so it would work fine to say

with Model() as model1: 
     x = ~Normal('x', 0, 1, shape, dtype)
     with Model() as model2: 
          y = ~Normal('y', 0, 1, shape, dtype)
     z = ~Normal('z', 0, 1, shape, dtype)

The result being that model1 will have 2 variables x and z and model2 will have 1 variable y.

@twiecki
Copy link
Member

twiecki commented Mar 27, 2013

I like the with statement -- very pythonic.

The ~Normal syntax strikes me a bit odd though. Do we need it?

@johnmcdonnell
Copy link
Contributor

Yeah I find it weird to have "TransformedVar" and also ~Normal, but I guess like I said I don't really understand the problem with Var.

@mamacneil
Copy link

Interesting,
I agree that this is much improved over the Var statements and that we don't need the tilde at all (they're not in there now). If having the with statement is the major change in the syntax I think that could work. I will say pushing to keep as near the current syntax as possible should be the goal - it's damn good in my opinion - but a single line to initiate things isn't much of a burden.
A

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

On 27/03/2013, at 2:44 AM, John Salvatier notifications@github.com wrote:

Here's my current idea: It turns out that Python with statements can be used to do context management. We can use a with statement to set a model as the 'currently active model' globally and functions that implicitly refer to a model will use this model behind the scenes.

with Model() as model:
x = ~Normal('x', 0, 1, shape, dtype)
# same as
#x = Var('x', Normal(0, 1, shape, dtype))
# which will also be supported
log_tau, tau = TransformedVar('tau', InvGamma(.01, .01), logtransform)
...
with statements work well with nesting, so it would work fine to say

with Model() as model1:
x = ~Normal('x', 0, 1, shape, dtype)
with Model() as model2:
y = ~Normal('y', 0, 1, shape, dtype)
z = ~Normal('z', 0, 1, shape, dtype)
The result being that model1 will have 2 variables x and z and model2 will have 1 variable y.


Reply to this email directly or view it on GitHub.

@jsalvatier
Copy link
Member

I think that the ~ is important actually. The ~ lets us distinguish between distributions and random variables.

~ acts as a constructor for a random variable, basically acting like Var, except that it's a method on the distribution. What I have in mind is that x = Normal('x', mu,tau, shape, dtype) constructs a normal distribution with a particular shape, data type, (and a name, 'x', that will be used if the distribution is used to construct a distribution but not otherwise). x.__invert__ (used to implement the ~ operator) constructs a random variable with x as its conditional distribution and adds it to the model in the current context.

@aflaxman
Copy link
Contributor

I'm growing skeptical of these syntactic changes, but I'll try to keep an open mind. For example, I don't see why the proposed approach to transformations of

log_tau, tau = TransformedVar('tau', InvGamma(.01, .01), logtransform)

is preferable to the current solution in PyMC2, which seems simpler and cleaner to me:

tau = InverseGamma('tau', .01, .01)
log_tau = log(tau)

I would like to see as clean a separation as possible between model and step method, so if log_tau is not needed for the model, I propose something akin to

mcmc.use_step_method(LogTransform(Metropolis), tau)

Regarding distributions not being the same random variables, maybe there is room for a middle-ground solution here, too. For example, if you could say

tau.logp(parameter_value)

in the InverseGamma stochastic above, would that meet your needs for calculating moments? The approach PyMC2 has with InverseGamma matched to inverse_gamma_like, etc, has worked for simple building blocks for me so far. Maybe there is a way to extend this so that it works for you, too.

@jsalvatier
Copy link
Member

  1. It's because those solve different problems, TransformedVar puts the
    actual random variable though which you can do sampling into log space. You
    can't tell a step method to sample the log_tau variable in pymc2 because
    it's a deterministic. I do agree that your suggestion of moving the
    TransformedVar functionality to be handled by step methods or some other
    post model building step would be best. Like you say, its cleaner to
    separate model and sampling issues as much as possible. I intend to move
    the TransformedVar in pymc3 to some post model building stage at some
    point, however, I'm not sure how to do it yet (I haven't spent time
    thinking about it)

The issue I see with directly transforming step methods directly like in mcmc.use_step_method(LogTransform(Metropolis), tau) is that in pymc3 step
methods commonly do block updating, and you often will want to transform
just one variable. I'm sure something along these lines is workable though.

  1. Re: distributions/random variables: Unfortunately that doesn't satisfy
    me. For example, this doesn't work cleanly with truncating distributions.
    Random variables and distributions are different concepts and should be
    different types. Having the nice x = Normal('x', mu, tau, ...) syntax is
    valuable, though, so its worth having some limited magic in order to have
    it work nicely. Coming up with the right magic is the tricky part. Some
    options:
  • The ~ operator turns a distribution into a random variable. Seems
    unpopular.
  • If there is a current model context when Normal() is called with a
    name or possibly given the observed keyword then it will create an random
    variable, otherwise a distribution. If there's no current model context
    then passing a name or observed keyword will throw an error.

@fonnesbeck
Copy link
Member Author

Two suggestions:

  • how about using lower case for distributions and upper case for random variables? This would allow for:

    x = Var('x', normal(mu, tau, dtype, shape))
    

    as well as:

    x = Normal('x', mu, tau, dtype, shape)
    

    this would both honor the difference between a distribution and a random variable and maintain a natural syntax if we prefer it.

  • alternately, we could concatenate the distribution name to Var for random variables:

    x = NormalVar('x', mu, tau, dtype, shape)
    

@twiecki
Copy link
Member

twiecki commented Apr 3, 2013

Personally, I think the upper/lower case is too similar syntax wise.

I do like the second option. Could be even more explicit and do NormalVar and NormalDist.

@fonnesbeck
Copy link
Member Author

A third option would be to have something like Normal for the random variable and normal_dist for the distribution, which is analogous with PyMC2's x_like convention for log-likelihoods.

@jsalvatier
Copy link
Member

I don't really like the 'different names for variables and dists' approach. Seems like making a random variable should be a function.

I have a version working where Normal('x', mu, tau, shape) produces a random variable (and attaches it to the current model, throwing an error if there is no current model) and Normal(mu, tau, shape) produces a distribution. I'll post it soon.

Another possibility which I personally like is x = Normal(mu, tau, shape)('x'). Normal() produces a distribution, and call on a distribution produces a random variable.

@fonnesbeck
Copy link
Member Author

I really like your first implementation.

I suppose the second one would make sense when you would want to use the same distribution to generate several random variables:

normal_prior = Normal(0, 0.0001)
betas = [normal_prior(i) for i in ('intercept', 'age_effect', 'treatment_effect')]

Though this may be more confusing than convenient.

@mamacneil
Copy link

Yes,
I like the first implementation also - clear, simple, and close to current function on the surface.
A

On 2013-04-03, at 12:44 PM, John Salvatier wrote:

I'm don't really like the 'different names for variables and dists' approach, seems clunky. I have a version working where Normal('x', mu, tau, shape) produces a random variable (and attaches it to the current model, throwing an error if there is no current model) and Normal(mu, tau, shape) produces a distribution.

Another possibility which I personally like is x = Normal(mu, tau, shape)('x'). Normal() produces a distribution, and call on a distribution produces a random variable.


Reply to this email directly or view it on GitHub.

@jsalvatier
Copy link
Member

New implementation is finished: see examples http://nbviewer.ipython.org/urls/raw.github.com/pymc-devs/pymc/pymc3/examples/stochastic_volatility.ipynb and http://nbviewer.ipython.org/urls/raw.github.com/pymc-devs/pymc/pymc3/examples/tutorial.ipynb (the text doesn't reflect the changes yet).

I've also made the functions, find_MAP, find_hessian, sample and so forth take a model parameter, which you do not need to supply if within a model context (with statement). I like this change, but not committed to it, so let me know if you like it.

Syntax suggestions still welcome, but I'm feeling pretty good about this.

@twiecki
Copy link
Member

twiecki commented Apr 26, 2013

That syntax looks great, @jsalvatier!

A few questions:

with model: 
    sigma, log_sigma = model.TransformedVar('sigma', Exponential(1./.02, testval = .1),
                 logtransform)

Is the model.TransformedVar required as opposed to just TransformedVar as the other calls?

Similarly:

start = find_MAP(model, vars = [s], fmin = optimize.fmin_l_bfgs_b)

@jsalvatier
Copy link
Member

The first one is, but the second one is not (I should remove that one).

Currently those model parameters are basically default parameters that can go up in front. The machinery is in "withcontext" here https://github.com/pymc-devs/pymc/blob/pymc3/pymc/model.py. I wonder if this is worth it, perhaps it would be best to just have them be regular arguments with defaults (and the default to be whatever is in the context). This would look a little less pretty, but would mean there's less funny business that can cause problems in the future.

Basically, should the explicit syntax look like

start = find_MAP([s], model = model)

or

start = find_MAP(model, [s])

The first one is doing exactly what it looks like it's doing, the second one has some funny business going on: it checks to see if the first arg is a 'Model' and if so, does nothing, if it's not it gets the model context and inserts it into the argument list.

@twiecki
Copy link
Member

twiecki commented Apr 26, 2013

I'd say definitely the first one then. Less funny business = good. ;)

@jsalvatier
Copy link
Member

I've implemented the first, like you say, less funny business is good.

With that, I'm pretty happy with the syntax.

@twiecki
Copy link
Member

twiecki commented Apr 29, 2013

Yeah, I saw the updated syntax and code. I agree that it looks great now. It's a nice combination of pymc2 flexibility and BUGS-style model specification. Also, I imagine the with syntax to be the main form of using pymc3 so I think it's fine to have a model kwarg since it's mostly used by people who know what they are doing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants