-
-
Notifications
You must be signed in to change notification settings - Fork 2k
-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model specification syntax #189
Comments
I like this example. We can just test whether something is a point or a matrix/vector and then try to make the inference. My only reluctance is that computations built using hessian() seem to be fairly slow to compile. Perhaps the solution to that is to cache the compiled versions. |
Possible usability improvements to hmc_step:
|
Maybe post this to the mailing list? |
We could probably get rid of the sampler states or at least make them an internal property. My notion was that it would be good to explicitly pass around the sampler state, but I don't think we've seen that pay off, and it does add an extra return parameter to sample, and make CompoundStep more complex. |
I'd love to be part of this discussion, but I think I'll need to get up to speed to contribute anything useful. First question: what is "d2logpc"? |
It generates the Hessian of the model log-probability for use in the Hamiltonian Monte Carlo step method. It is something I am suggesting could be automated. |
This is very exciting stuff. As best as I can understand @fonnesbeck 's suggestion makes sense. I'm sorry again for my ignorance, but how will deterministics and potentials fit into this framework? I like the PyMC2 framework, but I hope to come around to PyMC3 as well. Progress is good. |
Because stochastics are based on Theano tensor objects, you essentially get deterministics for free because you can perform arithmetic operations and transformations on them (see the probabilities in the logistic model example as a simple example of this). What we don't have (yet) are traces for these deterministics. Factor potentials are not available yet; we need to add this to the list. |
Gents,
looks less clean to me than
which is unambiguous to anyone familiar with Bayes notation. I suppose this would be my wish, to keep as much as possible in notation style and automate the rest, with the option for the flexibility we have come to love. Even the numpy normal used to generate the data is One thing that might help the chattering classes is to get a rough idea of the challenges or ways in which the syntax needs to change; people might have better ideas if they know the constraints. |
Hey @mamacneil , thanks for helping out. That's something I've struggled with. I definitely agree it would be good to have the First is that there is no clear ownership of random variables created in this way. There is no way for a model to say 'I should be tracking these random variables'. This is not a major issue, but does mean you have to manually tell the model which random variables it should care about. Second, this syntax sort of ties you to one way of creating a random variable. It becomes significantly more complex to write a Third, the syntax sort of confuses the difference between a distribution and a random variable. Right now, when you call Overall these points don't seem that strong to me, so I'll think about whether there's a way to overcome them. |
Maybe we don't have to break the existing way which I kinda like but just add a way of passing a list of variables to the init method of the Model class. |
One way to think of |
Possible alternative syntaxes:
Thoughts? |
Personally, I'm not a fan of any of them (compared to the current syntax). It's true that ideally we'd have 'x ~ Normal()' as in BUGS. But these packages have model specification and other code well separated which we can't. pymc2 solved this by requiring you to pass all random variables to MCMC(). The problem is that there is not real sense of a model other than a list of random variables. I think pymc3 improves on this by having the whole model in one place. I guess what I'm saying is that the logic @jsalvatier outlined above is a good motivation to stick with the current interface. It's a deviation from what pymc2 was but so is everything else. |
I also am still attached to the I don't understand the other two points in comment #189 (comment) by @jsalvatier, though, so I may be missing something here. Perhaps more details will help me see. What is a TransformedVar? My first impression is that this sounds like information that should go into a StepMethod, not a Stochastic. What is the distinction about distributions and random variables you are making here? Is this about the composition of distributions, i.e. a Normal distribution composed with a truncation operator yields a truncated normal distribution? It seems like you've put a lot of thought into this, so maybe some use cases/user stories would help to bring the rest of us along. |
I'll try to supply some more detail tomorrow evening. Lets say you have a scale variable I like your idea of having that transformation step go into a step method, not the model. That does seem cleaner. We'd have to come up with a way to do that though. Re: distributions vs. random variables. Yes, truncation operators are the sort of thing I have in mind. Sometimes you want to manipulate the distribution itself. Another example, you might want to find the expectation/median/mode/variance of a distribution as distinct from finding the expectation of a random variable (conditioning on other things). |
@aflaxman The main notable use of different functions for adding random variables has been TransformedVar, so if that could effectively be moved to a post model building step, I think that would be a strong argument for doing so, and then using simpler syntax. Does the part about distributions not being the same random variables make sense now? There are a couple of use cases: truncation/bounding, linear transforms, finding properties (mean/mode/variance) of a distribution, and using them as building blocks (say ZeroInflatedPoisson out of Poisson and Bernoulli or GaussianRandomWalk out of Normal). |
@fonnesbeck Do you think it would be advisable to support both syntaxes?
I've been thinking about it and I don't think it would be too challenging. However, it seems like it could make the package significantly more confusing. |
As another ill-informed guinea pig on this project, I like the use of Var because it's clear what's general about the call: the call will create a new random variable, the first parameter for a random variable is its name, then the underlying distribution. If you made a variable with a different distribution, you'd still use Var but your second parameter would change, by syntax not just convention. However I do consider these decisions to be mostly aesthetic, and I have no idea if my perspective is typical. |
Here's my current idea: It turns out that Python
The result being that |
I like the with statement -- very pythonic. The ~Normal syntax strikes me a bit odd though. Do we need it? |
Yeah I find it weird to have "TransformedVar" and also ~Normal, but I guess like I said I don't really understand the problem with Var. |
Interesting, . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . On 27/03/2013, at 2:44 AM, John Salvatier notifications@github.com wrote:
|
I think that the
|
I'm growing skeptical of these syntactic changes, but I'll try to keep an open mind. For example, I don't see why the proposed approach to transformations of
is preferable to the current solution in PyMC2, which seems simpler and cleaner to me:
I would like to see as clean a separation as possible between model and step method, so if log_tau is not needed for the model, I propose something akin to
Regarding distributions not being the same random variables, maybe there is room for a middle-ground solution here, too. For example, if you could say
in the InverseGamma stochastic above, would that meet your needs for calculating moments? The approach PyMC2 has with InverseGamma matched to inverse_gamma_like, etc, has worked for simple building blocks for me so far. Maybe there is a way to extend this so that it works for you, too. |
The issue I see with directly transforming step methods directly like in
|
Two suggestions:
|
Personally, I think the upper/lower case is too similar syntax wise. I do like the second option. Could be even more explicit and do NormalVar and NormalDist. |
A third option would be to have something like |
I don't really like the 'different names for variables and dists' approach. Seems like making a random variable should be a function. I have a version working where Another possibility which I personally like is |
I really like your first implementation. I suppose the second one would make sense when you would want to use the same distribution to generate several random variables:
Though this may be more confusing than convenient. |
Yes, On 2013-04-03, at 12:44 PM, John Salvatier wrote:
|
New implementation is finished: see examples http://nbviewer.ipython.org/urls/raw.github.com/pymc-devs/pymc/pymc3/examples/stochastic_volatility.ipynb and http://nbviewer.ipython.org/urls/raw.github.com/pymc-devs/pymc/pymc3/examples/tutorial.ipynb (the text doesn't reflect the changes yet). I've also made the functions, find_MAP, find_hessian, sample and so forth take a model parameter, which you do not need to supply if within a model context (with statement). I like this change, but not committed to it, so let me know if you like it. Syntax suggestions still welcome, but I'm feeling pretty good about this. |
That syntax looks great, @jsalvatier! A few questions: with model:
sigma, log_sigma = model.TransformedVar('sigma', Exponential(1./.02, testval = .1),
logtransform) Is the Similarly: start = find_MAP(model, vars = [s], fmin = optimize.fmin_l_bfgs_b) |
The first one is, but the second one is not (I should remove that one). Currently those model parameters are basically default parameters that can go up in front. The machinery is in "withcontext" here https://github.com/pymc-devs/pymc/blob/pymc3/pymc/model.py. I wonder if this is worth it, perhaps it would be best to just have them be regular arguments with defaults (and the default to be whatever is in the context). This would look a little less pretty, but would mean there's less funny business that can cause problems in the future. Basically, should the explicit syntax look like
or
The first one is doing exactly what it looks like it's doing, the second one has some funny business going on: it checks to see if the first arg is a 'Model' and if so, does nothing, if it's not it gets the model context and inserts it into the argument list. |
I'd say definitely the first one then. Less funny business = good. ;) |
I've implemented the first, like you say, less funny business is good. With that, I'm pretty happy with the syntax. |
Yeah, I saw the updated syntax and code. I agree that it looks great now. It's a nice combination of pymc2 flexibility and BUGS-style model specification. Also, I imagine the with syntax to be the main form of using pymc3 so I think it's fine to have a model kwarg since it's mostly used by people who know what they are doing. |
I thought it would be worth discussing how models are specified in PyMC3, and revisit whether it represents the best user experience that we can offer. I'm coming to terms with the fact that PyMC3 will not look like PyMC2, but I think we should offer something analogously clean and clear, since so many of the user base will be scientists first and Python programmers second.
One example is the following:
I'm wondering if, for example, all the business with H cannot happen automagically inside of
hmc_step
, so that we only need to do:The secret to making model specification clean is separating the "what" from the "how" and hiding as much of the "how" as possible. Not so say, of course, that we limit the flexibility of what users can do with PyMC, but only exposing the complexity when it is needed. An example of this from PyMC2 is that the selection of step methods is done automatically, so it appears nowhere in the user specification of the model, but at any time I can go in and call
MCMC.use_step_method()
to override the choice.Let's use this thread to discuss any potential improvements to model specification in PyMC3. I'm hoping others aside from just @jsalvatier and I will participate! What works, what doesn't, what can be improved?
@twiecki
The text was updated successfully, but these errors were encountered: