coding sprint plan 2020 #322
I'm just starting out on a two-month (with some interruptions) coding sprint on greta. There are a bunch of bugfixes, half-finished (and some half-baked) features, and some required refactoring that I have been putting off because 2019 was a pretty busy year. I'm hoping to make some headway on some of these major issues and features I've been promising for a while.
I'll keep chipping away at things on this list, which I will modify as I go. The list is just here for me to keep track of things to do, and for others to follow along and comment if there are features they are particularly interested in. The order here is neither in order of priority nor the order in which I intend to work on things. I'll release new versions as I go, but I'm not sure yet which features and fixes each release will contain.
During this work, I'll also be trying to keep on top of other issues that come up and the forum, which I've been neglecting over the Christmas break. Now's a good time to ask questions over there :)
1. Bugfixes & misc
This is a subset of the open issues that I'm particularly keen to fix, and have a plan for. This doesn't mean the other open issues aren't important as well, and I'll try to get to some of those too.
A number of things have changed in the interfaces to TF and TFP (particularly in moving to TF 2.0). greta currently still works with the compatibility functions, but some refactoring is needed to fully support these.
There's an incomplete branch
4. Sampling discrete variables
This has been on the to-do list for a long time. It will require a bit of refactoring and redesigning internals, but there's nothing about sampling of discrete random variables that should bee particularly tricky to implemnent.
Random independent sampling from a model object, optionally conditionally on fixed values or posterior samples, is a much-requested feature that needs a surprising amount of engineering in the background, and careful thinking about an intuitive interface. There's some existing work that just needs implementing, polishing up, testing and documenting, along with some examples of postreior predictive checks etc.
6. Continuous integration & TF versions
greta versions are now being tied to specific releases of TF and TFP. I was trying for a while not to do this, because I believe it's best practice not to be overly prescriptive about dependencies. However both TF and TFP are evolving fast and regularly introduce breaking changes. It would be good to catch these changes early with CI testing on the nightly releases of those dependencies.
The text was updated successfully, but these errors were encountered:
What an impressive and exciting list of features/fixes!
I just wanted to let you know that I currently have some time on my hands at the moment (at least until the start of March when I start my MSci at Melbourne) and would really like to contribute a bit more to greta.
Let me know if there is anything you would particularly like me to work on - happy to discuss!
Hi Jeffrey, that would be great! Good timing.
A couple of things spring to mind:
I've done most of 1.7 above, implemented on a branch called
I think I'll focus next on tasks 2 and 5, which is something you looked at previously. Thee way I'm planning to do that it should be possible to use the TFP distribution objects more naturally, calling on their IID sampling methods as well as their log densities etc. There may well be some distributions (e.g. greta's 'mixture' and 'joint' distributions) that will need IID sampling algorithms coded up - we should keep in touch about that in the relevant issues.
The simulation interface turned out to be a huge job, but merged into master now!
I don't know of any other statistical modelling software that lets you define the generative model once, and then enables IID sampling from the prior, sampling from the posterior, sampling of data conditional on the posterior (or on fixed values for parameters) and posterior prediction to new data. So I'm pretty pleased :)
Please feel free to take over the implementation - it's turned out that I haven't had as much time to work on greta as I had hoped...
I made some initial attempts but I think they are now out of date due to the simulation update etc.
I'd be happy to share ideas or otherwise collaborate if that would be helpful.
Concerning point 1.7 from above (class and methods from mcmc object), the Stan people are putting together a package to standardize the output of Bayesian models: https://mc-stan.org/posterior/, would it be an option to fix the output of mcmc to that to tap into the large resources that these guys are developing?