Interested in contributing - where to start #67

tblazina · 2021-01-27T17:21:31Z

tblazina
Jan 27, 2021

Hi, I'd be interest in contributing but am fairly new to PPLs and to Bayesian stats (i.e. have read statistical rethinking and worked through a number of the end of chapter problems and have used pymc3 and numpyro a bit for personal/work things), so I'm a bit skeptical about what I can contribute but would see contributing as a great way to dive deeper into these topics and contribute to some OSS. The only issues I see marked as "good first issue" are the call for use cases and examples and the one with tests. I'd be happy to start with the tests.

I'd maybe have a few more questions:

I've read through the /design_notes/mcx_design.md but I'm wondering where I could start with reading a bit more so as to be a bit more useful (e.g. I've never dealt with abstract syntax tree in python, I guess this would be something very fundamental to read up on) - as is often the case when learning new stuff its pretty easy to get overwhelmed with what I don't know yet 😄
In the docs its mentioned that there is inspiration drawn from PyMC3, Tensorflow Probability, Numpyro etc., but I'm wondering - what exactly would be the value add for mcx, or rather what would it be adding/doing differently than all these other python PPLs? Is it simply, as stated in the /design_notes/mcx_design.md, that it is treating the whole model definition part as the program and the sampling procedure as the "compiler", or is there something I've missed?
Is this the correct way/place to discuss such questions, I've actually never seen the Discussions feature in Github 🤷‍♂️

Cheers!

rlouf · 2021-01-29T16:24:30Z

rlouf
Jan 29, 2021
Maintainer

Welcome :) I've never used Github discussions before but this looks like the perfect place to discuss of things that are not directly related to the code.

Indeed tests and examples are very useful at this stage. Tests on distributions (especially regarding the shapes) are important, but also regression tests for the language and the API. Regarding examples, I started implementing examples from Statistical Rethinking and have my eyes on Bayesian methods for hackers as well.

My advice would be to read the code in core/parser.py in the compiler-refactor branch (which will shortly be merged) and keep a tab with the documentation of libCST open so you can lookup everything you do not understand. Also make sure that you understand the logic of the add method in core/graph.py. Once you've understood that you've understood the philosophy of MCX's core.
That is an excellent question; I haven't worked on a sales pitch yet so let me give it a try. Unlike TFP and Numpyro, MCX is declarative and converts models into a graph representation, which roughly looks like the corresponding graphical model and can be modified at runtime. This allows a bunch of optimizations that would otherwise be difficult to implement, such as collapsing conjugate distributions into one distribution. It also means that MCX is causal inference-friendly as we can intervene on random variables after the model has been defined.
Unlike PyMC3 data are not bound to the model when it is defined; as a result it is simpler to share models and re-use them. You can use models as distributions in other models, which increases modularity; this allows to gather higher-level abstractions (like the Horseshoe prior) in a library. You could very easily implement a scikit-learn like library with MCX, see this draft for an example.
Finally the language is simple, it follows the mathematical notations very closely. It is not the strongest argument in favor, but I find it quite pleasant practically aesthetically. It is also not bound to any particular backend: MCX generates python code for the samplers and the logpdf; it is possible to make it work with distributions written with, say, PyTorch and run inference with algorithms written with PyTorch. We'll be moving forward with JAX for inference, but the language part is more general than this.

Tests are a great way to start contributing. I also added a few other suggestions in this issue, in particular MCX could do with a Mixture distribution :) Since you are discovering the codebase, having your input for the (outdated) documentation would also be great; I am too used to my own code to realize what is hard to understand!

2 replies

tblazina Jan 29, 2021
Author

Thanks for the quick reply and the thorough answer, this indeed does seem like a good place to have such discussions not related to code/issues.

Thanks for the pointers, i'll fork the repo and have a look at this branch and the code in core/parser.py and the add method in core/graph.py and will try to read through and understand the docs of libCST to try to start getting some understanding of the AST stuff, i'm sure i'll have lots of simple (dumb? 🤷‍♂️ ) questions relating to this.
This sounds like a decent sales pitch to me! Unfortunately I can't say I really understand the design pattern/choices for these other PPLs, but all of the points you make seem reasonable to me. I do like the goal of more explicitly tying the model language to a graphical model and the following of mathematical notations closely, as these (from my limited experience) seem to be common ways Bayesian/PP is taught. I think this is sort of what pyro/numpyro had in mind with the whole use of the plate primitives, but I'm too new to all this to really decipher a lot of the documentation for pyro/numpyro.

So, I'll go ahead and fork the repo and try to see where I can start. Thanks again for the great answer!

rlouf Jan 29, 2021
Maintainer

You're welcome! If you take notes while trying to understand the core we could turn it into documentation for contributors 🙂

tblazina · 2021-01-29T17:37:18Z

tblazina
Jan 29, 2021
Author

Sure that's a great idea! I usually write notes in markdown when I'm learning/thinking through stuff so if I can organize them together in any usable/coherent fashion that would easy to add as documentation.

8 replies

rlouf Feb 1, 2021
Maintainer

You're right, I'll spend more time on documentation in the near future.

tblazina Feb 1, 2021
Author

Oh I'm also happy to also make a PR with small documentation changes. Was hoping to have a bit more time this weekend to read through the code but I have a 3.5 month old baby at home so, it's a bit tricky sometimes to find time to concentrate 😅

rlouf Feb 1, 2021
Maintainer

Don’t be hard on yourself, I barely had enough energy to take a shower when my sons were 3 months old :)

Documentation is super valuable, that’s the only thing we write here that will ever be read by users! PR on documentation are more than welcome.

tblazina Feb 9, 2021
Author

Alright - finally found some (disjointed) hours between naps to sit down and read through in core/parser.py in the compiler-refactor branch. I did this by simply stepping through a simple linear regression taken from the docs:

@mcx.model
def linear_regression(x, lambda=1.):
    sigma <~ dist.Exponential(lambda)
    coeffs_init = jnp.ones(x.shape[-1])
    coeffs <~ dist.Normal(coeffs_init, sigma)
    y = jnp.dot(x, coeffs)
    predictions <~ dist.Normal(y, sigma)
    return predictions

and then stepping through everything with a debugger. As I mentioned I kept notes in a markdown, which i'd be happy to somehow add some part of to the docs if you think it would be useful, it's a bit pedantic and skips over some of the finer details of how things are handled so I'm not sure how useful it is.

In any case, I had a question which I think is fairly fundamental that I'm not quite understanding:

During the model is parsing I'm not sure if I really understand the mcx.core.sample_joint function and the mcx.core.sample function, I get that the mcx.core.sample function replaces the SampleOp nodes in the graph with ones that actually draw samples from the distributions, such that you can get prior predictive samples simply by calling the model function which the @mcx.model decorator wraps. And the mcx.core.sample_joint function is pretty much doing the same thing as the mcx.core.sample function, it's just returning a dictionary with the parameters (i.e. random variables) and samples from those priors? Is that correct?

Another question - In the sample method of the model class, and in the call method which is used when you call, for example, linear_regression(rng_key=rng_key, x=some_data), I'm not seeing exactly where the sample size is defined? In the above example, it seems to inherit the shape from whatever is passed for x. I see on the master branch the mcx.sample() method has a num_samples parameter, was this removed/left out for a particular reason?

Hope this isn't too many questions!

tblazina Feb 9, 2021
Author

Rather than making a MR or pushing this stuff to a branch i'll just post it here, let me know if any part of this would be of interest to add to documentation - if so i'll make a MR. Until then I'll try to pick through some open issues.

What happens behind the scenes when we wrap a function with the @mcx.model decorator?

Let us use the example of:

@mcx.model
def linear_regression(x, lambda=1.):
    sigma <~ dist.Exponential(lambda)
    coeffs_init = jnp.ones(x.shape[-1])
    coeffs <~ dist.Normal(coeffs_init, sigma)
    y = jnp.dot(x, coeffs)
    predictions <~ dist.Normal(y, sigma)
    return predictions

First the function is given to the model class found in mcx/model.py. Upon instantiation of this class the code within the function that the @mcx.model decrator wraps is parsed via the parse function found in mcx/core/parser.py.

Model parsing

The parse() function turns the function code into a string using the Python standard library function insepect.getsource(), which is then dedented, and parsed into a Abstract Syntax Tree/Concrete Syntax Tree (abbrevaited AST or CST) using the LibCST library, and defined as the tree variable. Next the global namespace is set as the variable namespace, and this is then used to instantiate the ModelDefinitionParser also found in mcx/core/parser.py The ModelDefinitionParser is a subclass of the cst.CSTVisitor, which is a "... low-level base visitor class for traversing a CST", meaning that it contains methods that operate on the nodes in the CST while it is traversing the tree. "Traversal of any parsed tree directly matches the order that tokens appear in the source which was parsed." For more information on the traversal order see: https://libcst.readthedocs.io/en/latest/visitors.html#traversal-order

What the linear_regression function looks like in AST form is:

The CST tree variable defined above has a .visit method which takes a cst.CSTVisitor object. We pass the ModelDefinitionParser class above to the tree.visit method and the CST we defined above is traversed and we build up the GraphicalModel object which is contained in mcx.core.graph, which is ultimately what we will use to perform inference.

the `GraphicalModel` object

The first node in the CST that is visited is the function definition via the visit_FunctionDef method of the ModelDefinitionParser class. In this class we are setting the graph attribute of the ModelDefinitionParser class to be a GraphicalModel object which is a subclass of a NetworkX DiGraph (directed graph with self loops).

We first set the graph.namespace attribute of the GraphicalModel to be the same as that of the ModelDefinitionParser (i.e. the __global__ namespace). It is important to note that each of these visit_* methods that are part of the cst.CSTVisitor base class take a node argument which contains various information about the node that is currently being visited. For example, the next attribute we set for the GraphicalModel object is the graph.name which comes from the node.name.value value. We similarily set the scope attribute of the ModelDefinitionParser to be equal to the node.name.value value.

We then iterate through the parameters of the linear_regression function, in our example x and lambda, and add them as nodes in our GraphicalModel using the graph.add() method. If there are default values set for any of the parameters, for example as we have done in our example for lambda = 1, then these are passed to the graph.add method. What is actually being passed to the graph.add method is a PlaceHolder object which is found in mcx.core.nodes and which is a " a named node whose shape and value is not known until execution"

After going through the function parameters and adding them to the GraphicalModel, the rest of the function body is parsed. This again is done by traversing the CST of the code within the body of the function, and recusively visiting all the child nodes belonging to code on the right hand side of the expressions using the ModelDefinitionParser.recursive_visit method. For example in sigma <~ dist.Exponential(lambda) line, all nodes in the CST after the <~ operator would be visited. The currently implemented control flow is in the ModelDefinitionParser.recursive_visit method handles the following cases/operations:

Placeholders and named ops, for example in our sigma <~ dist.Exponential(lambda) line, the sigma would be a Placeholder
Function calls, for example in our sigma <~ dist.Exponential(lambda) line, the dist.Exponential(lambda) would be a function call which again needs to be parsed into the dist, Exponential and lambda parts.
Slices and subscripts, for example in our coeffs_init = jnp.ones(x.shape[-1]) line, the x.shape would be a subscript, and the [-1] would be a slice
Binary and Uniary operators, for example in our coeffs_init = jnp.ones(x.shape[-1]) line, the - within the [-1] slice would be a unary operator. A + or - within the slice would be considered a binary operator.

Finally, if none some cases arise, the ModelDefinitionParser.recursive_visit method throws an TypeError and tells the user to report it as an issue. This means that some syntax has not been anticipated and perhaps needs to be incorporated into the code.

Defining the model probability density function (PDF)

After the code within the model function wrapped by @mcx.model has been parsed and transformed into a GraphicalModel object, the next step is to define the function which computes the log probability. This is also done within the instantation of the model object called when a function is wrapped by @mcx.model, and is defined using the mcx.core.logpdf function.

the `logpdf` function

This function is located in mcx/core/target_functions.py. First this function creates a deep copy of the GraphicalModel object and then passes this to the _logpdf_core function also located in mcx/core/target_functions.py. This function "Transform[s] the SampleOps to statements that compute the logpdf associated with the variables' values". What the function does is to loop through each of the nodes in the graph which are random variables (i.e. those which have been parsed as SampleOps or SampleModelOp nodes in the GraphicalModel), and "Transform[s] the SampleOps from, [for example], a <~ Normal(0, 1) into lopdf_a = Normal(0, 1).logpdf_sum(a). These transformed nodes are then added to the GraphicalModel` and the original nodes/edges are removed.

We now have a GraphicalModel object with the random variable SampleOp nodes transformed into the logpdf_sum versions. What the logpdf function next does is to create a list of these random variable Nodes, and then add a new node in the GraphicalModel graph which is a sum of all of these random variable nodes. Finally the GraphicalModel object is passed to the compile_graph function found in mcx/core/compiler.py, which actually compiles the graph into a python executable function.

the `compile_graph` function

This function takes the GraphicalModel object and parses the model arguments in the following order:

(samplers only) rng_key;
(logpdf only) random variables, in the order in which they appear in the model.
(all) the model's arguments and keyword arguments.

This is done by creating a list of of all the model arguments which currently are all PlaceHolder nodes in the graph. "Every statement in the function corresponds to either a constant definition or a variable assignment." In orer to get the dependency order in the graph, the networkx.topological_sort function is used. A topological sort is a nonunique permutation of the nodes such that an edge from u to v implies that u appears before v in the topological sort order. The networkx.topological_sort function is a generator function, meaning that it can be looped through and will yield nodes from the graph in the topologically sorted order. This is done and the statements and returns are saved into two separate lists, respectively. These statements are then combined back into a CST with the translated nodes as such:

    ast_fn = cst.Module(
        body=[
            cst.FunctionDef(
                name=cst.Name(value=fn_name),
                params=cst.Parameters(params=args),
                body=cst.IndentedBlock(body=stmts + returns),
            )
        ]
    )

The code attribute of this ast_fn (ast_fn.code) variable is a string that ends up looking like:

"def linear_regression_logpdf(x, sigma, coeffs, predictions, lmbda = 1.):\n logpdf_linear_regression_sigma = dist.Exponential(lmbda).logpdf_sum(sigma)\n y = jnp.dot(x, coeffs)\n logpdf_linear_regression_predictions = dist.Normal(y, sigma).logpdf_sum(predictions)\n coeffs_init = jnp.ones(x.shape[-1])\n logpdf_linear_regression_coeffs = dist.Normal(coeffs_init, sigma).logpdf_sum(coeffs)\n logpdf = logpdf_linear_regression_coeffs + logpdf_linear_regression_predictions + logpdf_linear_regression_sigma\n return logpdf\n"

or more familiarily...

def linear_regression_logpdf(x, sigma, coeffs, predictions, lmbda = 1.):
    logpdf_linear_regression_sigma = dist.Exponential(lmbda).logpdf_sum(sigma)
    y = jnp.dot(x, coeffs)
    logpdf_linear_regression_predictions = dist.Normal(y, sigma).logpdf_sum(predictions)
    coeffs_init = jnp.ones(x.shape[-1])
    logpdf_linear_regression_coeffs = dist.Normal(coeffs_init, sigma).logpdf_sum(coeffs)
    logpdf = logpdf_linear_regression_coeffs + logpdf_linear_regression_predictions + logpdf_linear_regression_sigma
    return logpdf

This code is then executed in the global namespace:

    code = ast_fn.code
    exec(code, namespace)
    fn = namespace[fn_name]

and finally the compile_graph function returns the fn and code variables.

Defining a sample joint function

Now our model object which, again, we are instantiating when we wrap our linear_regression function in @mcx.model, now has a probability density function which we can sample from. The next step is to define a function to sample the joint probability distribution. This is done by calling mcx.core.sample_joint.

The `sample_joint` function

This function is located in mcx/core/target_functions.py. It starts like the logpdf function described above by creating a deep copy of the GraphicalModel object. Similar to the logpdf function, this function will loop through all the nodes in the graph, and replace the random variable nodes (i.e. the SampleOp and SampleModelOp nodes) with samples from those random variables. So it "Update[s] the SampleOps to return a sample from the distribution, [for example] a <~ Normal(0, 1) becomes a = Normal(0, 1).sample(rng_key). An additional thing this code is doing is adding a node for the rng_key (i.e. the random seed used in sampling, see jax docs here for more information) into the graph.

What you end up with at the end of this function is a tuple node containing dictionary of the random variable names and their associated SampleOp nodes in the graph, which again, are now samples drawn from the distributions. This Tuple node is given the name of 'forward_samples', because you can think of this of simply generating samples from your model. This tuple node is then added into the GraphicalModel and the graph is sent as a parameter, as in the logpdf function, to the compile_graph function described above where it is compiled back into an executable python function which returns forward samples and looks like:

def linear_regression_sample_forward(rng_key, x, lmbda = 1.):
    sigma = dist.Exponential(lmbda).sample(rng_key)
    coeffs_init = jnp.ones(x.shape[-1])
    coeffs = dist.Normal(coeffs_init, sigma).sample(rng_key)
    y = jnp.dot(x, coeffs)
    predictions = dist.Normal(y, sigma).sample(rng_key)
    forward_samples = {'sigma': sigma, 'coeffs': coeffs, 'predictions': predictions}
    return forward_samples

Defining a call function for executing sampling

Now that model object has a function for taking joint samples, the last step is to define a function which actually draws the samples. This is done using the mcx.core.sample function which is called again during instantiation of the model object on the model object (i.e. on self).

The `sample` function

This function is also located in mcx/core/target_functions.py. As with the sample_joint and logpdf function, the function starts by creating a deep copy of the GraphicalModel object. Similar to the sample_joint function above this function loops through all the nodes in the graph, and replace the random variable nodes (i.e. the SampleOp and SampleModelOp nodes) with samples from those random variables. So it "Update[s] the SampleOps to return a sample from the distribution, [for example] a <~ Normal(0, 1) becomes a = Normal(0, 1).sample(rng_key). Also similar to the sample_joint function, it adds a node to the graph for the rng_key and adds edges to the random variable nodes in the graph. Unlike the sample_joint function, it does not define a tuple node and add it to the graph, but simply sends the graph, with the transformed random variable nodes to the compile_graph function where it is compiled back into an executable python function which returns samples and looks like:

def linear_regression_sample(rng_key, x, lmbda = 1.):
    sigma = dist.Exponential(lmbda).sample(rng_key)
    coeffs_init = jnp.ones(x.shape[-1])
    coeffs = dist.Normal(coeffs_init, sigma).sample(rng_key)
    y = jnp.dot(x, coeffs)
    predictions = dist.Normal(y, sigma).sample(rng_key)
    return predictions

Note that this is where we are actually doing sampling from the priors of the model, this is what is called 'prior predictive sampling' in Richard Mcelreath's Statistical Rethinking book.

Tying it all together

Now what we have is a mcx model function which we can:

a) use for prior predictive sampling by simply passing some data, for example x_data, to the linear_regression function:

x_data = np.random.normal(0, 5, size=(1000,1))
linear_regression(rng_key=jax.random.PRNGKey(0), x=x_data)

b) Or pass this to a sampler for doing inference -> not really sure where/how this happens right now on the compiler-refactor branch.

rlouf · 2021-02-10T08:32:17Z

rlouf
Feb 10, 2021
Maintainer

Thank you for taking the time to share your notes! Since you're the first one (that I know of) to read the internals, I'd be happy to hear your thoughts about the general design and the naming conventions. What made sense immediately? What didn't? Maybe we could start from your notes and answers to those question to document the internals.

What the linear_regression function looks like in AST form is:

Oh that's cool! I haven't plotted this figure yet, but it would have made things easier. That'd be great in the docs.

Finally, if none some cases arise, the ModelDefinitionParser.recursive_visit method throws an TypeError and tells the user to report it as an issue. This means that some syntax has not been anticipated and perhaps needs to be incorporated into the code.

Yes, added or explicitly forbidden. We don't want the code to inexplicably crash, so we prefer to raise a SyntaxError by default.

Note that this is where we are actually doing sampling from the priors of the model, this is what is called 'prior predictive sampling' in Richard Mcelreath's Statistical Rethinking book.

There is an important distinction to make between the two prior distributions. MCX models are functions that can return a value; the prior distribution of this value is the _prior predictive distribution_, given by `sample`. Calling the model `model(rng_key, *args)` also gives you samples from this distribution. MCX models also implicitly define a (multivariate) probability distribution; samples from this distribution are given by `joint_sample`. Reading your notes I feel that `sample` is an unfortunate name choice; it is not obvious what it samples exactly. Would you agree?

Now what we have is a mcx model function which we can:

Yes it is important to note that MCX models will be model-first and not distribution first, as `model()` returns a value and not a class instance. This may change in the future as the latter would make the internals slightly simpler. But keep this in mind, models being hybrid objects can be confusing sometimes.

b) Or pass this to a sampler for doing inference -> not really sure where/how this happens right now on the compiler-refactor branch.

`samples = mcx.sampler(rng_key, model, (x_data,), {'observations': y_data}, kernel).run()` I add to check the code, which must mean the API is not quite there yet. I sometimes wonder if we should be more explicit: ```python samples = mcx.sampler( rng_key, model.condition(X=x_data, observations= y_data), kernel ).run() ```

0 replies

rlouf · 2021-02-10T08:44:11Z

rlouf
Feb 10, 2021
Maintainer

and then stepping through everything with a debugger. As I mentioned I kept notes in a markdown, which i'd be happy to somehow add some part of to the docs if you think it would be useful, it's a bit pedantic and skips over some of the finer details of how things are handled so I'm not sure how useful it is.

That was exhaustive! Made shorter and keeping the big picture ideas this could be very useful for contributors or anyone curious about how this all works.

And the mcx.core.sample_joint function is pretty much doing the same thing as the mcx.core.sample function, it's just returning a dictionary with the parameters (i.e. random variables) and samples from those priors? Is that correct?

I think I anticipated that question in my previous reply. Yes that's correct. The reason why they don't share so much code is the behavior when a MCX model is called within the current model.

Another question - In the sample method of the model class, and in the call method which is used when you call, for example, linear_regression(rng_key= rng_key, x=some_data), I'm not seeing exactly where the sample size is defined?

You _should_ be able to specify a `sample_shape` argument in the `sample` method to follow the API of `mcx.Distribution`. I simply forgot! However, you won't be able to specify the sample size for `linear_regression(rng_key, x)`. However you can do: ```python keys = jax.random.split(rng_key, num_samples) samples = jax.vmap(linear_regression, in_axis=(0, None))(keys, x) ``` You can see the `sample` method as being a shortcut for the more barebones call.

was this removed/left out for a particular reason?

No, I forgot!

Hope this isn't too many questions!

Nope :)

1 reply

tblazina Feb 10, 2021
Author

Thanks for the quick and thorough reply. I'll return the favor as I have some time between meetings 😃

Thank you for taking the time to share your notes! Since you're the
first one (that I know of) to read the internals, I'd be happy to hear
your thoughts about the general design and the naming conventions. What
made sense immediately? What didn't? Maybe we could start from your
notes and answers to those question to document the internals.

No problem - to be fair I've only gotten through the code to understand the model definition part - haven't dug into the sampling, distribution definitions, etc. In terms of the design, and take this with a grain of salt as I work and am trained as a data scientist and not a software engineer, I generally found it fairly easy to understand what was happening, despite not really having any exposure to working with ASTs or having any familiarity with the NetworkX library, so I think that would speak to the general design being understandable. Not really knowing the internals of PyMC3 well, I guess the whole - graph-nature/trying to stick to mathematical notation- of the design is somewhat similar and I think fits well with how some people (i.e. myself) learn/think about constructing these models, which I think is a big plus. In terms of naming conventions - again I found it to be readable and understandable. I think with some good documentation about the design and how the internals actually work, I think it would be quite approachable.

Oh that's cool! I haven't plotted this figure yet, but it would have
made things easier. That'd be great in the docs.

I used showast in a jupyter lab session. I'm quite a visual learner so it definitely helps me to visualize these trees, I could envision some boxes around different parts of this tree indicating where things are handled internally by mcx but maybe its a bit overkill!

Yes, added or explicitly forbidden. We don't want the code to
inexplicably crash, so we prefer to raise a SyntaxError by default.

✔️

There is an important distinction to make between the two prior
distributions. MCX models are functions that can return a value; the
prior distribution of this value is the prior predictive distribution,
given by sample. Calling the model model(rng_key, *args) also gives
you samples from this distribution.
MCX models also implicitly define a (multivariate) probability
distribution; samples from this distribution are given by
joint_sample. Reading your notes I feel that sample is an unfortunate name choice;
it is not obvious what it samples exactly. Would you agree?

Sorry I'm still not sure I get it - what do you exactly mean when you say "the two prior distributions" and "functions that can return a value; the prior distribution of this value is the prior predictive distribution? My understanding is that the prior predictive distribution would just simply be a joint distribution of the priors which you can use to simulate predictions - or am I misunderstanding?

Yes maybe my misunderstanding comes from the naming, perhaps it would be better named otherwise, but I'm not sure to what at the moment.

Yes it is important to note that MCX models will be model-first and not
distribution first, as model() returns a value and not a class
instance. This may change in the future as the latter would make the
internals slightly simpler. But keep this in mind, models being hybrid objects can be confusing
sometimes.

✔️

samples = mcx.sampler(rng_key, model, (x_data,), {'observations': y_data}, kernel).run()

✔️

I add to check the code, which must mean the API is not quite there yet. I sometimes wonder
if we should be more explicit:

samples = mcx.sampler(
     rng_key,
     model.condition(X=x_data, observations= y_data),
     kernel
).run()

This is a great point, I think making the prior predictive and inference stuff as straight forward as possible is a great goal, having used Numpyro a bit, I found the use of their Predictive class for doing these two things to be quite confusing, so I think having this easy and understandable would be a big plus. In terms of how to call it, I think what you propose could be good, I like the use of condition as a term, but would there be a case when you would be passing a model to a sampler when you would not be conditioning it on data (if the whole prior predictive stuff is a method that you can call on the model object)?

That was exhaustive! Made shorter and keeping the big picture ideas this
could be very useful for contributors or anyone curious about how this
all works.

hahah yes sorry I took a lot of notes, and I agree distilling this down into just the big picture with some diagrams and short text would be much preferable to my stream of consciousness notes 😆

I think I anticipated that question in my previous reply. Yes that's
correct. The reason why they don't share so much code is the behavior
when a MCX model is called within the current model.

✔️

You should be able to specify a sample_shape argument in the sample
method to follow the API of mcx.Distribution. I simply forgot!
However, you won't be able to specify the sample size for
linear_regression(rng_key, x). However you can do:

✔️ ✔️

No, I forgot!

✔️

Nope :)

Good then I'll keep asking as I dig in more, very cool stuff thus far! 🎉

rlouf · 2021-02-10T11:41:42Z

rlouf
Feb 10, 2021
Maintainer

Thanks for the feedback! I am glad this is understandable; very few people are comfortable with parsing/modifying the AST so I was a bit afraid this would be confusing. Still some effort needed on the naming side (cf our discussion on prior sampling).

I used showast in a jupyter lab session. I'm quite a visual learner so it definitely helps me to visualize these trees, I could envision some boxes around different parts of this tree indicating where things are handled internally by mcx but maybe its a bit overkill!

I'm a visual learner too and this is super helpful! It can be a great debugging tool for the internals as well; reading through 200 lines of AST dump, or print a graph traversal is not particularly pleasant.

Sorry I'm still not sure I get it - what do you exactly mean when you say "the two prior distributions" and "functions that can return a value; the prior distribution of this value is the prior predictive distribution? My understanding is that the prior predictive distribution would just simply be a joint distribution of the priors which you can use to simulate predictions - or am I misunderstanding?

The model implicitly defines a joint distribution on the random variables (say `a` and `b`). `sample_joint` returns prior samples from this distribution where each sample is a dictionary of values `{"a": val_a, "b": val_b}`. The returned value is considered by definition to be the "predicted" value of the model. The distribution of this value given the random variables are distibuted according to their prior distribution is what I call the prior predictive distribution.

Yes maybe my misunderstanding comes from the naming, perhaps it would be better named otherwise, but I'm not sure to what at the moment.

`predictive_sample` ?

This is a great point, I think making the prior predictive and inference stuff as straight forward as possible is a great goal, having used Numpyro a bit, I found the use of their Predictive class for doing these two things to be quite confusing, so I think having this easy and understandable would be a big plus.

Currently to get samples from the prior predictive distribution you would call `mcx.predict(rng_key, model, *args)`. To get posterior predictive distribution you first need to evaluate the model to use the posterior distribution of the random variables `evaluated_model = mcx.evaluate(model, trace)`. You can then use the same function `mcx.predict(rng_key, evaluated_model, *args)` to get sample from the predictive distribution. I spent a lot of time thinking about this and I think it is the only way to have a coherent API. `mcx.predict` returns samples from the predictive distibution. Between prior and posterior predictive sampling the difference is the model you sample from: one where random variables are distributed according to their prior distribution for the former, one where they are distributed according to their posterior distribution for the latter. Another option would be to use multiple dispatch so `mcx.predict` returns prior predictive samples if `rng_key`, `model`, and the args are specified, posterior predictive samples if you specify the trace as well. But it feels a bit magical. Yet another one is to split between `mcx.prior_predict` and `mcx.posterior_predict`.

In terms of how to call it, I think what you propose could be good, I like the use of condition as a term, but would there be a case when you would be passing a model to a sampler when you would not be conditioning it on data (if the whole prior predictive stuff is a method that you can call on the model object)?

That can happen in theory, for instance when people try samplers on Neal's funnel.

Good then I'll keep asking as I dig in more, very cool stuff thus far! 🎉

Great! I've been working on this pretty much in isolation so questions and constructive criticism are much appreciated.

1 reply

tblazina Feb 10, 2021
Author

Thanks for the feedback! I am glad this is understandable; very few
people are comfortable with parsing/modifying the AST so I was a bit
afraid this would be confusing. Still some effort needed on the naming
side (cf our discussion on prior sampling).

Yes, actually learning a bit about ASTs is honestly this is already a great thing i've never really been exposed to before, just by reading the code. After 6+ years of writing python, I'm often surprised at how much I still don't know.

I'm a visual learner too and this is super helpful! It can be a great
debugging tool for the internals as well; reading through 200 lines of
AST dump, or print a graph traversal is not particularly pleasant.

✔️

The model implicitly defines a joint distribution on the random
variables (say a and b). sample_joint returns prior samples
from this distribution where each sample is a dictionary of values
{"a": val_a, "b": val_b}.

The returned value is considered by definition to be the "predicted" value
of the model. The distribution of this value given the random variables
are distibuted according to their prior distribution is what I call the
prior predictive distribution.

Thanks for the clarification - now this makes sense to me.

predictive_sample ?

👍

Currently to get samples from the prior predictive distribution you
would call mcx.predict(rng_key, model, *args).
To get posterior predictive distribution you first need to evaluate
the model to use the posterior distribution of the random variables
evaluated_model = mcx.evaluate(model, trace). You can then use the
same function mcx.predict(rng_key, evaluated_model, *args) to get
sample from the predictive distribution.

This makes a lot of sense, however again, just from my own experience with the Numpyro API, I don't know how many times I messed up using their Predictive class, and got some unexpected output, only to realize I had passed the wrong parameters to it.

Yet another one is to split between
mcx.prior_predict and mcx.posterior_predict.

I think having this separated into two explicitly named methods would save a lot of confusion for an end user, even if they are just simple wrappers that ultimately feed into something like you describe with mcx.predict.

That can happen in theory, for instance when people try samplers on
Neal's funnel.

Had to look that one up 😉

Great! I've been working on this pretty much in isolation so questions
and constructive criticism are much appreciated.

👍

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interested in contributing - where to start #67

{{title}}

Replies: 5 comments 12 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Interested in contributing - where to start #67

tblazina Jan 27, 2021

Replies: 5 comments · 12 replies

rlouf Jan 29, 2021 Maintainer

tblazina Jan 29, 2021 Author

rlouf Jan 29, 2021 Maintainer

tblazina Jan 29, 2021 Author

rlouf Feb 1, 2021 Maintainer

tblazina Feb 1, 2021 Author

rlouf Feb 1, 2021 Maintainer

tblazina Feb 9, 2021 Author

tblazina Feb 9, 2021 Author

What happens behind the scenes when we wrap a function with the @mcx.model decorator?

Model parsing

the GraphicalModel object

Defining the model probability density function (PDF)

the logpdf function

the compile_graph function

Defining a sample joint function

The sample_joint function

Defining a call function for executing sampling

The sample function

Tying it all together

rlouf Feb 10, 2021 Maintainer

rlouf Feb 10, 2021 Maintainer

tblazina Feb 10, 2021 Author

rlouf Feb 10, 2021 Maintainer

tblazina Feb 10, 2021 Author

tblazina
Jan 27, 2021

Replies: 5 comments 12 replies

rlouf
Jan 29, 2021
Maintainer

tblazina Jan 29, 2021
Author

rlouf Jan 29, 2021
Maintainer

tblazina
Jan 29, 2021
Author

rlouf Feb 1, 2021
Maintainer

tblazina Feb 1, 2021
Author

rlouf Feb 1, 2021
Maintainer

tblazina Feb 9, 2021
Author

tblazina Feb 9, 2021
Author

the `GraphicalModel` object

the `logpdf` function

the `compile_graph` function

The `sample_joint` function

The `sample` function

rlouf
Feb 10, 2021
Maintainer

rlouf
Feb 10, 2021
Maintainer

tblazina Feb 10, 2021
Author

rlouf
Feb 10, 2021
Maintainer

tblazina Feb 10, 2021
Author