Neural network layers #16

rlouf · 2020-04-04T08:01:04Z

I open this PR to start thinking about the design of bayesian neural network layers. The idea is to subclass trax’s constructs and allow use of distributions for weights and transformation of weights.

The goal is to able to take any model expressed with ˋtrax` and make it bayesian by adding prior distributions on the weights.

Of course, we should be able to construct hierarchical models by adding hyperpriors on the priors’ parameters.

Layers are distributions over functions; let us see what if could look like on a naive MNIST example:

@mcx.model
def mnist(image):
    nn <~ ml.Serial(
        dense(400, Normal(0, 1)),
        dense(400, Normal(0, 1)),
        dense(10, Normal(0, 1)),
        softmax(),
    )
    p = nn(image)
    cat <~ Categorical(p)
    return cat

The above snippet is naive in the sense that the way Normal(0, 1) is related to each weight in the layer is not very clear. We need to specify broadcasting rules for the bayesian layers.

We should be able to easily define hierarchical models:

@mcx.model
def mnist(image):
    sigma <~ Exponential(2)
    nn <~ ml.Serial(
        dense(400, Normal(0, sigma)),
        dense(400, Normal(0, sigma)),
        dense(10, Normal(0, sigma)),
        softmax(),
    )
    p = nn(image)
    cat <~ Categorical(p)
    return cat

Forward sampling

Let’s look now at the design of the forward sampler. We need to return forward samples of the layer's weights as well as the other random variables.

We could define a sample method that draws a realization of each layer and performs a forward pass with the drawn weights.

def  mnist_sampler(rng_key, image):
    nn = ml.Serial(
        dense(400, Normal(0, 1)),
        dense(400, Normal(0, 1)),
        dense(10, Normal(0, 1)),
        softmax(),
    )
    p, weights = nn.sample(rng_key, image)
    cat = Categorical(p).sample(rng_key)
    return weights, cat

where weights is a tuple that contains all the weights's realized value. This would keep a similar API to the distributions' with the added output return value that reflects the fact that we are sampling a function.

Another option is

def  mnist_sampler(rng_key, image):
    nn = ml.Serial(
        dense(400, Normal(0, 1)),
        dense(400, Normal(0, 1)),
        dense(10, Normal(0, 1)),
        softmax(),
    )
    weights = nn.sample(rng_key)
    p = nn(image, weights)
    cat = Categorical(p).sample(rng_key)
    return weights, cat

which feels less magical.

Log-probability density function

def  mnist_logpdf(weights, image, cat):
    logpdf = 0
    nn = ml.Serial(
        dense(400, Normal(0, 1)),
        dense(400, Normal(0, 1)),
        dense(10, Normal(0, 1)),
        softmax(),
    )
    logpdf += nn.logpdf(image, weights)
    p = nn(image, weights)
    logpdf += Categorical(p).logpdf(cat)
    return logpdf

Note: the __call__ method of the layers calls the pure_fn method which is jit-able. Not sure it is necessary to call it directly here.

I believe `path_length` is still in use for historical reasons; however, it makes more sense to reason in terms of number of integration steps while it simplifies the cost (no dynamic computation of the number of integration steps and casting to int). I thus replaced every mention of `path length` in the HMC proposal and program with `num_integration_steps`.

The boundary between programs (sampling algorithms) and runtimes (executors) was not very clear. I remove any dependence on the model from the program and responsibilities are now clear. Most of the initialization has been transferred to the runtime. I also improved the performance of the creation of initial states.

ericmjl · 2020-04-15T15:53:06Z

@rlouf I'm not sure if this might help a bit, but would a blog post I wrote on shapes be helpful to you? No pressure to read it though. Just a thought, no pressure.

rlouf · 2020-04-15T17:09:37Z

@ericmjl Thank you for the link, I did read your post before implementing distributions. It was really helpful to dive into TFP’s shape system!

Is there anything in particular you think I might have missed that could help me?

ericmjl · 2020-04-15T20:34:09Z

@rlouf thank you for the kind words! I think (but I'm not 100% sure) maybe working backwards from the desired semantics might be helpful?

Personally, when I think of Gaussian priors on a neural network's weights, I tend to think of them as being the "same" prior (e.g. N(0, 1)) applied to every single weight matrix entry, as I haven't seen a strong reason to apply, for example, N(0, 1) to entry [0, 0] and then N(0, 3) to entry [0, 1] and so on.

I think I might still be unclear, so let me attempt an example that has contrasts in there.

Given the following NN:

@mcx.model
def mnist(image):
    nn ~ ml.Serial(
        dense(400, Normal(0, 1)),
        dense(400, Normal(3, 4)),
	    dense(10, Normal(-2, 7)),
        softmax(),
    )
    p = nn(image)
    cat ~ Categorical(p)
    return cat

I would read it as:

First dense layer has shape (:, 400), and so I give each weight in there N(0, 1) as the prior.
Second dense layer has shape (400, 400), and so I give each weight in there N(3, 4) as the prior. (No idea why I'd actually wanna do that though!)
Third dense layer has shape (400, 10), and so I give each weight in there N(-2, 7) as the prior. (Even more absurd prior! 😛)

I think the suggestion I have here matches to your 2nd option exactly:

Every weight has the same prior distribution.

You don't have to accept the exact suggestion, but maybe implementing it one way first and then trying it out might illuminate whether it's good or not? In reimplementing an RNN, I did the layers in an opinionated, "my-way" fashion first, then realized it'd be easier and more compatible to just go stax/trax-style, and then worked with my intern to get it re-done in a stax-compatible fashion. Not much time was lost, even though in retrospect, I clearly got it wrong the first time.

rlouf · 2020-04-16T06:40:57Z

Interesting feedback, thank you for taking the time to explain! The NN API is indeed a bit tricky to get right the first time.

I am currently leaning towards what you're proposing. Would you agree with simply broadcasting the parameters' shape with the layer's shape to obtain the batch_shape (drawing I made)? This can be done dynamically when forward sampling; since the initialization of the posterior sampler uses forward sampling to determinate the layers' shape it would work.

This way it is also compatible with crazy specs, like a different variance for each layer weight.

ericmjl · 2020-04-16T19:50:04Z

Would you agree with simply broadcasting the parameters' shape with the layer's shape to obtain the batch_shape (drawing I made)?

Yes, I would! It sounds like a sensible default to have.

rlouf · 2020-04-17T17:28:38Z

Thank you for your insights! It feels good to have someone else's opinion.

Was your RNN project Bayesian? If so, is the code available somewhere?

ericmjl · 2020-04-17T21:06:35Z

The RNN wasn't Bayesian, and it was mostly a re-implementation of the original, but done in JAX. Given that it's written stax-style, I'm sure it shouldn't be too hard to extend it to mcx 😄.

You can find the repo here, and we have a mini-writeup available too.

To keep a simple API when building Bayesian Neural Network we don't want to have to specify the batching shape of the prior distribution so that it matches the layer size. Therefore we add a helper function that re-broadcasts a distribution to a destination shape (here the layer size) so it can be used in the neural network internals.

rlouf · 2021-04-12T06:39:12Z

Closing for now; the relevant info is in the discussions.

rlouf and others added 6 commits March 26, 2020 10:07

rename execution method of the routine

ab03f63

remove now unnecessary program template

28c21f7

remove mentions to flat arrays in docstrings

bee1925

add neural network base layer (untested)

4d5305a

rlouf added enhancement-api priority-3 Not but, low priority issues/PR labels Apr 6, 2020

rlouf force-pushed the master branch 3 times, most recently from 48e21e4 to 025b5f1 Compare May 7, 2020 05:54

rlouf added 2 commits June 2, 2020 12:33

update rv assignment symbol in docstrings

96ba947

rlouf force-pushed the master branch 10 times, most recently from ad89b00 to 4e2e2fa Compare September 29, 2020 10:09

rlouf force-pushed the master branch 4 times, most recently from 8d48e5a to 9b0c1e7 Compare September 29, 2020 13:20

rlouf mentioned this pull request Oct 21, 2020

Neural network layers #50

Closed

rlouf force-pushed the master branch 3 times, most recently from 2f93d4d to e906ee0 Compare October 29, 2020 17:40

rlouf force-pushed the master branch from 5e9fc32 to 24bddb3 Compare November 27, 2020 10:21

rlouf marked this pull request as draft February 16, 2021 12:39

rlouf force-pushed the master branch 3 times, most recently from f8f3e6b to 965f6dd Compare February 23, 2021 11:28

rlouf closed this Apr 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Neural network layers #16

Neural network layers #16

rlouf commented Apr 4, 2020 •

edited

Loading

ericmjl commented Apr 15, 2020 •

edited

Loading

rlouf commented Apr 15, 2020 •

edited

Loading

ericmjl commented Apr 15, 2020 •

edited by rlouf

Loading

rlouf commented Apr 16, 2020 •

edited

Loading

ericmjl commented Apr 16, 2020

rlouf commented Apr 17, 2020 •

edited

Loading

ericmjl commented Apr 17, 2020 •

edited

Loading

rlouf commented Apr 12, 2021

Neural network layers #16

Neural network layers #16

Conversation

rlouf commented Apr 4, 2020 • edited Loading

Forward sampling

Log-probability density function

ericmjl commented Apr 15, 2020 • edited Loading

rlouf commented Apr 15, 2020 • edited Loading

ericmjl commented Apr 15, 2020 • edited by rlouf Loading

rlouf commented Apr 16, 2020 • edited Loading

ericmjl commented Apr 16, 2020

rlouf commented Apr 17, 2020 • edited Loading

ericmjl commented Apr 17, 2020 • edited Loading

rlouf commented Apr 12, 2021

rlouf commented Apr 4, 2020 •

edited

Loading

ericmjl commented Apr 15, 2020 •

edited

Loading

rlouf commented Apr 15, 2020 •

edited

Loading

ericmjl commented Apr 15, 2020 •

edited by rlouf

Loading

rlouf commented Apr 16, 2020 •

edited

Loading

rlouf commented Apr 17, 2020 •

edited

Loading

ericmjl commented Apr 17, 2020 •

edited

Loading