Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add sampling interface #310

Closed
wants to merge 4 commits into from
Closed

[WIP] Add sampling interface #310

wants to merge 4 commits into from

Conversation

jeffreypullin
Copy link
Contributor

@jeffreypullin jeffreypullin commented Aug 17, 2019

Hi Nick,

This (single commit)^1 pull request outlines my idea for a sampling interface for distributions in greta.

I worked a bit a while ago on a substantive pull request which added sampling to almost all distributions. Writing the code to sample the distributions was pretty easy but I was very unsure of the best (simplest, most consistent with other greta internals etc.) interface. Some reasons I was confused about this were:

  • Confusion about the arguments to tf_distrib which I thought tf_sample should try to emulate

    • Why are the parameters passed, not accessed using self$? (I instinctively feel that tf_sample should have no arguments.)
    • What form are the parameters meant to take?
    • Why is dag passed but not used?
  • I wasn't sure what sort of internal juggling of tensorflow tenors would be required to sample a full model and how this might effect the necessary interface

The commit outlines my best idea but I am happy to receive any other ideas.

Once the interface is decided on I would be very happy to actually go through and implement the sampling for each distribution.

Jeffrey

  1. Or at least it would be if I knew how to edit commit messages properly

@goldingn
Copy link
Member

goldingn commented Aug 17, 2019

I think a simpler approach for adding the sampling methods would be to add a sample() function to the output of tf_distrib() for those distributions that don't already return a TFP distribution object. E.g. for the current implementation of uniform, the returned list (which stands in for a TFP distribution object) should read:

 list(log_prob = log_prob, cdf = NULL, log_cdf = NULL, sample = sample)

and in the code above, the function sample() should be defined for this distribution alongside log_prob().

That way, all the probability distributions that are already using TFP distribution objects don't need to be changed at all.


That handles the implementation of sampling algorithms. To use those to simulate from a whole model, I'm imagining an alternative version of dag$define_tf() that defines random tensors for unknown variables, rather than using parameters for random variables. I've done a bit of thinking today about how that would work, and whilst it's not straightforward I think I have a clear plan. I also have a bit of time to throw at this tomorrow.

These sampling methods would be called in a new distribution_node method - similar to distribution_node$tf() to create a tensor for a new random variable, using the values of the parameters (which may also be random variables). That would look something like this:

define_random = function (dag) {

  # fetch tensors for the parameters of this distribution
  tf_parameters <- self$tf_fetch_parameters(dag)

  # ensure  the parameters are all expanded to have the n_chains
  # batch dimension, if any have it
  tf_parameters <- match_batches(tf_parameters)

  # define a tensor for the random variable
  distrib <- self$tf_distrib(tf_parameters, dag)
  variable <- distrib$sample()
  
  # assign to the graph in the environment for sampling
  assign(dag$tf_name(self$target, sampling = TRUE),
         variable,
         envir = dag$tf_environment)
  
}

Then we can run the tensorflow session, using the batch dimension to simultaneously generate random samples for the required variables.

I'll have a go at implementing the bare bones of that tomorrow and will share my progress.

@goldingn
Copy link
Member

Just to make the TF sampling a bit more concrete (for me), below is a hand-coded TensorFlow graph corresponding to a greta model. We'll have the dag create something like this from the greta model.

Example greta model:

n <- 10L
y <- rnorm(n)

library(greta)
sd <- lognormal(1, 2)
mu <- normal(0, 1, dim = n)
distribution(y) <- normal(mu, sd)

The type of TF graph the dag will write to sample from this model:

library(tfprobability)
library(reticulate)

# n_samples defines the batch dimension, we'll use it to draw random samples
# from the model prior (we currently use this to run simultaneous chains)
n_samples <- tf$compat$v1$placeholder(dtype = tf$int32)

# define data
scalar_shape <- tf$stack(list(n_samples, 1L))
sd_mu <- tf$ones(shape = scalar_shape)
sd_sig <- tf$ones(shape = scalar_shape) * 2
mn_mu <- tf$zeros(shape = tf$stack(list(n_samples, n)))
mn_sig <- tf$ones(shape = scalar_shape)

# define random tensors for the three variables (sd, mn, y)
sd_dist <- tfp$distributions$LogNormal(sd_mu, sd_sig)
tf_sd <- sd_dist$sample()

mn_dist <-  tfp$distributions$Normal(mn_mu, mn_sig)
tf_mn <- mn_dist$sample()

y_dist <- tfp$distributions$Normal(tf_mn, tf_sd)
tf_y <- y_dist$sample()

# set up tf session
sess <- tf$compat$v1$Session()

# get samples of the things we want
targets <- list(y = tf_y, mn = tf_mn, sd = tf_sd)
samples <- sess$run(targets, feed_dict = dict(n_samples = 50))

str(samples, 1)
List of 3
 $ y : num [1:50, 1:10] 0.502 -26.127 3.94 -8.052 -1.409 ...
 $ mn: num [1:50, 1:10] -0.1316 -0.4377 -0.0734 0.1137 0.607 ...
 $ sd: num [1:50, 1] 1.67 29.31 2.36 4.96 1.47 ...

@jeffreypullin
Copy link
Contributor Author

Right, sounds good! rgreta (my attempt) takes the somewhat similar approach of going through the dag and changing each distribution/variable node to a data node filled with samples from that distribution.

@goldingn goldingn closed this Aug 18, 2019
@goldingn goldingn reopened this Aug 18, 2019
@goldingn
Copy link
Member

I've opened a forum post asking for feedback on the interface to this functionality.

@goldingn
Copy link
Member

goldingn commented Aug 18, 2019

A couple of thoughts for writing the sampling code for distributions. you may have thought of these!

  1. we need to account for truncation of the distribution (for distributions that support truncation)
  2. we need sampling for the joint and mixture distributions too
    1. for mixture, we need to select randomly, according to the weights, from between the component distributions for each observation-sample
    2. for joint, they are (a priori) independent, so just sample the component distributions separately and concatenate them
  3. we need to error nicely when a sampling isn't implemented for a distribution

@jeffreypullin
Copy link
Contributor Author

For (mainly my) reference the list of all greta distributions which do not use tfp$distributions$the_distribution are:

  • uniform
  • bernoulli probit
  • binomial probit
  • beta binomial
  • hypergeometric
  • weibull
  • F
  • multivariate normal
  • wishart
  • lkj

So these are the only ones which we would need to implement errors/ custom sampling from.

Of these the MVN, wishart and LKJ distribution are all implemented in tfp but we don't currently use those implementations. There was a bit of discussion of this here: #272

@goldingn
Copy link
Member

Cool, I'm happy to switch as many distributions as possible over to TFP. So long as they give the same densities (there should be tests for that anyway) and don't incur a major performance hit (I'd expect the opposite anyway). Should be fairly straightforward [1].

As far as I can see, there's no truncation argument for the distributions in TFP, so we would have to write custom sampling code for all the distributions that support truncation. This could call truncated versions of TFP distributions where available, though TruncatedNormal is the only one of those I see.


[1]: The only one that might be fiddly to switch over to TFP is uniform. It has a different definition than the others since handling the transform from the unconstrained free state to the observed scale is tricky if the bounds are variable. The solution is to transform from the free state to (0, 1], then modify the bounds. But that's a bit tricky to do with the current architecture. It could be resolved

@goldingn
Copy link
Member

For most of those univariate distrbutions, we could sample truncated distributions with something like this:

sample <- function () {
  
  # get correct shape as a tensor, including the batch dimension
  batch_size <- tf$shape(<parameter_1>)[0]
  target_dim <- self$target$dim
  shape <- tf$stack(c(list(batch_size), as.list(target_dim)))
  
  # get truncation bounds of the target
  lower <- tf$constant(self$target$lower, dtype = tf_float())
  upper <- tf$constant(self$target$upper, dtype = tf_float())
  
  # the distribution
  d <- tfp$distributions$<Distribution>(<parameter_1>, <parameter_2>)
  
  # transform bounds to (0, 1], and sample uniforms within these bounds
  u_lower <- d$cdf(lower)
  u_upper <- d$cdf(upper)
  u <- tf$random_uniform(shape, minval = u_lower, maxval = u_upper)
  
  # transform from (0, 1] back to target scale
  d$quantile(u)
  
}

@jeffreypullin
Copy link
Contributor Author

The other option that comes to mind is rejection sampling - though it may well be slower.

@goldingn
Copy link
Member

I've made some good progress over on the feature/simulate branch. simulate() now works with list values (mcmc.list values shouldn't be too hard), and passes most tests. Most of the remaining tests are about sensible error messages.

The one awkward failing test is for running simulate() on models with variables that have a different dimensions to the parameters of their distributions. E.g.:

x <- normal(0, 1, dim = 3)

Simulating from this currently only returns a scalar value of x per sample, rather than all three.

That's because the method that we use for broadcasting in forward mode (create the variable, evaluate the density on it, and let TFP handle the broadcasting) doesn't work with sampling. I think the solution is to work out the number of realisations per parameter (ignoring the batch dimension), pass that number in to distribution$sample(), and then reshape the tensor.

I think the following illustrates the right approach (dumping it here because I'll forget this otherwise!):

# set up data
mu_raw <- tf$zeros(shape(1, 1, 1))
sd_raw <- tf$ones(shape(1, 1, 1))

# output dim
batch_size <- tf$compat$v1$placeholder(dtype = tf$int32)
node_dim <- c(10L, 2L)

# distribution
dist <-  tfp$distributions$Normal(mu, sd)

# get TFP's batch size (different meaning to our batch dim) and convert to dim, ignoring our batch dimension
bs <- dist$batch_shape
dim <- vapply(bs$dims[-1], member, "value", FUN.VALUE = integer(1))

# get ratio with our expected output dimensions
ratio <- as.integer(node_dim / dim)

# sample, with this number of extra samples
draws <- mn_dist$sample(sample_shape = ratio)

# reshape this 
out_shape <- tf$stack(c(batch_size, node_dim))
draws_out <- tf$reshape(draws, shape = out_shape)

Not sure of TF's rules for reshaping and whether these will be going back in the right places (which may matter for multivariate distributions).

I'm not likely to have any more time to work on this this week, but hopefully next week!

@goldingn
Copy link
Member

I think this discussion is now superseded by the feature/simulate branch I just merged, which I think does everything does we want. Thanks for your input here though, it was really helpful to get my head around the problems!

I mostly ended up generating random samples for each distribution using TFP distributions (e.g. here), but for truncated univariate distributions I had to sample via the (truncated) inverse CDF (e.g. here and here). Occasionally I had to use other methods (e.g. here).

There are distributional tests for all of these random sampling methods here.

There were only two distributions I couldn't work out how to sample in TensorFlow: hypergeometric and truncated f. I don't think either are particularly urgent, but it would be nice to have solutions. There is a numpy algorithm for random hyperegeometric samples, and probably for the inverse CDF of the f distribution, so maybe when greta switches over to using eager mode, those could be used.

@jeffreypullin
Copy link
Contributor Author

Definitely superseded - great work implementing sampling!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants