New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add sampling interface #310
Conversation
I think a simpler approach for adding the sampling methods would be to add a list(log_prob = log_prob, cdf = NULL, log_cdf = NULL, sample = sample) and in the code above, the function That way, all the probability distributions that are already using TFP distribution objects don't need to be changed at all. That handles the implementation of sampling algorithms. To use those to simulate from a whole model, I'm imagining an alternative version of These sampling methods would be called in a new define_random = function (dag) {
# fetch tensors for the parameters of this distribution
tf_parameters <- self$tf_fetch_parameters(dag)
# ensure the parameters are all expanded to have the n_chains
# batch dimension, if any have it
tf_parameters <- match_batches(tf_parameters)
# define a tensor for the random variable
distrib <- self$tf_distrib(tf_parameters, dag)
variable <- distrib$sample()
# assign to the graph in the environment for sampling
assign(dag$tf_name(self$target, sampling = TRUE),
variable,
envir = dag$tf_environment)
} Then we can run the tensorflow session, using the batch dimension to simultaneously generate random samples for the required variables. I'll have a go at implementing the bare bones of that tomorrow and will share my progress. |
Just to make the TF sampling a bit more concrete (for me), below is a hand-coded TensorFlow graph corresponding to a greta model. We'll have the dag create something like this from the greta model. Example greta model: n <- 10L
y <- rnorm(n)
library(greta)
sd <- lognormal(1, 2)
mu <- normal(0, 1, dim = n)
distribution(y) <- normal(mu, sd) The type of TF graph the dag will write to sample from this model: library(tfprobability)
library(reticulate)
# n_samples defines the batch dimension, we'll use it to draw random samples
# from the model prior (we currently use this to run simultaneous chains)
n_samples <- tf$compat$v1$placeholder(dtype = tf$int32)
# define data
scalar_shape <- tf$stack(list(n_samples, 1L))
sd_mu <- tf$ones(shape = scalar_shape)
sd_sig <- tf$ones(shape = scalar_shape) * 2
mn_mu <- tf$zeros(shape = tf$stack(list(n_samples, n)))
mn_sig <- tf$ones(shape = scalar_shape)
# define random tensors for the three variables (sd, mn, y)
sd_dist <- tfp$distributions$LogNormal(sd_mu, sd_sig)
tf_sd <- sd_dist$sample()
mn_dist <- tfp$distributions$Normal(mn_mu, mn_sig)
tf_mn <- mn_dist$sample()
y_dist <- tfp$distributions$Normal(tf_mn, tf_sd)
tf_y <- y_dist$sample()
# set up tf session
sess <- tf$compat$v1$Session()
# get samples of the things we want
targets <- list(y = tf_y, mn = tf_mn, sd = tf_sd)
samples <- sess$run(targets, feed_dict = dict(n_samples = 50))
str(samples, 1)
|
Right, sounds good! |
I've opened a forum post asking for feedback on the interface to this functionality. |
A couple of thoughts for writing the sampling code for distributions. you may have thought of these!
|
For (mainly my) reference the list of all greta distributions which do not use
So these are the only ones which we would need to implement errors/ custom sampling from. Of these the MVN, wishart and LKJ distribution are all implemented in tfp but we don't currently use those implementations. There was a bit of discussion of this here: #272 |
Cool, I'm happy to switch as many distributions as possible over to TFP. So long as they give the same densities (there should be tests for that anyway) and don't incur a major performance hit (I'd expect the opposite anyway). Should be fairly straightforward [1]. As far as I can see, there's no truncation argument for the distributions in TFP, so we would have to write custom sampling code for all the distributions that support truncation. This could call truncated versions of TFP distributions where available, though [1]: The only one that might be fiddly to switch over to TFP is uniform. It has a different definition than the others since handling the transform from the unconstrained free state to the observed scale is tricky if the bounds are variable. The solution is to transform from the free state to (0, 1], then modify the bounds. But that's a bit tricky to do with the current architecture. It could be resolved |
For most of those univariate distrbutions, we could sample truncated distributions with something like this: sample <- function () {
# get correct shape as a tensor, including the batch dimension
batch_size <- tf$shape(<parameter_1>)[0]
target_dim <- self$target$dim
shape <- tf$stack(c(list(batch_size), as.list(target_dim)))
# get truncation bounds of the target
lower <- tf$constant(self$target$lower, dtype = tf_float())
upper <- tf$constant(self$target$upper, dtype = tf_float())
# the distribution
d <- tfp$distributions$<Distribution>(<parameter_1>, <parameter_2>)
# transform bounds to (0, 1], and sample uniforms within these bounds
u_lower <- d$cdf(lower)
u_upper <- d$cdf(upper)
u <- tf$random_uniform(shape, minval = u_lower, maxval = u_upper)
# transform from (0, 1] back to target scale
d$quantile(u)
} |
The other option that comes to mind is rejection sampling - though it may well be slower. |
I've made some good progress over on the feature/simulate branch. The one awkward failing test is for running x <- normal(0, 1, dim = 3) Simulating from this currently only returns a scalar value of x per sample, rather than all three. That's because the method that we use for broadcasting in forward mode (create the variable, evaluate the density on it, and let TFP handle the broadcasting) doesn't work with sampling. I think the solution is to work out the number of realisations per parameter (ignoring the batch dimension), pass that number in to I think the following illustrates the right approach (dumping it here because I'll forget this otherwise!): # set up data
mu_raw <- tf$zeros(shape(1, 1, 1))
sd_raw <- tf$ones(shape(1, 1, 1))
# output dim
batch_size <- tf$compat$v1$placeholder(dtype = tf$int32)
node_dim <- c(10L, 2L)
# distribution
dist <- tfp$distributions$Normal(mu, sd)
# get TFP's batch size (different meaning to our batch dim) and convert to dim, ignoring our batch dimension
bs <- dist$batch_shape
dim <- vapply(bs$dims[-1], member, "value", FUN.VALUE = integer(1))
# get ratio with our expected output dimensions
ratio <- as.integer(node_dim / dim)
# sample, with this number of extra samples
draws <- mn_dist$sample(sample_shape = ratio)
# reshape this
out_shape <- tf$stack(c(batch_size, node_dim))
draws_out <- tf$reshape(draws, shape = out_shape) Not sure of TF's rules for reshaping and whether these will be going back in the right places (which may matter for multivariate distributions). I'm not likely to have any more time to work on this this week, but hopefully next week! |
I think this discussion is now superseded by the I mostly ended up generating random samples for each distribution using TFP distributions (e.g. here), but for truncated univariate distributions I had to sample via the (truncated) inverse CDF (e.g. here and here). Occasionally I had to use other methods (e.g. here). There are distributional tests for all of these random sampling methods here. There were only two distributions I couldn't work out how to sample in TensorFlow: hypergeometric and truncated f. I don't think either are particularly urgent, but it would be nice to have solutions. There is a numpy algorithm for random hyperegeometric samples, and probably for the inverse CDF of the f distribution, so maybe when greta switches over to using eager mode, those could be used. |
Definitely superseded - great work implementing sampling! |
Hi Nick,
This (single commit)^1 pull request outlines my idea for a sampling interface for distributions in greta.
I worked a bit a while ago on a substantive pull request which added sampling to almost all distributions. Writing the code to sample the distributions was pretty easy but I was very unsure of the best (simplest, most consistent with other greta internals etc.) interface. Some reasons I was confused about this were:
Confusion about the arguments to
tf_distrib
which I thoughttf_sample
should try to emulateself$
? (I instinctively feel thattf_sample
should have no arguments.)dag
passed but not used?I wasn't sure what sort of internal juggling of tensorflow tenors would be required to sample a full model and how this might effect the necessary interface
The commit outlines my best idea but I am happy to receive any other ideas.
Once the interface is decided on I would be very happy to actually go through and implement the sampling for each distribution.
Jeffrey