We were looking to solve the problem of unsupervised learning of a probabilistic model via real-value non-volume preserving (rNVP) transformations. Based on the change of variable formulas for functions of random variables, one can transform one r.v. to a different r.v. if the determinant of the Jacobian of said transformation is computable. For many transformations, this process is either impossible analytically, or very expensive numerically. That’s where rNVP comes in. rNVP are invertible, stable mappings between a data distribution and a base distribution (Multivariate Normal). Said mappings also have Jacobian determinants that are easily computable. This results in a generative system that can learn and produce relatively exact and relatively efficient log-likelihood, sampling and inference of the learned distribution. The base distribution being Gaussian is also interpretable from the outcome of the learning process.
We used two main papers as the basis for our work. The first one, “Maximum Entropy Flow Networks”, created a network that can learn to recreate distributions while maximizing randomness and applied their model to help achieve risk neutral asset pricing for options as well as texture generation. The other paper we looked at, “Density estimation using Real NVP” also uses rNVP transformations to generate images (i.e. celebrities, animals, bedrooms…). We first wanted to test our model on toy distributions to show functionality and then attempted to test our image generation capabilities on the MNIST dataset that we had previously encountered in class. With further work on the topic such transformations learned using rNVP can be applied to arbitrarily complex distributions. We could for example generate high dimensional neural data in a biological context by learning the underlying distributions for firing rates of networks.
Both our scale (s) network and translation (t) network are fully-connected, with 2 hidden LeakyReLU layers. The last layer of t is linear, and s is tanh. We used a hidden size of 512 and optimized the models using Adam with a learning rate 0.0001 for moons, and 0.001 for images. We used six alternating blocks of such s and t.