WIP: Implement opvi #1694

ferrine · 2017-01-22T10:25:05Z

This PR suggests new approach to variational inference via OPVI framework

Variational inference is an umbrella term for algorithms which cast Bayesian inference as optimization. Classically, variational inference uses the Kullback-Leibler divergence to define the optimization. Though this divergence has been widely used, the resultant posterior approximation can suffer from undesirable statistical properties. To address this, we reexamine variational inference from its roots as an optimization problem. We use operators, or functions of functions, to design variational objectives. As one example, we design a variational objective with a Langevin-Stein operator. We develop a black box algorithm, operator variational inference (OPVI), for optimizing any operator objective. Importantly, operators enable us to make explicit the statistical and computational tradeoffs for variational inference. We can characterize different properties of variational objectives, such as objectives that admit data subsampling---allowing inference to scale to massive data---as well as objectives that admit variational programs---a rich class of posterior approximations that does not require a tractable density. We illustrate the benefits of OPVI on a mixture model and a generative model of images.

Convenient interface is still a question, I'm open to suggestions

ferrine · 2017-01-22T10:52:55Z

There is no support for aevb yet, it is going to be implemented soon

springcoil · 2017-01-22T14:46:02Z

pymc3/tests/test_variational_inference.py

+                elbo = mf.elbo(i)
+            elbos = theano.function([i], elbo)
+            self.assertEqual(elbos(1).shape[0], 1)
+            self.assertEqual(elbos(10).shape[0], 10)


Why is this not ready yet?

I've copied this file from #1600, interface is different so tests will fail. I'll refactor them later when interface will be convenient

Ok I get you :)

springcoil · 2017-01-22T14:47:10Z

pymc3/theanof.py

+
+@change_flags(compute_test_value='off')
+def launch_rng(rng):
+    """Helper function for safe launch of rng.


Maybe rng needs expanded here in the docstring.

springcoil · 2017-01-22T14:49:20Z

pymc3/variational/opvi.py

+        """
+        vars_ = [var for var in model.vars if not isinstance(var, pm.model.ObservedRV)]
+        if any([var.dtype in pm.discrete_types for var in vars_]):
+            raise ValueError('Model should not include discrete RVs')


This is good :)

In the paper authors suggest the way how to deal with discrete vars. The approach is described in the appendix. See https://arxiv.org/abs/1610.09033

I think that it can be possibly implemented in the future

springcoil · 2017-01-22T14:55:57Z

I'll need to review the papers to really get on top of this, which will be difficult. Is anyone else au fait with this? Maybe @taku-y or @ColCarroll or @twiecki

fonnesbeck · 2017-01-22T16:09:01Z

Thanks for working on this!

twiecki · 2017-01-23T08:59:43Z

pymc3/theanof.py

+def tt_rng():
+    """
+    Get the package-level random number generator.
+    Returns


I think we need blank lines in-between.

taku-y · 2017-01-23T15:07:34Z

pymc3/distributions/dist_math.py

+        Det = tt.nlinalg.det(S)
+    delta = x - mean
+    k = S.shape[0]
+    result = k * tt.log(2 * np.pi) + tt.log(Det)


It would be better to use logdet(S), which is more numerically stable log(det(S)) especially for high dimensional variables. See #1012.

That PR in Theano is not merged yet. I can copy and paste it into theanof or our math maybe

#1777 is merged now.

taku-y · 2017-01-23T15:33:33Z

pymc3/variational/opvi.py

+
+    class ObjectiveFunction(object):
+        def __init__(self, op, tf):
+            self.random = lambda: op.approx.random()


It would be better to use def instead of lambda, PEP8.

taku-y · 2017-01-23T15:38:01Z

pymc3/variational/opvi.py

+        def __call__(self, z):
+            return self.obj(z)
+
+        def updates(self, z, obj_optimizer=None, test_optimizer=None, more_params=None):


updates() looks good to me.

ferrine · 2017-01-24T05:49:20Z

Rebased

springcoil · 2017-01-28T12:06:16Z

This looks good. Are there changes that need made to Theano for this to work?

twiecki · 2017-01-28T13:51:31Z

pymc3/variational/opvi.py

+        return self.logq(z) - self.logp(z)
+
+
+class LS(Operator):


I suppose this is still not functional?

Yes, this does not work yet. I have great vacations and will resume to work in 14 days:)
Welcome for feedback)

ferrine · 2017-01-30T15:17:37Z

@springcoil no changes for Theano are required. The only serious problem I know is with langevine stein operator. I need theoretical help with it. Here I'm trying to resolve the it.

twiecki · 2017-02-13T10:46:48Z

@ferrine I think we should forge ahead here without langevin stein.

ferrine · 2017-02-13T12:20:46Z

@twiecki I've thought about it. Good idea.

twiecki · 2017-02-17T10:41:32Z

@ferrine #1777 is merged.

ferrine · 2017-02-17T11:33:02Z

I've found problems with minibatch training. Can't get why that happens.

ferrine · 2017-02-17T12:03:54Z

Rebased

ferrine · 2017-02-25T23:20:48Z

Finally

Usage

# assuming neural_network, sigmoid_output, input_var are defined above
with neural_network:
    inference = ADVI()
    # BTW inference.approx is already created
    approx = inference.fit(10000)
    new_input = tt.matrix()
    proba = approx.apply_replacements(
        sigmoid_output,
        deterministic=False, # It's default, just for clarity
        more_replacements={input_var:new_input}
    )
    predict_proba = theano.function([new_input], proba)

Features

Unified interface for variational methods - scope for future work
Simple and clear interface
Custom callbacks while training (add predefined?)
Make replacements for prediction with one line of code
Fast sampling from posterior into NDArray trace
FullRankADVI - catch mutual correlations in posterior for a computational cost
No minibatch pain - use generators in observed nodes and specify total_size and use regular way for inference
Object oriented approach - easy access to the parts of implementation (useful for experiments)

Benchmarks

I've checked if this works with http://twiecki.github.io/blog/2016/06/01/bayesian-deep-learning/
and got good results:

OPVI ADVI performs 6-8% slower than previous implementation. I don't think that's big deal
FullRankADVI worked for that example too. Assuming ADVI speed is 1, FullRank alternative is 0.12 of this. Tracking ELBO matters a lot here: without we have 0.12, 0.05 otherwise.

TODO:

Add documentation
Maybe add some shortcuts

Further ideas

Construct Approximation from given trace (sample from trace object directly, applying replacements as usual)
Implement Normalizing Flows
Implement Langevin Stein Operator
Implement Approximation transfer (pretrain MeanField and initialize FullRank with it)
Implement Structured Variational Inference (https://arxiv.org/abs/1404.4114)
Replace old advi with new one?? (in future PR ofc)
Support pickling?
Other Ideas @twiecki, @taku-y, @springcoil?

ferrine · 2017-02-26T17:36:21Z

What about merge?

taku-y · 2017-02-27T15:24:17Z

It might be better to update example notebooks with the new interface, while commenting out the old one.

taku-y · 2017-02-28T14:21:41Z

pymc3/distributions/dist_math.py

+    else:
+        std = tau**(-1)
+    std += eps
+    return c - tt.log(tt.abs_(std)) - (x - mean) ** 2 / (2 * std ** 2)


c is scaled with the number of elements in x: c -> c * x.ravel().shape[0] ?

There are elemwise operations, so that's ok here

taku-y · 2017-02-28T14:24:39Z

pymc3/model.py

+
+        Returns
+        -------
+        order, flat_view, view


It seems conflicting with the code: order, flat_view, view -> flat_view, order, view

fonnesbeck · 2017-03-02T15:32:45Z

This looks really good. My only recommendation is a tweak to the API to make it more consistent with the rest of the PyMC fitting methods. At the moment, it is identical to the scikit-learn interface, where you instantiate a class and call its fit method. Can we eliminate the separation between the two and have something along the lines of:

with neural_network:
    approx = advi(10000)

this makes it closer to what we do for MCMC. Even better might be something like:

with neural_network:
    approx = approximate(10000, method='ADVI')

which mirrors sample (a verb) with approximate (also a verb).

I think a consistent interface is important, so I'd like to hear what others think. If we like a scikit-learn-like API better, then I'd suggest making changes across the board to accomodate it.

ferrine · 2017-03-02T15:59:16Z

@fonnesbeck Yes, it is good idea. I think that I'll still leave OOP interface for advanced users (for training model sequentially) while adding a simple shortcut approximate

fonnesbeck · 2017-03-02T22:46:55Z

Also, we probably want to modify one or more of the ADVI examples to use this interface.

ferrine · 2017-03-13T23:54:39Z

Rebased.

twiecki · 2017-03-15T09:04:24Z

Congrats @ferrine, this is a huge one.

fonnesbeck · 2017-03-15T13:42:17Z

Yes, well done! I will be putting this interface to use right away.

fonnesbeck · 2017-03-15T17:12:17Z

I don't see a fit function here -- just the method.

Edit: never mind, I see it now.

@taku-y

* migrate useful functions from previous PR (cherry picked from commit 9f61ab4) * opvi draft (cherry picked from commit d0997ff) * made some test work (cherry picked from commit b1a87d5) * refactored approximation to support aevb (without test) * refactor opvi delete unnecessary methods from operator, change method order * change log_q_local computation * add full rank approximation * add more_params argument to ObjectiveFunction.updates (aevb case) * refactor density computation in full rank approximation * typo: cast dict values to list * typo: cast dict values to list * typo: undefined T in dist_math * refactor gradient scaling as suggested in approximateinference.org/accepted/RoederEtAl2016.pdf * implement Langevin-Stein (LS) operator * fix docstring * add blank line in docs * refactor ObjectiveFunction * add not working LS Op test * experiments with not working LS Op * change activations * refactor networks * add step_function * remove Langevin Stein, done refactoring * remove Langevin Stein, done refactoring * change optimizers * refactor init params * implement tests * implement Inference * code style * test fix * add minibatch test (fails now) * add more tests for minibatch training * add logdet to FullRank approximation * add conversion of arrays to floatX * tiny changes * change number of iterations * fix test and pylint check * memoize functions in Objective function * Optimize code a lot * a bit more efficient pickling * add docs * Add MeanField -> FullRank parameter transfer * refactor MeanField and FullRank a bit * fix FullRank bug with shapes in random * refactor Model.flatten (CC @taku-y) * add `approximate` to inference * rename approximate->fit * change abbreviations * Fix bug with scaling input variable in aevb * fix theano bottleneck in graph * more efficient scaling for local vars * fix typo in local Q * add aevb test * refactor memoize to work with my objects * add tests for numpy view usage * pickle-hash fix * pickle-hash fix again * add node sampling + make up some code * add notebook with example * sample_proba explained

@taku-y

* Added live_traceplot function * Cosmetic change * Changed the API to pm.sample(..., live_plot=True) * Don't include `-np.inf` in calculating average ELBO (#1880) * Adds an infmean for advi reporting * fixing typo * Add tutorial to detect sampling problems (#1866) * Expand sampler-stats.ipynb example include model diagnose from case study example in Stan http://mc-stan.org/documentation/case-studies/divergences_and_bias.html * Sampler Diagnose for NUTS * descriptive annotation and axis labels * Fix typos * PEP8 styling * minor updates 1, add example to examples.rst 2, original content in Markdown code block * Make install scripts idempotent (#1879) * DOC Change heading names. * Add examples of censored data models (#1870) * Raise TypeError on non-data values of observed (#1872) * Raise TypeError on non-data values of observed * Added check for observed TypeError * Make exponential mode have the correct shape * Fix support of LKJCorr * Added tutorial notebook on updating priors * Fixed y-axis bug in forestplot; added transform argument to summary * Style cleanup * Made small changes and executed the notebook * Added probit and invprobit functions * Added carriage return to end of file * Fixed indentation * Changed probit test to use assert_allclose * Fix tests for LKJCorr * Added warning for ignoring init arguments in sample * Kill stray tab * Improve performance of transformations * DOC Add new features * Bump version. * Added docs and scripts to MANIFEST * WIP: Implement opvi (#1694) * migrate useful functions from previous PR (cherry picked from commit 9f61ab4) * opvi draft (cherry picked from commit d0997ff) * made some test work (cherry picked from commit b1a87d5) * refactored approximation to support aevb (without test) * refactor opvi delete unnecessary methods from operator, change method order * change log_q_local computation * add full rank approximation * add more_params argument to ObjectiveFunction.updates (aevb case) * refactor density computation in full rank approximation * typo: cast dict values to list * typo: cast dict values to list * typo: undefined T in dist_math * refactor gradient scaling as suggested in approximateinference.org/accepted/RoederEtAl2016.pdf * implement Langevin-Stein (LS) operator * fix docstring * add blank line in docs * refactor ObjectiveFunction * add not working LS Op test * experiments with not working LS Op * change activations * refactor networks * add step_function * remove Langevin Stein, done refactoring * remove Langevin Stein, done refactoring * change optimizers * refactor init params * implement tests * implement Inference * code style * test fix * add minibatch test (fails now) * add more tests for minibatch training * add logdet to FullRank approximation * add conversion of arrays to floatX * tiny changes * change number of iterations * fix test and pylint check * memoize functions in Objective function * Optimize code a lot * a bit more efficient pickling * add docs * Add MeanField -> FullRank parameter transfer * refactor MeanField and FullRank a bit * fix FullRank bug with shapes in random * refactor Model.flatten (CC @taku-y) * add `approximate` to inference * rename approximate->fit * change abbreviations * Fix bug with scaling input variable in aevb * fix theano bottleneck in graph * more efficient scaling for local vars * fix typo in local Q * add aevb test * refactor memoize to work with my objects * add tests for numpy view usage * pickle-hash fix * pickle-hash fix again * add node sampling + make up some code * add notebook with example * sample_proba explained * Revert "small fix for multivariate mixture models" * Added message about init only working with auto-assigned step methods * doc(DiagInferDiv): formatting fix in blog post quote. Closes #1895. (#1909) * delete unnecessary text and add some benchmarks (#1901) * Add LKJCholeskyCov * Added newline to MANIFEST * Replaced package list with find_packages in setup.py; removed examples/data/__init__.py * Fix log jacobian in LKJCholeskyCov * Updated version to rc2 * Fixed stray version string * Fix indexing traces with steps greater one * refactor variational module, add histogram approximation (#1904) * refactor module, add histogram * add more tests * refactor some code concerning AEVB histogram * fix test for histogram * use mean as deterministic point in Histogram * remove unused import * change names of shortcuts * add names to shared params * add new line at the end of `approximations.py` * Add documentation for LKJCholeskyCov * SVGD problems (#1916) * fix some svgd problems * switch -> ifelse * except in record * Histogram docs (#1914) * add docs * delete redundant code * add usage example * remove unused import * Add expand_packed_triangular * improve aesthetics * Bump theano to 0.9.0rc4 (#1921) * Add tests for LKJCholeskyCov * Histogram: use only free RVs from trace (#1926) * use only free RVs from trace * use memoize in Histogram.histogram_logp * Change tests for histogram * Bump theano to be at least 0.9.0 * small fix to prevent a TypeError with the ufunc true_divide * Fix tests for py2 * Add floatX wrappers in test_advi * Changed the API to pm.sample(..., live_plot=True) * Better formatting

ferrine mentioned this pull request Jan 22, 2017

Added base class for variational methods #1600

Closed

springcoil reviewed Jan 22, 2017

View reviewed changes

springcoil changed the title ~~Implement opvi~~ WIP: Implement opvi Jan 22, 2017

springcoil added the WIP label Jan 22, 2017

twiecki reviewed Jan 23, 2017

View reviewed changes

taku-y reviewed Jan 23, 2017

View reviewed changes

twiecki reviewed Jan 28, 2017

View reviewed changes

This was referenced Feb 13, 2017

Use logdet() instead of log(det()) #1012

Closed

Change ADVI to use updates.py #1784

Closed

taku-y reviewed Feb 28, 2017

View reviewed changes

ferrine added 16 commits March 14, 2017 02:51

a bit more efficient pickling

9e8bf41

add docs

afa8af2

Add MeanField -> FullRank parameter transfer

4fcef3c

refactor MeanField and FullRank a bit

6bfc243

fix FullRank bug with shapes in random

3049d8f

refactor Model.flatten (CC @taku-y)

a7a31de

add approximate to inference

32487c4

rename approximate->fit

c415647

change abbreviations

3ccc7b8

Fix bug with scaling input variable in aevb

3c1b805

fix theano bottleneck in graph

6a7d84f

more efficient scaling for local vars

c695990

fix typo in local Q

92f0ca7

add aevb test

b6ac76e

refactor memoize to work with my objects

7a036f2

add tests for numpy view usage

57eb4c1

ferrine added 5 commits March 14, 2017 11:42

pickle-hash fix

1e8d2ec

pickle-hash fix again

d90af39

add node sampling + make up some code

2b7f84e

add notebook with example

c4da035

sample_proba explained

6747dc3

twiecki merged commit 4a713dc into pymc-devs:master Mar 15, 2017

ferrine deleted the implement_opvi branch March 15, 2017 09:13

aseyboldt mentioned this pull request May 8, 2017

Remove RNG initialization from theanof. #2144

Closed

WIP: Implement opvi #1694

WIP: Implement opvi #1694

Conversation

ferrine commented Jan 22, 2017

ferrine commented Jan 22, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

springcoil commented Jan 22, 2017

fonnesbeck commented Jan 22, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ferrine commented Jan 24, 2017

springcoil commented Jan 28, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ferrine commented Jan 30, 2017 • edited Loading

twiecki commented Feb 13, 2017

ferrine commented Feb 13, 2017

twiecki commented Feb 17, 2017

ferrine commented Feb 17, 2017 • edited Loading

ferrine commented Feb 17, 2017

ferrine commented Feb 25, 2017 • edited Loading

Finally

Usage

Features

Benchmarks

TODO:

Further ideas

ferrine commented Feb 26, 2017

taku-y commented Feb 27, 2017

taku-y Feb 28, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fonnesbeck commented Mar 2, 2017

ferrine commented Mar 2, 2017

fonnesbeck commented Mar 2, 2017

ferrine commented Mar 13, 2017

twiecki commented Mar 15, 2017

fonnesbeck commented Mar 15, 2017

fonnesbeck commented Mar 15, 2017 • edited Loading

ferrine commented Jan 30, 2017 •

edited

Loading

ferrine commented Feb 17, 2017 •

edited

Loading

ferrine commented Feb 25, 2017 •

edited

Loading

taku-y Feb 28, 2017 •

edited

Loading

fonnesbeck commented Mar 15, 2017 •

edited

Loading