New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stochastic Gradient Fisher Scoring #1977

Merged
merged 11 commits into from Jul 18, 2017

Conversation

Projects
None yet
8 participants
@shkr
Copy link
Contributor

shkr commented Mar 31, 2017

Issue #1958

@shkr shkr changed the title [WIP] Stochastic Gradient Fisher Scoring #1958 #1958 [WIP] Stochastic Gradient Fisher Scoring Mar 31, 2017

@shkr shkr changed the title #1958 [WIP] Stochastic Gradient Fisher Scoring [WIP] Stochastic Gradient Fisher Scoring Mar 31, 2017

@ColCarroll
Copy link
Member

ColCarroll left a comment

Made a few nitpicks to this before realizing it is a WIP! Looks like a great start - happy to keep looking when it is done.

If you want to hook this up to existing test machinery, check out TestStepMethods.test_step_continuous in tests/test_step.py. It does some sanity checking of recovering the mean and variance in a model. You can then run just that test with py.test -xv tests/test_step.py::TestStepMethods::test_step_continuous.

"""
default_blocked = False

def __init__(self, vars=None, batch_size=None, n_iter=None, total_size=None,

This comment has been minimized.

@ColCarroll

ColCarroll Apr 1, 2017

Member

need to add minibatches to signature here

-------
vars : list
List of variables for sampler
total_size : int

This comment has been minimized.

@ColCarroll

ColCarroll Apr 1, 2017

Member

can you rearrange the docstring to match the order of the arguments?

This comment has been minimized.

@shkr

shkr Apr 3, 2017

Author Contributor

Sure. I will fix that.

shared = pm.make_shared_replacements(vars, model)
super(BaseStochasticGradient, self).__init__(vars, shared)

self.batch_size = batch_size or 128

This comment has been minimized.

@ColCarroll

ColCarroll Apr 1, 2017

Member

can you put these defaults in the signature instead?

def __init__(self, vars=None, batch_size=128, n_iter=100, ...):
    self.batch_size = batch_size
    self.n_iter = n_iter
    ...

This comment has been minimized.

@shkr

shkr Apr 3, 2017

Author Contributor

Ok



def _initialize_values(self):
raise ValueError('NotImplemented')

This comment has been minimized.

@ColCarroll

ColCarroll Apr 1, 2017

Member

NotImplementedError

This comment has been minimized.

@shkr

shkr Apr 3, 2017

Author Contributor

Using it now

return self.model.fn(tt.add(*terms))


def dlogp_elemwise(self):

This comment has been minimized.

@ColCarroll

ColCarroll Apr 1, 2017

Member

this is defined twice

@shkr shkr force-pushed the shkr:stgbs branch 3 times, most recently from a5bac12 to 5f92399 Apr 3, 2017

@fonnesbeck

This comment has been minimized.

Copy link
Member

fonnesbeck commented Apr 4, 2017

I'd suggest a shorter, abbreviated filename, such as sgfs.py, or fisher_scoring.py would work as well.

self.step_size = step_size
self.burn_in = burn_in

This comment has been minimized.

@twiecki

twiecki Apr 6, 2017

Member

these shouldn't turn up, best to install a plugin in your editor that automatically deletes these.

@@ -230,15 +222,15 @@ def astep(self, ps0):
g = dlogpt(q_params)

# 5. Calculate mean dlogp
avg_g = (1. / n) * (np.sum(g, axis=0))
avg_g = (1./n)*(np.sum(g, axis=0))

This comment has been minimized.

@twiecki

twiecki Apr 6, 2017

Member

these were correct before, see pep8.

This comment has been minimized.

@shkr

shkr Apr 9, 2017

Author Contributor

Okay. I will use the yapf - pep8 tool to do the indentation.

return samples


class StochasticGradientFisherScoring(BaseStochasticGradient):

This comment has been minimized.

@twiecki

twiecki Apr 8, 2017

Member

For better or worse, we usually just abbreviate -> SGFS.

@staticmethod
def competence(var):
if var.dtype in continuous_types:
return Competence.IDEAL

This comment has been minimized.

@twiecki

twiecki Apr 8, 2017

Member

Should down-grade from IDEAL.

This comment has been minimized.

@shkr

shkr Apr 10, 2017

Author Contributor

Should I recommend it to Compatible ? What difference will it cause at runtime ?

This comment has been minimized.

@twiecki

twiecki Apr 10, 2017

Member

Yes, otherwise it might get automatically chosen for continuous RVs.

@shkr shkr changed the title [WIP] Stochastic Gradient Fisher Scoring WIP: Stochastic Gradient Fisher Scoring Apr 10, 2017

@shkr shkr force-pushed the shkr:stgbs branch 2 times, most recently from 5c7210a to 162c50b Apr 10, 2017

with pm.Model() as model:
a = pm.Normal('a', mu=0, sd=1.0)
b = pm.Normal('b', mu=1.0, sd=1.0)
c = pm.Normal('c', mu=a + b, sd=1.0, observed=cT)

This comment has been minimized.

@twiecki

twiecki Apr 10, 2017

Member

We should definitely port this to @ferrine's new pm.generator which would obsolete most kwargs below. @ferrine could you advise on how to best use it here?

This comment has been minimized.

@ferrine

ferrine Apr 10, 2017

Member

I considered the following interface very flexible. Let user feel free of what mini batch interface to use. There are three options.

  1. Set total_size=N for observed and local vars
  2. Set minibatch generator via pm.generator. It is very easy, requires minimum code. But sometimes corner case happen. See Theano/Theano#5809
  3. Use shared variable in advance for data storing and define a callback that changes storage. Requires callbacks kwarg. It is also useful for getting custom statistics from the process, see this notebook for example.
  4. use more_replacements kwarg. Usually it replaces input with pm.generator to avoid Theano/Theano#5809

I hope in future we'll have consistent interface using this paradigm for scalability

This comment has been minimized.

@shkr

shkr Apr 13, 2017

Author Contributor

@ferrine Any changes you would suggest, to the way I have used generator in the test file now ?

@twiecki

This comment has been minimized.

Copy link
Member

twiecki commented Apr 10, 2017

Also would profit from an example NB with a more complex / large model, maybe this one: http://twiecki.github.io/blog/2016/07/05/bayesian-deep-learning/ (just the last one with convolutions).

@shkr shkr changed the title WIP: Stochastic Gradient Fisher Scoring Stochastic Gradient Fisher Scoring Apr 10, 2017

@shkr shkr force-pushed the shkr:stgbs branch 2 times, most recently from d55f6ac to 3adec9c Apr 10, 2017

g = dlogpt

#f = theano.function(inputs=input_tensors, outputs=g)
#vars_value = self.get_vars_value()

This comment has been minimized.

@twiecki

twiecki Apr 10, 2017

Member

can we remove this?

@shkr

This comment has been minimized.

Copy link
Contributor Author

shkr commented Apr 10, 2017

@twiecki I will add a notebook in the PR, using the problem discussed in the blogpost http://twiecki.github.io/blog/2016/07/05/bayesian-deep-learning/ (just the last one with convolutions). Thanks for the recommendation.

@shkr shkr force-pushed the shkr:stgbs branch 2 times, most recently from 44fcde7 to 480a938 Apr 11, 2017

@shkr

This comment has been minimized.

Copy link
Contributor Author

shkr commented Apr 13, 2017

@twiecki @ColCarroll @ferrine I have moved the sgmcmc file. I had to spend time with the pymc3 library to understand how everything wired together. I think the code that I have tries to use existing apis and design to the best of my judgement. There is a bug in SGFS.astep that I am trying to fix, which is making results wonky. I think by adding the notebook that @twiecki suggested I will be able to diagnose that issue.

This is my first PR, so the test that I wrote might be a little sort of expectations too. Please advice if I should change that.

@shkr shkr force-pushed the shkr:stgbs branch from d0928fc to 5c43ae4 Apr 13, 2017

@shkr

This comment has been minimized.

Copy link
Contributor Author

shkr commented Apr 13, 2017

Okay. I saw @ColCarroll earlier comment on testing

If you want to hook this up to existing test machinery, check out TestStepMethods.test_step_continuous in tests/test_step.py. It does some sanity checking of recovering the mean and variance in a model. You can then run just that test with py.test -xv tests/test_step.py::TestStepMethods::test_step_continuous.

I will add that

obs = pm.Normal('obs', mu=mu, sd=sd, observed=batch_generator)
trace = pm.sample(
draws=int(total_size / batch_size),
step=SGFS(vars=model.vars, model=model, total_size=total_size))

This comment has been minimized.

@fonnesbeck

fonnesbeck Apr 14, 2017

Member

Does model need to be passed if it is inside the context manager?

@ferrine

This comment has been minimized.

Copy link
Member

ferrine commented Apr 15, 2017

I have a good finding
https://papers.nips.cc/paper/6293-variance-reduction-in-stochastic-gradient-langevin-dynamics.pdf
Maybe it can serve as an idea for further extensions

@shkr shkr force-pushed the shkr:stgbs branch from 7208693 to 54f2e09 Apr 16, 2017

from ...model import modelcontext, inputvars
from .arraystep import Competence, ArrayStepShared
from ..vartypes import continuous_types
from ..model import modelcontext, inputvars
import theano.tensor as tt
from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams

This comment has been minimized.

@ferrine

ferrine Apr 16, 2017

Member

we have pm.theanof.tt_rng() for package level random generator

tt_logp_elemwise = tt.as_tensor_variable(
[v.logp_elemwiset for v in model.observed_RVs]).sum(axis=0)
terms = [
v.logp_elemwiset for v in model.basic_RVs if var in inputs([v.logpt])

This comment has been minimized.

@ferrine

ferrine Apr 16, 2017

Member

What is this check for?

terms = [
v.logp_elemwiset for v in model.basic_RVs if var in inputs([v.logpt])
]
elemwise_logp = tt.add(*terms)

This comment has been minimized.

@ferrine

ferrine Apr 16, 2017

Member

This will fail if observed vars have different shape

@shkr

This comment has been minimized.

Copy link
Contributor Author

shkr commented Jun 27, 2017

identity_matrix

Preconditioning matrix = Identity

27558470-3041dde6-5a8b-11e7-9f5a-04d483cfba63

Preconditioning matrix is proportional to Fisher Information

@twiecki

This comment has been minimized.

Copy link
Member

twiecki commented Jun 27, 2017

The data is different in both runs, try setting the seed.

@shkr

This comment has been minimized.

Copy link
Contributor Author

shkr commented Jun 27, 2017

Oops. My bad. Here is the difference after fixing the seed

alt text

@shkr

This comment has been minimized.

Copy link
Contributor Author

shkr commented Jun 28, 2017

@twiecki Oops, that is my misunderstanding. The only requirement on the B matrix is positive definite property. There is no clear reason why B's proportionality to I_t should outperform.

The large variance required in the direction of low curvature when exploring the probability space is provided by the preconditioned matrix 'C' which is defined as:

screen shot 2017-06-28 at 4 20 15 pm

The constant matrix B provides a constant support to mixing nature of the algorithm

screen shot 2017-06-28 at 1 05 12 pm

However, its recommended in the paper and it does simplify the equation for Variance. That is my understanding.

Reference - https://www.slideshare.net/hustwj/austerity-in-mcmc-land-cutting-the-computational-budget/32

@shkr

This comment has been minimized.

Copy link
Contributor Author

shkr commented Jun 29, 2017

@twiecki The unit test is included in the PR now.

@twiecki

This comment has been minimized.

Copy link
Member

twiecki commented Jul 12, 2017

@shkr Now with 3.1 out the door I want to get back to this. Is the PR ready from your end?

@shkr

This comment has been minimized.

Copy link
Contributor Author

shkr commented Jul 12, 2017

Yup

@shkr

This comment has been minimized.

Copy link
Contributor Author

shkr commented Jul 12, 2017

There is a timeout issue with the travis build, because of which it is failing

@aseyboldt

This comment has been minimized.

Copy link
Member

aseyboldt commented Jul 12, 2017

I restarted those builds, the timeout doesn't seem to be related to the PR.

@shkr

This comment has been minimized.

Copy link
Contributor Author

shkr commented Jul 12, 2017

Thanks @aseyboldt travis build succeeded !

@junpenglao

This comment has been minimized.

Copy link
Member

junpenglao commented Jul 13, 2017

Looks nice and I am excited to try! Just a small nitpick: should we put an EXPERIMENTAL_WARNING similar to the one in SMC?

@twiecki

This comment has been minimized.

Copy link
Member

twiecki commented Jul 13, 2017

Yes, that's a good point.

@shkr

This comment has been minimized.

Copy link
Contributor Author

shkr commented Jul 13, 2017

Sure. I will add the EXPERIMENTAL_TAG

@junpenglao

This comment has been minimized.

Copy link
Member

junpenglao commented Jul 13, 2017

LGTM

@ferrine

This comment has been minimized.

Copy link
Member

ferrine commented Jul 14, 2017

Do we really want to use old minibatch interface here?

@junpenglao

This comment has been minimized.

Copy link
Member

junpenglao commented Jul 14, 2017

@ferrine yeah good call, we should use the new API.

Returns None it creates class variables which are required for the training fn
"""

def __init__(self,

This comment has been minimized.

@ferrine

ferrine Jul 14, 2017

Member

We should use new consistent minibatch api here.

Optional model for sampling step. Defaults to None (taken from context)
random_seed : int
The seed to initialize the Random Stream
minibatches : iterator

This comment has been minimized.

@ferrine

ferrine Jul 14, 2017

Member

The old one should be removed at all I think.

This comment has been minimized.

@shkr

shkr Jul 17, 2017

Author Contributor

@twiecki @ferrine

There are two issues I am running into which is preventing me from using pm.Minibatch

  1. The batch size parameter is not configurable. I have different sized datasets for training and testing. I cannot use the same variable and change the data for running ppc after training.

  2. The sampling speed slows down by 50% when using pm.Minibatch over using a simple iterator.

Here I have included a screenshot. Let me know if there are any easy fixes that I am missing, otherwise I feel I can post this problem in a issue separately.

screen shot 2017-07-17 at 9 54 04 am

This comment has been minimized.

@ferrine

ferrine Jul 17, 2017

Member

How do you use total_size in step parameters?

This comment has been minimized.

@ferrine

ferrine Jul 17, 2017

Member

I can't find a problem there

This comment has been minimized.

@ferrine

ferrine Jul 17, 2017

Member

I'll make benchmarks for minibatch there

This comment has been minimized.

@ferrine

This comment has been minimized.

@shkr

shkr Jul 17, 2017

Author Contributor

Cool. I have added the experimental warning. So this is good to go from my end.

This comment has been minimized.

@ferrine

ferrine Jul 17, 2017

Member

I see some boost when dropping indexing(=pm.Minibatch -> old interface) for ADVI gradients
nmc=1

160 µs ± 1.08 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
159 µs ± 7.08 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

nmc=100

13.5 ms ± 92.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
6.38 ms ± 46.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

To conclude theano.scan places significant overhead there. From your code I see that you use scan a lot.

This comment has been minimized.

@ferrine

ferrine Jul 17, 2017

Member

Code to reproduce

import theano.tensor as tt
NMC = 1
# first variant
ann_input = pm.Minibatch(X_train)
ann_output = pm.Minibatch(Y_train)
neural_network = construct_nn(ann_input, ann_output)
with neural_network:
    inference = pm.ADVI()
with theano.configparser.change_flags(compute_test_value='off'):
    obj1 = tt.grad(inference.objective(NMC), inference.approx.params)
    fn1 = theano.function([], obj1)

# second one
ann_input = tt.matrix()
ann_input.tag.test_value = X_train[:1]
ann_output = tt.vector()
ann_output.tag.test_value = Y_train[:1]
neural_network = construct_nn(ann_input, ann_output)
with neural_network:
    inference = pm.ADVI()
with theano.configparser.change_flags(compute_test_value='off'):
    def create_minibatch(data):
        rng = np.random.RandomState(0)

        while True:
            # Return random data samples of set size 100 each iteration
            ixs = rng.randint(len(data), size=50)
            yield data[ixs]
    minibatches = zip(
            create_minibatch(X_train), 
            create_minibatch(Y_train),
        )

    obj2 = tt.grad(inference.objective(NMC), inference.approx.params)

    fn2_ = theano.function([ann_input, ann_output], obj2)
    def fn2():
        return fn2_(*next(minibatches))
%timeit fn1()
%timeit fn2()

This comment has been minimized.

@ferrine

ferrine Jul 17, 2017

Member

I'll profile this tomorrow to get more insight

@shkr shkr force-pushed the shkr:stgbs branch from 6301721 to 922923d Jul 17, 2017

@junpenglao junpenglao merged commit 3dd559b into pymc-devs:master Jul 18, 2017

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.04%) to 86.829%
Details
@junpenglao

This comment has been minimized.

Copy link
Member

junpenglao commented Jul 18, 2017

Thanks for the big contribution @shkr !!!

@twiecki

This comment has been minimized.

Copy link
Member

twiecki commented Jul 18, 2017

This is amazing, thanks @shkr!

@ferrine

This comment has been minimized.

Copy link
Member

ferrine commented Jul 18, 2017

Thanks! You did great job

@shkr

This comment has been minimized.

Copy link
Contributor Author

shkr commented Jul 18, 2017

Thanks for being so helpful, with my first contribution to pymc3 :D

denadai2 added a commit to denadai2/pymc3 that referenced this pull request Aug 9, 2017

Stochastic Gradient Fisher Scoring (pymc-devs#1977)
* BaseStochasticGradient and SGFS

SGFS implements algorithm 1 from http://people.ee.duke.edu/~lcarin/782.pdf
The BaseStochasticGradient is extensible for other
mcmc recipies listed here https://arxiv.org/pdf/1506.04696.pdf

* add pm.Minibatch example to a nb

* remove use of lambda scope

* theano non-sequences passed to lambda fn

* modify documentation

* add unit test for sgfs

* fix test

* vanilla test method

* experimental warning added

@shkr shkr deleted the shkr:stgbs branch May 13, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment