Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stochastic Gradient Fisher Scoring #1977

Merged
merged 11 commits into from Jul 18, 2017
Merged

Stochastic Gradient Fisher Scoring #1977

merged 11 commits into from Jul 18, 2017

Conversation

shkr
Copy link
Contributor

@shkr shkr commented Mar 31, 2017

Issue #1958

@shkr shkr changed the title [WIP] Stochastic Gradient Fisher Scoring #1958 #1958 [WIP] Stochastic Gradient Fisher Scoring Mar 31, 2017
@shkr shkr changed the title #1958 [WIP] Stochastic Gradient Fisher Scoring [WIP] Stochastic Gradient Fisher Scoring Mar 31, 2017
Copy link
Member

@ColCarroll ColCarroll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a few nitpicks to this before realizing it is a WIP! Looks like a great start - happy to keep looking when it is done.

If you want to hook this up to existing test machinery, check out TestStepMethods.test_step_continuous in tests/test_step.py. It does some sanity checking of recovering the mean and variance in a model. You can then run just that test with py.test -xv tests/test_step.py::TestStepMethods::test_step_continuous.

"""
default_blocked = False

def __init__(self, vars=None, batch_size=None, n_iter=None, total_size=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to add minibatches to signature here

-------
vars : list
List of variables for sampler
total_size : int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you rearrange the docstring to match the order of the arguments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I will fix that.

shared = pm.make_shared_replacements(vars, model)
super(BaseStochasticGradient, self).__init__(vars, shared)

self.batch_size = batch_size or 128
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put these defaults in the signature instead?

def __init__(self, vars=None, batch_size=128, n_iter=100, ...):
    self.batch_size = batch_size
    self.n_iter = n_iter
    ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok



def _initialize_values(self):
raise ValueError('NotImplemented')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NotImplementedError

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using it now

return self.model.fn(tt.add(*terms))


def dlogp_elemwise(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is defined twice

@shkr shkr force-pushed the stgbs branch 3 times, most recently from a5bac12 to 5f92399 Compare April 3, 2017 04:57
@fonnesbeck
Copy link
Member

I'd suggest a shorter, abbreviated filename, such as sgfs.py, or fisher_scoring.py would work as well.

self.step_size = step_size
self.burn_in = burn_in

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these shouldn't turn up, best to install a plugin in your editor that automatically deletes these.

@@ -230,15 +222,15 @@ def astep(self, ps0):
g = dlogpt(q_params)

# 5. Calculate mean dlogp
avg_g = (1. / n) * (np.sum(g, axis=0))
avg_g = (1./n)*(np.sum(g, axis=0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these were correct before, see pep8.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I will use the yapf - pep8 tool to do the indentation.

return samples


class StochasticGradientFisherScoring(BaseStochasticGradient):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For better or worse, we usually just abbreviate -> SGFS.

@staticmethod
def competence(var):
if var.dtype in continuous_types:
return Competence.IDEAL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should down-grade from IDEAL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I recommend it to Compatible ? What difference will it cause at runtime ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, otherwise it might get automatically chosen for continuous RVs.

@shkr shkr changed the title [WIP] Stochastic Gradient Fisher Scoring WIP: Stochastic Gradient Fisher Scoring Apr 10, 2017
@shkr shkr force-pushed the stgbs branch 2 times, most recently from 5c7210a to 162c50b Compare April 10, 2017 00:23
with pm.Model() as model:
a = pm.Normal('a', mu=0, sd=1.0)
b = pm.Normal('b', mu=1.0, sd=1.0)
c = pm.Normal('c', mu=a + b, sd=1.0, observed=cT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely port this to @ferrine's new pm.generator which would obsolete most kwargs below. @ferrine could you advise on how to best use it here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered the following interface very flexible. Let user feel free of what mini batch interface to use. There are three options.

  1. Set total_size=N for observed and local vars
  2. Set minibatch generator via pm.generator. It is very easy, requires minimum code. But sometimes corner case happen. See theano.clone bug Theano/Theano#5809
  3. Use shared variable in advance for data storing and define a callback that changes storage. Requires callbacks kwarg. It is also useful for getting custom statistics from the process, see this notebook for example.
  4. use more_replacements kwarg. Usually it replaces input with pm.generator to avoid theano.clone bug Theano/Theano#5809

I hope in future we'll have consistent interface using this paradigm for scalability

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ferrine Any changes you would suggest, to the way I have used generator in the test file now ?

@twiecki
Copy link
Member

twiecki commented Apr 10, 2017

Also would profit from an example NB with a more complex / large model, maybe this one: http://twiecki.github.io/blog/2016/07/05/bayesian-deep-learning/ (just the last one with convolutions).

@shkr shkr changed the title WIP: Stochastic Gradient Fisher Scoring Stochastic Gradient Fisher Scoring Apr 10, 2017
@shkr shkr force-pushed the stgbs branch 2 times, most recently from d55f6ac to 3adec9c Compare April 10, 2017 17:17
g = dlogpt

#f = theano.function(inputs=input_tensors, outputs=g)
#vars_value = self.get_vars_value()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove this?

@shkr
Copy link
Contributor Author

shkr commented Apr 10, 2017

@twiecki I will add a notebook in the PR, using the problem discussed in the blogpost http://twiecki.github.io/blog/2016/07/05/bayesian-deep-learning/ (just the last one with convolutions). Thanks for the recommendation.

@shkr shkr force-pushed the stgbs branch 2 times, most recently from 44fcde7 to 480a938 Compare April 12, 2017 14:37
@shkr
Copy link
Contributor Author

shkr commented Apr 13, 2017

@twiecki @ColCarroll @ferrine I have moved the sgmcmc file. I had to spend time with the pymc3 library to understand how everything wired together. I think the code that I have tries to use existing apis and design to the best of my judgement. There is a bug in SGFS.astep that I am trying to fix, which is making results wonky. I think by adding the notebook that @twiecki suggested I will be able to diagnose that issue.

This is my first PR, so the test that I wrote might be a little sort of expectations too. Please advice if I should change that.

@shkr
Copy link
Contributor Author

shkr commented Apr 13, 2017

Okay. I saw @ColCarroll earlier comment on testing

If you want to hook this up to existing test machinery, check out TestStepMethods.test_step_continuous in tests/test_step.py. It does some sanity checking of recovering the mean and variance in a model. You can then run just that test with py.test -xv tests/test_step.py::TestStepMethods::test_step_continuous.

I will add that

obs = pm.Normal('obs', mu=mu, sd=sd, observed=batch_generator)
trace = pm.sample(
draws=int(total_size / batch_size),
step=SGFS(vars=model.vars, model=model, total_size=total_size))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does model need to be passed if it is inside the context manager?

@ferrine
Copy link
Member

ferrine commented Apr 15, 2017

I have a good finding
https://papers.nips.cc/paper/6293-variance-reduction-in-stochastic-gradient-langevin-dynamics.pdf
Maybe it can serve as an idea for further extensions

from ...model import modelcontext, inputvars
from .arraystep import Competence, ArrayStepShared
from ..vartypes import continuous_types
from ..model import modelcontext, inputvars
import theano.tensor as tt
from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have pm.theanof.tt_rng() for package level random generator

tt_logp_elemwise = tt.as_tensor_variable(
[v.logp_elemwiset for v in model.observed_RVs]).sum(axis=0)
terms = [
v.logp_elemwiset for v in model.basic_RVs if var in inputs([v.logpt])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this check for?

terms = [
v.logp_elemwiset for v in model.basic_RVs if var in inputs([v.logpt])
]
elemwise_logp = tt.add(*terms)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will fail if observed vars have different shape

@twiecki
Copy link
Member

twiecki commented Jun 27, 2017

The hyperplane is markedly difference when I use identity as the pre-conditioning matrix.

In which way?

@shkr
Copy link
Contributor Author

shkr commented Jun 27, 2017

identity_matrix

Preconditioning matrix = Identity

27558470-3041dde6-5a8b-11e7-9f5a-04d483cfba63

Preconditioning matrix is proportional to Fisher Information

@twiecki
Copy link
Member

twiecki commented Jun 27, 2017

The data is different in both runs, try setting the seed.

@shkr
Copy link
Contributor Author

shkr commented Jun 27, 2017

Oops. My bad. Here is the difference after fixing the seed

alt text

@shkr
Copy link
Contributor Author

shkr commented Jun 28, 2017

@twiecki Oops, that is my misunderstanding. The only requirement on the B matrix is positive definite property. There is no clear reason why B's proportionality to I_t should outperform.

The large variance required in the direction of low curvature when exploring the probability space is provided by the preconditioned matrix 'C' which is defined as:

screen shot 2017-06-28 at 4 20 15 pm

The constant matrix B provides a constant support to mixing nature of the algorithm

screen shot 2017-06-28 at 1 05 12 pm

However, its recommended in the paper and it does simplify the equation for Variance. That is my understanding.

Reference - https://www.slideshare.net/hustwj/austerity-in-mcmc-land-cutting-the-computational-budget/32

@shkr
Copy link
Contributor Author

shkr commented Jun 29, 2017

@twiecki The unit test is included in the PR now.

@twiecki
Copy link
Member

twiecki commented Jul 12, 2017

@shkr Now with 3.1 out the door I want to get back to this. Is the PR ready from your end?

@shkr
Copy link
Contributor Author

shkr commented Jul 12, 2017

Yup

@shkr
Copy link
Contributor Author

shkr commented Jul 12, 2017

There is a timeout issue with the travis build, because of which it is failing

@aseyboldt
Copy link
Member

I restarted those builds, the timeout doesn't seem to be related to the PR.

@shkr
Copy link
Contributor Author

shkr commented Jul 12, 2017

Thanks @aseyboldt travis build succeeded !

@junpenglao
Copy link
Member

Looks nice and I am excited to try! Just a small nitpick: should we put an EXPERIMENTAL_WARNING similar to the one in SMC?

@twiecki
Copy link
Member

twiecki commented Jul 13, 2017

Yes, that's a good point.

@shkr
Copy link
Contributor Author

shkr commented Jul 13, 2017

Sure. I will add the EXPERIMENTAL_TAG

@junpenglao
Copy link
Member

LGTM

@ferrine
Copy link
Member

ferrine commented Jul 14, 2017

Do we really want to use old minibatch interface here?

@junpenglao
Copy link
Member

@ferrine yeah good call, we should use the new API.

Returns None it creates class variables which are required for the training fn
"""

def __init__(self,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use new consistent minibatch api here.

Optional model for sampling step. Defaults to None (taken from context)
random_seed : int
The seed to initialize the Random Stream
minibatches : iterator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old one should be removed at all I think.

Copy link
Contributor Author

@shkr shkr Jul 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@twiecki @ferrine

There are two issues I am running into which is preventing me from using pm.Minibatch

  1. The batch size parameter is not configurable. I have different sized datasets for training and testing. I cannot use the same variable and change the data for running ppc after training.

  2. The sampling speed slows down by 50% when using pm.Minibatch over using a simple iterator.

Here I have included a screenshot. Let me know if there are any easy fixes that I am missing, otherwise I feel I can post this problem in a issue separately.

screen shot 2017-07-17 at 9 54 04 am

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you use total_size in step parameters?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find a problem there

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make benchmarks for minibatch there

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. I have added the experimental warning. So this is good to go from my end.

Copy link
Member

@ferrine ferrine Jul 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see some boost when dropping indexing(=pm.Minibatch -> old interface) for ADVI gradients
nmc=1

160 µs ± 1.08 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
159 µs ± 7.08 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

nmc=100

13.5 ms ± 92.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
6.38 ms ± 46.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

To conclude theano.scan places significant overhead there. From your code I see that you use scan a lot.

Copy link
Member

@ferrine ferrine Jul 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code to reproduce

import theano.tensor as tt
NMC = 1
# first variant
ann_input = pm.Minibatch(X_train)
ann_output = pm.Minibatch(Y_train)
neural_network = construct_nn(ann_input, ann_output)
with neural_network:
    inference = pm.ADVI()
with theano.configparser.change_flags(compute_test_value='off'):
    obj1 = tt.grad(inference.objective(NMC), inference.approx.params)
    fn1 = theano.function([], obj1)

# second one
ann_input = tt.matrix()
ann_input.tag.test_value = X_train[:1]
ann_output = tt.vector()
ann_output.tag.test_value = Y_train[:1]
neural_network = construct_nn(ann_input, ann_output)
with neural_network:
    inference = pm.ADVI()
with theano.configparser.change_flags(compute_test_value='off'):
    def create_minibatch(data):
        rng = np.random.RandomState(0)

        while True:
            # Return random data samples of set size 100 each iteration
            ixs = rng.randint(len(data), size=50)
            yield data[ixs]
    minibatches = zip(
            create_minibatch(X_train), 
            create_minibatch(Y_train),
        )

    obj2 = tt.grad(inference.objective(NMC), inference.approx.params)

    fn2_ = theano.function([ann_input, ann_output], obj2)
    def fn2():
        return fn2_(*next(minibatches))
%timeit fn1()
%timeit fn2()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll profile this tomorrow to get more insight

@junpenglao junpenglao merged commit 3dd559b into pymc-devs:master Jul 18, 2017
@junpenglao
Copy link
Member

Thanks for the big contribution @shkr !!!

@twiecki
Copy link
Member

twiecki commented Jul 18, 2017

This is amazing, thanks @shkr!

@ferrine
Copy link
Member

ferrine commented Jul 18, 2017

Thanks! You did great job

@shkr
Copy link
Contributor Author

shkr commented Jul 18, 2017

Thanks for being so helpful, with my first contribution to pymc3 :D

denadai2 pushed a commit to denadai2/pymc3 that referenced this pull request Aug 9, 2017
* BaseStochasticGradient and SGFS

SGFS implements algorithm 1 from http://people.ee.duke.edu/~lcarin/782.pdf
The BaseStochasticGradient is extensible for other
mcmc recipies listed here https://arxiv.org/pdf/1506.04696.pdf

* add pm.Minibatch example to a nb

* remove use of lambda scope

* theano non-sequences passed to lambda fn

* modify documentation

* add unit test for sgfs

* fix test

* vanilla test method

* experimental warning added
@shkr shkr deleted the stgbs branch May 13, 2018 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants