New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stochastic Gradient Fisher Scoring #1977
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a few nitpicks to this before realizing it is a WIP! Looks like a great start - happy to keep looking when it is done.
If you want to hook this up to existing test machinery, check out TestStepMethods.test_step_continuous
in tests/test_step.py
. It does some sanity checking of recovering the mean and variance in a model. You can then run just that test with py.test -xv tests/test_step.py::TestStepMethods::test_step_continuous
.
""" | ||
default_blocked = False | ||
|
||
def __init__(self, vars=None, batch_size=None, n_iter=None, total_size=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to add minibatches
to signature here
------- | ||
vars : list | ||
List of variables for sampler | ||
total_size : int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you rearrange the docstring to match the order of the arguments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I will fix that.
shared = pm.make_shared_replacements(vars, model) | ||
super(BaseStochasticGradient, self).__init__(vars, shared) | ||
|
||
self.batch_size = batch_size or 128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you put these defaults in the signature instead?
def __init__(self, vars=None, batch_size=128, n_iter=100, ...):
self.batch_size = batch_size
self.n_iter = n_iter
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
|
||
|
||
def _initialize_values(self): | ||
raise ValueError('NotImplemented') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NotImplementedError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using it now
return self.model.fn(tt.add(*terms)) | ||
|
||
|
||
def dlogp_elemwise(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is defined twice
a5bac12
to
5f92399
Compare
I'd suggest a shorter, abbreviated filename, such as |
pymc3/step_methods/hmc/sgmcmc.py
Outdated
self.step_size = step_size | ||
self.burn_in = burn_in | ||
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these shouldn't turn up, best to install a plugin in your editor that automatically deletes these.
pymc3/step_methods/hmc/sgmcmc.py
Outdated
@@ -230,15 +222,15 @@ def astep(self, ps0): | |||
g = dlogpt(q_params) | |||
|
|||
# 5. Calculate mean dlogp | |||
avg_g = (1. / n) * (np.sum(g, axis=0)) | |||
avg_g = (1./n)*(np.sum(g, axis=0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these were correct before, see pep8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. I will use the yapf - pep8 tool to do the indentation.
pymc3/step_methods/hmc/sgmcmc.py
Outdated
return samples | ||
|
||
|
||
class StochasticGradientFisherScoring(BaseStochasticGradient): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better or worse, we usually just abbreviate -> SGFS
.
pymc3/step_methods/hmc/sgmcmc.py
Outdated
@staticmethod | ||
def competence(var): | ||
if var.dtype in continuous_types: | ||
return Competence.IDEAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should down-grade from IDEAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I recommend it to Compatible
? What difference will it cause at runtime ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, otherwise it might get automatically chosen for continuous RVs.
5c7210a
to
162c50b
Compare
pymc3/tests/test_sg.py
Outdated
with pm.Model() as model: | ||
a = pm.Normal('a', mu=0, sd=1.0) | ||
b = pm.Normal('b', mu=1.0, sd=1.0) | ||
c = pm.Normal('c', mu=a + b, sd=1.0, observed=cT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered the following interface very flexible. Let user feel free of what mini batch interface to use. There are three options.
- Set
total_size=N
for observed and local vars - Set minibatch generator via
pm.generator
. It is very easy, requires minimum code. But sometimes corner case happen. See theano.clone bug Theano/Theano#5809 - Use shared variable in advance for data storing and define a callback that changes storage. Requires
callbacks
kwarg. It is also useful for getting custom statistics from the process, see this notebook for example. - use
more_replacements
kwarg. Usually it replaces input withpm.generator
to avoid theano.clone bug Theano/Theano#5809
I hope in future we'll have consistent interface using this paradigm for scalability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ferrine Any changes you would suggest, to the way I have used generator in the test file now ?
Also would profit from an example NB with a more complex / large model, maybe this one: http://twiecki.github.io/blog/2016/07/05/bayesian-deep-learning/ (just the last one with convolutions). |
d55f6ac
to
3adec9c
Compare
pymc3/step_methods/hmc/sgmcmc.py
Outdated
g = dlogpt | ||
|
||
#f = theano.function(inputs=input_tensors, outputs=g) | ||
#vars_value = self.get_vars_value() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we remove this?
@twiecki I will add a notebook in the PR, using the problem discussed in the blogpost http://twiecki.github.io/blog/2016/07/05/bayesian-deep-learning/ (just the last one with convolutions). Thanks for the recommendation. |
44fcde7
to
480a938
Compare
@twiecki @ColCarroll @ferrine I have moved the sgmcmc file. I had to spend time with the pymc3 library to understand how everything wired together. I think the code that I have tries to use existing apis and design to the best of my judgement. There is a bug in This is my first PR, so the test that I wrote might be a little sort of expectations too. Please advice if I should change that. |
Okay. I saw @ColCarroll earlier comment on testing
I will add that |
pymc3/tests/test_sg.py
Outdated
obs = pm.Normal('obs', mu=mu, sd=sd, observed=batch_generator) | ||
trace = pm.sample( | ||
draws=int(total_size / batch_size), | ||
step=SGFS(vars=model.vars, model=model, total_size=total_size)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does model
need to be passed if it is inside the context manager?
I have a good finding |
pymc3/step_methods/sgmcmc.py
Outdated
from ...model import modelcontext, inputvars | ||
from .arraystep import Competence, ArrayStepShared | ||
from ..vartypes import continuous_types | ||
from ..model import modelcontext, inputvars | ||
import theano.tensor as tt | ||
from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have pm.theanof.tt_rng()
for package level random generator
pymc3/step_methods/sgmcmc.py
Outdated
tt_logp_elemwise = tt.as_tensor_variable( | ||
[v.logp_elemwiset for v in model.observed_RVs]).sum(axis=0) | ||
terms = [ | ||
v.logp_elemwiset for v in model.basic_RVs if var in inputs([v.logpt]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this check for?
pymc3/step_methods/sgmcmc.py
Outdated
terms = [ | ||
v.logp_elemwiset for v in model.basic_RVs if var in inputs([v.logpt]) | ||
] | ||
elemwise_logp = tt.add(*terms) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will fail if observed vars have different shape
In which way? |
The data is different in both runs, try setting the seed. |
@twiecki Oops, that is my misunderstanding. The only requirement on the B matrix is positive definite property. There is no clear reason why B's proportionality to I_t should outperform. The large variance required in the direction of low curvature when exploring the probability space is provided by the preconditioned matrix 'C' which is defined as: The constant matrix B provides a constant support to mixing nature of the algorithm However, its recommended in the paper and it does simplify the equation for Variance. That is my understanding. Reference - https://www.slideshare.net/hustwj/austerity-in-mcmc-land-cutting-the-computational-budget/32 |
@twiecki The unit test is included in the PR now. |
@shkr Now with 3.1 out the door I want to get back to this. Is the PR ready from your end? |
Yup |
There is a timeout issue with the travis build, because of which it is failing |
I restarted those builds, the timeout doesn't seem to be related to the PR. |
Thanks @aseyboldt travis build succeeded ! |
Looks nice and I am excited to try! Just a small nitpick: should we put an |
Yes, that's a good point. |
Sure. I will add the EXPERIMENTAL_TAG |
LGTM |
Do we really want to use old minibatch interface here? |
@ferrine yeah good call, we should use the new API. |
Returns None it creates class variables which are required for the training fn | ||
""" | ||
|
||
def __init__(self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use new consistent minibatch api here.
Optional model for sampling step. Defaults to None (taken from context) | ||
random_seed : int | ||
The seed to initialize the Random Stream | ||
minibatches : iterator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old one should be removed at all I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two issues I am running into which is preventing me from using pm.Minibatch
-
The batch size parameter is not configurable. I have different sized datasets for training and testing. I cannot use the same variable and change the data for running ppc after training.
-
The sampling speed slows down by 50% when using pm.Minibatch over using a simple iterator.
Here I have included a screenshot. Let me know if there are any easy fixes that I am missing, otherwise I feel I can post this problem in a issue separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you use total_size in step parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find a problem there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make benchmarks for minibatch there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. I have added the experimental warning. So this is good to go from my end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see some boost when dropping indexing(=pm.Minibatch -> old interface) for ADVI gradients
nmc=1
160 µs ± 1.08 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
159 µs ± 7.08 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
nmc=100
13.5 ms ± 92.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
6.38 ms ± 46.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
To conclude theano.scan places significant overhead there. From your code I see that you use scan a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code to reproduce
import theano.tensor as tt
NMC = 1
# first variant
ann_input = pm.Minibatch(X_train)
ann_output = pm.Minibatch(Y_train)
neural_network = construct_nn(ann_input, ann_output)
with neural_network:
inference = pm.ADVI()
with theano.configparser.change_flags(compute_test_value='off'):
obj1 = tt.grad(inference.objective(NMC), inference.approx.params)
fn1 = theano.function([], obj1)
# second one
ann_input = tt.matrix()
ann_input.tag.test_value = X_train[:1]
ann_output = tt.vector()
ann_output.tag.test_value = Y_train[:1]
neural_network = construct_nn(ann_input, ann_output)
with neural_network:
inference = pm.ADVI()
with theano.configparser.change_flags(compute_test_value='off'):
def create_minibatch(data):
rng = np.random.RandomState(0)
while True:
# Return random data samples of set size 100 each iteration
ixs = rng.randint(len(data), size=50)
yield data[ixs]
minibatches = zip(
create_minibatch(X_train),
create_minibatch(Y_train),
)
obj2 = tt.grad(inference.objective(NMC), inference.approx.params)
fn2_ = theano.function([ann_input, ann_output], obj2)
def fn2():
return fn2_(*next(minibatches))
%timeit fn1()
%timeit fn2()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll profile this tomorrow to get more insight
Thanks for the big contribution @shkr !!! |
This is amazing, thanks @shkr! |
Thanks! You did great job |
Thanks for being so helpful, with my first contribution to pymc3 :D |
* BaseStochasticGradient and SGFS SGFS implements algorithm 1 from http://people.ee.duke.edu/~lcarin/782.pdf The BaseStochasticGradient is extensible for other mcmc recipies listed here https://arxiv.org/pdf/1506.04696.pdf * add pm.Minibatch example to a nb * remove use of lambda scope * theano non-sequences passed to lambda fn * modify documentation * add unit test for sgfs * fix test * vanilla test method * experimental warning added
Issue #1958