Skip to content

Commit

Permalink
Merge 0bf373b into 7b0cf5d
Browse files Browse the repository at this point in the history
  • Loading branch information
aloctavodia committed Sep 25, 2019
2 parents 7b0cf5d + 0bf373b commit 2a30204
Show file tree
Hide file tree
Showing 6 changed files with 591 additions and 551 deletions.
5 changes: 3 additions & 2 deletions RELEASE-NOTES.md
Expand Up @@ -18,9 +18,10 @@
- Fixed a bug in `Categorical.logp`. In the case of multidimensional `p`'s, the indexing was done wrong leading to incorrectly shaped tensors that consumed `O(n**2)` memory instead of `O(n)`. This fixes issue [#3535](https://github.com/pymc-devs/pymc3/issues/3535)
- Fixed a defect in `OrderedLogistic.__init__` that unnecessarily increased the dimensionality of the underlying `p`. Related to issue issue [#3535](https://github.com/pymc-devs/pymc3/issues/3535) but was not the true cause of it.
- SMC: stabilize covariance matrix [3573](https://github.com/pymc-devs/pymc3/pull/3573)
- SMC is no longer a step method of `pm.sample` now it should be called using `pm.sample_smc` [3579](https://github.com/pymc-devs/pymc3/pull/3579)
- SMC: improve computation of the proposal scaling factor [3594](https://github.com/pymc-devs/pymc3/pull/3594)
- SMC: is no longer a step method of `pm.sample` now it should be called using `pm.sample_smc` [3579](https://github.com/pymc-devs/pymc3/pull/3579)
- SMC: improve computation of the proposal scaling factor [3594](https://github.com/pymc-devs/pymc3/pull/3594) and [3625](https://github.com/pymc-devs/pymc3/pull/3625)
- SMC: reduce number of logp evaluations [3600](https://github.com/pymc-devs/pymc3/pull/3600)
- SMC: remove `scaling` and `tune_scaling` arguments as is a better idea to always allow SMC to automatically compute the scaling factor [3625](https://github.com/pymc-devs/pymc3/pull/3625)
- Now uses `multiprocessong` rather than `psutil` to count CPUs, which results in reliable core counts on Chromebooks.
- `sample_posterior_predictive` now preallocates the memory required for its output to improve memory usage. Addresses problems raised in this [discourse thread](https://discourse.pymc.io/t/memory-error-with-posterior-predictive-sample/2891/4).
- Fixed a bug in `Categorical.logp`. In the case of multidimensional `p`'s, the indexing was done wrong leading to incorrectly shaped tensors that consumed `O(n**2)` memory instead of `O(n)`. This fixes issue [#3535](https://github.com/pymc-devs/pymc3/issues/3535)
Expand Down
2 changes: 1 addition & 1 deletion pymc3/smc/__init__.py
@@ -1 +1 @@
from .smc import sample_smc
from .sample_smc import sample_smc
161 changes: 161 additions & 0 deletions pymc3/smc/sample_smc.py
@@ -0,0 +1,161 @@
from .smc import SMC
import logging


def sample_smc(
draws=1000,
kernel="metropolis",
n_steps=25,
parallel=False,
start=None,
cores=None,
tune_steps=True,
p_acc_rate=0.99,
threshold=0.5,
epsilon=1.0,
dist_func="absolute_error",
sum_stat=False,
progressbar=False,
model=None,
random_seed=-1,
psis=True,
):
"""
Sequential Monte Carlo based sampling
Parameters
----------
draws : int
The number of samples to draw from the posterior (i.e. last stage). And also the number of
independent chains. Defaults to 1000.
kernel : str
Kernel method for the SMC sampler. Available option are ``metropolis`` (default) and `ABC`.
Use `ABC` for likelihood free inference togheter with a ``pm.Simulator``.
n_steps : int
The number of steps of each Markov Chain. If ``tune_steps == True`` ``n_steps`` will be used
for the first stage and for the others it will be determined automatically based on the
acceptance rate and `p_acc_rate`, the max number of steps is ``n_steps``.
parallel : bool
Distribute computations across cores if the number of cores is larger than 1.
Defaults to False.
start : dict, or array of dict
Starting point in parameter space. It should be a list of dict with length `chains`.
When None (default) the starting point is sampled from the prior distribution.
cores : int
The number of chains to run in parallel. If ``None`` (default), it will be automatically
set to the number of CPUs in the system.
tune_steps : bool
Whether to compute the number of steps automatically or not. Defaults to True
p_acc_rate : float
Used to compute ``n_steps`` when ``tune_steps == True``. The higher the value of
``p_acc_rate`` the higher the number of steps computed automatically. Defaults to 0.99.
It should be between 0 and 1.
threshold : float
Determines the change of beta from stage to stage, i.e.indirectly the number of stages,
the higher the value of `threshold` the higher the number of stages. Defaults to 0.5.
It should be between 0 and 1.
epsilon : float
Standard deviation of the gaussian pseudo likelihood. Only works with `kernel = ABC`
dist_func : str
Distance function. Available options are ``absolute_error`` (default) and
``sum_of_squared_distance``. Only works with ``kernel = ABC``
sum_stat : bool
Whether to use or not a summary statistics. Defaults to False. Only works with
``kernel = ABC``
progressbar : bool
Flag for displaying a progress bar. Defaults to False.
model : Model (optional if in ``with`` context)).
random_seed : int
random seed
Notes
-----
SMC works by moving through successive stages. At each stage the inverse temperature
:math:`\beta` is increased a little bit (starting from 0 up to 1). When :math:`\beta` = 0
we have the prior distribution and when :math:`\beta` =1 we have the posterior distribution.
So in more general terms we are always computing samples from a tempered posterior that we can
write as:
.. math::
p(\theta \mid y)_{\beta} = p(y \mid \theta)^{\beta} p(\theta)
A summary of the algorithm is:
1. Initialize :math:`\beta` at zero and stage at zero.
2. Generate N samples :math:`S_{\beta}` from the prior (because when :math `\beta = 0` the
tempered posterior is the prior).
3. Increase :math:`\beta` in order to make the effective sample size equals some predefined
value (we use :math:`Nt`, where :math:`t` is 0.5 by default).
4. Compute a set of N importance weights W. The weights are computed as the ratio of the
likelihoods of a sample at stage i+1 and stage i.
5. Obtain :math:`S_{w}` by re-sampling according to W.
6. Use W to compute the covariance for the proposal distribution.
7. For stages other than 0 use the acceptance rate from the previous stage to estimate the
scaling of the proposal distribution and `n_steps`.
8. Run N Metropolis chains (each one of length `n_steps`), starting each one from a different
sample in :math:`S_{w}`.
9. Repeat from step 3 until :math:`\beta \ge 1`.
10. The final result is a collection of N samples from the posterior.
References
----------
.. [Minson2013] Minson, S. E. and Simons, M. and Beck, J. L., (2013),
Bayesian inversion for finite fault earthquake source models I- Theory and algorithm.
Geophysical Journal International, 2013, 194(3), pp.1701-1726,
`link <https://gji.oxfordjournals.org/content/194/3/1701.full>`__
.. [Ching2007] Ching, J. and Chen, Y. (2007).
Transitional Markov Chain Monte Carlo Method for Bayesian Model Updating, Model Class
Selection, and Model Averaging. J. Eng. Mech., 10.1061/(ASCE)0733-9399(2007)133:7(816),
816-832. `link <http://ascelibrary.org/doi/abs/10.1061/%28ASCE%290733-9399
%282007%29133:7%28816%29>`__
"""

smc = SMC(
draws=draws,
kernel=kernel,
n_steps=n_steps,
parallel=parallel,
start=start,
cores=cores,
tune_steps=tune_steps,
p_acc_rate=p_acc_rate,
threshold=threshold,
epsilon=epsilon,
dist_func=dist_func,
sum_stat=sum_stat,
progressbar=progressbar,
model=model,
random_seed=random_seed,
)

_log = logging.getLogger("pymc3")
_log.info("Sample initial stage: ...")
stage = 0
smc.initialize_population()
smc.setup_kernel()
smc.initialize_logp()

while smc.beta < 1:
smc.update_weights_beta(psis=psis)
_log.info(
"Stage: {:3d} Beta: {:.3f} Steps: {:3d} Acce: {:.3f}".format(
stage, smc.beta, smc.n_steps, smc.acc_rate
)
)
smc.resample()
smc.update_proposal()
if stage > 0:
smc.tune()
smc.mutate()
stage += 1

if smc.parallel and smc.cores > 1:
smc.pool.close()
smc.pool.join()

trace = smc.posterior_to_trace()

return trace

0 comments on commit 2a30204

Please sign in to comment.