Skip to content

Commit

Permalink
Some updates in Docs (#259)
Browse files Browse the repository at this point in the history
* Clarifications in tutorial notebook

* Explain vectorization

* Address comments to PR
  • Loading branch information
vuolleko committed Apr 5, 2018
1 parent 8c61b43 commit a35290a
Show file tree
Hide file tree
Showing 2 changed files with 60 additions and 33 deletions.
27 changes: 27 additions & 0 deletions docs/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,30 @@ produces outputs from the interval (1, 3).*
their definitions. There the uniform distribution uses the location/scale definition, so
the first argument defines the starting point of the interval and the second its length.

.. _vectorization:

*Q: What is vectorization in ELFI?*

**A**: Looping is relatively inefficient in Python, and so whenever possible, you should *vectorize*
your operations_. This means that repetitive computations are performed on a batch of data using
precompiled libraries (typically NumPy_), which effectively runs the loops in faster, compiled C-code.
Due to the potentially huge saving in CPU-time, operations including user-code are by default assumed to
be vectorized in ELFI. This must be accounted for.

.. _operations: good-to-know.html#operations
.. _NumPy: http://www.numpy.org/

For example, imagine you have a simulator that depends on a scalar parameter and produces a vector of 5
values. When this is used in ELFI with ``batch_size`` set to 1000, ELFI draws 1000 values from the
parameter's prior distribution and gives this *vector* to the simulator. Ideally, the simulator should
efficiently process all 1000 parameter cases in one go and output a numpy array of shape (1000, 5). In
ELFI, the length (i.e. the first dimension) of all output arrays should equal ``batch_size`` **even if
it is 1**. Note that because of this the evaluation of summary statistics, distances etc. should
bypass the first dimension (e.g. with NumPy functions using ``axis=1`` in this case).

See ``elfi.examples`` for tips on how to vectorize simulators and work with ELFI. In case you are
unable to vectorize your simulator, you can use `elfi.tools.vectorize`_ to mimic
vectorized behaviour, though without the performance benefits.

.. _`elfi.tools.vectorize`: api.html#elfi.tools.vectorize

66 changes: 33 additions & 33 deletions docs/usage/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,9 @@ settings.
import numpy as np
import scipy.stats
import matplotlib
import matplotlib.pyplot as plt
import logging
logging.basicConfig(level=logging.INFO)
logging.basicConfig(level=logging.INFO) # sometimes this is required to enable logging inside Jupyter
%matplotlib inline
%precision 2
Expand Down Expand Up @@ -251,7 +250,7 @@ a DAG.



.. note:: You will need the Graphviz_ software as well as the graphviz `Python package`_ (https://pypi.python.org/pypi/graphviz) for drawing this. The software is already installed in many unix-like OS.
.. note:: You will need the Graphviz_ software as well as the graphviz `Python package`_ (https://pypi.python.org/pypi/graphviz) for drawing this.

.. _Graphviz: http://www.graphviz.org
.. _`Python package`: https://pypi.python.org/pypi/graphviz
Expand Down Expand Up @@ -396,8 +395,8 @@ time is spent in drawing.

.. parsed-literal::
CPU times: user 2.28 s, sys: 165 ms, total: 2.45 s
Wall time: 2.45 s
CPU times: user 1.6 s, sys: 166 ms, total: 1.77 s
Wall time: 1.76 s
The ``sample`` method returns a ``Sample`` object, which contains
Expand Down Expand Up @@ -452,8 +451,8 @@ as long as it takes to generate the requested number of samples.
.. parsed-literal::
CPU times: user 222 ms, sys: 40.3 ms, total: 263 ms
Wall time: 261 ms
CPU times: user 198 ms, sys: 35.5 ms, total: 233 ms
Wall time: 231 ms
Method: Rejection
Number of samples: 1000
Number of simulations: 40000
Expand Down Expand Up @@ -497,9 +496,9 @@ been reached or a maximum of one second of time has been used.
Method: Rejection
Number of samples: 1000
Number of simulations: 190000
Threshold: 0.0855
Sample means: t1: 0.561, t2: 0.218
Number of simulations: 180000
Threshold: 0.088
Sample means: t1: 0.561, t2: 0.221
Expand Down Expand Up @@ -547,8 +546,8 @@ in our model:
.. parsed-literal::
CPU times: user 5.26 s, sys: 37.1 ms, total: 5.3 s
Wall time: 5.3 s
CPU times: user 5.01 s, sys: 60.9 ms, total: 5.07 s
Wall time: 5.09 s
Expand All @@ -558,8 +557,8 @@ in our model:
Method: Rejection
Number of samples: 1000
Number of simulations: 1000000
Threshold: 0.036
Sample means: t1: 0.561, t2: 0.227
Threshold: 0.0363
Sample means: t1: 0.554, t2: 0.216
Expand All @@ -580,8 +579,8 @@ anything. Let's do that.
.. parsed-literal::
CPU times: user 636 ms, sys: 1.35 ms, total: 638 ms
Wall time: 638 ms
CPU times: user 423 ms, sys: 3.35 ms, total: 426 ms
Wall time: 429 ms
Expand All @@ -591,8 +590,8 @@ anything. Let's do that.
Method: Rejection
Number of samples: 1000
Number of simulations: 1000000
Threshold: 0.0452
Sample means: t1: 0.56, t2: 0.228
Threshold: 0.0457
Sample means: t1: 0.55, t2: 0.216
Expand All @@ -610,8 +609,8 @@ simulations and only have to simulate the new ones:
.. parsed-literal::
CPU times: user 1.72 s, sys: 10.6 ms, total: 1.73 s
Wall time: 1.73 s
CPU times: user 1.44 s, sys: 17.9 ms, total: 1.46 s
Wall time: 1.47 s
Expand All @@ -621,8 +620,8 @@ simulations and only have to simulate the new ones:
Method: Rejection
Number of samples: 1000
Number of simulations: 1200000
Threshold: 0.0417
Sample means: t1: 0.561, t2: 0.225
Threshold: 0.0415
Sample means: t1: 0.55, t2: 0.215
Expand All @@ -640,8 +639,8 @@ standard numpy .npy files:
.. parsed-literal::
CPU times: user 25.8 ms, sys: 3.27 ms, total: 29 ms
Wall time: 28.5 ms
CPU times: user 28.7 ms, sys: 4.5 ms, total: 33.2 ms
Wall time: 33.4 ms
This stores the simulated data in binary ``npy`` format under
Expand All @@ -658,7 +657,7 @@ This stores the simulated data in binary ``npy`` format under
.. parsed-literal::
Files in pools/arraypool_3521077242 are ['d.npy', 't1.npy', 't2.npy', 'Y.npy']
Files in pools/arraypool_3375867934 are ['d.npy', 't1.npy', 't2.npy', 'Y.npy']
Now lets load all the parameters ``t1`` that were generated with numpy:
Expand All @@ -672,7 +671,7 @@ Now lets load all the parameters ``t1`` that were generated with numpy:
.. parsed-literal::
array([ 0.79, -0.01, -1.47, ..., 0.98, 0.18, 0.5 ])
array([ 0.36, 0.47, -1.66, ..., 0.09, 0.45, 0.2 ])
Expand All @@ -687,7 +686,7 @@ We can also close (or save) the whole pool if we wish to continue later:
.. parsed-literal::
arraypool_3521077242
arraypool_3375867934
And open it up later to continue where we were left. We can open it
Expand Down Expand Up @@ -718,12 +717,12 @@ You can delete the files with:
os.listdir(arraypool.path)
except FileNotFoundError:
print("The directry is removed")
print("The directory is removed")
.. parsed-literal::
The directry is removed
The directory is removed
Visualizing the results
Expand Down Expand Up @@ -820,8 +819,9 @@ sampler:
smc = elfi.SMC(d, batch_size=10000, seed=seed)
For sampling, one has to define the number of output samples, the number
of populations and a *schedule* i.e. a list of quantiles to use for each
population. In essence, a population is just refined rejection sampling.
of populations and a *schedule* i.e. a list of thresholds to use for
each population. In essence, a population is just refined rejection
sampling.

.. code:: ipython3
Expand All @@ -839,8 +839,8 @@ population. In essence, a population is just refined rejection sampling.
.. parsed-literal::
CPU times: user 1.72 s, sys: 154 ms, total: 1.87 s
Wall time: 1.56 s
CPU times: user 1.6 s, sys: 156 ms, total: 1.75 s
Wall time: 1.38 s
We can have summaries and plots of the results just like above:
Expand Down

0 comments on commit a35290a

Please sign in to comment.