DifferentialEquation Op refactor #3634

michaelosthege · 2019-09-25T14:29:26Z

Why

The current implementation of the DifferentialEquation Op currently involves a few "hacks":

a dictionary-based cache of simulation results
return value is a 1D tensor which the user must reshape
no support for compute_test_value

Changes (WIP)

strongly typed inputs/outputs
support for test values
optionally return sensitivities to the user
2D return shapes of the Op
example notebook with a model having two state variables; different observations

The notebook ODE_API_WIP.ipynb implements a simple enzymatic reaction (S->P) model with sparse observations.
It also has a benchmark cell that we can use for future ODE performance testing.
Finally, in the computation graph, one can see (after zooming in a lot) that the modified implementation does not use a helper Op for the gradient, but directly uses the symbolic output from the forward pass.

In a previous commit on this branch, the new refactors were benchmarked against the existing implementation. There was no significant performance difference.

ToDo

notebooks were renamed

re-run notebooks on a fast machine

align existing tests
do we need more tests in this PR?
remove older implementation
align docstrings & code style (please review!)
updated changelog

(not yet implemented)

+ full support for test_values + explicit input/output types + 2D return shape + optional return of sensitivities + gradient without helper Op

review-notebook-app · 2019-09-25T14:29:32Z

Check out this pull request on

You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB.

+ whitespace & condensed formatting + test for equality of identical Ops

(due to downcast exception)

codecov · 2019-11-01T14:19:18Z

Codecov Report

Merging #3634 into master will decrease coverage by <.01%.
The diff coverage is 95.23%.

@@            Coverage Diff             @@
##           master    #3634      +/-   ##
==========================================
- Coverage   89.79%   89.78%   -0.01%     
==========================================
  Files         134      134              
  Lines       20050    20088      +38     
==========================================
+ Hits        18004    18037      +33     
- Misses       2046     2051       +5

Impacted Files	Coverage Δ
pymc3/exceptions.py	`100% <100%> (ø)`	⬆️
pymc3/ode/utils.py	`100% <100%> (ø)`	⬆️
pymc3/tests/test_ode.py	`100% <100%> (ø)`	⬆️
pymc3/tests/test_util.py	`100% <100%> (ø)`	⬆️
pymc3/ode/ode.py	`93.75% <91.54%> (-3.66%)`	⬇️
pymc3/distributions/distribution.py	`95.02% <0%> (-0.3%)`	⬇️

+ add notebooks to examples index

ColCarroll

Code changes look good to me!

I did not check for "scientific correctness", but it looks like it follows the old code pretty closely.

pymc3/ode/ode.py

ColCarroll · 2019-11-02T11:58:24Z

I will try to also look at the notebooks and make sure they run on my machine later today, but I am ok merging before that happens (or if I do not get to it).

…thege/pymc3 into ode-shapes-and-caching

Dpananos · 2019-11-02T18:10:25Z

@michaelosthege This looks great! Thanks for all the effort

ColCarroll · 2019-11-03T20:25:06Z

Merging! Good work on this!

fonnesbeck · 2019-11-04T18:48:06Z

The sampling for the ODE models in the notebooks all take 4+ hours to run, even on my souped-up MBP. Is this expected (and why they are unrendered)?

Dpananos · 2019-11-04T18:49:59Z

@fonnesbeck Both notebooks? Even the original one which doesn't use DifferentialEquation?

fonnesbeck · 2019-11-04T18:53:14Z

I'm looking specifically at the one with the SIR and falling object examples.

Dpananos · 2019-11-04T19:30:16Z

Hmm. Can confirm. Models say sampling will take approx 2.5 hours. The first example in that notebook shouldn't take longer than 10 minutes from what I remember.

michaelosthege · 2019-11-04T19:36:58Z

I can confirm as well. With NUTS I get ~5s/it (1 chain 1 core), whereas DEMetropolis runs at 1.5it/s (12 chains 1 core), so NUTS is about 40x slower.

From the relative speeds that sounds realistic, because NUTS evaluates the model several times per iteration.

The absolute speed however is quite slow indeed.

I will try to check out the previous version and run the same comparison.

michaelosthege · 2019-11-04T20:19:48Z

So here the puzzle:

The original implementation (by Demetri) runs NUTS 20x faster than my refactor of it
DEMetropolis runs with the same speed on both versions
My benchmark (notebook) gives approximately the same speed for both versions

Here are two hypotheses:

something with the caching gives NUTS an advantage?
If I remember correctly, my refactor moves the vector-sensitivities product from numpy to theano. Could this explain why only NUTS has a performance difference?

For reference, the approximate seconds/it/chain

	original	refactor
NUTS	0.1	5
DEMetropolis	0.05	0.05

Maybe my benchmark did not capture the relevant parts of the graph?

def make_benchmark():
    pmodel = get_model()
    
    # benchmark using the full gradient
    test_t = tt.grad(pmodel.logpt, [pmodel.sigma, pmodel.vmax, pmodel.K_S, pmodel.red_0])
    # compile a function to evaluate the gradient
    test_f = theano.function(inputs=pmodel.cont_vars, outputs=test_t)
    
    # test the compiled function with the true parameters
    print(f'Test gradient:')
    print(test_f(0.2, 0.5, 2, 10))
    
    # make a benchmarking function that uses random inputs
    # - to avoid cheating by caching
    # - to get a more realistic distribution over simulation times
    def bm():
        test_f(
            np.log(np.random.uniform(0.1, 0.2)),
            np.log(np.random.uniform(0.4, 0.6)),
            np.log(np.random.uniform(1.9, 2.1)),
            np.random.uniform(9, 11)
        )
    return pmodel, bm

michaelosthege · 2019-11-05T19:45:58Z

I found two things that look suspicious:

1.: Benchmark gradient is not representative

The benchmark (ODE_API_shapes_and_benchmarking notebook) does a different gradient than the HMC implementation:

# benchmark using the full gradient
test_t = tt.grad(pmodel.logpt, [pmodel.sigma, pmodel.vmax, pmodel.K_S, pmodel.red_0])
# compile a function to evaluate the gradient
test_f = theano.function(inputs=pmodel.cont_vars, outputs=test_t)

compared to model.py:734 and model.py:467

2.: Symbolic sensitivity handover does not work when using NUTS

If you take ODE_API_shapes_and_benchmarking.ipynb and append the following two cells at the bottom, you can get the debug output of the graph construction:

with get_model() as model:
    pm.find_MAP()
# Output:
#Applied log-transform to sigma and added transformed sigma_log__ to model.
#Applied log-transform to vmax and added transformed vmax_log__ to model.
#Applied log-transform to K_S and added transformed K_S_log__ to model.
#make_node for inputs 726405093253646730
#grad w.r.t. inputs 726405093253646730
#grad w.r.t. inputs 726405093253646730
#grad w.r.t. inputs 726405093253646730
#logp = -37.589, ||grad|| = 21.255: 100%|███████████████████| 27/27 [00:01<00:00, 15.57it/s]

The output above shows that for find_MAP, the grad method of the Op is called 3 times with the same inputs that were previously fed to make_node (same hash). This way the cached sensitivity Tensor is used, avoiding duplicate computation.

The following output shows that the logp_dlogp_function function, which (I think @ColCarroll) is also used by HMC calls grad with different inputs, rendering the sensitivity-caching ineffective.

with model:
    f = model.logp_dlogp_function(model.cont_vars)
# Output:
#grad w.r.t. inputs 5369686199239769812
#[MakeVector{dtype='float64'}.0, MakeVector{dtype='float64'}.0]      <---- _log.debug(inputs) in ode.py:197
#No cached sensitivities found!
#make_node for inputs 5369686199239769812

Finally, running NUTS gives a similar output to the one above:

with model:
    trace = pm.sample(
        draws=1000, tune=500,
        chains=1, cores=1,
        step=pm.NUTS()
    )
# Output:
#grad w.r.t. inputs -1522734632884762703
#[MakeVector{dtype='float64'}.0, MakeVector{dtype='float64'}.0]
#No cached sensitivities found!
#make_node for inputs -1522734632884762703
#Sequential sampling (1 chains in 1 job)
#NUTS: [red_0, K_S, vmax, sigma]
#Sampling chain 0, 0 divergences: .......

Conclusion & Actions

The failure to cache symbolic sensitivities could explain a 2x slowdown.

Maybe, the internalization of the sensitivity-vector product from numpy to theano could explain the rest?

@ColCarroll @aseyboldt do you have an idea how I could repair the symbolic sensitivity cache?

If the sensitivity cache trick does not work with HMC gradients or the product in theano is significantly slower, I could bring back the ODEGradOp.

For re-use of DifferentialEquation Op instances, either self._output_sensitivities Tensors, or self._grad_ops must be cached in dicts using something like hash(inputs) keys.

Should this be impossible, Op instance re-use will cause a 2x slowdown.

michaelosthege · 2019-11-12T11:03:20Z

I'm continuing the discussion over in #3676

michaelosthege added 2 commits September 25, 2019 16:04

addition of test for equality checking of ODE Ops

757937e

(not yet implemented)

WIP: refactoring the DifferentialEquation Op

82fdae8

+ full support for test_values + explicit input/output types + 2D return shape + optional return of sensitivities + gradient without helper Op

michaelosthege added request discussion WIP labels Sep 25, 2019

michaelosthege added 4 commits November 1, 2019 12:01

fully replace DifferentialEquation Op with the refactored implementation

d2040af

align tests with refactored API

4caeb4b

+ whitespace & condensed formatting + test for equality of identical Ops

use tt.stack as suggested by DeprecationWarning

0309fbe

always cast y0 and theta to floatX

33e3a33

michaelosthege marked this pull request as ready for review November 1, 2019 13:00

michaelosthege added 2 commits November 1, 2019 14:45

allow some tests to fail on float32

5bed874

(due to downcast exception)

don't use f-strings to maintain 3.5 support

31dd2bf

michaelosthege and others added 3 commits November 1, 2019 17:45

link ODE refactor PR

e93cd93

renamed ODE notebooks

bf4ae0b

+ add notebooks to examples index

Merge branch 'master' into ode-shapes-and-caching

8172a83

michaelosthege changed the title ~~[WIP] DifferentialEquation Op improvements~~ DifferentialEquation Op refactor Nov 1, 2019

ColCarroll approved these changes Nov 2, 2019

View reviewed changes

pymc3/ode/ode.py Outdated Show resolved Hide resolved

michaelosthege added 2 commits November 2, 2019 13:35

use (custom) errors instead of asserts

5efc079

Merge branch 'ode-shapes-and-caching' of https://github.com/michaelos…

6bb988e

…thege/pymc3 into ode-shapes-and-caching

michaelosthege added ready? and removed WIP labels Nov 2, 2019

move ShapeError to exceptions.py

4359308

ColCarroll merged commit 225ae82 into pymc-devs:master Nov 3, 2019

michaelosthege mentioned this pull request Nov 5, 2019

Terrible performance of NUTS with refactored ODE API #3676

Closed

michaelosthege mentioned this pull request Nov 18, 2019

Repair ode API after refactor broke it #3684

Merged

7 tasks

michaelosthege deleted the ode-shapes-and-caching branch August 7, 2021 11:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DifferentialEquation Op refactor #3634

DifferentialEquation Op refactor #3634

michaelosthege commented Sep 25, 2019 •

edited

Loading

review-notebook-app bot commented Sep 25, 2019

codecov bot commented Nov 1, 2019 •

edited

Loading

ColCarroll left a comment

ColCarroll commented Nov 2, 2019

Dpananos commented Nov 2, 2019

ColCarroll commented Nov 3, 2019

fonnesbeck commented Nov 4, 2019

Dpananos commented Nov 4, 2019

fonnesbeck commented Nov 4, 2019 •

edited

Loading

Dpananos commented Nov 4, 2019 •

edited

Loading

michaelosthege commented Nov 4, 2019

michaelosthege commented Nov 4, 2019 •

edited

Loading

michaelosthege commented Nov 5, 2019

michaelosthege commented Nov 12, 2019

DifferentialEquation Op refactor #3634

DifferentialEquation Op refactor #3634

Conversation

michaelosthege commented Sep 25, 2019 • edited Loading

Why

Changes (WIP)

ToDo

review-notebook-app bot commented Sep 25, 2019

codecov bot commented Nov 1, 2019 • edited Loading

Codecov Report

ColCarroll left a comment

Choose a reason for hiding this comment

ColCarroll commented Nov 2, 2019

Dpananos commented Nov 2, 2019

ColCarroll commented Nov 3, 2019

fonnesbeck commented Nov 4, 2019

Dpananos commented Nov 4, 2019

fonnesbeck commented Nov 4, 2019 • edited Loading

Dpananos commented Nov 4, 2019 • edited Loading

michaelosthege commented Nov 4, 2019

michaelosthege commented Nov 4, 2019 • edited Loading

michaelosthege commented Nov 5, 2019

1.: Benchmark gradient is not representative

2.: Symbolic sensitivity handover does not work when using NUTS

Conclusion & Actions

michaelosthege commented Nov 12, 2019

michaelosthege commented Sep 25, 2019 •

edited

Loading

codecov bot commented Nov 1, 2019 •

edited

Loading

fonnesbeck commented Nov 4, 2019 •

edited

Loading

Dpananos commented Nov 4, 2019 •

edited

Loading

michaelosthege commented Nov 4, 2019 •

edited

Loading