-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Milstein (Strat), Milstein grad-free (Ito + Strat) #31
Conversation
Heyup. Overall, looks good! Re: code duplication: I'd suggest creating an abstract base class
I'm surprised that the grad-free is looking like 0.5; it should still be an order-1 scheme for commutative noise.
Other comments:
|
(I'll do a proper code review once these points are organised.) |
c6e9fcf
to
674549c
Compare
@patrick-kidger Ok, applied all above comments. Current rates for modified For Ito grad-free keeps the same rate as expected but for Stratonovich it differs from 1.0. side note: right now running diagnostics prints one additional warning but I haven't looked at it yet:
|
I'm perturbed by the fact that the Strat derivative-free Milstein doesn't seem to be order 1. But I've been through your implementation and it looks correct to me. Can you try it on other problems / other noise types / with smaller steps, and see what happens? Hopefully that will help to diagnose the issue. Regarding the warning: this can safely be ignored. (Something I put in to warn about an easy-to-make inefficiency, but we know what we're doing. ;) ) If you like you can suppress it. Overall this PR looks in good shape to me. |
@patrick-kidger All comments addressed. I've run |
Other than the order for Derivative-free Stratonovich Milstein, this all looks good to me. Pinging @lxuechen - maybe another set of eyes will help. Any idea why this solver seems to be getting order 0.5 rather than 1.0? |
What is the reference on the current impl of derivative-free Milstein? I think there might be a couple of issues in the math of the code I see, but I'm not completely certain given that I only went over this quickly. |
Equation (24): https://infoscience.epfl.ch/record/143450/files/sde_tutorial.pdf Alternate source - equation (9) in: https://core.ac.uk/download/pdf/82209565.pdf |
I think I've spotted one divergence in pdfs: Where in Julia DiffEq they differ in And also in second pdf you linked those two different definitions of I tried it with that different Y_tilde but it's still 0.6 so there's still something else there. |
torchsde/_core/methods/milstein.py
Outdated
y0_ + dt * f_eval_ + g_eval_ * sqrt_dt | ||
for y0_, f_eval_, g_eval_ in zip(y0, f_eval, g_eval) | ||
] | ||
g_prod_eval_prime = self.sde.g_prod(t1, y0_prime, v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And according to DiffEq here's t0
being used (But with those fixes slope is still 0.6).
Fair comment that Stratonovich specifically isn't mentioned in the second pdf, but that shouldn't affect things: the difference is just in what zeta means. Yeah, I'd spotted the difference in t, but (a) our test problem is independent of t, and (b) if we treat t as part of the state (with unit drift and zero diffusion) then we recover the version you've implemented, so I don't think it should matter. |
Maybe double check what convergence DiffEq.jl gets? In principle it should be possible (if perhaps a bit laborious) to get the two libraries to compute the exact same thing, and see where things diverge. |
Hmm.. and running |
I probably won't have time to look into this carefully this week, and I think it's likely only safe for us to merge this when we see the correct rate or figure out why otherwise. One comment I'd have in general for the derivative-free scheme is to look into its motivation. The whole argument behind the derivative-free scheme is that it approximates the gdg term with the difference between function evaluations. The way to derive these schemes is via Taylor expansion (this is classical calculus and doesn't require any stochastic analysis). Concretely, you could do an expansion of g at y0 evaluated at y_tilde, and the overall finite diff term (after div root(h) and other ops) should match the overall gdg term in the vanilla Milstein scheme. This explains why that removing the f h term is valid as you described in Julia diffeq's impl and that you should eval at t0 (and not at t1). To double check that the math is correct, I'd also recommend starting from the Taylor expansion. More generally, it makes sense to learn about why the scheme works that way while coding up the scheme. This will also make it clear why different people implement it differently and validly. |
@patrick-kidger Hi! I've tried to recreate |
Fascinating. Perhaps reassuring that this seems to be happening. Btw, do you know what the difference is between My next guess is that we're observing an intermediate regime, where we might only get 0.5 convergence for the timescales we're looking at, and would need to go much smaller. (Would need to think a bit harder about whether this is possible in this case, but I do know of examples going the other way, when you can observe 1.0 convergence on intermediate scales despite 0.5 being the true rate.) |
Ok, I've tried it few times and that's results: I couldn't come up with analytical solution for Here's updated notebook: https://github.com/mtsokol/torchsde/blob/dev-notebook/examples/Milstein%20order%20check%202.ipynb For Ito slope is 1.0 (but in few random runs it sometimes ended up as 0.6 lowest) But for Stratonovich with proper correction it's 1.0 (in few runs it's in range 0.8 - 1.1) And now fun part: I changed My questions:
|
torchsde/_core/methods/milstein.py
Outdated
g_prod_eval = self.sde.g_prod(t0, y0, I_k) | ||
gdg_prod_eval = self.sde.gdg_prod(t0, y0, v) | ||
g_eval = self.sde.g(t0, y0) | ||
g_prod_eval = self.sde.prod(g_eval, I_k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prod
will break the adjoint code. I don't think we should optimize this for now, and I'd rather just use g_prod
at the cost of a few extra function evaluations. Feel free to add this to #23.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Will reverse it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! We can brainstorm about ways to speed things up after this PR gets merged. On a high-level, the reason we made g_prod
in the first place is that it's a vjp for the adjoint, and vjps are faster than directly computing the Jacobian. Evaluating g directly for the adjoint would mean computing the whole Jacobian, which scales poorly w.r.t. the dimension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing!
For now I just pushed changes.
Nicely figured out! I can completely believe that it's down to the choice of SDE, and that we're just observing funny phenomena from this one. This is particularly backed up by the fact that the same behaviour occurs in a completely different package. I am surprised that the choice of (And yeah, as Chen says we're finding that things are a bit complicated in adjoint-land so the |
d1093d4
to
562c652
Compare
Ok, so current version is with all the latest remarks: |
LGTM. Feel free to pull the squash and merge trigger after taking another look @patrick-kidger. |
* Added BrownianInterval * Unified base solvers * Updated solvers to use interval interface * Unified solvers. * Required Python 3.6. Bumped version. * Updated benchmarks. Fixed several bugs. * Tweaked BrownianInterval to accept queries outside its range * tweaked benchmark * Added midpoint. Tweaked and fixed things. * Tided up adjoint. * Bye bye Python 2; fixes #14. * tweaks from feedback * Fix typing. * changed version * Rename. * refactored settings up a level. Fixed bug in BTree. * fixed bug with non-srk methods * Fixed? srk noise * Fixed SRK properly, hopefully * fixed mistake in adjoint * Fix broken tests and refresh documentation. * Output type annotation. * Rename to reverse bm. * Fix typo. * minor refactors in response to feedback * Tided solvers a little further. * Fixed strong order for midpoint * removed unused code * Dev kidger2 (#19) * Many fixes. Updated diagnostics. Removed trapezoidal_approx Fixed error messages for wrong methods etc. Can now call BrownianInterval with a single value. Fixed bug in BrownianInterval that it was always returning 0! There's now a 2-part way of getting levy area: it has to be set as available during __init__, and then specified that it's wanted during __call__. This allows one to create general Brownian motions that can be used in multiple solvers, and have each solver call just the bits it wants. Bugfix spacetime -> space-time Improved checking in check_contract Various minor tidy-ups Can use Brownian* with any Levy area sufficient for the solver, rather than just the minimum the solver needs. Fixed using bm=None in sdeint and sdeint_adjoint, so that it creates an appropriate BrownianInterval. This also makes method='srk' easy. * Fixed ReverseBrownian * bugfix for midpoint * Tided base SDE classes slightly. * spacetime->space-time; small tidy up; fix latent_sde.py example * Add efficient gdg_jvp term for log-ODE schemes. (#20) * Add efficient jvp for general noise and refactor surrounding. * Add test for gdg_jvp. * Simplify requires grad logic. * Add more rigorous numerical tests. * Fix all issues * Heun's method (#24) * Implemented Heun method * Refactor after review * Added docstring * Updated heun docstring * BrownianInterval tests + bugfixes (#28) * In progress commit on branch dev-kidger3. * Added tests+bugfixes for BrownianInterval * fixed typo in docstring * Corrections from review * Refactor tests for * Refactor tests for BrownianInterval. * Refactor tests for Brownian path and Brownian tree. * use default CPU * Remove loop. Co-authored-by: Xuechen Li <12689993+lxuechen@users.noreply.github.com> * bumped numpy version (#32) * Milstein (Strat), Milstein grad-free (Ito + Strat) (#31) * Added milstein_grad_free, milstein_strat and milstein_strat_grad_free * Refactor after first review * Changes after second review * Formatted imports * Changed used Ex. Reversed g_prod * Add support for Stratonovich adjoint (#21) * Add efficient jvp for general noise and refactor surrounding. * Add test for gdg_jvp. * Simplify requires grad logic. * Add more rigorous numerical tests. * Minor refactor. * Simplify adjoints. * Add general noise version. * Refactor adjoint code. * Fix new interface. * Add adjoint method checking. * Fix bug in not indexing the dict. * Fix broken tests for sdeint. * Fix bug in selection. * Fix flatten bug in adjoint. * Fix zero filling bug in jvp. * Fix bug. * Refactor. * Simplify tuple logic in modified Brownian. * Remove np.searchsorted in BrownianPath. * Make init more consistent. * Replace np.searchsorted with bisect for speed; also fixes #29. * Prepare levy area support for BrownianPath. * Use torch.Generator to move to torch 1.6.0. * Prepare space-time Levy area support for BrownianPath. * Enable all levy area approximations for BrownianPath. * Fix for test_sdeint. * Fix all broken tests; all tests pass. * Add numerical test for gradient using midpoint method for Strat. * Support float/int time list. * Fixes from comments. * Additional fixes from comments. * Fix documentation. * Remove to for BrownianPath. * Fix insert. * Use none for default levy area. * Refactor check tensor info to reduce boilerplate. * Add a todo regarding get noise. * Remove type checks in adjoint. * Fixes from comments. * Added BrownianReturn (#34) * Added BrownianReturn * Update utils.py * Binterval improvements (#35) * Tweaked to not hang on adaptive solvers. * Updated adaptive fix * Several fixes for tests and adjoint. Removed some broken tests. Added error-raising `g` to the adjoint SDE. Fixed Milstein for adjoint. Fixed running adjoint at all. * fixed bug in SRK * tided up BInterval * variable name tweak * Improved heuristic for BrownianInterval's dependency tree. (#40) * [On dev branch] Tuple rewrite (#37) * Rename plot folders from diagnostics. * Complete tuple rewrite. * Remove inaccurate comments. * Minor fixes. * Fixes. * Remove comment. * Fix docstring. * Fix noise type for problem. * Binterval recursion fix (#42) * Improved heuristic for BrownianInterval's dependency tree. * Inlined the recursive code to reduce number of stack frames * Add version number. Co-authored-by: Xuechen <12689993+lxuechen@users.noreply.github.com> * Refactor. * Euler-Heun method (#39) * Implemented euler-heun * After refactor * Applied refactor. Added more diagnostics * Refactor after review * Corrected order * Formatting * Formatting * BInterval - U fix (#44) * Improved heuristic for BrownianInterval's dependency tree. * fixed H aggregation * Added consistency test * test fixes * put seed back * from comments * Add log-ODE scheme and simplify typing. (#43) * Add log-ODE scheme and simplify typing. * Register log-ODE method. * Refactor diagnostics and examples. * Refactor plotting. * Move btree profile to benchmarks. * Refactor all ito diagnostics. * Refactor. * Split imports. * Refactor the Stratonovich diagnostics. * Fix documentation. * Minor typing fix. * Remove redundant imports. * Fixes from comment. * Simplify. * Simplify. * Fix typo caused bug. * Fix directory issue. * Fix order issue. * Change back weak order. * Fix test problem. * Add weak order inspection. * Bugfixes for log-ODE (#45) * fixed rate diagnostics * tweak * adjusted test_strat * fixed logODE default. * Fix typo. Co-authored-by: Xuechen Li <12689993+lxuechen@users.noreply.github.com> * Default to loop-based. Fixes #46. * Minor tweak of settings. * Fix directory structure. * Speed up experiments. * Cycle through the possible line styles. Co-authored-by: Patrick Kidger <33688385+patrick-kidger@users.noreply.github.com> * Simplify and fix documentation. * Minor fixes. - Simplify strong order assignment for euler. - Fix bug with "space_time". * Simplify strong order assignment for euler. * Fix bug with space-time naming. * Make tensors for grad for adjoint specifiable. (#52) * Copy of #55 | Created pyproject.toml (#56) * Skip tests if the optional C++ implementations don't compile; fixes #51. * Create pyproject.toml * Version add 1.6.0 and up Co-authored-by: Xuechen <12689993+lxuechen@users.noreply.github.com> * Latent experiment (#48) * Latent experiment * Refactor after review * Fixed y0 * Added stable div * Minor refactor * Simplify latent sde even further. * Added double adjoint (#49) * Added double adjoint * tweaks * Updated adjoint tests * Dev adjoint double test (#57) * Add gradgrad check for adjoint. * Relax tolerance. * Refactor numerical tests. * Remove unused import. * Fix bug. * Fixes from comments. * Rename for consistency. * Refactor comment. * Minor refactor. * Add adjoint support for general/scalar noise in the Ito case. (#58) * adjusted requires_grad Co-authored-by: Xuechen Li <12689993+lxuechen@users.noreply.github.com> * Dev minor (#63) * Add requirements and update latent sde. * Fix requirements. * Fix. * Update documentation. * Use split to speed things up slightly. * Remove jit standalone. * Enable no value arguments. * Fix bug in args. * Dev adjoint strat (#67) * Remove logqp test. * Tide examples. * Refactor to class attribute. * Fix gradcheck. * Reenable adjoints. * Typo. * Simplify tests * Deprecate this test. * Add back f ito and strat. * Simplify. * Skip more. * Simplify. * Disable adaptive. * Refactor due to change of problems. * Reduce problem size to prevent general noise test case run for ever. * Continuous Integration. (#68) * Skip tests if the optional C++ implementations don't compile; fixes #51. * Continuous integration. * Fix os. * Install package before test. * Add torch to dependency list. * Reduce trials. * Restrict max number of parallel runs. * Add scipy. * Fixes from comment. * Reduce frequency. * Fixes. * Make sure run installed package. * Add check version on pr towards master. * Separate with blank lines. * Loosen tolerance. * Add badge. * Brownian unification (#61) * Added tol. Reduced number of generator creations. Spawn keys now of finite length. Tidied code. * Added BrownianPath and BrownianTree as BrownianInterval wrappers * added trampolining * Made Path and Tree wrappers on Interval. * Updated tests. Fixed BrownianTree determinism. Allowed cache_size=0 * done benchmarks. Fixed adjoint bug. Removed C++ from setup.py * fixes for benchmark * added base brownian * BrownianPath/Tree now with the same interface as before * BInterval(shape->size), changed BPath and BTree to composition-over-inheritance. * tweaks * Fixes for CI. (#69) * Fixes for CI. * Tweaks to support windows. * Patch for windows. * Update patch for windows. * Fix flaky tests of BInterval. * Add fail-fast: false (#72) * Dev methods fixes (#73) * Fixed adaptivity checks. Improved default method selection. * Fixes+updated sdeint tests * adjoint method fixes * Fixed for Py3.6 * assert->ValueError; tweaks * Dev logqp (#75) * Simplify. * Add stable div utility. * Deprecate. * Refactor problems. * Sync adjoint tests. * Fix style. * Fix import style. * Add h to test problems. * Add logqp. * Logqp backwards compatibility. * Add type annotation. * Better documentation. * Fixes. * Fix notebook. (#74) * Fix notebook. * Remove trivial stuff. * Fixes from comments. * Fixes. * More fixes. * Outputs. * Clean up. * Fixes. * fixed BInterval flakiness+slowness (#76) * Added documentation (#71) * Added documentation * tweaks * Fix significance level. * Fix check version. * Skip confirmation. * Fix indentation errors. * Update README.md Co-authored-by: Patrick Kidger <33688385+patrick-kidger@users.noreply.github.com> Co-authored-by: Mateusz Sokół <8431159+mtsokol@users.noreply.github.com> Co-authored-by: Sayantan Das <36279638+ucalyptus@users.noreply.github.com>
@patrick-kidger Hi!
I'm sharing a WIP draft with three methods: Milstein for Strat, Milstein grad-free for Ito and Strat - done as described - grad-free variants are inside original methods where Milstein Strat is a separate one (as agreed for code duplication, but whole code differs in
- dt
- should I unify it into one method with or leave that duplicaton?)Grad-free mode can be used by passing {'grad_free': True} in options: False or lack of that option results in default
sde.gdg_prod
usage.Issues/questions:
sde.g_prod
shouldn't compute diffusiong
and perform mul withI_k
-> in Milstein grad-free I need to computeg
,g * I_k
andg * v
so I end up withg
being calculated three times instead of one. What do you think about extending sde API by addingdef prod(self, g, v)
method that will apply correctseq_mul
according to the noise type? (that will introduce a bit of code duplication I guess)gdg_prod
bug is merged. Butstratanovich_diagonal.py
with grad-free Milstein works fine. Here's one plot and rate:I think that usage of grad-free variant changes order to 0.5, right? (I can
if
it in code)sqrt_dt = torch.sqrt(dt) if isinstance(dt, torch.Tensor) else math.sqrt(dt)
appears in few places now - how about introducingmisc.sqrt
?