Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nail down stability of IBM with very high orders. #404

Merged

Conversation

pnkraemer
Copy link
Collaborator

When playing with the code, I noticed that the changes in #386 were absolutely crucial for stability of high order IBMs.
I thought it would be good to have a test that nails this down. The changes in this PR are:

  1. add an optional mini-damping factor for IBM covariances: this way (using 1e-15), the maximum order for IBM kalman filtering is not 10 anymore, but 15! :D If we do crazy small steps, the max order is "only" 14 now. @nathanaelbosch
  2. write tests for the above point, as well as the no-damping limit of 10.
  3. made mini updates in the car tracking problem zoo that improve its square-root usage in a very minor way.

@codecov
Copy link

codecov bot commented May 5, 2021

Codecov Report

Merging #404 (cd3dedb) into master (fed9dfc) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #404      +/-   ##
==========================================
- Coverage   83.67%   83.67%   -0.01%     
==========================================
  Files          99       99              
  Lines        4957     4955       -2     
  Branches      670      669       -1     
==========================================
- Hits         4148     4146       -2     
  Misses        598      598              
  Partials      211      211              
Impacted Files Coverage Δ
src/probnum/statespace/discrete_transition.py 90.12% <ø> (ø)
...um/problems/zoo/filtsmooth/_filtsmooth_problems.py 90.90% <100.00%> (+0.09%) ⬆️
src/probnum/statespace/integrator.py 99.02% <100.00%> (-0.02%) ⬇️

Comment on lines 207 to 210
process_noise_cov_cholesky = np.linalg.cholesky(
self._proc_noise_cov_mat
+ self._process_noise_damping * np.eye(len(self._proc_noise_cov_mat))
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this nugget necessary for numerical stability?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only for orders > 11 :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we then actually add this argument? I guess it will be needed quite rarely, if not only for the test you added in this PR 😄 Correct me if this is wrong

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be ignored, I read your answer below

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, not really. I was playing around with it and thought it was cool, but I see your concerns. Shall I remove it?

Comment on lines +504 to +507
state_trans_mat_fun=lambda t: state_trans_mat,
shift_vec_fun=lambda t: shift_vec,
proc_noise_cov_mat_fun=lambda t: proc_noise_cov_mat,
proc_noise_cov_cholesky_fun=lambda t: proc_noise_cov_cholesky,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thanks! I always get a bit twitchy when more than 2 arguments are passed positionally

Comment on lines +326 to +331
# The Cholesky factor of the process noise covariance matrix of the IBM
# always exists, even for non-square root implementations.
proc_noise_cov_cholesky = (
self.precon(dt)
@ self.equivalent_discretisation_preconditioned.proc_noise_cov_cholesky
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we get any benefit from computing this Cholesky factor when not using the sqrt implementation?
Otherwise, we might make this conditional.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm my thinking here was that this cholesky factor is created for the equivalent discretisation (preconditioned), so it was super weird not to translate it here, too. What do you think?

Copy link
Collaborator

@schmidtjonathan schmidtjonathan May 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make the computation of the Cholesky factor - both in equivalent_discretisation_preconditioned and in discretise - conditional on self.forward-/backward_implementation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we could. Thought it only happens once, so the impact on runtime should be almost neglectible. Do you think that it is worth it nevertheless?

step=1e-12,
forward_implementation="sqrt",
backward_implementation="sqrt",
_process_noise_damping=1e-14,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprised that a nugget of 1e-14 does anything at all.

Do you think it necessary to establish an entirely new kwarg, solely for the case that somebody wants to use a 14 times integrated model? 😄

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I dont want to stretch the word "necessary" too much here... :) I just thought it was cool to get these high orders! ;)
And it is a "hidden" kwarg that defaults to 0, so I thought it doesnt hurt too much. Though if you find it annoying I am open for talking about it :)

Copy link
Collaborator

@schmidtjonathan schmidtjonathan May 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well maybe I am biased, but I quickly get confused when a method offers me to tweak every part of the inner workings, and especially if the documentation takes half an hour to read (not that this would be the case here!) So I often prefer clarity over control, given that often, the kwargs are only used in very special cases. Do hidden kwargs appear in the docs?

"""With damping, the filter achieves really high orders (we test 25, but 50 was
possible too, for some reason)."""
regression_problem, statespace_components = filtsmooth_zoo.car_tracking(
model_ordint=25,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is happening.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for the jokes, the test passes for order=75 as well (no typo!), but I thought 25 was sufficiently extreme :D

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All just by removing the matrix inversion, right ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and by using this mini damping factor that makes the process noise always positive definite.

@schmidtjonathan
Copy link
Collaborator

Just checking: Your PR description is outdated, right?

the maximum order for IBM kalman filtering is not 10 anymore, but 15!

this should be 25 by now, right? I don't want to be petty here, just check if the 25-order thing is not some kind of special case. 😆

@pnkraemer
Copy link
Collaborator Author

25 is only filter. filtsmooth starts complaining at 15

@schmidtjonathan
Copy link
Collaborator

25 is only filter. filtsmooth starts complaining at 15

I see. Still. That's quite remarkable

@pnkraemer
Copy link
Collaborator Author

as I said in one of the comments as well: 75 worked, too, but I thought no one would believe me so I used 25 for the test ;)

Copy link
Collaborator

@schmidtjonathan schmidtjonathan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crazy stuff! Thanks for working that out :)

@pnkraemer pnkraemer merged commit d050f26 into probabilistic-numerics:master May 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants