Fix numerical instability #149

iangrooms · 2022-03-29T20:51:45Z

From the beginning we've had problems with numerical stability, i.e. roundoff error, that accumulates during the iterative application of the polynomial filter to data. See, for example, #33, #124, and #135, and the things we've tried to improve this, e.g. #67, #130, and #134.

This PR uses exactly the same polynomial approximation of the target filter, but applies the polynomial to the data in a completely different way. The old way was based on factoring the polynomial in terms of its roots and then iteratively applying each factor to the data. The new method is based on an algorithm for evaluating a polynomial using its coordinates in the Chebyshev basis, the algorithm being based on the three-term recurrence for Chebyshev polynomials.

For now I have left the "Numerical Instability" example notebook in, showing that the new code can filter 1/4 degree vorticity data to 10 degrees with no problems. In fact I have not been able to break the new code in any example I've tried including the Taper filter with very large filter factors.

review-notebook-app · 2022-03-29T20:51:49Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

NoraLoose · 2022-04-05T13:58:08Z

Thanks @iangrooms, this sounds really exciting! 🚀

It is a bit difficult for me to review this PR because there are no notes on how the new method exactly operates. Ian, it would be helpful if you could share notes (if available) and/or provide more comments in the code. That way, I could try to match up theory and code.

In addition, I am wondering if we can leverage the verification tests that @rabernat set up in #79 (where we would need verifications not only for kernels but for full filter operators). The rationale is the following: This PR swaps out the central filtering algorithm which this package is based on for another algo. Don't we want to test that the new algo leads to the same/similar filtered fields as the old one?

iangrooms · 2022-04-05T15:03:32Z

It is a bit difficult for me to review this PR because there are no notes on how the new method exactly operates. Ian, it would be helpful if you could share notes (if available) and/or provide more comments in the code. That way, I could try to match up theory and code.

I think I'll update the Filter Theory section to describe how it works. In exact arithmetic it should produce exactly the same answer as the old method, but the new method is more stable to roundoff errors. Sort of like re-arranging the order of the filter steps.

In addition, I am wondering if we can leverage the verification tests that @rabernat set up in #79 (where we would need verifications not only for kernels but for full filter operators). The rationale is the following: This PR swaps out the central filtering algorithm which this package is based on for another algo. Don't we want to test that the new algo leads to the same/similar filtered fields as the old one?

I probably have to look more closely at this. I guess we could put in some kind of test to compare the result of the new algorithm to the old one, but eventually I think we should just drop the old one and use the new one.

NoraLoose · 2022-04-05T15:21:30Z

I think I'll update the Filter Theory section to describe how it works.

Sounds great! I will wait with the review until the changes to the filter theory are pushed.

I guess we could put in some kind of test to compare the result of the new algorithm to the old one, but eventually I think we should just drop the old one and use the new one.

Yes, exactly. As outlined in #79, the idea is to have tests like these:

filtered = filter.apply(data)
np.testing.assert_allclose(filtered, expected)

where expected is a field we save explicitly into the test suite based on the current code.

codecov-commenter · 2022-04-19T20:51:45Z

Codecov Report

Merging #149 (0cbaba3) into master (e4177da) will increase coverage by 0.33%.
The diff coverage is 95.12%.

@@            Coverage Diff             @@
##           master     #149      +/-   ##
==========================================
+ Coverage   97.98%   98.32%   +0.33%     
==========================================
  Files           9        9              
  Lines        1044     1014      -30     
==========================================
- Hits         1023      997      -26     
+ Misses         21       17       -4

Flag	Coverage Δ
unittests	`98.32% <95.12%> (+0.33%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
gcm_filters/filter.py	`98.01% <95.00%> (+1.55%)`	⬆️
tests/test_filter.py	`99.55% <100.00%> (-0.02%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e4177da...0cbaba3. Read the comment docs.

iangrooms · 2022-05-04T18:48:33Z

I added a section on the theory that underpins the new iterative algorithm for applying the filter to data. @NoraLoose can you take a look?

docs/theory.rst

NoraLoose · 2022-05-04T21:28:29Z

docs/theory.rst

+.. math:: \mathbf{A} = -\left(\frac{2}{s_{\text{max}}}\Delta + \mathbf{I}\right)
+
+be the discrete Laplacian :math:`\Delta` with a rescaling and a shift.
+In principle one could apply the filter to a vector of data :math:`\mathbf{f}` by computing the vectors :math:`T_i(\mathbf{A})\mathbf{f}` and then taking a linear combination with weights :math:`c_i`.


Can we reformulate to avoid confusion with the "vector" Laplacian further down?

Suggestions for how to do that are welcome. We don't want to be so loose with terminology that it's confusing, but at the same time we don't want to be pedantic.

docs/theory.rst

NoraLoose · 2022-05-04T21:49:00Z

gcm_filters/filter.py

+                T_minus_2 = T_minus_1.copy()
+                T_minus_1 = T_minus_0.copy()


Suggested change

T_minus_2 = T_minus_1.copy()

T_minus_1 = T_minus_0.copy()

T_minus_2 = T_minus_1

T_minus_1 = T_minus_0

NoraLoose · 2022-05-04T21:50:09Z

gcm_filters/filter.py

+                uT_minus_2 = uT_minus_1.copy()
+                uT_minus_1 = uT_minus_0.copy()
+                vT_minus_2 = vT_minus_1.copy()
+                vT_minus_1 = vT_minus_0.copy()


Suggested change

uT_minus_2 = uT_minus_1.copy()

uT_minus_1 = uT_minus_0.copy()

vT_minus_2 = vT_minus_1.copy()

vT_minus_1 = vT_minus_0.copy()

uT_minus_2 = uT_minus_1

uT_minus_1 = uT_minus_0

vT_minus_2 = vT_minus_1

vT_minus_1 = vT_minus_0

NoraLoose · 2022-05-04T21:50:58Z

gcm_filters/filter.py

+            uT_minus_2 = ufield_bar.copy()
+            vT_minus_2 = vfield_bar.copy()


Suggested change

uT_minus_2 = ufield_bar.copy()

vT_minus_2 = vfield_bar.copy()

uT_minus_2 = ufield_bar

vT_minus_2 = vfield_bar

NoraLoose · 2022-05-04T21:51:17Z

gcm_filters/filter.py

-                    field_bar += (
-                        temp_l * 2 * np.real(s_b) / np.abs(s_b) ** 2
-                        + temp_b * 1 / np.abs(s_b) ** 2
+            T_minus_2 = field_bar.copy()


Suggested change

T_minus_2 = field_bar.copy()

T_minus_2 = field_bar

I'm responding here for all of the .copy() changes. I'm afraid to remove these because Python might just have T_minus_2 and field_bar point to the same thing, such that every time we update T_minus_2 it also updates field_bar, which we don't want.

NoraLoose · 2022-05-04T21:58:20Z

gcm_filters/filter.py

+        1: {"offset": 2.2, "factor": 0.6, "exponent": 2.5, "max_filter_factor": 67},
+        2: {"offset": 3.2, "factor": 0.7, "exponent": 2.7, "max_filter_factor": 77},


Do we still need max_filter_factor? Do we still want to throw numerical instability warnings? What are these values based on?

NoraLoose · 2022-05-04T22:09:29Z

Looks great @iangrooms! I left some minor comments above.

Two bigger comments are:

We should probably delete the numerical instability notebook because it is confusing: it says the filter becomes unstable, but then it just doesn't happen.
Since this PR involves major refactoring of the code (the central filter algorithm is essentially swapped out for a new one!), I want to re-iterate what I said above: we need verification tests that the new and old algorithm give the same / similar results. I think we can do this in a similar manner as set up by @rabernat in Refactor kernel tests #79, but we need verifications not only for kernels but for full filter operators. I'm happy to hear other opinions.

Co-authored-by: Nora Loose <NoraLoose@users.noreply.github.com>

iangrooms · 2022-05-04T23:18:07Z

Thanks for the code review! In response to the two big comments:

I agree that we should delete the Numerical Instability notebook before merging the PR. I'm leaving it in for now in case anyone wants to look at it and verify that indeed, the new algorithm does not suffer from numerical instability.
I can work to provide some evidence that the old and new algorithms give the same results, at least when the old one is stable. But setting up a new suite of tests to verify that the filters produce the expected results (not necessarily 'the same as the old algorithm', which would be unnecessary if we believe the new algorithm is correct, not to mention being onerous to implement) seems outside the scope of this PR. I think it needs its own issue (cf Refactor kernel tests #79) and its own PR.

NoraLoose · 2022-05-04T23:31:34Z

which would be unnecessary if we believe the new algorithm is correct, not to mention being onerous to implement

The point of the verification tests would be that we don't have to "believe". They would catch any bugs in the derivation / implementation of the new algorithm, which we may have overseen by simply looking at the code.

I think it needs its own issue (cf #79) and its own PR.

Sure, we can make the verification tests its own issue and PR. But I think those should be resolved first, before we can merge this PR. I can help with these tests.

iangrooms · 2022-05-05T15:08:33Z

I ran the old and new algorithms on the numerical instability notebook. I saved the result of the old algorithm (zeta_2Deg) then plotted a comparison with the result of the new algorithm. The plot is in the latest commit of the numerical instability notebook. The difference is about 7 orders of magnitude smaller than the field itself. So it looks like the old and new algorithms are giving the same results up to roundoff errors (single precision), at least in situations where the old algorithm is stable. Between the theory, the code implementing the theory, and the comparison of the old and new methods, I hope this is enough to convince that the new algorithm works as advertised.

NoraLoose · 2022-06-09T14:27:35Z

Hi @iangrooms, we merged #153 so we can finalize this PR too. Here are some final minor things that have to be done / discussed:

Remove numerical instability notebook?
What to do about the "Factoring the Gaussian Filter" option (including the docs section)? Do we want to get rid off it because there is no reason to use this functionality, now that numerical instability is not a problem anymore? Or do we want to keep it, to not lose backward compatibility? (If we bump up to version v0.3, it is okay to drop the feature as long as we communicate it.) If we keep it, we should add some sentences at the top of the "Factoring the Gaussian Filter" docs section, to the effect of: "Using this feature is not recommended, and only necessary if you run into numerical instability issues, which you most likely won't."
Merge the master branch into your branch three_term_recurrence, and run the tests without overwriting the test data. (You won't overwrite anything if you simply run pytest and don't mess with the GCM_FILTERS_OVERWRITE_TEST_DATA variable.) If the tests pass, the filter algorithm gives the same result as the previous one.

iangrooms · 2022-06-21T19:12:19Z

I'll merge the new tests. In the meantime, I do plan to remove the numerical instability notebook as well as any related comments in the docs, and the 'factoring the Gaussian.' It's a big update, so I think it makes sense to bump the version number. The old version will still be available if anyone needs it.

NoraLoose · 2022-06-21T19:19:20Z

If we remove the "Factoring the Gaussian" section in the docs, we should probably also delete the parameter n_iterations in the code (+ functionality associated to it). I'm fine with this.

Removes discussion of the factored/iterated Gaussian filter from the docs Removes the numerical instability notebook Slightly updates the theory section of the docs

iangrooms · 2022-06-21T20:48:13Z

I've removed the numerical instability notebook, removed the factored Gaussian from the docs and code, and merged the latest updates to testing. For some reason building the docs fails (all other tests pass); I noticed that this happened after I pulled in the latest changes from the master branch and before I made my own changes. Not sure how to fix it. Can @NoraLoose or @rabernat take a look?

NoraLoose · 2022-06-22T21:16:17Z

I can build the docs locally for this branch. Not sure what is going on either.

rabernat · 2022-06-22T21:25:23Z

I am guessing what happened is that some version has changed in the readthedocs environment, since we are not pinning specific versions. It looks a bit like this: pydata/pydata-sphinx-theme#511

I will try to dig deeper. If you want, you can merge this PR and we will fix the docs in a separate PR.

iangrooms · 2022-06-22T21:37:56Z

I can also build the docs locally, so if you're both agreed then I think it's OK to merge this PR and fix the docs separately.

iangrooms added 2 commits February 23, 2022 16:14

Fix numerical stability

3e3406e

Incorporate dimensionality fixes

36209fa

iangrooms requested a review from NoraLoose March 29, 2022 20:52

Merge branch 'master' into three_term_recurrence

f04673b

Chebyshev theory

4a9897d

NoraLoose requested changes May 4, 2022

View reviewed changes

iangrooms and others added 3 commits May 4, 2022 16:40

Apply suggestions from code review

a494a49

Co-authored-by: Nora Loose <NoraLoose@users.noreply.github.com>

Fix whitespace

0a30ff6

Remove instability warnings

89ad9f2

Show difference between old and new methods

0cbaba3

NoraLoose mentioned this pull request May 10, 2022

Filter verification tests #153

Merged

iangrooms added 2 commits June 21, 2022 13:27

Merge branch 'ocean-eddy-cpt:master' into three_term_recurrence

389d050

Update Docs

60c7cf4

Removes discussion of the factored/iterated Gaussian filter from the docs Removes the numerical instability notebook Slightly updates the theory section of the docs

NoraLoose approved these changes Jun 22, 2022

View reviewed changes

iangrooms merged commit 3505365 into ocean-eddy-cpt:master Jun 22, 2022

iangrooms mentioned this pull request Jul 21, 2022

Attempt to fix doc build #155

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix numerical instability #149

Fix numerical instability #149

iangrooms commented Mar 29, 2022

review-notebook-app bot commented Mar 29, 2022

NoraLoose commented Apr 5, 2022

iangrooms commented Apr 5, 2022

NoraLoose commented Apr 5, 2022

codecov-commenter commented Apr 19, 2022 •

edited

iangrooms commented May 4, 2022

NoraLoose May 4, 2022

iangrooms May 4, 2022

NoraLoose May 4, 2022

NoraLoose May 4, 2022

NoraLoose May 4, 2022

NoraLoose May 4, 2022

iangrooms May 4, 2022

NoraLoose May 4, 2022

NoraLoose commented May 4, 2022

iangrooms commented May 4, 2022

NoraLoose commented May 4, 2022

iangrooms commented May 5, 2022 •

edited

NoraLoose commented Jun 9, 2022 •

edited by iangrooms

iangrooms commented Jun 21, 2022

NoraLoose commented Jun 21, 2022

iangrooms commented Jun 21, 2022

NoraLoose commented Jun 22, 2022

rabernat commented Jun 22, 2022

iangrooms commented Jun 22, 2022

		uT_minus_2 = ufield_bar.copy()
		vT_minus_2 = vfield_bar.copy()

		1: {"offset": 2.2, "factor": 0.6, "exponent": 2.5, "max_filter_factor": 67},
		2: {"offset": 3.2, "factor": 0.7, "exponent": 2.7, "max_filter_factor": 77},

Fix numerical instability #149

Fix numerical instability #149

Conversation

iangrooms commented Mar 29, 2022

review-notebook-app bot commented Mar 29, 2022

NoraLoose commented Apr 5, 2022

iangrooms commented Apr 5, 2022

NoraLoose commented Apr 5, 2022

codecov-commenter commented Apr 19, 2022 • edited

Codecov Report

iangrooms commented May 4, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NoraLoose commented May 4, 2022

iangrooms commented May 4, 2022

NoraLoose commented May 4, 2022

iangrooms commented May 5, 2022 • edited

NoraLoose commented Jun 9, 2022 • edited by iangrooms

iangrooms commented Jun 21, 2022

NoraLoose commented Jun 21, 2022

iangrooms commented Jun 21, 2022

NoraLoose commented Jun 22, 2022

rabernat commented Jun 22, 2022

iangrooms commented Jun 22, 2022

codecov-commenter commented Apr 19, 2022 •

edited

iangrooms commented May 5, 2022 •

edited

NoraLoose commented Jun 9, 2022 •

edited by iangrooms