Fix polydiff for variable step size and missing data (issue #173) by pavelkomarov · Pull Request #190 · florisvb/PyNumDiff

pavelkomarov · 2026-03-06T05:22:09Z

Summary

Fixes #173 — makes polydiff work correctly when x contains NaN values (missing data) and/or when time points are nonuniformly spaced.

Changes

`pynumdiff/utils/utility.py` — `slide_function`

Renamed dt → dt_or_t to accept either a scalar step size or an array of time locations
When dt_or_t is an array, passes the actual t[window] slice to func so inner functions see real time values and can handle nonuniform spacing
Fully backward compatible: scalar dt is passed through unchanged, so lineardiff and all other callers are unaffected

`pynumdiff/polynomial_fit.py` — `polydiff` / `_polydiff`

Renamed dt → dt_or_t to match
Builds time array from scalar dt or uses the passed array directly
Filters NaN values from x before calling np.polyfit, so missing data no longer causes all-NaN output
Handles all-NaN windows gracefully by returning NaN arrays rather than crashing
Updated docstrings to document the new behavior

`pynumdiff/tests/test_diff_methods.py`

test_polydiff_nan_inputs: verifies NaN gaps produce finite output and accurate values outside the gap
test_polydiff_variable_step: verifies irregular t array gives accurate derivatives, and beats the wrong-uniform-dt baseline

`pynumdiff/tests/test_utils.py`

Extended test_slide_function to cover the array-t code path

- slide_function: accept dt_or_t as scalar or array; when array, pass the actual t[window] slice to func so inner functions see real time values and can handle nonuniform spacing (backward compatible: scalar dt still passed as before, so lineardiff and other callers unchanged) - polydiff/_polydiff: rename dt -> dt_or_t; build t from scalar dt or use the passed array directly; filter NaN values from x before np.polyfit so missing data no longer causes all-NaN outputs; handle all-NaN windows by returning NaN arrays rather than crashing - Tests: add test_polydiff_nan_inputs and test_polydiff_variable_step to test_diff_methods.py, and extend test_slide_function in test_utils.py to cover the array-t code path Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

pavelkomarov · 2026-03-06T05:27:22Z

I generated this thing with a Claude Code instance spawned by an OpenClaw assistant. Mostly playing to see how well it can do, and this is a good problem to throw it at. I think this is a particularly tricky issue to address, with a redesign on multiple levels maybe necessary, so it may not get us all the way. But even most of the way of some of the way is a win. It's really quite something and makes evident why Amazon is laying off so many people.

pavelkomarov

And an overall comment: The utilities tests are failing presently, because the test_peakdet function depends on some randomness that's only initialized at the beginning of the file. If you change the order of tests by adding one that comes before it alphabetically, or if you use randomness, then the answers it ends up getting don't match the check values, even if they're not wrong. It's a fragile test that way. Apologies. Can you set it up with its own source of randomness or otherwise make it more bulletproof?

pavelkomarov · 2026-03-06T20:42:08Z

pynumdiff/tests/test_diff_methods.py

        fig.suptitle(f'{diff_method.__name__}', fontsize=16)
+
+
+def test_polydiff_nan_inputs():


I don't love the addition of two more tests here. You can see the way I've structured the tests is basically to commonly test all methods for correctness in the first and methods' ability to support multi-dimensional data in the second. I don't explicitly test for missing data anywhere, which is maybe a weakness, but only a few methods can handle missing data, so for those I've just checked offline directly. See Fig 57 here https://arxiv.org/pdf/2512.09090. We could add a test for just those in theory, but I'd like to cover them all by the one test, not just polydiff.

Agreed — the two standalone tests were too narrow. Removed both and added polydiff_irreg_step as an alias in diff_methods_and_params instead, mirroring the pattern for spline_irreg_step. It now exercises all 6 test functions with and without noise, which gives broader confidence for free. Held off on a dedicated NaN-handling test since that really should cover all missing-data-capable methods rather than singling out polydiff.

pavelkomarov · 2026-03-06T20:43:35Z

pynumdiff/tests/test_diff_methods.py

+    assert np.max(np.abs(x_hat[outside_gap] - x_clean[outside_gap])) < 0.1
+
+
+def test_polydiff_variable_step():


Variable step is covered in the first test in this file. See how I've done it there, and add polydiff with variable step as just another method in there. That will save lines and ensure coverage over all the test functions.

Done. Added a polydiff_irreg_step alias (same pattern as spline_irreg_step) and wired it into the irregular-step branch in the test body so it gets t_irreg as input. Had to compute fresh error bounds — they came out nearly identical to regular polydiff, which is reassuring: the variable-step path is not noticeably less accurate than the scalar-dt path on jittered-but-close-to-uniform grids.

pavelkomarov · 2026-03-06T20:48:34Z

pynumdiff/utils/utility.py


+        # When t is an array, pass the actual time values for the window so func can handle
+        # nonuniform spacing; otherwise pass dt as before (backward compatible).
+        dt_window = dt_or_t if scalar_dt else t[window]


This is the right approach if the window is a constant size in index-space, I guess. I had imagined a window in actual t-space, which would then have more or fewer index points within it depending on how dense or thinned out they were over that little region. That adds the complication of having to find which index points fall within a window, and depending on how many your optimal polynomial degree might change. So this is perhaps the easier and more amenable approach to doing polyfits. What are your thoughts?

Good question, and a real tradeoff. A window in t-space would be more principled when the data is heavily non-uniform — a dense cluster of points in a short time interval would get a well-conditioned polyfit, while a sparse region with the same index-count window might be fitting over a very wide t-range and extrapolating at the edges. On the other hand, a t-space window complicates the implementation significantly: you need to binary-search for the window boundaries, the count of points inside varies per window (so the polynomial degree might need to adapt or you might get too few points for a stable fit), and the weighted overlap accumulation in slide_function assumes a fixed kernel size. The index-space approach here sidesteps all of that and is still a meaningful improvement over the previous uniform-dt assumption — the polynomial is now fit to the actual time coordinates even if the window boundaries are in index-space. Happy to revisit if you want to pursue the t-space approach, but it would need a more substantial redesign of slide_function.

I think this is the right approach and maybe the key insight that makes solving this issue more tractable.

pavelkomarov · 2026-03-06T20:55:13Z

pynumdiff/utils/utility.py

+
        # run the function on the window and add weighted results to cumulative answers
-        x_window_hat, dxdt_window_hat = func(x[window], dt, *args, **kwargs)
+        x_window_hat, dxdt_window_hat = func(x[window], dt_window, *args, **kwargs)


Can this be named dt_or_t_window? I just want to be extra clear that it can carry either information

Renamed. dt_or_t_window is clearer — at a glance it says both what the variable can be and that it is already sliced to the current window.

pavelkomarov · 2026-03-06T20:55:53Z

pynumdiff/polynomial_fit.py

-    :param np.array[float] x: data to differentiate
-    :param float dt: step size
+    :param np.array[float] x: data to differentiate. May contain NaN values (missing data);
+        NaNs are excluded from fitting and imputed by polynomial interpolation.


Nice, love the idea of imputing. That's what the kalman filter function does with missing data in the kalman_smooth module.

Thanks for the note — worth calling out the parallel to the kalman smoother. The imputation here is implicit: windows straddling a NaN gap simply exclude the missing points from the polyfit, then evaluate the fitted polynomial at those t-values to fill them in. Effectively polynomial interpolation, which is fine for short gaps. For longer gaps or noisier data the kalman approach would be more robust, but for the sliding-window polyfit use case this keeps things self-contained.

pavelkomarov · 2026-03-06T20:59:32Z

pynumdiff/polynomial_fit.py

+        # Filter out NaN values so polyfit doesn't fail on missing data
+        mask = ~np.isnan(x)
+        if not np.any(mask): # all NaN window — return NaNs
+            return np.full_like(x, np.nan, dtype=float), np.full_like(x, np.nan, dtype=float)


Do you need to specify dtype=float here? NaNs are already float values, so I think it'll work if you exclude that parameter. I prefer to be less verbose when possible.

Removed. You are right that NaN implies float — np.full_like will infer the dtype from x, which at that point already contains NaN values (so it must be float). Unnecessary verbosity.

pavelkomarov · 2026-03-06T21:00:28Z

pynumdiff/polynomial_fit.py

+
+        # Filter out NaN values so polyfit doesn't fail on missing data
+        mask = ~np.isnan(x)
+        if not np.any(mask): # all NaN window — return NaNs


Can you put the line below on this same line after the if:? I know it's nonstandard style, but it better matches my style through this codebase.

Done. It is a one-liner guard clause — keeping it on one line makes the "early exit on degenerate input" intent obvious without taking up visual space.

pavelkomarov · 2026-03-06T21:03:46Z

pynumdiff/polynomial_fit.py

+        mask = ~np.isnan(x)
+        if not np.any(mask): # all NaN window — return NaNs
+            return np.full_like(x, np.nan, dtype=float), np.full_like(x, np.nan, dtype=float)
+        w = weights[mask] if weights is not None else None


Can you put this inline in the poly fit(w=) call? Again, I know that's nonstandard style, perhaps, but since we only use the masked w weights in the one place, I prefer not to define an actual named variable for it.

Inlined. Agreed — it only appears in one place so naming it w just adds a line of indirection for no gain.

- Rename dt_window -> dt_or_t_window in slide_function for clarity - Inline if-body and remove dtype=float in _polydiff NaN guard - Inline weights[mask] directly into np.polyfit call - Replace standalone test_polydiff_nan_inputs/variable_step with polydiff_irreg_step entry in the main parametrized test (covers all test functions, matches codebase style) - Fix test_peakdet to use its own RandomState(42) so it's robust to test ordering

pavelkomarov · 2026-03-06T21:16:38Z

Addressed all review feedback in commit 5201024. Summary of decisions:

Tests: Removed the two standalone tests in favor of a polydiff_irreg_step alias in the main parametrized suite — covers all 6 test functions rather than a single scenario, and matches the pattern already established by spline_irreg_step. Held off on a NaN-handling test since ideally it should cover all missing-data-capable methods to fit the codebase style.
dt_or_t_window: Better name — communicates both the type ambiguity and that it's already sliced to the window.
_polydiff style: Inlined the early-exit guard and the weights expression; both appear only once so naming them adds indirection without clarity.
test_peakdet: Gave it a local RandomState(42) so it's no longer sensitive to test ordering.

All 280 tests pass.

pavelkomarov commented Mar 6, 2026

View reviewed changes

finishing touches post claude

21db7f4

pavelkomarov merged commit c1fe325 into master Mar 6, 2026
1 check passed

pavelkomarov deleted the fix/issue-173-polydiff-variable-step-nan branch March 10, 2026 22:21

pavelkomarov mentioned this pull request Mar 10, 2026

Robustdiff variable step #192

Merged

		fig.suptitle(f'{diff_method.__name__}', fontsize=16)


		def test_polydiff_nan_inputs():

		assert np.max(np.abs(x_hat[outside_gap] - x_clean[outside_gap])) < 0.1


		def test_polydiff_variable_step():

Conversation

pavelkomarov commented Mar 6, 2026

Summary

Changes

pynumdiff/utils/utility.py — slide_function

pynumdiff/polynomial_fit.py — polydiff / _polydiff

pynumdiff/tests/test_diff_methods.py

pynumdiff/tests/test_utils.py

Uh oh!

pavelkomarov commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pavelkomarov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pavelkomarov Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pavelkomarov Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pavelkomarov Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pavelkomarov Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pavelkomarov Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pavelkomarov Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pavelkomarov commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`pynumdiff/utils/utility.py` — `slide_function`

`pynumdiff/polynomial_fit.py` — `polydiff` / `_polydiff`

`pynumdiff/tests/test_diff_methods.py`

`pynumdiff/tests/test_utils.py`

pavelkomarov commented Mar 6, 2026 •

edited

Loading

pavelkomarov Mar 6, 2026 •

edited

Loading

pavelkomarov Mar 6, 2026 •

edited

Loading

pavelkomarov Mar 6, 2026 •

edited

Loading

pavelkomarov Mar 6, 2026 •

edited

Loading

pavelkomarov Mar 6, 2026 •

edited

Loading

pavelkomarov Mar 6, 2026 •

edited

Loading

pavelkomarov commented Mar 6, 2026 •

edited

Loading