Thoughts on _jacobi.py #1

mdhaber · 2023-06-23T18:48:23Z

No description provided.

mdhaber

@HDembinski Last year, you opened scipy/scipy#17059. I'd be interested in moving something like that forward, although I think I'd like to work it into the framework of mdhaber/scipy#106. Here are a few thoughts I had about _jacobi.py.

If you don't have time to pursue this right now or are no longer interested, that's fine, but if you have a moment, I was hoping you could answer two questions about line 157/158 (the even powers of h used if np.polyfit) and line 164 (no appearance of Student's t distribution). Images are from D'Errico's writup. Thanks for your help!

src/jacobi/_jacobi.py

mdhaber · 2023-06-23T18:57:40Z

src/jacobi/_jacobi.py

+    maxiter : int, optional
+        Maximum number of iterations of the algorithm.
+    maxgrad : int, optional
+        Maximum grad of the extrapolation polynomial.


Consider expressing in terms of "degree" of the polynomial. Perhaps "grad" refers to highest derivative that is nonzero?

Yes, that should be "degree". I don't know why I called it maxgrad, usually I try to follow established names used in established libraries. That may be a bit of German language influence, we call "degree" "grad".

Ah makes sense.

mdhaber · 2023-06-23T19:05:18Z

src/jacobi/_jacobi.py

+    return (-3 * f0 + 4 * f1 - f2) * (0.5 / h)
+
+
+def _first(method, f0, f, x, i, h, args):


It looks like the purpose of this function is to detect whether NaNs are encountered and, if so, choose the appropriate one-sided formula. If this were to be added to SciPy, I'd suggest that this function just determine which formula should be used, then call _derive (_differentiate?), possibly passing the values as which f has already been evaluated (to eliminate the duplicate calculation). The purpose is to avoid repeating the formulas.

I agree that sounds like a cleaner design, but I remember I chose this design to avoid a duplicate calculation.

I figured. I'd just suggest that there are other ways to avoid the duplicate calculation. I will probably work on that in my PR and then I'll have something more concrete to say.

mdhaber · 2023-06-23T19:07:12Z

src/jacobi/_jacobi.py

+            else:
+                method = 0
+    if method == 0:
+        return method, None, (f1 - f2) * norm


Suggested change

return method, None, (f1 - f2) * norm

return method, None, (f2 - f1) * norm

I think this produces the negative of the derivative estimate as-is.

No this should be correct. I always compute f1 - f2 and h = (x1 - x2) * factor consistently everywhere, so h and norm should also be negative for x1 < x2. I hope that one of the unit tests breaks if you make that change.

I'd suggest taking a look; there are only a few lines to follow starting at 121 then jumping into this function. I noticed the numerical effect before finding the cause here. I saw that the _first derivative estimate changes sign depending on whether it's doing central or one-sided differences.

I haven't run the tests because I didn't install this properly (just copy-pasted), but I wouldn't be surprised if they didn't catch it. (Unless you have tests that confirm that polynomials only require the expected number of iterations.) It only affects the first estimate because every other iteration uses _derive. It could affect the number of iterations, but has little effect on the final result.

src/jacobi/_jacobi.py

mdhaber · 2023-06-23T20:21:02Z

src/jacobi/_jacobi.py

+            start = i - (grad + 1)
+            stop = i + 1
+            q, c = np.polyfit(
+                h[start:stop] ** 2, fd[start:], grad, rcond=None, cov=True


Re:

I can see why the odd power terms are not included when the derivative is computed with central differences. Is that still true when the derivative is estimated with one-sided differences?

That's a good point. I think you are right and this should only be done for the central derivatives. Forward and backward derivatives should use a polynomial in h instead of h^2.

This may explain why the performance of the forward/backward derivatives was always worse than the central derivative. I don't see a fundamental reason for that in this scheme.

Well.. I'd think that they will require more iterations because you have twice as many error terms to cancel to get to a given order. But yes, I think that adjusting this will help.

Yes, but on the other hand in each iteration you do half as many calculations, assuming that f0 is computed only once in the beginning. It should roughly cancel out.

Ah, theoretically that is possible, yes. I didn't notice that when h is reduced, two of the three one-sided points can be the same as in the previous iteration, so you only need to evaluate the function once per iteration with one-sided differences. Nice.

I don't think the code currently takes advantage of that, though; it performs two function evaluations in _derive whether using a one-sided or two-sided stencil. This would work if you use f(x0), f(x0 + h), f(x0 + h/factor) as the evaluation points. For example, with factor=0.5, you could use f(x0), f(x0 + h), f(x0 + 2*h) so that you only need to evaluate the function at one new point per step size.

Currently, though, you use f(x0), f(x0 + h), and f(x + 2h) with a default factor of 1/4. You would need to adjust the weights based on factor if that is user-adjustable.

Ah, I forgot about that. I have to check, but then the h^2 powers in the polynomial fit may be correct for forward/backward formula, since it is second order like the central difference formula. At least that was the idea I had there.

No, I don't think the order of the formula you use controls the power on h. It's a matter of cancelling out error terms. With central difference, there are only even-powered error terms, so you can use h**2. But for one-sided difference, all powers of h remain and need to be cancelled.

When these things are adjusted, each new function evaluation will reduce the order of the error by one whether you are using one-sided or central difference.

I'll be doing the math on this today.

On one hand, mentally if I do Richardson extrapolation on two second order approximations that have both odd and even order error terms, I can only hope to solve for coefficients that satisfy two criteria: the first derivative itself remains, and the second order error term is cancelled. Symbolically, this would appear to leave the third order error terms if I know nothing about the coefficients.

On the other hand, I know that if I have five function evaluations, I can solve for coefficients that will give me a fourth order approximation of the derivative.

So there is definitely a way to use the n+1 function evaluations to get an nth order approximation. I don't think you can get that in general by extrapolating from n/2 second order approximations because in forming the second order approximations, you've locked some degrees of freedom you could have used to cancel out extra error terms.

For instance, suppose you have evaluated $f(x)$, $f(x+h/4)$, $f(x+h/2)$, $f(x+h)$, and $f(x+2h)$. (This is what you're currently doing with a step factor of 1/4). You can get a 4th order formula by choosing five weights $c_i$.

$c_0 f(x) + c_1 f(x +h/4) + c_2 f(x +h/2) + c_3 f(x+h) + c_4 f(x+2h) = hf'(x) + hO(h^4)$

to satisfy five constraints: canceling out all the terms of Taylor series expansion through $f^{(4)}(x)$ except the one with $f'(x)$.

Instead, you choose three weights $d_i$ to get a second-order approximation:

$d_0 f(x) + d_1 f(x+h) + d_1 f(x+2h) = hf'(x) + hO(h^2)$

and you get another second-order approximation like:

$d_0 f(x) + d_1 f(x+h/4) + d_1 f(x+h/2) = \frac{h}{4}f'(x) + \frac{h}{4}O(h^2)$

You get to choose two more coefficients, $w_1$ and $w_2$, to weight your two lower-order approximations to get a higher-order approximation, so you can only satisfy two new constraints. You have to preserve the first derivative, so you can only cancel one higher-order term, leaving you with a third-order approximation. Only if you're lucky would you happen to cancel an extra term and get a fourth-order approximation, and I don't think you're lucky with this stencil and step factor. We could probably choose the step factor to allow for the cancellation of the extra error term, but then there's no guarantee that two steps of the same ratio will let you get a sixth-order approximation and so on.

OK, here's something a bit more concrete. I modified the Jacobi code as follows:

Correct the sign error discussed above Thoughts on _jacobi.py #1 (comment)

Allow step sizes larger than 0.5 (so we can see convergence before running into numerical garbage).

Here's some code that allows us to plot the error as a function of the initial step size while we play with the initial step size and maxgrad (while maxiter stays fixed).

f = np.exp df = np.exp x = 1 df_ref = df(x) maxiter = 5 maxgrad = 0 hs = np.logspace(np.log10(1), np.log10(5), 30) err_central = [] err_oneside = [] for h in hs: step = (h, 0.25) df_approx, _ = jacobi(f, x, maxiter=maxiter, step=step, diagonal=True, method=0, maxgrad=maxgrad) err_central.append(abs(df_approx - df_ref)) df_approx, _ = jacobi(f, x, maxiter=maxiter, step=step, diagonal=True, method=1, maxgrad=maxgrad) err_oneside.append(abs(df_approx - df_ref)) err_central = np.array(err_central) err_oneside = np.array(err_oneside) order_central = np.polyfit(np.log10(hs), np.log10(err_central), 1)[0] order_oneside = np.polyfit(np.log10(hs), np.log10(err_oneside), 1)[0] print(f'Central difference order: {order_central}') print(f'One-sided difference convergence order: {order_oneside}') plt.loglog(hs, err_central, label=f'Central difference (order: {order_central})') plt.loglog(hs, err_oneside, label=f'One-sided difference (order: {order_oneside})') plt.legend() plt.xlabel('Initial step size') plt.ylabel('Error')

When maxgrad=0, we would expect both central-difference and one-sided difference to give us 2nd-order convergence.

Great! When maxgrad=1, we would hope that both central-difference and one-sided difference result in a fourth-order estimate, but:

maxgrad=2

maxgrad=3

If you adjust np.polyfit to look for a fit in h for the one-sided method instead of h**2, you can start to see convergence of order maxgrad + 1. I think this makes sense, because maxgrad is the degree of the polynomial, and that would correspond with the highest order error term you can cancel out. (For central difference, you get order 2*(maxgrad+1), which also makes sense for the same reason.)

Thanks very cool, thanks for investigating and fixing this.

mdhaber · 2023-06-23T20:30:55Z

src/jacobi/_jacobi.py

+            # pulls have roughly unit variance, however,
+            # the pull distribution is not gaussian and looks
+            # more like student's t
+            rei = c[-1, -1] ** 0.5


It seems like you're using this idea:

so I was expecting the use of Student's t distribution, but I don't see it. Is the idea that you're just reporting the standard error instead of the size of a confidence interval?

I am not following the original recipe, because the Students-t distribution is not exact for this case, since it assumes Gaussian fluctuations for the residuals. The residuals in this case come from both round-off error and truncation error, which are not Gaussian. Their probability distribution is not well-defined.

In light of this ambiguity, I chose am empirical approach and tried out several examples. I could not find an estimate for the error which is generally satisfactory, so I chose the simplest possible calculation (which is also the fastest) and just return the standard error reported by the fit. The error is rather an order of magnitude estimate than an interval with a well-defined statistical coverage of X%.

Ok. Yeah I thought the multivariate t idea may have been practical, but I prefer not to abuse statistics, too. Just checking.

HDembinski · 2023-06-26T08:52:20Z

@mdhaber Sorry for the silence. I still very much would like to move this to scipy. I can only work on this in my freetime and I am currently handling another private project. If someone (maybe you) is interested in helping me with this, I would be happy to work with them. This would be a way to move this forward.

mdhaber · 2023-06-26T14:37:33Z

Great. Yes, I can work on it with you. How about this.

I want/need derivatives of scalar-valued functions soon for my work on scipy.stats, so I will finish up mdhaber/scipy#106 and then open a PR on the main SciPy repo. It's going to be a private function for now, so we don't need to worry about backwards compatibility if we want to change things later.

Then, as time permits, we can work on incorporating some of the features you have here. For instance, we can compare the merits of the np.polyfit approach to what I have and switch, if desired. And of course, we would want add support for multivariate, vector-valued functions for it to be more broadly useful.

I know you already have these features here, so the other approach would be to mostly copy what you have. However, I have a pretty different initial set of requirements, so it made sense for me to start fresh. The other benefit is that by working on my own version, I get more familiarity with the subject. After that, we'll both have experience, and then we can combine the best of both to put together a public function. It will be good to finally have numerical differentiation in SciPy - it's been b discussed for decades!

HDembinski · 2023-06-26T16:23:21Z

Sounds like a good plan, I am happy to work you on this. I am also ok with your idea of starting again from scratch.

I had a quick look at mdhaber/scipy#106 and I saw that you calculate the weights for different orders (I think). I am convinced that with the DERIVEST approach, it makes no sense to calculate anything that's higher order than the central derivative, since you get the benefits of that effectively from the polynomial extrapolation.

mdhaber · 2023-06-26T16:41:51Z

I saw that you calculate the weights for different orders (I think)

Yes.

I am convinced that with the DERIVEST approach, it makes no sense to calculate anything that's higher order than the central derivative

Maybe. You might still need/want to calculate weights for a one-sided stencil to save function evaluations; see #1 (comment).

since you get the benefits of that effectively from the polynomial extrapolation.

I didn't use polyfit, though. A call to solve is faster than a call to polyfit, and it only has to be done once per import (assuming the same sequence of step sizes) because it's cached. So after that is done, this really replaces a call to polyfit with a dot product (@). Considering that this is for use on an elementwise function and we can reuse the weights, it's bound to be faster for the intended use case.

IIUC, if you were using with a polynomial of exactly the right degree, these approaches must be equivalent if they are both producing approximations of the same order. The downside of using exactly matching the degree to the number of points is the noise sensitivity. But again,

we can compare the merits of the np.polyfit approach to what I have and switch, if desired

The nice thing about having both versions is that we can compare the performance differences! One set of benchmark functions is the CDFs of all of SciPy's distributions because we can compare the results against the PDFs. Do you know of any other existing benchmark set?

Update: I used the SciPy distribution CDFs as a benchmark

Code:

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
from scipy.optimize import _zeros_py as zeros
from time import perf_counter

times1 = []
times2 = []
acc1 = []
acc2 = []
err_ratio1 = []
err_ratio2 = []
nits = []
rtol = np.sqrt(np.finfo(np.float64).eps)

for case in stats._distr_params.distcont:
    distname, params = case
    print(distname)
    dist = getattr(stats, distname)(*params)
    x = dist.median() + 0.1
    # x = dist.ppf(rng.random(size=100))
    ref = dist.pdf(x)

    tic = perf_counter()
    res = zeros._differentiate(dist.cdf, x, rtol=rtol)
    toc = perf_counter()

    nits.append(res.nit)

    true_error = abs(res.df-ref)/ref
    times1.append(toc-tic)
    acc1.append(true_error)
    err_ratio1.append(res.error/true_error)

    tic = perf_counter()
    res, error = jacobi(dist.cdf, x, diagonal=True, rtol=rtol)
    toc = perf_counter()

    true_error = abs(res-ref)/ref
    times2.append(toc-tic)
    acc2.append(true_error)
    err_ratio2.append(error/true_error)

plt.hist(np.log10(times1), bins=np.linspace(-4, 0, 25), label='differentiate', alpha=0.5)
plt.hist(np.log10(times2), bins=np.linspace(-4, 0, 25), label='jacobi', alpha=0.5)
plt.legend()
plt.title('log10 of execution time (s)')
plt.show()

# acc1 = np.asarray(acc1)
# acc2 = np.asarray(acc2)
# acc1[acc1 == 0] = 1e-16
# acc2[acc2 == 0] = 1e-16
# plt.hist(np.log10(acc1), label='differentiate', alpha=0.5, bins=np.linspace(-16, -2, 25))
# plt.hist(np.log10(acc2), label='jacobi', alpha=0.5, bins=np.linspace(-16, -2, 25))
# plt.legend()
# plt.title('log10 of true error')
# plt.show()

# err_ratio1 = np.asarray(err_ratio1)
# err_ratio2 = np.asarray(err_ratio2)
#
# err_ratio1 = err_ratio1[np.isfinite(err_ratio1)]
# err_ratio2 = err_ratio2[np.isfinite(err_ratio2)]
# plt.hist(np.log10(err_ratio1), label='differentiate', alpha=0.5, bins=np.linspace(-5, 5, 25))
# plt.hist(np.log10(err_ratio2), label='jacobi', alpha=0.5, bins=np.linspace(-5, 5, 25))
# plt.legend()
# plt.title('log10 of error ratio')
# plt.show()

Results:

With relative tolerance set at the square root of floating point epsilon (~ 1e-8 ), and taking the derivative of a scalar-valued function at a single (scalar) point:

log10 of the execution time

log10 of the actual relative (to the true derivative) error

log10 of the ratio between the error estimate / true error (positive means overestimate)

Same but with relative error tolerance at 1e-12:

Same but taking the derivative at 100 points in a single vectorized call. (For jacobi, this means diagonal=True.)

Not too much difference really! _differentiate is a tad faster, but it overestimates the error at the loose rtol. Both get about the same accuracy at a given rtol setting, and both have pretty accurate error estimates for tighter rtol.

Update _jacobi.py

efd715b

mdhaber commented Jun 23, 2023

View reviewed changes

mdhaber changed the title ~~Update _jacobi.py~~ Thoughts on _jacobi.py Jun 23, 2023

HDembinski mentioned this pull request Jun 26, 2023

Make diagonal=True smarter? HDembinski/jacobi#37

Open

This was referenced Jun 28, 2023

ENH: optimize.check_grad: support complex-valued functions scipy/scipy#18772

Open

ENH: optimize._differentiate: numerical differentiation of elementwise functions scipy/scipy#18811

Merged

mdhaber closed this Jul 28, 2023

mdhaber mentioned this pull request Apr 17, 2024

ENH: Robust and fast numerical derivative for error propagation scipy/scipy#17059

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thoughts on _jacobi.py #1

Thoughts on _jacobi.py #1

mdhaber commented Jun 23, 2023

mdhaber left a comment •

edited

Loading

mdhaber Jun 23, 2023

HDembinski Jun 26, 2023 •

edited

Loading

mdhaber Jun 26, 2023

mdhaber Jun 23, 2023

HDembinski Jun 26, 2023

mdhaber Jun 26, 2023

mdhaber Jun 23, 2023

HDembinski Jun 26, 2023 •

edited

Loading

mdhaber Jun 26, 2023 •

edited

Loading

mdhaber Jun 23, 2023

HDembinski Jun 26, 2023 •

edited

Loading

mdhaber Jun 26, 2023

HDembinski Jun 26, 2023

mdhaber Jun 26, 2023 •

edited

Loading

HDembinski Jun 28, 2023

mdhaber Jun 28, 2023 •

edited

Loading

mdhaber Jun 29, 2023 •

edited

Loading

mdhaber Jun 29, 2023 •

edited

Loading

HDembinski Jul 3, 2023

mdhaber Jun 23, 2023

HDembinski Jun 26, 2023 •

edited

Loading

mdhaber Jun 26, 2023 •

edited

Loading

HDembinski commented Jun 26, 2023 •

edited

Loading

mdhaber commented Jun 26, 2023

HDembinski commented Jun 26, 2023 •

edited

Loading

mdhaber commented Jun 26, 2023 •

edited

Loading

		return (-3 * f0 + 4 * f1 - f2) * (0.5 / h)


		def _first(method, f0, f, x, i, h, args):

	return method, None, (f1 - f2) * norm
	return method, None, (f2 - f1) * norm

Thoughts on _jacobi.py #1

Thoughts on _jacobi.py #1

Conversation

mdhaber commented Jun 23, 2023

mdhaber left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HDembinski Jun 26, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HDembinski Jun 26, 2023 • edited Loading

Choose a reason for hiding this comment

mdhaber Jun 26, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HDembinski Jun 26, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdhaber Jun 26, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdhaber Jun 28, 2023 • edited Loading

Choose a reason for hiding this comment

mdhaber Jun 29, 2023 • edited Loading

Choose a reason for hiding this comment

mdhaber Jun 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HDembinski Jun 26, 2023 • edited Loading

Choose a reason for hiding this comment

mdhaber Jun 26, 2023 • edited Loading

Choose a reason for hiding this comment

HDembinski commented Jun 26, 2023 • edited Loading

mdhaber commented Jun 26, 2023

HDembinski commented Jun 26, 2023 • edited Loading

mdhaber commented Jun 26, 2023 • edited Loading

mdhaber left a comment •

edited

Loading

HDembinski Jun 26, 2023 •

edited

Loading

HDembinski Jun 26, 2023 •

edited

Loading

mdhaber Jun 26, 2023 •

edited

Loading

HDembinski Jun 26, 2023 •

edited

Loading

mdhaber Jun 26, 2023 •

edited

Loading

mdhaber Jun 28, 2023 •

edited

Loading

mdhaber Jun 29, 2023 •

edited

Loading

mdhaber Jun 29, 2023 •

edited

Loading

HDembinski Jun 26, 2023 •

edited

Loading

mdhaber Jun 26, 2023 •

edited

Loading

HDembinski commented Jun 26, 2023 •

edited

Loading

HDembinski commented Jun 26, 2023 •

edited

Loading

mdhaber commented Jun 26, 2023 •

edited

Loading