Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Clarify behavior of interp in presence of duplicated x's (= discontinuities) #20028

Open
anntzer opened this issue Oct 4, 2021 · 7 comments

Comments

@anntzer
Copy link
Contributor

anntzer commented Oct 4, 2021

Issue with current documentation:

Consider e.g.

np.interp(x, [0, 1, 1, 2], [a, b, c, d])

If x does not contain the value 1, then the interpolation is well defined (it's a discontinuous piecewise linear function, going from a to b on [0, 1] and then from c to d on [1, 2]). AFAICT, np.interp indeed implements this correctly.

However, the docs of interp explicitly state that xp needs to be (strictly) increasing (thus without any duplicated values). It may be worth explicitly stating whether the behavior in the presence of duplicated xp's (i.e., modelling a discontinuous piecewise linear function) is guaranteed, or whether one cannot rely on it.

Idea or request for content:

No response

@bashtage
Copy link
Contributor

bashtage commented Oct 5, 2021

I think this is a bug. Should have a check for strict increasing, e.g., np.all(np.diff(xp, 1) > 0) to remove ambiguity.

@bashtage
Copy link
Contributor

bashtage commented Oct 5, 2021

Missing check here:

if xp.ndim != 1 or fp.ndim != 1:
raise ValueError("Data points must be 1-D sequences")
if xp.shape[0] != fp.shape[0]:
raise ValueError("fp and xp are not of the same length")

In fact, it doesn't do any check at all with monotonicity.

np.interp([0.5, 1.0, 1.5, 2.0], [0, 1, -1, 2], [5, 10, 7, 20])

is accepted without complaint.

@anntzer
Copy link
Contributor Author

anntzer commented Oct 5, 2021

That is an intended speed tradeoff (#17860).

@bashtage
Copy link
Contributor

bashtage commented Oct 5, 2021

Hmm, seems like a bad default IMO since it places too much of an expectation of expertise on the end user. Would have been better if the check could be disabled using a keyword argument, e.g.,

np.interp(...,...,...,check_monotonicity=False)  # Default true

But if a check can't be added then definitely needs clarification about what it does in the case where xp is not monotonic (including the case where it is not even weakly monotonic).

@bashtage
Copy link
Contributor

bashtage commented Oct 5, 2021

Looks like it always chooses the rightmost value when searching, once it finds a set of xp values that are the same. This appears to be the case because it always uses a >= comparison when moving the end points of the bisection search:

/* finally, find index by bisection */
while (imin < imax) {
const npy_intp imid = imin + ((imax - imin) >> 1);
if (key >= arr[imid]) {
imin = imid + 1;
}
else {
imax = imid;
}
}
return imin - 1;

@seberg
Copy link
Member

seberg commented Oct 5, 2021

Looks like it always chooses the rightmost value when searching

I don't think so? imax should be able to jump into the middle of a range with equal values (or the lower one)? And there would be no mechanism to recover.

I suppose when the needle is larg(ish) checking for monoticity would not matter speed-wise, and even if we can't change the default, I guess there may still be a point in giving the option to check.

@bashtage
Copy link
Contributor

bashtage commented Oct 5, 2021

@seberg You are right. I only thought through the case where where xp had an exact match for x. When there is no exact match, you get results all over the place.

  import numpy as np
  
  xp = np.random.default_rng(1234).random(100)
  xp.sort()
  xp = np.tile(xp,(100,1)).T.ravel()
  
  y = np.array([list(range(100)) for _ in range(100)]).ravel()
# Exact match x to xp - should be all 99.
np.interp(list(set(xp)),xp, y)
array([99., 99., 99., 99., 99., 99., 99., 99., 99., 99., 99., 99., 99.,
       99., 99., 99., 99., 99., 99., 99., 99., 99., 99., 99., 99., 99.,
       99., 99., 99., 99., 99., 99., 99., 99., 99., 99., 99., 99., 99.,
       99., 99., 99., 99., 99., 99., 99., 99., 99., 99., 99., 99., 99.,
       99., 99., 99., 99., 99., 99., 99., 99., 99., 99., 99., 99., 99.,
       99., 99., 99., 99., 99., 99., 99., 99., 99., 99., 99., 99., 99.,
       99., 99., 99., 99., 99., 99., 99., 99., 99., 99., 99., 99., 99.,
       99., 99., 99., 99., 99., 99., 99., 99., 99.])

Ineact match

np.interp(list(set(xp+.001)),xp, y)

These seem nearly random, but are clearly deterministic and just depend on hard-to-guess initial conditions of the bisection.

array([87.96769148, 93.84243592, 85.12541822, 84.67573251, 88.36026352,
       80.31543693, 94.93194401, 83.27246395, 89.3561392 , 94.02486031,
       91.0792067 , 83.9940723 , 92.93339844, 90.10397319, 92.26065234,
       92.82270194, 88.96416697, 91.53563299, 73.80561465, 91.40399054,
       96.80500538, 42.99499853, 94.64887174, 93.15935879, 74.07060325,
       32.64797238, 68.2636088 , 61.83641506, 80.89680615, 73.20359998,
       95.59032797, 91.74899275, 71.82678962, 93.1533579 , 92.0555191 ,
       69.40665182, 99.        , 60.82353685, 52.16686546, 90.16518594,
       16.7068954 , 48.42219743, 93.13043543, 84.91363759, 66.96696608,
       90.97557275, 88.57983828, 89.88841608, 93.74941515, 84.65643975,
       88.11523823, 30.36056718, 95.17903515, 85.8402638 , 81.3488221 ,
       90.16558052, 92.66362915, 26.94405898, 65.21250809, 79.70498582,
       69.95912546, 76.17549066, 91.76644704, 93.67513296, 92.05279913,
       89.58828826, 61.77119305, 89.67882374, 97.00172546, 80.43926469,
       77.91837697, 97.42794708,  6.43397811, 88.18515918, 82.28039367,
       86.41655247, 58.47935471, 97.64947033, 79.75484932, 89.92125392,
       71.68713907, 93.0715536 , 50.81298313, 93.75417528, 96.64065537,
       85.61698634, 59.60334096, 85.02912845, 78.44619974, 93.69784159,
       85.91437588, 85.90517943, 90.28802669, 92.86700456, 91.04774257,
       12.70079911, 82.9704702 , 24.19615079, 96.87864579, 64.21288643])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants