Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix exceptions when Series.interpolate's `order` parameter is missing or invalid #25246

Merged
@@ -1115,11 +1115,8 @@ def check_int_bool(self, inplace):
fill_value=fill_value,
coerce=coerce,
downcast=downcast)
# try an interp method
try:
m = missing.clean_interp_method(method, **kwargs)
except ValueError:
m = None
# validate the interp method
m = missing.clean_interp_method(method, **kwargs)

This conversation was marked as resolved by gfyoung

This comment has been minimized.

Copy link
@nmusolino

nmusolino Feb 9, 2019

Author Contributor

I believe the check if m is not None on the following line (1121) and the subsequent raise on line 1132 are now unnecessary, and can be removed. I'd appreciate any second opinions on that.

This comment has been minimized.

Copy link
@gfyoung

gfyoung Feb 9, 2019

Member

I'm all for removing it, so long as we are sure that we can't get None ever returned for m.

This comment has been minimized.

Copy link
@nmusolino

nmusolino Feb 9, 2019

Author Contributor

Great. missing.clean_interp_method will not return None; if its method argument is None, it will raise (line 105 of missing.py below):

def clean_interp_method(method, **kwargs):
order = kwargs.get('order')
valid = ['linear', 'time', 'index', 'values', 'nearest', 'zero', 'slinear',
'quadratic', 'cubic', 'barycentric', 'polynomial', 'krogh',
'piecewise_polynomial', 'pchip', 'akima', 'spline',
'from_derivatives']
if method in ('spline', 'polynomial') and order is None:
raise ValueError("You must specify the order of the spline or "
"polynomial.")
if method not in valid:
raise ValueError("method must be one of {valid}. Got '{method}' "
"instead.".format(valid=valid, method=method))
return method

if m is not None:
r = check_int_bool(self, inplace)
@@ -293,8 +293,8 @@ def _interpolate_scipy_wrapper(x, y, new_x, method, fill_value=None,
bounds_error=bounds_error)
new_y = terp(new_x)
elif method == 'spline':
# GH #10633
if not order:
# GH #10633, #24014
if order is None or not (0 < order):
This conversation was marked as resolved by gfyoung

This comment has been minimized.

Copy link
@gfyoung

gfyoung Feb 9, 2019

Member

not (0 < order) --> order >= 0 ?

This comment has been minimized.

Copy link
@nmusolino

nmusolino Feb 10, 2019

Author Contributor

Done. I had written it the original way so that it would raise on order=numpy.nan. After the simplification, numpy.nan won't trigger the exception, but I think that's a reasonable balance.

This comment has been minimized.

Copy link
@gfyoung

gfyoung Feb 10, 2019

Member

Hmm...I see. Then let's add a comment about that.

This comment has been minimized.

Copy link
@nmusolino

nmusolino Feb 10, 2019

Author Contributor

Do you think we should catch the case where order is nan? I can add a comment either way, as you suggest.

This comment has been minimized.

Copy link
@gfyoung

gfyoung Feb 10, 2019

Member

Actually, you can use pandas.isnull, which would simplify this i.e.

if pandas.isnull(order) or order >= 0

Then we wouldn't even need the comment.

This comment has been minimized.

Copy link
@nmusolino

nmusolino Feb 10, 2019

Author Contributor

Of course, that's a good suggestion. I used isna, which is already imported in the module and would catch a case of None.

This comment has been minimized.

Copy link
@gfyoung

gfyoung Feb 11, 2019

Member

Yep, that works, since isna and isnull are one in the same function.

raise ValueError("order needs to be specified and greater than 0")
terp = interpolate.UnivariateSpline(x, y, k=order, **kwargs)
new_y = terp(new_x)
@@ -1064,19 +1064,32 @@ def test_interp_limit(self):
# GH 9217, make sure limit is an int and greater than 0
methods = ['linear', 'time', 'index', 'values', 'nearest', 'zero',
'slinear', 'quadratic', 'cubic', 'barycentric', 'krogh',
'polynomial', 'spline', 'piecewise_polynomial', None,
'polynomial', 'spline', 'piecewise_polynomial',
This conversation was marked as resolved by gfyoung

This comment has been minimized.

Copy link
@nmusolino

nmusolino Feb 9, 2019

Author Contributor

The specific value None is now checked in the test case test_interp_invalid_method below.

'from_derivatives', 'pchip', 'akima']
s = pd.Series([1, 2, np.nan, np.nan, 5])
msg = (r"Limit must be greater than 0|"
"time-weighted interpolation only works on Series or"
r" DataFrames with a DatetimeIndex|"
r"invalid method '(polynomial|spline|None)' to interpolate|"
"Limit must be an integer")
r"Limit must be an integer|"
r"You must specify the order of the spline")
for limit in [-1, 0, 1., 2.]:
for method in methods:
with pytest.raises(ValueError, match=msg):
s.interpolate(limit=limit, method=method)

This comment has been minimized.

Copy link
@gfyoung

gfyoung Feb 11, 2019

Member

While we can, let's do same thing here: break up this test so that we can leverage pytest parameterization.

This comment has been minimized.

Copy link
@nmusolino

nmusolino Feb 11, 2019

Author Contributor

I agree with you that this test ought to be broken up and simplified.

I will try to do that tonight, but if it proves difficult, I may propose doing it in a separate PR.

This comment has been minimized.

Copy link
@nmusolino

nmusolino Feb 11, 2019

Author Contributor

Okay, ready for another look.


def test_interp_invalid_method(self):
s = Series([1, 3, np.nan, 12, np.nan, 25])

invalid_methods = [None, 'nonexistent_method']
for method in invalid_methods:
msg = "method must be one of.*\\. Got '{}' instead".format(method)
with pytest.raises(ValueError, match=msg):
s.interpolate(method=method)
# When an invalid method and invalid limit (such as -1) are
# provided, the error message reflects the invalid method.
with pytest.raises(ValueError, match=msg):
s.interpolate(method=method, limit=-1)

def test_interp_limit_forward(self):
s = Series([1, 3, np.nan, np.nan, np.nan, 11])

@@ -1277,7 +1290,7 @@ def test_interp_limit_no_nans(self):
@pytest.mark.parametrize("method", ['polynomial', 'spline'])
def test_no_order(self, method):
s = Series([0, 1, np.nan, 3])
msg = "invalid method '{}' to interpolate".format(method)
msg = "You must specify the order of the spline or polynomial"
with pytest.raises(ValueError, match=msg):
s.interpolate(method=method)

@@ -1315,16 +1328,17 @@ def test_spline_interpolation(self):

@td.skip_if_no_scipy
def test_spline_error(self):
# see gh-10633
# see GH-10633, GH-24014
s = pd.Series(np.arange(10) ** 2)
s[np.random.randint(0, 9, 3)] = np.nan
msg = "invalid method 'spline' to interpolate"
msg = "You must specify the order of the spline or polynomial"
with pytest.raises(ValueError, match=msg):
s.interpolate(method='spline')

msg = "order needs to be specified and greater than 0"
with pytest.raises(ValueError, match=msg):
s.interpolate(method='spline', order=0)
for invalid_order in [-1, -1., 0, 0., np.nan]:
with pytest.raises(ValueError, match=msg):
s.interpolate(method='spline', order=invalid_order)
This conversation was marked as resolved by gfyoung

This comment has been minimized.

Copy link
@gfyoung

gfyoung Feb 9, 2019

Member

Let's separate this code into a separate test. That way, we can also use pytest parameterization.

This comment has been minimized.

Copy link
@nmusolino

nmusolino Feb 10, 2019

Author Contributor

Done.


def test_interp_timedelta64(self):
# GH 6424
ProTip! Use n and p to navigate between commits in a pull request.
You can’t perform that action at this time.