Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC/API: limit argument to interpolate is somewhat non-intuitive #9218

Closed
shoyer opened this issue Jan 9, 2015 · 10 comments · Fixed by #10691
Closed

DOC/API: limit argument to interpolate is somewhat non-intuitive #9218

shoyer opened this issue Jan 9, 2015 · 10 comments · Fixed by #10691
Labels
API Design Docs Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Milestone

Comments

@shoyer
Copy link
Member

shoyer commented Jan 9, 2015

The current documentation for the limit parameter looks like this:

limit : int, default None.
    Maximum number of consecutive NaNs to fill.

Several features were not immediately obvious to me and could probably be resolved by slightly better docstring or API design:

  1. What happens where the max number of consecutive NaNs is reached? pandas simply stops replacing values, e.g., if there is a gap of 3 values and limit=2, pandas replaces the first 2 values. The alternative would be to entirely ignore gaps of more than 2 values. I'm not saying this would be a better alternative, but we should clarify this.

  2. The limit refers to forward filling only, even though interpolation is not inherently directional. It would be nice if there was some way to trigger a limit for back filling at the same time, e.g., with forward_limit and backward_limit arguments:

    >>> pd.Series([1, np.nan, np.nan, np.nan, 5]).interpolate(forward_limit=1, backward_limit=1)
    0     1
    1     2
    2   NaN
    3     4
    4     5
    dtype: float64
    

Thoughts?

@jreback
Copy link
Contributor

jreback commented Jan 9, 2015

@shoyer this is very well defined when reindexing (e.g. for your first point, you simply don't fill forward/backward anymore). So should be the same for interpolation (and btw, you can require that a method be provided, e.g. forward/backward when using limit, no need to add another kw. This should only be uni-directional, way to complicated otherwise.

@shoyer
Copy link
Member Author

shoyer commented Jan 9, 2015

@jreback I agree this is well defined for reindex. But interpolate makes no reference to backward or forward filling, regardless of the method (e.g., consider method='linear', the default). Hence the confusion.

@jreback
Copy link
Contributor

jreback commented Jan 9, 2015

ahh that's right

but I think the direction of interpolation is by definition forward always right? (as these are mostly scipy routines) ?

@shoyer
Copy link
Member Author

shoyer commented Jan 9, 2015

Take a look at the scipy routines -- they really make no reference to forwards or backwards. They just interpolate between points. As you can see here, we handle the limit argument in pandas itself: https://github.com/pydata/pandas/blob/e2b014ce6791dc1907b9ed4dd3cd23cacea28af3/pandas/core/common.py#L1538

@jreback
Copy link
Contributor

jreback commented Jan 9, 2015

no what I mean is interpolation is by definition forward ATM
as their is no way to specify it
I mean u can just flip the array of backward and flip after done

can think about a direction kw

@TomAugspurger
Copy link
Contributor

@shoyer correct, we do all the limit handling.

The alternative would be to entirely ignore gaps of more than 2 values

I think I remember a SO post asking how to do this actually...

The limit refers to forward filling only, even though interpolation is not inherently directional

I guess it depends on the context. You might not want to fill backward for some time series. But that's up to the user to decide, so having the option would be nice.

As far as implementation goes, it shouldn't be too difficult, just rework the limit logic a bit. I wrote that originally, but I'm not sure I'll have time to get to this soon 😞

@lmjohns3
Copy link

I just tried adding a little bit of code to address this issue!

I also updated the test code that seemed to be aimed at interpolation limits.

I've never contributed to Pandas before so I'm not sure what the procedure usually is. I'd be happy if there was some discussion to make sure this change makes sense. I'm also unsure where to update documentation for the change.

What should I do next?

@TomAugspurger
Copy link
Contributor

Contribution guidelines are here

And there's some more information in the wiki

Basically, you'll want to fork this repository and open up a pull request.

@lmjohns3
Copy link

Ok, thanks!

I suppose the discussion about expected behavior, etc. will take place on the PR.

@lukecyca
Copy link

The alternative would be to entirely ignore gaps of more than 2 values

I think I remember a SO post asking how to do this actually...

For posterity, here is the SO post that discusses this:

https://stackoverflow.com/questions/30533021/interpolate-or-extrapolate-only-small-gaps-in-pandas-dataframe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Docs Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants