Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: fix interpolate with kwarg limit area and limit direction using pad or bfill #31048

Conversation

cchwala
Copy link
Contributor

@cchwala cchwala commented Jan 15, 2020

This separates a large part of the code changes from #25141 to keep things comprehendible.

This test currently only test `limit_area`.

For `limit_direction` the implementation should later raise an error,
because `pad` and `bfill` both already define a direction. But let's
now first do the implementation of the `limit_area` for `pad`
and `bfill`.
Since methods `pad` and `bfill` in `blocks.interpolate` end up
using `missing.interpolate_2d` which can not (easily) be extended
to support `limit_area`, I introduce the new function
`missing.interpolate_1d_fill`. It is a modified copy of `interpolate_2d`
but only works for 1d data and uses newly introduced function
`_derive_indices_of_nans_to_preserve`, which is now also used in
`missing.interpolate_1d`. It works the same way as the
1D-interpolation functions which are based on scipy-interpolation which
are applied via np.apply_along_axis.
@pep8speaks
Copy link

pep8speaks commented Jan 15, 2020

Hello @cchwala! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-09-15 15:40:12 UTC

@WillAyd
Copy link
Member

WillAyd commented Jan 15, 2020

@cchwala so IIUC you are doing a precursor to solve the issue with pad and limit_area and then planning to add max_gap after that gets merged in the original PR right?

@cchwala
Copy link
Contributor Author

cchwala commented Jan 15, 2020

@WillAyd Yes. That is my plan.

I am reusing the code from the PR with max_gap that actually solves #26796. The max_gap PR will be a lot smaller when the changes from this PR are merge.

@cchwala cchwala changed the title FIX: fix interpolate with kward limit area and limit direction usung pad or bfill FIX: fix interpolate with kwarg limit area and limit direction using pad or bfill Jan 15, 2020
* black formatting
* typo
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good. If you can annotate any new functions would be helpful. Added what I think should be

@@ -477,6 +496,65 @@ def _akima_interpolate(xi, yi, x, der=0, axis=0):
else:
return [P(x, nu) for nu in der]

def interpolate_1d_fill(
values,
method="pad",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
method="pad",
method: str = "pad",

def interpolate_1d_fill(
values,
method="pad",
axis=0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
axis=0,
axis: Axis = 0,

(import from pandas._typing)

values,
method="pad",
axis=0,
limit=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
limit=None,
limit: Optional[int] = None,

method="pad",
axis=0,
limit=None,
limit_area=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
limit_area=None,
limit_area: Optional[str] = None,

limit=None,
limit_area=None,
fill_value=None,
dtype=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dtype=None,
dtype: Optional[Dtype] = None,

(import from pandas._typing)

axis=0,
limit=None,
limit_area=None,
fill_value=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fill_value=None,
fill_value: Optional[Hashable] = None,

@WillAyd WillAyd added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Jan 15, 2020
@cchwala cchwala requested a review from WillAyd January 15, 2020 22:49
@cchwala
Copy link
Contributor Author

cchwala commented Jan 15, 2020

Should be ready now. I am offline for some hours of sleep, though.

@cchwala
Copy link
Contributor Author

cchwala commented Jan 16, 2020

@WillAyd Any further requests for changes?

pandas/core/generic.py Outdated Show resolved Hide resolved
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks pretty good

pandas/core/missing.py Outdated Show resolved Hide resolved
pandas/core/internals/blocks.py Outdated Show resolved Hide resolved
@@ -6720,6 +6722,28 @@ def interpolate(
"column to a numeric dtype."
)

# Set `limit_direction` depending on `method`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason this logic does not simply belong in interpolate_2d?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or call something like clean_fill_method

Copy link
Contributor Author

@cchwala cchwala Feb 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

limit_direction has no effect in interpolate_2d.

My idea was to detect conflicting user input, e.g. pad with limit_direction='backward' as early as possible in the chain of calls to the many functions that are involved.

But I can move the logic to a function like clean_fill_method, e.g. called set_limit_direction. Should this function belong to generic.py or missing.py?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interpolate_2d is also called from fillna. only NDFrame.interpolate allows limit_direction so does make sense to validate here.

clean_fill_method checks method. probably want to avoid passing more parameters around.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls make a method similr to clean_fill_method then, e.g this logic should living pandas/core/missing.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what to do here. after merge with master, there is similar code at this location, but it is not mine...

@@ -478,6 +500,66 @@ def _akima_interpolate(xi, yi, x, der=0, axis=0):
return [P(x, nu) for nu in der]


def interpolate_1d_fill(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this under interpolate_1d

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interpolate_1d_fill is a special case needed by blocks._interpolate_with_fill() to support the limit kwargs with method pad. As you requested in the comment above interpolate_1d_fill is now called from within interpolate_2d which is what blocks._interpolate_with_fill() calls.

You request, moving it to missing.interpolate_1d will, in my opinion, not work, because missing.interpolate_1d is not reached in the call sequence for interpolating with method pad since blocks.Block.interpolate() splits up into blocks.Block._interpolate (for scipy wrappers) and blocks.Block._interpolate_with_fill (for pad methods). missing.interpolate_1d is only called in the former case, i.e. for methods using the scipy wrappers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cchwala interpolate_2d will work on a 1d array. did you investigate applying it along an axis (with masking logic) rather than creating a 1d version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see #34749 for alternative implementation calling interpolate_2d instead. interpolate_2d already has the limit logic so no need to use the preserve_nans set based logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the nice solution from #34749 as suggested above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a consequence there is no interpolate_1d_fill anymore

doc/source/whatsnew/v1.0.0.rst Outdated Show resolved Hide resolved
…ments

Test on my local machine are not affected by removing the unncessery
arguments `valid` and `invalid`, which are now derived within the
function.
@WillAyd
Copy link
Member

WillAyd commented Feb 12, 2020

@cchwala can you address comments?

@simonjayhawkins
Copy link
Member

not really sure of the status here and @simonjayhawkins had some cleanups as well, can we clarify.

I did break some bits off this PR. so does need a merge of master to see what's left (I did try this #31048 (comment))

@cchwala
Copy link
Contributor Author

cchwala commented Aug 25, 2020

Okay. I will do the merge with master today to see what is left. I guess, at least the tests will still be of value.

…imit_area_and_limit_direction_with_pad

I removed most of the code that I have added in this branch and will add it back in step by step in the next commits
Tests should be red now
@cchwala
Copy link
Contributor Author

cchwala commented Aug 26, 2020

I am now looking into the suggestion from @simonjayhawkins to use interpolate_2d instead of introducing interpolate_1d_fill to add less clutter to the already cluttered code.

@cchwala
Copy link
Contributor Author

cchwala commented Aug 26, 2020

@jreback can you please have another look? As expected there is not much left from this PR because parts of it have already been merged elsewhere. Most of your code review comments do not apply anymore, only this one.

@cchwala cchwala requested a review from jreback August 26, 2020 22:09
@cchwala
Copy link
Contributor Author

cchwala commented Aug 27, 2020

@jreback Sorry to ping you again, but it would be great if I could finish this PR with (hopefully) final adjustments tomorrow (Friday). Otherwise things would have to wait again another two weeks because I will be offline.

@jreback
Copy link
Contributor

jreback commented Aug 28, 2020

will have to wait as i am away

@cchwala
Copy link
Contributor Author

cchwala commented Aug 28, 2020

Okay. No problem.

limit=limit,
fill_value=fill_value,
dtype=self.dtype,
)

if limit_area is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put this in a function in missing and just call here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. But if I do this I could also directly integrate it into missing.interpolate_2d, e.g. like in this branch. Advantage, no need to introduce yet another function that does some interpolation. Disadvantage, obfuscates missing.interpolate_2d. For this, adding some comments would help for sure.

If you prefer the separate function in missing, what function name would you choose?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. like in this branch.

I would merge recursive solution here for review (so that ci has been run against it, i think needs changes to pandas/core/arrays/categorical.py see master...simonjayhawkins:fix_interpolate_limit_area_and_limit_direction_with_pad)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Merged and CI is running.

I have also pushed a branch with an example implementation using a separate function in missing. The name of the new function is not good, but the implementation would probably look something like this.

For both solutions, tests for interpolation pass, but I did not run the full test suite. Will see what changes are required when CI is done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test are green.

If you find the recursive solution appropriate, I would add a comment to the function that briefly explains the reasoning. Anything else that needs improvement?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cchwala can you add the type annotations as requested by @WillAyd https://github.com/pandas-dev/pandas/pull/31048/files#r367123208

@simonjayhawkins simonjayhawkins added this to the 1.2 milestone Sep 15, 2020
@simonjayhawkins
Copy link
Member

@cchwala can you resolve merge conflicts and can take another look

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks ok, some comments, can you merge master and we can get this in.

# and `limit_area` logic along a certain axis.
if limit_area is not None:

def func(values):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make this a module level function and just call it here

@jreback jreback removed this from the 1.2 milestone Nov 24, 2020
@jreback
Copy link
Contributor

jreback commented Nov 24, 2020

moving off 1.2 need merge master and update to comments.

@arw2019
Copy link
Member

arw2019 commented Nov 27, 2020

Closing in favor of #38106

@arw2019 arw2019 closed this Nov 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

limit_area and limit_direction do not have an effect when interpolation method is 'pad'
7 participants