Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Add limit_area to EA ffill/bfill #56616

Merged
merged 10 commits into from
Jan 3, 2024

Conversation

rhshadrach
Copy link
Member

@rhshadrach rhshadrach commented Dec 25, 2023

Follow up to #56531.

This is slightly complicated by the EA.fillna deprecation.

  • If EA.fillna is implemented, we raise whenever limit_area is not None
  • If EA._pad_or_backfill is implemented and does not have a limit_area arg, we raise when limit_area is not None

My plan is to introduce a deprecation in 3.1 whenever _pad_or_backfill gets called (regardless of whether limit_area is None) informing EA authors they need to add this argument. Assuming this is a good way to do, I'll open a tracking issue for this.

For dtypes where the corresponding mask is not used after filling (e.g. NDArrayBackedExtensionArray), this takes a shortcut by modifying mask prior to filling. Compared to implementing limit_area after filling, this saves 1 or 2 copies (depending on the EA) and extra computation, at the cost of making mask no longer necessarily represent NA values.

@rhshadrach rhshadrach added Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate ExtensionArray Extending pandas with custom dtypes or arrays. labels Dec 25, 2023
@rhshadrach rhshadrach added this to the 2.2 milestone Dec 25, 2023
@rhshadrach rhshadrach marked this pull request as ready for review December 27, 2023 19:55
@@ -313,7 +313,7 @@ Other enhancements
- :meth:`DataFrame.apply` now allows the usage of numba (via ``engine="numba"``) to JIT compile the passed function, allowing for potential speedups (:issue:`54666`)
- :meth:`ExtensionArray._explode` interface method added to allow extension type implementations of the ``explode`` method (:issue:`54833`)
- :meth:`ExtensionArray.duplicated` added to allow extension type implementations of the ``duplicated`` method (:issue:`55255`)
- :meth:`Series.ffill`, :meth:`Series.bfill`, :meth:`DataFrame.ffill`, and :meth:`DataFrame.bfill` have gained the argument ``limit_area`` (:issue:`56492`)
- :meth:`Series.ffill`, :meth:`Series.bfill`, :meth:`DataFrame.ffill`, and :meth:`DataFrame.bfill` have gained the argument ``limit_area``; 3rd part :class:`.ExtensionArray` authors need to add this argument to the method ``_pad_or_backfill`` (:issue:`56492`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3rd party instead of 3rd part?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put this into the ExtensionArray section as well

@@ -1012,6 +1018,12 @@ def _pad_or_backfill(
DeprecationWarning,
stacklevel=find_stack_level(),
)
if limit_area is not None:
raise NotImplementedError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only get here if 3rd party authors didn't implement this themselves, correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right - this is only hit when EA author overrides .fillna but not ._pad_or_backfill. Currently pandas .ffill will call the EA's .fillna in such a case, which can only be correctly done when limit_area is None.

@rhshadrach
Copy link
Member Author

Friendly ping @phofl @jbrockmendel

@@ -205,7 +210,21 @@ def _pad_or_backfill(
if copy:
npvalues = npvalues.copy()
new_mask = new_mask.copy()
elif limit_area is not None:
mask = mask.copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking, but why do we need this copy here?

@phofl phofl merged commit 9e87dc7 into pandas-dev:main Jan 3, 2024
50 checks passed
@phofl
Copy link
Member

phofl commented Jan 3, 2024

thx @rhshadrach

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Jan 3, 2024
mroeschke pushed a commit that referenced this pull request Jan 3, 2024
…fill) (#56720)

Backport PR #56616: BUG: Add limit_area to EA ffill/bfill

Co-authored-by: Richard Shadrach <45562402+rhshadrach@users.noreply.github.com>
@rhshadrach
Copy link
Member Author

Opened #56729 to track the deprecation

@rhshadrach rhshadrach deleted the bug_ffill_limit_area branch January 4, 2024 03:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug ExtensionArray Extending pandas with custom dtypes or arrays. Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Axis behaviour not consistent for df.interpolate for pad with limit_area passed.
2 participants