Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fix online ewma with CoW #55735

Merged
merged 1 commit into from
Oct 27, 2023
Merged

Conversation

jorisvandenbossche
Copy link
Member

Fixing some failing tests that turned up in #55732. This PR specifically fixes pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean for Copy-on-Write enabled.

The numba function takes the underlying frame's array of values, and then writes into those values:

weighted_avg[j] = cur[j]

(this weighted_avg is a view in the passed values, from weighted_avg = values[0])

That currently doesn't cause a bug, because before getting the values from the frame and passing it to the numba func, we cast to float64:

np_array = self._selected_obj.astype(np.float64).to_numpy()

This astype() call currently always returns a copy by default (even if you already have all float64 columns), but with CoW enabled, this avoids making a copy. Currently, with CoW enabled, the numba func fails because of the numpy array being marked as read-only. But if we wouldn't manually copy the data and just ignore this read-only flag, we would be mutating the calling dataframe (so marking as read-only prevented a new bug here!).

@jorisvandenbossche jorisvandenbossche added Bug Window rolling, ewma, expanding Copy / view semantics numba numba-accelerated operations labels Oct 27, 2023
@@ -52,7 +52,7 @@ def online_ewma(
exponentially weighted mean accounting minimum periods.
"""
result = np.empty(values.shape)
weighted_avg = values[0]
weighted_avg = values[0].copy()
Copy link
Member Author

@jorisvandenbossche jorisvandenbossche Oct 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the copy here inside the function, instead of copying the full array before passing it to the function, since here we only need to copy the first row of the array.

This copy ensures we don't mutate the passed values, and then this numba func will work fine on read-only input.

@mroeschke mroeschke added this to the 2.2 milestone Oct 27, 2023
Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this definitely looks like safer usage.

@mroeschke mroeschke merged commit 984d755 into pandas-dev:main Oct 27, 2023
46 of 47 checks passed
@mroeschke
Copy link
Member

Thanks @jorisvandenbossche

@jorisvandenbossche jorisvandenbossche deleted the cow-numba branch October 27, 2023 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Copy / view semantics numba numba-accelerated operations Window rolling, ewma, expanding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants