Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/CLN: Decouple Series/DataFrame.transform #35964

Merged
merged 13 commits into from
Sep 12, 2020

Conversation

rhshadrach
Copy link
Member

@rhshadrach rhshadrach commented Aug 28, 2020

First step toward #35725. Currently transform just calls aggregate, and so if we are to forbid aggregate from transforming, these need to be decoupled. Other than the bugfix (#35811), the only other behavioral change is in the error messages.

Assuming the bugfix #35811 is the correct behavior, docs/whatsnew also needs to be updated.

I wasn't sure if tests should be marked with #35725 or perhaps this PR #. Any guidance here?

@rhshadrach rhshadrach added Apply Apply, Aggregate, Transform Bug Clean labels Aug 28, 2020
@@ -222,7 +223,7 @@ def test_transform(self, string_series):
expected.columns = ["sqrt"]
tm.assert_frame_equal(result, expected)

result = string_series.transform([np.sqrt])
result = string_series.apply([np.sqrt])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line transform([np.sqrt]) occurs 5 lines up. I believe the original intention was to test apply here.

@jreback
Copy link
Contributor

jreback commented Sep 2, 2020

I wasn't sure if tests should be marked with #35725 or perhaps this PR #. Any guidance here?
either is fine

@jreback jreback added this to the 1.2 milestone Sep 2, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments

pandas/core/generic.py Outdated Show resolved Hide resolved
pandas/core/frame.py Outdated Show resolved Hide resolved
pandas/core/series.py Outdated Show resolved Hide resolved
pandas/tests/frame/apply/test_frame_apply.py Outdated Show resolved Hide resolved
pandas/tests/frame/apply/test_frame_apply.py Outdated Show resolved Hide resolved
@rhshadrach
Copy link
Member Author

@jreback Thanks for the review. Code has been reorganized and ready for another. For the comment about not adding things to the core files - I think the idea behind this that we should be distributing the implementation of methods in e.g. NDFrame into various submodules like aggregate instead of having one monolithic generic.py. Is that correct?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking really good, a few comments


def transform(
obj: FrameOrSeries,
func: Union[str, List, Dict, Callable],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have an alias for this anywhere?

pandas/core/aggregation.py Show resolved Hide resolved
# combine results
if len(results) == 0:
raise ValueError("Transform function failed")
from pandas.core.reshape.concat import concat
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put this at the top of the function

pandas/core/aggregation.py Show resolved Hide resolved
raise ValueError("transforms cannot produce aggregated results")

return result
raise NotImplementedError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove this? or is the doc-string used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just move the doc-string to shared_docs?

expected = f_sqrt
tm.assert_frame_equal(result, expected)

# list-like
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would actually split these to separate tests easier to grok


# Determine which columns op will work on
columns = []
for column in df:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm can we explicit list the exluded ones rather than this?

@@ -29,3 +32,22 @@ def _check_mixed_int(df, dtype=None):
assert df.dtypes["C"] == dtypes["C"]
if dtypes.get("D"):
assert df.dtypes["D"] == dtypes["D"]


def zip_frames(frames, axis=1):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you type args and output

expected = f_sqrt.copy()
tm.assert_series_equal(result, expected)

# list-like
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you split this one as well

@rhshadrach rhshadrach mentioned this pull request Sep 5, 2020
5 tasks
…ansform_refactor

� Conflicts:
�	pandas/core/frame.py
�	pandas/tests/frame/apply/test_frame_transform.py
�	pandas/tests/frame/common.py
�	pandas/tests/series/apply/test_series_apply.py
�	pandas/tests/series/apply/test_series_transform.py
@rhshadrach
Copy link
Member Author

@jreback changes made and green. In fixing the one test I found cumcount doesn't exist for DataFrames/Series (which, after thinking a bit, makes sense). This prompted me to add in tests for (almost) all the transformation kernels.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor comments, but can address in a followon. thanks!

@@ -342,6 +342,7 @@ Other
^^^^^
- Bug in :meth:`DataFrame.replace` and :meth:`Series.replace` incorrectly raising ``AssertionError`` instead of ``ValueError`` when invalid parameter combinations are passed (:issue:`36045`)
- Bug in :meth:`DataFrame.replace` and :meth:`Series.replace` with numeric values and string ``to_replace`` (:issue:`34789`)
- Bug in :meth:`Series.transform` would give incorrect results or raise when the argument ``func`` was dictionary (:issue:`35811`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to reshaping section (generally we want almost nothing in Other)

assert not is_series
return transform(obj.T, func, 0, *args, **kwargs).T

if isinstance(func, list):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably use is_list_like here

else:
func = {col: func for col in obj}

if isinstance(func, dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably use is_dict_like here

# GH 15931 - deprecation of renaming keys
raise SpecificationError("nested renamer is not supported")

results = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you type results

@jreback jreback merged commit ab5b38d into pandas-dev:master Sep 12, 2020
@jreback
Copy link
Contributor

jreback commented Sep 12, 2020

thanks @rhshadrach

kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020
# GH 35964
s = Series(3 * [object]) # Series that will fail on most transforms
if op in ("backfill", "shift", "pad", "bfill", "ffill"):
pytest.xfail("Transform function works on any datatype")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the issue here that the behavior is wrong or that the test isn't applicable to these cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Behavior is correct, the test is not applicable. This test is about behavior when transform fails and these ops are expected to never fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform Bug Clean
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG/QST: Series.transform with a dictionary
3 participants