Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX-#5112: allows empty partition to be passed into query_compiler.dt_prop_map #5133

Merged
merged 8 commits into from
Oct 27, 2022

Conversation

billiam-wang
Copy link
Collaborator

@billiam-wang billiam-wang commented Oct 18, 2022

Signed-off-by: Bill Wang billiam@ponder.io

What do these changes do?

Adds special case when empty partition is passed into dt_prop_map function. Previously, calling squeeze on a DataFrame partition with no columns would return the same DataFrame and then attempt to access a Series property. After the fix, a DataFrame partition with no columns simply returns an empty DataFrame.

  • first commit message and PR title follow format outlined here

    NOTE: If you edit the PR title to match this format, you need to add another commit (even if it's empty) or amend your last commit for the CI job that checks the PR title to pick up the new PR title.

  • passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
  • passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
  • signed commit with git commit -s
  • Resolves BUG: (v0.16.0) Series object being treated as a data frame object when using dt accessor #5112
  • tests added and passing
  • module layout described at docs/development/architecture.rst is up-to-date

@billiam-wang billiam-wang requested a review from a team as a code owner October 18, 2022 09:39
@@ -147,7 +147,10 @@ def _dt_prop_map(property_name):

def dt_op_builder(df, *args, **kwargs):
"""Access specified date-time property of the passed frame."""
prop_val = getattr(df.squeeze(axis=1).dt, property_name)
squeezed_df = df.squeeze(axis=1)
if isinstance(squeezed_df, pandas.DataFrame) and len(squeezed_df.columns) == 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it ever possible for squeezed_df to ever be an empty Series?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

squeezed_df can become an empty Series if df is originally a DataFrame with a single column but no rows.

If squeezed_df becomes an empty Series, there are 2 possible outcomes. If the Series dtype is datetime64[ns], it should return an empty DataFrame as expected. If the Series dtype is anything else, it will error because dt is only accessible to datetimelike values.

Is there a situation where the passed in df would be a DataFrame with a single column but no rows and the dtype is not datetime64[ns] when it is supposed to be?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of such a case.

…y_compiler.dt_prop_map

Signed-off-by: Bill Wang <billiam@ponder.io>
Signed-off-by: Bill Wang <billiam@ponder.io>
Signed-off-by: Bill Wang <billiam@ponder.io>
Signed-off-by: Bill Wang <billiam@ponder.io>
Signed-off-by: Bill Wang <billiam@ponder.io>
Signed-off-by: Bill Wang <billiam@ponder.io>
Copy link
Collaborator

@mvashishtha mvashishtha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some minor comments.

modin/pandas/test/test_series.py Outdated Show resolved Hide resolved
modin/pandas/test/test_series.py Outdated Show resolved Hide resolved
modin/core/storage_formats/pandas/query_compiler.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@pyrito pyrito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pyrito
Copy link
Collaborator

pyrito commented Oct 26, 2022

Might need to re-run CI here @Billy2551 . Just commit an empty commit to re-trigger the runs.

@mvashishtha mvashishtha merged commit f492ba9 into modin-project:master Oct 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: (v0.16.0) Series object being treated as a data frame object when using dt accessor
3 participants