Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX-#5112: allows empty partition to be passed into query_compiler.dt_prop_map #5133

Merged
merged 8 commits into from
Oct 27, 2022
6 changes: 5 additions & 1 deletion modin/core/storage_formats/pandas/query_compiler.py
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,11 @@ def _dt_prop_map(property_name):

def dt_op_builder(df, *args, **kwargs):
"""Access specified date-time property of the passed frame."""
prop_val = getattr(df.squeeze(axis=1).dt, property_name)
squeezed_df = df.squeeze(axis=1)
if isinstance(squeezed_df, pandas.DataFrame) and len(squeezed_df.columns) == 0:
billiam-wang marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it ever possible for squeezed_df to ever be an empty Series?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

squeezed_df can become an empty Series if df is originally a DataFrame with a single column but no rows.

If squeezed_df becomes an empty Series, there are 2 possible outcomes. If the Series dtype is datetime64[ns], it should return an empty DataFrame as expected. If the Series dtype is anything else, it will error because dt is only accessible to datetimelike values.

Is there a situation where the passed in df would be a DataFrame with a single column but no rows and the dtype is not datetime64[ns] when it is supposed to be?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of such a case.

return squeezed_df
assert isinstance(squeezed_df, pandas.Series)
prop_val = getattr(squeezed_df.dt, property_name)
billiam-wang marked this conversation as resolved.
Show resolved Hide resolved
if isinstance(prop_val, pandas.Series):
return prop_val.to_frame()
elif isinstance(prop_val, pandas.DataFrame):
Expand Down
16 changes: 16 additions & 0 deletions modin/pandas/test/test_series.py
Original file line number Diff line number Diff line change
Expand Up @@ -1733,6 +1733,22 @@ def test_dt(timezone):
df_equals(modin_series.dt.end_time, pandas_series.dt.end_time)
df_equals(modin_series.dt.to_timestamp(), pandas_series.dt.to_timestamp())

def dt_with_empty_partition(lib):
# For context, see https://github.com/modin-project/modin/issues/5112
df_a = lib.DataFrame({"A": [lib.to_datetime("26/10/2020")]})
df_b = lib.DataFrame({"B": [lib.to_datetime("27/10/2020")]})
df = lib.concat([df_a, df_b], axis=1)
eval_result = df.eval("B - A", engine="python")
# BaseOnPython had a single partition after the concat, and it
# maintains that partition after eval. In other execution modes,
# eval() should re-split the result into two column partitions,
# one of which is empty.
if isinstance(df, pd.DataFrame) and get_current_execution() != "BaseOnPython":
assert eval_result._query_compiler._modin_frame._partitions.shape == (1, 2)
return eval_result.dt.days

eval_general(pd, pandas, dt_with_empty_partition)


@pytest.mark.parametrize(
"data", test_data_with_duplicates_values, ids=test_data_with_duplicates_keys
Expand Down