Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT-#6144: Stop defaulting at API layer for a bunch of methods #6145

Merged
merged 18 commits into from May 17, 2023

Conversation

vnlitvinov
Copy link
Collaborator

What do these changes do?

  • first commit message and PR title follow format outlined here

    NOTE: If you edit the PR title to match this format, you need to add another commit (even if it's empty) or amend your last commit for the CI job that checks the PR title to pick up the new PR title.

  • passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
  • passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
  • signed commit with git commit -s
  • Resolves Stop defaulting-to-pandas at frontend for a bunch of Series/DataFrame methods #6144
  • tests added and passing
  • module layout described at docs/development/architecture.rst is up-to-date

mvashishtha and others added 10 commits May 16, 2023 15:08
…bunch of Series/DataFrame methods

Co-authored-by: mvashishtha <mahesh@ponder.io>
Co-authored-by: Naren Krishna <naren@ponder.io>
Signed-off-by: Vasily Litvinov <fam1ly.n4me@yandex.ru>
Signed-off-by: Vasily Litvinov <fam1ly.n4me@yandex.ru>
Signed-off-by: Vasily Litvinov <fam1ly.n4me@yandex.ru>
Signed-off-by: Vasily Litvinov <fam1ly.n4me@yandex.ru>
Signed-off-by: Vasily Litvinov <fam1ly.n4me@yandex.ru>
@vnlitvinov vnlitvinov added the ponder modin work desired for the ponder product label May 16, 2023
@vnlitvinov vnlitvinov changed the title Stop defaulting at API layer for a bunch of methods FEAT-#6144: Stop defaulting at API layer for a bunch of methods May 16, 2023
Copy link
Collaborator

@RehanSD RehanSD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few quick questions but overall LGTM!

df = df.squeeze(axis=1)
return df.mask(cond, **kwargs)

return DataFrameDefault.register(mask)(self, cond, **kwargs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we pass in pandas.DataFrame.mask here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably can, I think I've copy-pasted some of my other fix where the behaviour was different... I had already implemented the change locally before your comment, but haven't tested to it wasn't pushed. Should be fixed now.

modin/core/storage_formats/base/query_compiler.py Outdated Show resolved Hide resolved
@doc_utils.add_refer_to("Series.argsort")
def argsort(self, **kwargs): # noqa: PR02
"""
Return the integer indices that would sort the Series values.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit but why have we added a docstring here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our doc checker insists on having docstrings, so we'll have to either silence it with pragmas or fill in at least some docstrings... and for methods which are a direct match for pandas I feel it useful to copy their docstrings for clarity.

"mask",
cond,
other=other,
return self._create_or_update_from_compiler(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using this instead of just the constructor? Or will this method just use the constructor if in place is false, and otherwise do an in place update?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. It handles the inplace properly.

return self._default_to_pandas(pandas.Series.nlargest, n=n, keep=keep)
if len(self._query_compiler.columns) == 0:
raise NotImplementedError(
"Series.nlargest is not implemented for empty Series."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true for pandas as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, just checked, and in pandas it returns Series([], dtype: float64) thingy... should be fixed now.

Comment on lines +3531 to +3538
**kwargs,
): # noqa: PR01, RT01, D200
return self._create_or_update_from_compiler(
self._query_compiler.interpolate(
method=method,
axis=axis,
limit=limit,
inplace=False,

Check notice

Code scanning / CodeQL

Mismatch between signature and use of an overridden method Note

Overridden method signature does not match
call
, where it is passed an argument named 'bins'. Overriding method
method Series.value_counts
matches the call.
@vnlitvinov vnlitvinov marked this pull request as ready for review May 17, 2023 09:26
@vnlitvinov vnlitvinov requested a review from a team as a code owner May 17, 2023 09:26
Copy link
Collaborator

@RehanSD RehanSD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left 2 quick questions - lgtm otherwise!

@@ -1905,7 +1905,6 @@ def mask(
level=level,
errors=errors,
try_cast=try_cast,
squeeze_self=not self._is_dataframe,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this argument removed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this is because mask and pct_change are equivalent in pandas whether applied to a Series or a 1 col DataFrame?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, they seem to behave identically in pandas

return super().mask(cond, other, inplace, axis, level, errors, try_cast)
# This method exists because `errors` has different default value for Series :(
return super(Series, self).mask(
cond, other, inplace, axis, level, "raise", try_cast
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we pass in errors here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also seems this argument has been deprecated since 1.5? So we should be coercing types anyways?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That deprecation is the reason why I'm not passing the errors through, as it is expected to always behave as if raise is given.

Copy link
Collaborator

@RehanSD RehanSD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@RehanSD RehanSD merged commit 53a04b8 into modin-project:master May 17, 2023
56 checks passed
@vnlitvinov vnlitvinov deleted the stop-frontend-default-1 branch May 18, 2023 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ponder modin work desired for the ponder product
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stop defaulting-to-pandas at frontend for a bunch of Series/DataFrame methods
2 participants