Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT-#6906: Update to pandas 2.2.* #6907

Merged
merged 27 commits into from Feb 9, 2024
Merged

Conversation

anmyachev
Copy link
Collaborator

@anmyachev anmyachev commented Feb 2, 2024

What do these changes do?

Main changes (what's new in Pandas):

  • Changed the minimum versions of supported dependencies, just like Pandas did.
  • Frequency aliases that are considered deprecated have been updated.
  • Added 2 new accessors for Series object: list and struct (and simple tests for them).
  • Added new function for Series: case_when. TODO: add a simple test.
  • TODO: fix fuzzydata tests. These tests are disabled due to sqlalchemy package incompatibility. Update SQLAlchemy pin from >=1.4.31,<2.0.0 -> >=2.0.0 suhailrehman/fuzzydata#11
  • first commit message and PR title follow format outlined here

    NOTE: If you edit the PR title to match this format, you need to add another commit (even if it's empty) or amend your last commit for the CI job that checks the PR title to pick up the new PR title.

  • passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
  • passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
  • signed commit with git commit -s
  • Resolves Update to pandas 2.2.* #6906
  • tests added and passing
  • module layout described at docs/development/architecture.rst is up-to-date

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
This reverts commit 65879e4.
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Comment on lines +41 to +47
from .series_utils import (
CategoryMethods,
DatetimeProperties,
ListAccessor,
StringMethods,
StructAccessor,
)

Check notice

Code scanning / CodeQL

Cyclic import Note

Import of module
modin.pandas.series_utils
begins an import cycle.
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
@anmyachev anmyachev marked this pull request as ready for review February 8, 2024 13:26
@anmyachev anmyachev requested a review from aregm as a code owner February 8, 2024 13:26
@YarShev
Copy link
Collaborator

YarShev commented Feb 8, 2024

fuzzy / test-fuzzydata (3.9, ray) failed

@anmyachev
Copy link
Collaborator Author

fuzzy / test-fuzzydata (3.9, ray) failed

Yes, I mentioned it in the description. Generally speaking, I'm waiting for a response from fuzzydata owner.

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
@dchigarev
Copy link
Collaborator

Looks good to me (besides fuzzy tests, we may want to disable them though if we won't get a response from them in reasonable time)

@@ -185,7 +192,7 @@ def get_col_names():
usecols_md = cls._prepare_pyarrow_usecols(kwargs)

po = ParseOptions(
delimiter="\\s+" if kwargs["delim_whitespace"] else delimiter,
delimiter="\\s+" if kwargs["delim_whitespace"] is True else delimiter,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we leave the previous code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lib.no_default is equivalent for False value

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably mean the opposite. I see, we can leave the new change as is.

from pandas._libs import lib

a = lib.no_default

def f(a):
    if a:
        print("A")
    else:
        print("B")

f(a)
# A

Copy link
Collaborator

@YarShev YarShev Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although, maybe we should make this condition (and all related ones) more explicit?

delimiter="\\s+" if kwargs["delim_whitespace"] and kwargs["delim_whitespace"] is not lib.no_default else delimiter

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably mean the opposite

Let me clarify. lib.no_default is equivalent for True value in if condition, but in pandas, for particular method it's equivalent for False value.

Although, maybe we should make this condition (and all related ones) more explicit?

It seems to me that this is an additional burden on the reader of the code.

modin/experimental/core/io/sql/utils.py Show resolved Hide resolved
modin/pandas/dataframe.py Show resolved Hide resolved
modin/pandas/io.py Show resolved Hide resolved
modin/pandas/series_utils.py Outdated Show resolved Hide resolved
modin/pandas/test/test_api.py Show resolved Hide resolved
modin/pandas/test/test_groupby.py Show resolved Hide resolved
modin/pandas/test/test_series.py Show resolved Hide resolved
modin/pandas/test/test_series.py Show resolved Hide resolved
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
modin/pandas/base.py Dismissed Show dismissed Hide dismissed
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
environment-dev.yml Outdated Show resolved Hide resolved
modin/pandas/series.py Outdated Show resolved Hide resolved
modin/pandas/dataframe.py Outdated Show resolved Hide resolved
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Copy link
Contributor

@sfc-gh-dpetersohn sfc-gh-dpetersohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @anmyachev , left a couple of structural comments

modin/pandas/series.py Show resolved Hide resolved
@@ -31,6 +31,83 @@
from pandas._typing import npt


@_inherit_docstrings(pandas.core.arrays.arrow.ListAccessor)
class ListAccessor(ClassLogger):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly here, is it possible to move the _default_to_pandas into the query compiler?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved

YarShev
YarShev previously approved these changes Feb 9, 2024
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
)

def explode(self):
from modin.pandas.dataframe import DataFrame

Check notice

Code scanning / CodeQL

Cyclic import Note

Import of module
modin.pandas.dataframe
begins an import cycle.
@YarShev YarShev merged commit c555f59 into modin-project:master Feb 9, 2024
37 checks passed
@anmyachev anmyachev deleted the issue6906 branch February 10, 2024 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update to pandas 2.2.*
4 participants