Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Drop DataFrame.append and Series.append #12839

Merged

Conversation

galipremsagar
Copy link
Contributor

Description

This PR removes DataFrame.append & Series.append to match pandas-2.0 API. Test usages are now replaced with .concat API calls.

pytests related to these changes:

(cudfdev) pgali@dt07:/nvme/0/pgali/cudf$ pytest python/cudf/cudf/tests/test_dataframe.py::test_concat_index
============================================================================================= test session starts ==============================================================================================

python/cudf/cudf/tests/test_dataframe.py ....                                                                                                                                                            [100%]

============================================================================================== 4 passed in 1.68s ===============================================================================================
(cudfdev) pgali@dt07:/nvme/0/pgali/cudf$ pytest python/cudf/cudf/tests/test_dataframe.py::test_axes
============================================================================================= test session starts ==============================================================================================

python/cudf/cudf/tests/test_dataframe.py ...........                                                                                                                                                     [100%]

============================================================================================== 11 passed in 1.68s ==============================================================================================
(cudfdev) pgali@dt07:/nvme/0/pgali/cudf$ pytest python/cudf/cudf/tests/test_dataframe.py::test_concat_different_column_dataframe
============================================================================================= test session starts ==============================================================================================                                                                                                                                                                                           
python/cudf/cudf/tests/test_dataframe.py ............                                                                                                                                                    [100%]

============================================================================================== 12 passed in 1.74s ==============================================================================================
(cudfdev) pgali@dt07:/nvme/0/pgali/cudf$ pytest python/cudf/cudf/tests/test_dataframe.py::test_dataframe_concat_dataframe
============================================================================================= test session starts ==============================================================================================

python/cudf/cudf/tests/test_dataframe.py ............................................................................................................................................................... [ 25%]
........................................................................................................................................................................................................ [ 57%]
........................................................................................................................................................................................................ [ 89%]
.................................................................                                                                                                                                        [100%]

============================================================================================= 624 passed in 4.68s ==============================================================================================
(cudfdev) pgali@dt07:/nvme/0/pgali/cudf$ pytest python/cudf/cudf/tests/test_dataframe.py::test_dataframe_concat_series
============================================================================================= test session starts ==============================================================================================

python/cudf/cudf/tests/test_dataframe.py ................................................................                                                                                                [100%]

============================================================================================== 64 passed in 1.98s ==============================================================================================
(cudfdev) pgali@dt07:/nvme/0/pgali/cudf$ pytest python/cudf/cudf/tests/test_dataframe.py::test_dataframe_concat_series_mixed_index
============================================================================================= test session starts ==============================================================================================

python/cudf/cudf/tests/test_dataframe.py .                                                                                                                                                               [100%]

============================================================================================== 1 passed in 1.64s ===============================================================================================
(cudfdev) pgali@dt07:/nvme/0/pgali/cudf$ pytest python/cudf/cudf/tests/test_dataframe.py::test_dataframe_concat_dataframe_lists
============================================================================================= test session starts ==============================================================================================
python/cudf/cudf/tests/test_dataframe.py ............................................................................................................................................................... [ 30%]
........................................................................................................................................................................................................ [ 67%]
.........................................................................................................................................................................                                [100%]

============================================================================================= 528 passed in 5.57s ==============================================================================================
(cudfdev) pgali@dt07:/nvme/0/pgali/cudf$ pytest python/cudf/cudf/tests/test_dataframe.py::test_dataframe_concat_series_without_name
============================================================================================= test session starts ==============================================================================================

python/cudf/cudf/tests/test_dataframe.py .                                                                                                                                                               [100%]

============================================================================================== 1 passed in 1.63s ===============================================================================================
(cudfdev) pgali@dt07:/nvme/0/pgali/cudf$ pytest python/cudf/cudf/tests/test_series.py::test_series_concat_basic
============================================================================================= test session starts ==============================================================================================

python/cudf/cudf/tests/test_series.py ..................                                                                                                                                                 [100%]

============================================================================================== 18 passed in 1.20s ==============================================================================================
(cudfdev) pgali@dt07:/nvme/0/pgali/cudf$ pytest python/cudf/cudf/tests/test_series.py::test_series_concat_basic_str
============================================================================================= test session starts ==============================================================================================

python/cudf/cudf/tests/test_series.py ................                                                                                                                                                   [100%]

============================================================================================== 16 passed in 1.23s ==============================================================================================
(cudfdev) pgali@dt07:/nvme/0/pgali/cudf$ pytest python/cudf/cudf/tests/test_series.py::test_series_concat_series_with_index
============================================================================================= test session starts ==============================================================================================

python/cudf/cudf/tests/test_series.py ................                                                                                                                                                   [100%]

============================================================================================== 16 passed in 1.20s ==============================================================================================
(cudfdev) pgali@dt07:/nvme/0/pgali/cudf$ pytest python/cudf/cudf/tests/test_series.py::test_series_concat_error_mixed_types
============================================================================================= test session starts ==============================================================================================

python/cudf/cudf/tests/test_series.py .                                                                                                                                                                  [100%]

============================================================================================== 1 passed in 1.14s ===============================================================================================
(cudfdev) pgali@dt07:/nvme/0/pgali/cudf$ pytest python/cudf/cudf/tests/test_series.py::test_series_concat_list_series_with_index
============================================================================================= test session starts ==============================================================================================

python/cudf/cudf/tests/test_series.py ........................                                                                                                                                           [100%]

============================================================================================== 24 passed in 1.79s ==============================================================================================
(cudfdev) pgali@dt07:/nvme/0/pgali/cudf$ pytest python/cudf/cudf/tests/test_series.py::test_series_concat_existing_buffers
============================================================================================= test session starts ==============================================================================================

python/cudf/cudf/tests/test_series.py .                                                                                                                                                                  [100%]

============================================================================================== 1 passed in 1.21s ===============================================================================================

(cudfdev) pgali@dt07:/nvme/0/pgali/cudf$ conda list | grep "pandas"
pandas                    2.0.0rc0                 pypi_0    pypi

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@galipremsagar galipremsagar added 3 - Ready for Review Ready for review by team Python Affects Python cuDF API. 4 - Needs cuDF (Python) Reviewer improvement Improvement / enhancement to an existing function breaking Breaking change labels Feb 24, 2023
@galipremsagar galipremsagar requested a review from a team as a code owner February 24, 2023 00:40
@galipremsagar galipremsagar self-assigned this Feb 24, 2023
@galipremsagar galipremsagar requested review from bdice and mroeschke and removed request for a team February 24, 2023 00:40
expected = pd.concat([pdf, other_pd], sort=sort, ignore_index=ignore_index)
actual = cudf.concat([gdf, other_gd], sort=sort, ignore_index=ignore_index)

# In some cases, Pandas creates an empty Index([], dtype="object") for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you show a example where this happens? I think pandas would also like to return an empty RangeIndex too if the object is empty-like

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for an incorrect comment previously. So the actual issue is with how cudf & pandas represent empty columns. In case of pandas it's always an empty RangeIndex, but in cudf it's always an empty Index:

In [1]: import cudf

In [2]: import pandas as pd

In [3]: df = cudf.DataFrame()

In [4]: pdf = pd.DataFrame()

In [5]: df.columns
Out[5]: RangeIndex(start=0, stop=0, step=1)

In [6]: pdf.columns
Out[6]: Index([], dtype='object')

Due to this difference, when we concat similar dataframes, pandas still returns a RangeIndex, whereas cudf will return an empty Index. Which is why we needed this special handling here for pytest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay cool thanks for the clarification!

@galipremsagar galipremsagar merged commit e115ba5 into rapidsai:pandas_2.0_feature_branch Mar 10, 2023
@galipremsagar galipremsagar added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team 4 - Needs cuDF (Python) Reviewer labels Mar 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge breaking Breaking change improvement Improvement / enhancement to an existing function Python Affects Python cuDF API.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

3 participants