[FEA] Consolidate shared method between `DataFrame` and `Series` into `Frame` #9038

isVoid · 2021-08-13T23:27:08Z

DataFrame and Series shares much similarity in structure. Operations that works on DataFrame can work (without much adaptation) in Series context, thus, we should consolidate these methods into Frame to reduce duplication.

Caveats:

Frame base class is also inherited by Index class. This consolidation would allow extra methods that can be operated on Index. It should also be pointed out that operation that depends on the frame having an index cannot be consolidated (such as sort_index).
After consolidation, the method may have shared docstring for both DataFrame and Series. As pointed out by @vyasr in Refactor Frame scans #9021 (comment). Some Pandas API docs also does this by including examples for both dataframe and series in the same docstring section. It's generally more preferable to have less code duplication to have longer docstrings.

Potential candidates:

The text was updated successfully, but these errors were encountered:

vyasr · 2021-08-16T18:33:02Z

As of #9021 I moved forward with the pandas approach of including both Series and DataFrame examples in the same docstring (in Frame), so we can standardize on that approach for the methods mentioned in this issue.

Partly addresses #9038 This function consolidate several (trivial) functions from `Series` and `DataFrame` into Frame. `__invert__` was consolidated to shared (more efficient) code path using factory methods. `deserialize` was not consolidated because we have to provide backward compatibility to older classes. But factory method was used for faster class construction. Authors: - Michael Wang (https://github.com/isVoid) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Sheilah Kirui (https://github.com/skirui-source) URL: #9059

Partially solves #9038 This PR moves binary ops from `Series` and `DataFrame` into `Frame` to reduce duplications. Because the arguments takes a different order in `series.py` and `dataframe.py` resort to `functools.wraps` to reorder the arguments while keeping the docstrings the same. Authors: - Michael Wang (https://github.com/isVoid) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #9357

#9378) This PR adds a new IndexedFrame subclass of Frame. For the moment this is a minimal class with one user-facing method and the indexer properties, but as we make progress on #9038 we are likely to find more methods that should live here. Additionally, once we remove references to Frame and Index types from the Cython layer of cudf we will be able to move all index-related logic from Frame into this class and provide a suitable ducktyped interface for methods that need to behave differently for frames depending on whether or not they have an index. In the process I also moved SingleColumnFrame into its own module. One class per file is a good organizational pattern, especially for huge classes like these, and it can also help reduce issues with circular imports and improve mypy runtime. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Ashwin Srinath (https://github.com/shwina) URL: #9378

github-actions · 2021-11-15T21:04:31Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

Partial of #9038 This is a rewrite of `reset_index` to share some common logics between `Series` and `DataFrame`. It extends it's capability to handle `level` argument for multi-level index, `col_level` and `col_fill` for multi-level column name support. And adds `name` argument support for series api. Authors: - Michael Wang (https://github.com/isVoid) - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #9750

This PR contributes to #9038, merging implementations of memory_usage and __iter__. It also removes an unnecessary override of take (saves about 25% of runtime for data <1M rows) and deprecates Series.merge, which doesn't exist in pandas. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) URL: #10167

This PR builds on #10217 and #10287 to bring full ufunc support for Index types, expanding well beyond the small set previously supported in the `cudf.core.ops` namespace. By using most of the machinery introduced for IndexedFrame in the prior two PRs we avoid duplicating much logic so that all ufunc dispatches flow through a relatively standard path of known methods prior to a common cupy dispatch. With this change we are also able to deprecate the various ufunc operations defined in cudf/core/ops.py that exist only for this purpose as well as a number of Frame methods that are not defined for the corresponding pandas types. Users of those APIs are recommended to calling the corresponding numpy/cupy ufuncs instead to leverage the new dispatch. This PR also fixes a bug where index binary operations that output booleans would previously return instances of GenericIndex, whereas those pandas operations would return numpy arrays. cudf now returns cupy arrays in those cases. Resolves #9083. Contributes to #9038. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #10346

This PR contributes to #9038 by combining common code for `append`, `astype`, `drop`, and `explode`. It's not a small LOC change, but most of the changed lines are just moving code around. Some minor rewrites of `astype` in this PR do significantly speed up those calls (2x faster for <1M rows). Additionally, I've deprecated `append` since it is now deprecated in pandas. Note that it has not been removed in pandas, so we should not remove it until they do. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #10381

This PR resolves #9038. Not all the remaining functions could be consolidated (for various reasons), but the combination was explored and where necessary the reasons not to combine the functions was documented in that issue. Some additional cleanup that was done during the exploration is also included in this PR, but ultimately `groupby` is the only method that was moved up to `IndexedFrame`. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: #11315

isVoid added feature request New feature or request Needs Triage Need team to review and classify labels Aug 13, 2021

isVoid added this to the CuDF Python Refactoring milestone Aug 13, 2021

isVoid added Python Affects Python cuDF API. tech debt labels Aug 13, 2021

isVoid added improvement Improvement / enhancement to an existing function and removed feature request New feature or request labels Aug 16, 2021

vyasr mentioned this issue Aug 16, 2021

Template function docstrings #9050

Open

isVoid mentioned this issue Aug 18, 2021

Consolidate Several Series and Dataframe Methods #9059

Merged

beckernick removed the Needs Triage Need team to review and classify label Aug 23, 2021

isVoid mentioned this issue Sep 16, 2021

[REVIEW] Dataframe.sort_index optimizations #9238

Merged

2 tasks

isVoid mentioned this issue Oct 1, 2021

Consolidate binary ops into Frame #9357

Merged

vyasr mentioned this issue Oct 5, 2021

Add IndexedFrame class and move SingleColumnFrame to a separate module #9378

Merged

github-actions bot added the inactive-30d label Nov 15, 2021

isVoid mentioned this issue Nov 22, 2021

Consolidate and improve reset_index #9750

Merged

vyasr assigned vyasr and isVoid Jan 28, 2022

vyasr mentioned this issue Jan 28, 2022

Some consolidation of indexed frame methods #10167

Merged

vyasr mentioned this issue Feb 23, 2022

Refactor array_ufunc for Index and unify across all classes #10346

Merged

vyasr mentioned this issue Mar 2, 2022

Consolidate some Frame APIs #10381

Merged

vyasr mentioned this issue Jul 20, 2022

Consolidate remaining DataFrame/Series APIs #11315

Merged

rapids-bot bot closed this as completed in #11315 Jul 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Consolidate shared method between `DataFrame` and `Series` into `Frame` #9038

[FEA] Consolidate shared method between `DataFrame` and `Series` into `Frame` #9038

isVoid commented Aug 13, 2021 •

edited by vyasr

Loading

vyasr commented Aug 16, 2021

github-actions bot commented Nov 15, 2021

[FEA] Consolidate shared method between DataFrame and Series into Frame #9038

[FEA] Consolidate shared method between DataFrame and Series into Frame #9038

Comments

isVoid commented Aug 13, 2021 • edited by vyasr Loading

vyasr commented Aug 16, 2021

github-actions bot commented Nov 15, 2021

[FEA] Consolidate shared method between `DataFrame` and `Series` into `Frame` #9038

[FEA] Consolidate shared method between `DataFrame` and `Series` into `Frame` #9038

isVoid commented Aug 13, 2021 •

edited by vyasr

Loading