Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Grouper objects #8776

Merged
merged 9 commits into from Mar 7, 2024
Merged

Refactor Grouper objects #8776

merged 9 commits into from Mar 7, 2024

Conversation

dcherian
Copy link
Contributor

@dcherian dcherian commented Feb 22, 2024

Some refactoring towards the Grouper refactor described in #8510

  1. Rename to Resampler from ResampleGrouper
  2. Refactor to a single "ResolvedGrouper" object that encapsulates the underling Grouper/Resampler object: UniqueGrouper, BinGrouper, or TimeResampler.

1. Rename to Resampler from ResampleGrouper
2. Move code from common.resample to TimeResampler
@dcherian dcherian changed the title Refactor resampling. Refactor Grouper objects Feb 22, 2024


class Grouper(ABC):
@property
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't make this abstractmethod since it'll be deleted once the squeeze kwarg is completely deprecated.

Copy link
Collaborator

@mathause mathause left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I find deciphering what the individual classes stand for nontrivial. Maybe a short comment/ docstring can be added?

xarray/core/groupby.py Outdated Show resolved Hide resolved
xarray/core/groupby.py Outdated Show resolved Hide resolved
xarray/core/groupby.py Outdated Show resolved Hide resolved
is accomplished by calling the `factorize` method on the encapsulated Grouper
object.

This class is private API, while Groupers are public.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not rename it to _ResolvedGrouper if it is private?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've just tended to not do this. Only things listed in api.rst and/or exposed in __init__.py are considered public. https://docs.xarray.dev/en/stable/getting-started-guide/faq.html#what-parts-of-xarray-are-considered-public-api


@property
def size(self) -> int:
return len(self)

def __len__(self) -> int:
return len(self.full_index) # TODO: full_index not def, abstractmethod?
return len(self.full_index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value of self.full_index is False (which doesn't have a __len__), should it be an empty pd.Index instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default is uninitialized (init=False). It will be assigned in the call to factorize in __post_init__ so should be fine.

xarray/core/groupby.py Outdated Show resolved Hide resolved
xarray/core/groupby.py Outdated Show resolved Hide resolved
# This design makes it clear to mypy that
# codes, group_indices, unique_coord, and full_index
# are set by the factorize method on the derived class.
def factorize(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't this one have a group argument, like the others?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is a wrapper around the others, it knows what group is and passes it down to the Grouper instance.

dcherian and others added 2 commits March 3, 2024 14:19
Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com>
@dcherian dcherian added plan to merge Final call for comments and removed needs review labels Mar 3, 2024
@dcherian dcherian merged commit 8468be0 into pydata:main Mar 7, 2024
27 of 29 checks passed
@dcherian dcherian deleted the refactor-groupby branch March 7, 2024 21:50
dcherian added a commit to dcherian/xarray that referenced this pull request Mar 15, 2024
* main: (31 commits)
  correctly encode/decode _FillValues/missing_values/dtypes for packed data (pydata#8713)
  Expand use of `.oindex` and `.vindex` (pydata#8790)
  Return a dataclass from Grouper.factorize (pydata#8777)
  [skip-ci] Fix upstream-dev env (pydata#8839)
  Add dask-expr for windows envs (pydata#8837)
  [skip-ci] Add dask-expr dependency to doc.yml (pydata#8835)
  Add `dask-expr` to environment-3.12.yml (pydata#8827)
  Make list_chunkmanagers more resilient to broken entrypoints (pydata#8736)
  Do not attempt to broadcast when global option ``arithmetic_broadcast=False`` (pydata#8784)
  try to get the `upstream-dev` CI to complete again (pydata#8823)
  Bump the actions group with 1 update (pydata#8818)
  Update documentation for clarity (pydata#8817)
  DOC: link to zarr.convenience.consolidate_metadata (pydata#8816)
  Refactor Grouper objects (pydata#8776)
  Grouper object design doc (pydata#8510)
  Bump the actions group with 2 updates (pydata#8804)
  tokenize() should ignore difference between None and {} attrs (pydata#8797)
  fix: remove Coordinate from __all__ in xarray/__init__.py (pydata#8791)
  Fix non-nanosecond casting behavior for `expand_dims` (pydata#8782)
  Migrate treenode module. (pydata#8757)
  ...
dcherian added a commit to dcherian/xarray that referenced this pull request Mar 16, 2024
* main: (42 commits)
  correctly encode/decode _FillValues/missing_values/dtypes for packed data (pydata#8713)
  Expand use of `.oindex` and `.vindex` (pydata#8790)
  Return a dataclass from Grouper.factorize (pydata#8777)
  [skip-ci] Fix upstream-dev env (pydata#8839)
  Add dask-expr for windows envs (pydata#8837)
  [skip-ci] Add dask-expr dependency to doc.yml (pydata#8835)
  Add `dask-expr` to environment-3.12.yml (pydata#8827)
  Make list_chunkmanagers more resilient to broken entrypoints (pydata#8736)
  Do not attempt to broadcast when global option ``arithmetic_broadcast=False`` (pydata#8784)
  try to get the `upstream-dev` CI to complete again (pydata#8823)
  Bump the actions group with 1 update (pydata#8818)
  Update documentation for clarity (pydata#8817)
  DOC: link to zarr.convenience.consolidate_metadata (pydata#8816)
  Refactor Grouper objects (pydata#8776)
  Grouper object design doc (pydata#8510)
  Bump the actions group with 2 updates (pydata#8804)
  tokenize() should ignore difference between None and {} attrs (pydata#8797)
  fix: remove Coordinate from __all__ in xarray/__init__.py (pydata#8791)
  Fix non-nanosecond casting behavior for `expand_dims` (pydata#8782)
  Migrate treenode module. (pydata#8757)
  ...
dcherian added a commit to kmsquire/xarray that referenced this pull request Mar 21, 2024
* upstream/main: (765 commits)
  increase typing annotations coverage in `xarray/core/indexing.py` (pydata#8857)
  pandas 3 MultiIndex fixes (pydata#8847)
  FIX: adapt handling of copy keyword argument in scipy backend for numpy >= 2.0dev (pydata#8851)
  FIX: do not cast _FillValue/missing_value in CFMaskCoder if _Unsigned is provided (pydata#8852)
  Implement setitem syntax for `.oindex` and `.vindex` properties (pydata#8845)
  Support pandas copy-on-write behaviour (pydata#8846)
  correctly encode/decode _FillValues/missing_values/dtypes for packed data (pydata#8713)
  Expand use of `.oindex` and `.vindex` (pydata#8790)
  Return a dataclass from Grouper.factorize (pydata#8777)
  [skip-ci] Fix upstream-dev env (pydata#8839)
  Add dask-expr for windows envs (pydata#8837)
  [skip-ci] Add dask-expr dependency to doc.yml (pydata#8835)
  Add `dask-expr` to environment-3.12.yml (pydata#8827)
  Make list_chunkmanagers more resilient to broken entrypoints (pydata#8736)
  Do not attempt to broadcast when global option ``arithmetic_broadcast=False`` (pydata#8784)
  try to get the `upstream-dev` CI to complete again (pydata#8823)
  Bump the actions group with 1 update (pydata#8818)
  Update documentation for clarity (pydata#8817)
  DOC: link to zarr.convenience.consolidate_metadata (pydata#8816)
  Refactor Grouper objects (pydata#8776)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants