Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make head() robust to empty partitions by default #380

Open
dougbrn opened this issue Feb 20, 2024 · 0 comments
Open

Make head() robust to empty partitions by default #380

dougbrn opened this issue Feb 20, 2024 · 0 comments
Assignees
Labels
bug Something isn't working enhancement New feature or request

Comments

@dougbrn
Copy link
Collaborator

dougbrn commented Feb 20, 2024

With Dask join/filter workflows, empty partitions are a common occurence. Users will usually want to do some kind of head call to inspect their result, but this will often yield an empty dataframe due to only searching the first partition. This is confusing for new users who may now think they lost all their data. We should update head to account for potential empty partitions by default, and prioritize returning a result. We think a check_all=True kwarg is appropriate to add to our implementation, something like this:

# if check_all=True and we get an empty result, prints an info message and then calls head(-1) on the rest of the partitions 
def head(self, n, npartitions=1, check_all=True):
    result = super().head(n, npartitions)
    if not result and npartitions != -1 and check_all and npartitions < self.npartitions:
       print("The first npartitions were empty, checking remaining partitions...")
       result = super().partitions[npartitions : self.npartitions].head(n, -1)
    return result

Alternatively, head was recently added to LSDB, and the implementation is robust to empty_partitions, with the additional feature that it will search partition by partition until it fulfills the requisite number of rows. It may be best to just align with their implementation: https://github.com/astronomy-commons/lsdb/blob/12271382ee6953c32d4422f0e777d05c0d1bd8f0/src/lsdb/catalog/catalog.py#L70

Behavior should be aligned between EnsembleFrame.head and Ensemble.head

@dougbrn dougbrn added bug Something isn't working enhancement New feature or request labels Feb 20, 2024
@wilsonbb wilsonbb self-assigned this Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants