Skip to content

count_nested should correctly handle the empty NestedFrame case, to produce meta #291

@gitosaurus

Description

@gitosaurus

Feature request

When a function that accepts DataFrame or NestedFrame is able to handle the zero-size input case, returning a correctly structure zero-size output, then lsdb.catalog.Catalog.map_partitions is able to correctly deduce the meta= for that function, without obliging the user to do so.

Consider the case of count_nested:

from nested_pandas.utils import count_nested

def count_points(pts):
    if len(pts) == 0:
        return pts.assign(n_lc=0)
    return count_nested(pts, "lc")


ztf_objects.map_partitions(count_points).compute()

The above is relatively compact, but still requires the user to know that the additional column added will be called n_lc. If count_nested made that same test within itself, the interface would become even cleaner:

from nested_pandas.utils import count_nested

def count_points(pts):
    return count_nested(pts, "lc")


ztf_objects.map_partitions(count_points).compute()

Before submitting
Please check the following:

  • I have described the purpose of the suggested change, specifying what I need the enhancement to accomplish, i.e. what problem it solves.
  • I have included any relevant links, screenshots, environment information, and data relevant to implementing the requested feature, as well as pseudocode for how I want to access the new functionality.
  • If I have ideas for how the new feature could be implemented, I have provided explanations and/or pseudocode and/or task lists for the steps.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions