Skip to content

Count Nested: Sort output columns of by behavior alphabetically for nested-dask meta #109

@dougbrn

Description

@dougbrn

Feature request
When by is set for count_nested, the output columns are ordered as per value_counts default behavior. This ordering is value-based and thus unpredictable for dask. We should have by output columns be sorted alphabetically to create a meta-friendly output for Nested-Dask. The PR should just involve implementing the behavior like this:

else:
        # this may be able to be sped up using tolists() as well
        counts = df[nested].apply(lambda x: x[by].value_counts(sort=False))
        counts = counts.rename(columns={colname: f"n_{nested}_{colname}" for colname in counts.columns})
        counts = counts.reindex(sorted(counts.columns), axis=1)

Before submitting
Please check the following:

  • I have described the purpose of the suggested change, specifying what I need the enhancement to accomplish, i.e. what problem it solves.
  • I have included any relevant links, screenshots, environment information, and data relevant to implementing the requested feature, as well as pseudocode for how I want to access the new functionality.
  • If I have ideas for how the new feature could be implemented, I have provided explanations and/or pseudocode and/or task lists for the steps.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions