-
Notifications
You must be signed in to change notification settings - Fork 1
Closed
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomers
Description
Feature request
When by is set for count_nested, the output columns are ordered as per value_counts default behavior. This ordering is value-based and thus unpredictable for dask. We should have by output columns be sorted alphabetically to create a meta-friendly output for Nested-Dask. The PR should just involve implementing the behavior like this:
else:
# this may be able to be sped up using tolists() as well
counts = df[nested].apply(lambda x: x[by].value_counts(sort=False))
counts = counts.rename(columns={colname: f"n_{nested}_{colname}" for colname in counts.columns})
counts = counts.reindex(sorted(counts.columns), axis=1)
Before submitting
Please check the following:
- I have described the purpose of the suggested change, specifying what I need the enhancement to accomplish, i.e. what problem it solves.
- I have included any relevant links, screenshots, environment information, and data relevant to implementing the requested feature, as well as pseudocode for how I want to access the new functionality.
- If I have ideas for how the new feature could be implemented, I have provided explanations and/or pseudocode and/or task lists for the steps.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomers