Skip to content

Add nunique and unique to series apply whitelist #6146

@bburan-galenea

Description

@bburan-galenea

The unique and nunique attributes are very useful in conjunction with series groupby operations. I used these extensively in previous versions of Pandas whenever I needed to get a list of unique values for each subgroup (or the number of unique values). This can be used, for example, to count the number of subjects in each treatment group (or get a list of the subject IDs for reporting):

data = pandas.DataFrame({
    'subject_id': ('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'),
    'treatment': (0, 0, 0, 0, 0, 1, 1, 1, 0, 0),
})
print data.groupby('treatment').subject_id.apply(lambda x: x.nunique())
print data.groupby('treatment').subject_id.apply(lambda x: x.unique())

We'd get the following output:

treatment
0            7
1            3
dtype: int64

treatment
0            [A, B, C, D, E, I, J]
1                        [F, G, H]
dtype: object

This is super-useful for generating summary statistics (e.g. N's) and debugging (e.g. tracking down which subjects are in which groups. In previous versions of Pandas, we could simply do:

print data.groupby('treatment').subject_id.nunique()
print data.groupby('treatment').subject_id.unique()

It would be nice to continue this. Is there a reason why nunique and unique can't be added to the whitelist?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions