Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
API: add DataFrame.nunique() and DataFrameGroupBy.nunique() #14336
Comments
|
Agreed, I think this would be welcome functionality. |
|
Note that these are already defined for Series.
|
jreback
added Enhancement Groupby Difficulty Intermediate Effort Medium
labels
Oct 3, 2016
jreback
added this to the
Next Major Release
milestone
Oct 3, 2016
|
Of course, extending the >>> df = pd.DataFrame({'id': ['spam', 'eggs', 'eggs', 'spam', 'ham', 'ham'],
'value1': [1, 5, 5, 2, 5, 5], 'value2': list('abbaxy')})
>>> df
id value1 value2
0 spam 1 a
1 eggs 5 b
2 eggs 5 b
3 spam 2 a
4 ham 5 x
5 ham 5 y
>>> df.groupby('id').filter(lambda g: (g.apply(pd.Series.nunique) > 1).any())
id value1 value2
0 spam 1 a
3 spam 2 a
4 ham 5 x
5 ham 5 y |
xflr6
referenced
this issue
Oct 7, 2016
Closed
API: add DataFrame.nunique() and DataFrameGroupBy.nunique() #14376
jreback
modified the milestone: 0.20.0, Next Major Release
Jan 2, 2017
mahnunchik
commented
Jan 23, 2017
|
Any news? |
jreback
closed this
in a1b6587
Jan 23, 2017
|
just merged. |
AnkurDedania
added a commit
to AnkurDedania/pandas
that referenced
this issue
Mar 21, 2017
|
|
xflr6 + AnkurDedania |
51e32d0
|
jreback
referenced
this issue
Mar 23, 2017
Closed
Add .nunique method to DataFrame and .groupby #15794
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
xflr6 commentedOct 3, 2016
When exploring a data set, I often need to
df.apply(pd.Series.nunique)ordf.apply(lambda x: x.nunique()). How about adding this asnunique()-method parallel toDataFrame.count()(countanduniqueare also the two most basic infos displayed byDataFrame.describe())?I think there are also use cases for this as a
groupby-method, for example when checking a candidate primary key for different lines (values):