Feature request: Option to include NaNs in value_counts() #5569

michaelaye · 2013-11-22T05:17:44Z

I find it highly valuable to also receive the information how many NaN values are in my Series.
Could we have an option in value_counts(), maybe include_nans=True that would add a count for those in the output of it?

The text was updated successfully, but these errors were encountered:

jtratner · 2013-11-22T07:48:06Z

to be clear, you can already do this: series.isnull().sum()

The problem is that you'd end up with an Index with nan with it, which causes problems.

michaelaye · 2013-11-22T07:57:35Z

Yes, I know the workaround, but I would like to see it solved for value_counts, because it is a relevant piece of information for the scope of value_counts.
I see the problem with nan in the Index. How about converting the index to dtype 'object' just for this output. Then the nan could be packed into a string?

jtratner · 2013-11-22T08:06:37Z

this is trivial to implement and the Index with nan issues aren't that big. We wouldn't include nan by default.

so we're down to a design decision. To me, nan means missing value, so it doesn't make sense to show up as a counted value or for the mode.

michaelaye · 2013-11-22T08:15:15Z

I wouldn't include it by default as well. But it is extremely helpful to have in one overview what the rough ratio is between successful measurements with values and nans.
In my case, I have 3 different categories plus 'not categorized'. When displayed like so:

'a'   1000
'b'   5000
'c'   4000
'nan' 100000

I immediately know that something went wrong, which I wouldn't suspect if I don't see the nans. When I have to do this without the nans, I first have to sum up all the real values and compare it to the length of series, it's always one more step to do. Sure, I can write my own wrapper, but I thought it's a useful feature to have it at least as an option to the value_counts call.
What did you mean by or for the mode?

jtratner · 2013-11-22T08:16:40Z

well, mode is really a special case of value_counts(), so if you include it for one it makes sense to use the same kwarg for the other.

michaelaye · 2013-11-22T08:21:56Z

I have never used 'mode' and don't understand what it does. In my case:

print df.marking.mode()
print 
print df.marking.value_counts()

0    blotch
dtype: object

blotch         3854641
fan            3192799
None           2785831
interesting     884843
dtype: int64

I cheated by replacing nans by None string. ;)

nmontpetit · 2014-06-13T13:15:03Z

From my experience, it is very helpful for nan to show up as a counted value.

I work in a SAS shop, but I'm moving all of my analysis and reporting work from SAS to Python. I use value_counts to give me the results I would get from PROC FREQ in SAS. I use PROC FREQ daily, and almost always I'm looking at real-world data with missing values. I honestly cannot remember a case where I didn't want the missing values to be in the frequency counts.

I've got to believe I'm nowhere near the only person who needs to see frequency counts for nan values. I could see the lack of this feature slowing the adoption of pandas among SAS users.

SAS does not report missing values in frequency reports by default, but I'm OK with always selecting that option when I run PROC FREQ.

I do know how to add the missing counts to my value_counts output, but it's annoying to need to do it pretty much every time I use value_counts. If I -- and probably others -- need to do this every time we use value_counts it seems like including nan counts in the results is a reasonable option to add to the method.

hayd · 2014-06-13T16:45:02Z

+1 will fix along with #7424.

jankatins · 2014-06-14T20:31:12Z

Something similar is in #7217. If NaN is a problem in index, this will also come up in Categoricals: jreback@725a497

abhishekmamdapure · 2021-02-02T06:27:18Z

why wait for the feature, you can try something like this which I have been using for a very long time

Name	Designation
aaa	Data Scientist
bbb	Data Scientist
ccc	Data Scientist
ddd
eee	ML Engineer
fff	ML Engineer
ggg
hhh	Data Analyst
iii

df['Designation'].astype('str').value_counts()

Data Scientist    3
nan               3
ML Engineer       2
Data Analyst      1
Name: Designation, dtype: int64

ghost assigned jtratner Dec 5, 2013

jreback modified the milestones: 0.15.0, 0.14.0 Feb 18, 2014

hayd mentioned this issue Jun 13, 2014

FIX value_counts should skip NaT #7424

Merged

jreback modified the milestones: 0.14.1, 0.15.0 Jun 14, 2014

jreback closed this as completed in #7424 Jun 17, 2014

fkaufer mentioned this issue Oct 15, 2014

Zero counts in Series.value_counts for categoricals #8559

Closed

wesm unassigned jtratner Oct 12, 2016

adrienpacifico mentioned this issue Jul 13, 2018

Why is dropna default value is True in value_counts() methods/functions ? #21890

Open

skatenerd mentioned this issue Oct 18, 2018

Pandas 0.23.4 reindexing multiindexed frame with ffill confusing output #23225

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Option to include NaNs in value_counts() #5569

Feature request: Option to include NaNs in value_counts() #5569

michaelaye commented Nov 22, 2013

jtratner commented Nov 22, 2013

michaelaye commented Nov 22, 2013

jtratner commented Nov 22, 2013

michaelaye commented Nov 22, 2013

jtratner commented Nov 22, 2013

michaelaye commented Nov 22, 2013

nmontpetit commented Jun 13, 2014

hayd commented Jun 13, 2014

jankatins commented Jun 14, 2014

abhishekmamdapure commented Feb 2, 2021

Feature request: Option to include NaNs in value_counts() #5569

Feature request: Option to include NaNs in value_counts() #5569

Comments

michaelaye commented Nov 22, 2013

jtratner commented Nov 22, 2013

michaelaye commented Nov 22, 2013

jtratner commented Nov 22, 2013

michaelaye commented Nov 22, 2013

jtratner commented Nov 22, 2013

michaelaye commented Nov 22, 2013

nmontpetit commented Jun 13, 2014

hayd commented Jun 13, 2014

jankatins commented Jun 14, 2014

abhishekmamdapure commented Feb 2, 2021