Series.value_counts doesn't respect dropna = False for categorical series #9443

Kodiologist · 2015-02-08T14:50:01Z

Right:

$ python -c 'import pandas; print pandas.Series([1, 2, None, 1, 1, 3, None, 3]).value_counts(dropna = False)'
 1     3
 3     2
NaN    2
 2     1
dtype: int64

Wrong, because there is no row for NaN:

$ python -c 'import pandas; print pandas.Series([1, 2, None, 1, 1, 3, None, 3], dtype = "category").value_counts(dropna = False)'
1    3
3    2
2    1
dtype: int64

pandas.show_versions() yields:

commit: None
python: 2.7.8.final.0
python-bits: 64
OS: Linux
OS-release: 3.16.0-30-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.15.2
nose: 1.3.4
Cython: 0.18
numpy: 1.9.1
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 2.3.0
sphinx: None
patsy: 0.2.1
dateutil: 2.4.0
pytz: 2014.10
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.4.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
httplib2: 0.9
apiclient: None
rpy2: 2.4.2
sqlalchemy: None
pymysql: None
psycopg2: None

The text was updated successfully, but these errors were encountered:

bitzl · 2015-02-09T14:40:07Z

I can confirm the same issue here (Numpy 1.9.1, Scipy 0.15.1, Pandas 0.15.2). Works fine as long as it is not categorical.

jreback · 2015-02-10T14:28:21Z

Looks like a bug. Want to submit a pull-request for this? prob not passing the paramter thru (and not testing).

Kodiologist · 2015-02-10T15:32:35Z

Sure, I'll take a shot. I just noticed that dropna is ignored in the opposite sense when np.nan is in the Categorical's categories (that is, a row for NaN is always included even with dropna = True), so I'll try to fix that case too.

One question about another issue with dropna. Currently, with dropna = False, boolean series get a row for NaN even when there are no NaN values in the series, whereas integer series get a row for NaN only if there's at least one:

$ python -c 'import pandas; print pandas.Series([True, False]).value_counts(dropna = False)'
True     1
False    1
NaN      0
dtype: int64

$ python -c 'import pandas; print pandas.Series([1, 2]).value_counts(dropna = False)'
2    1
1    1
dtype: int64

The first case is wrong and the second case is right, right? In which case I guess I should attempt a PR for that, too.

Kodiologist · 2015-02-10T21:11:07Z

See PR #9459.

jreback · 2015-03-08T16:13:04Z

closed by #9459

jreback added Bug Categorical Categorical Data Type labels Feb 10, 2015

jreback added this to the 0.16.0 milestone Feb 10, 2015

Kodiologist mentioned this issue Feb 10, 2015

BUG: improve handling of Series.value_counts's argument 'dropna' (GH9443) #9459

Closed

jreback modified the milestones: 0.16.0, Next Major Release Mar 5, 2015

jreback closed this as completed Mar 8, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Series.value_counts doesn't respect dropna = False for categorical series #9443

Series.value_counts doesn't respect dropna = False for categorical series #9443

Kodiologist commented Feb 8, 2015

bitzl commented Feb 9, 2015

jreback commented Feb 10, 2015

Kodiologist commented Feb 10, 2015

Kodiologist commented Feb 10, 2015

jreback commented Mar 8, 2015

Series.value_counts doesn't respect dropna = False for categorical series #9443

Series.value_counts doesn't respect dropna = False for categorical series #9443

Comments

Kodiologist commented Feb 8, 2015

bitzl commented Feb 9, 2015

jreback commented Feb 10, 2015

Kodiologist commented Feb 10, 2015

Kodiologist commented Feb 10, 2015

jreback commented Mar 8, 2015