Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Series.value_counts doesn't respect dropna = False for categorical series #9443

Closed
Kodiologist opened this issue Feb 8, 2015 · 5 comments
Closed
Labels
Bug Categorical Categorical Data Type
Milestone

Comments

@Kodiologist
Copy link
Contributor

Right:

$ python -c 'import pandas; print pandas.Series([1, 2, None, 1, 1, 3, None, 3]).value_counts(dropna = False)'
 1     3
 3     2
NaN    2
 2     1
dtype: int64

Wrong, because there is no row for NaN:

$ python -c 'import pandas; print pandas.Series([1, 2, None, 1, 1, 3, None, 3], dtype = "category").value_counts(dropna = False)'
1    3
3    2
2    1
dtype: int64

pandas.show_versions() yields:

commit: None
python: 2.7.8.final.0
python-bits: 64
OS: Linux
OS-release: 3.16.0-30-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.15.2
nose: 1.3.4
Cython: 0.18
numpy: 1.9.1
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 2.3.0
sphinx: None
patsy: 0.2.1
dateutil: 2.4.0
pytz: 2014.10
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.4.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
httplib2: 0.9
apiclient: None
rpy2: 2.4.2
sqlalchemy: None
pymysql: None
psycopg2: None

@bitzl
Copy link

bitzl commented Feb 9, 2015

I can confirm the same issue here (Numpy 1.9.1, Scipy 0.15.1, Pandas 0.15.2). Works fine as long as it is not categorical.

@jreback
Copy link
Contributor

jreback commented Feb 10, 2015

Looks like a bug. Want to submit a pull-request for this? prob not passing the paramter thru (and not testing).

@jreback jreback added Bug Categorical Categorical Data Type labels Feb 10, 2015
@jreback jreback added this to the 0.16.0 milestone Feb 10, 2015
@Kodiologist
Copy link
Contributor Author

Sure, I'll take a shot. I just noticed that dropna is ignored in the opposite sense when np.nan is in the Categorical's categories (that is, a row for NaN is always included even with dropna = True), so I'll try to fix that case too.

One question about another issue with dropna. Currently, with dropna = False, boolean series get a row for NaN even when there are no NaN values in the series, whereas integer series get a row for NaN only if there's at least one:

$ python -c 'import pandas; print pandas.Series([True, False]).value_counts(dropna = False)'
True     1
False    1
NaN      0
dtype: int64

$ python -c 'import pandas; print pandas.Series([1, 2]).value_counts(dropna = False)'
2    1
1    1
dtype: int64

The first case is wrong and the second case is right, right? In which case I guess I should attempt a PR for that, too.

@Kodiologist
Copy link
Contributor Author

See PR #9459.

@jreback
Copy link
Contributor

jreback commented Mar 8, 2015

closed by #9459

@jreback jreback closed this as completed Mar 8, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Categorical Categorical Data Type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants