Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: ordered categorical comparison with missing values evaluates to True #26504

Closed
mzwiessele opened this issue May 23, 2019 · 3 comments

Comments

@mzwiessele
Copy link

commented May 23, 2019

Code Sample, a copy-pastable example if possible

import pandas as pd
pd.Categorical(["1", "2", "3", None], categories=["1", "2", "3"], ordered=True) <= "2"
# => array([ True,  True, False,  True])

Problem description

Here a missing entry is being evaluated as None <= "2" == True. Shouldn't missing values always be evaluate to False in any comparison?
I think this is related to #4537

Expected Output

import pandas as pd
pd.Categorical(["1", "2", "3", None], categories=["1", "2", "3"], ordered=True) <= "2"
# => array([ True,  True, False,  False])

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.8.final.0 python-bits: 64 OS: Darwin OS-release: 17.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8

pandas: 0.24.2
pytest: 3.6.2
pip: 10.0.1
setuptools: 39.2.0
Cython: 0.28.3
numpy: 1.16.3
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.1.1
sphinx: 1.7.5
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.9
feather: None
matplotlib: 3.0.3
openpyxl: 2.5.4
xlrd: 1.1.0
xlwt: 1.2.0
xlsxwriter: 1.0.5
lxml.etree: 4.2.2
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.2.8
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented May 23, 2019

Yep, should probably be False. It looks like we correctly mask for categorical : categorical comparisons, but not categorical : scalar.

In [16]: c
Out[16]:
[1, 2, 3, NaN]
Categories (3, object): [1 < 2 < 3]

In [17]: c <= c
Out[17]: array([ True,  True,  True, False])

This fix will be somewhere in

if is_scalar(other):
, similar to what we do above.

@shreyateeza

This comment has been minimized.

Copy link

commented May 24, 2019

@TomAugspurger I am new here. I would like to contribute to this issue. Can you tell me what needs to be done?

@shantanu-gontia

This comment has been minimized.

Copy link
Contributor

commented May 24, 2019

The Categorical object has the _codes property, which is used for comparison.

In [1]: import pandas as pd

In [2]: a = pd.Categorical([1, 2, 3, 4], categories=[1, 2, 3], ordered=Tr
   ...: ue)

In [3]: a._codes
Out[3]: array([ 0,  1,  2, -1], dtype=int8)

In [4]: a._codes < 2
Out[4]: array([ True,  True, False,  True])

The _codes takes NaN to be -1 leading to incorrect comparison


Perhaps the fix lies in correct construction of the Categorical object and not in the comparison itself.

yanglinlee added a commit to yanglinlee/pandas that referenced this issue May 24, 2019

yanglinlee added a commit to yanglinlee/pandas that referenced this issue May 24, 2019

yanglinlee added a commit to yanglinlee/pandas that referenced this issue May 24, 2019

yanglinlee added a commit to yanglinlee/pandas that referenced this issue May 25, 2019

@jorisvandenbossche jorisvandenbossche changed the title BUG: None comparison evaluates to True BUG: ordered categorical comparison with missing values evaluates to True May 28, 2019

yanglinlee added a commit to yanglinlee/pandas that referenced this issue May 29, 2019

@jreback jreback modified the milestones: Contributions Welcome, 0.25.0 May 30, 2019

jreback added a commit that referenced this issue Jun 1, 2019

vaibhavhrt added a commit to vaibhavhrt/pandas that referenced this issue Jun 6, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.