pandas.Categorical.from_codes
incorrectly converts NaN codes to 0.
#21767
Labels
Categorical
Categorical Data Type
Error Reporting
Incorrect or improved errors from pandas
Missing-data
np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Milestone
Code Sample, a copy-pastable example if possible
Problem description
pandas.Categorical.from_codes
is incorrectly coercingNaN
code values into0
values. I think it should be raising aValueError
. I believe this is becausepandas._libs.algos.ensure_int8/16/32/64
is behaving in an unanticipated manner onnumpy
arrays.The call chain is:
pandas.core.arrays.categorical.Categorical.from_codes
pandas.core.dtypes.cast.coerce_indexer_dtype
pandas._libs.algos.ensure_int8/16/32/64
(aliased topandas.core.dtypes.common.ensure_int8/16/32/64
)When given a single value all the ensure functions behave correctly. But when given a
numpy.array
they do not.Expected Output
test_categorical_from_codes
should pass. Instead it contains the codes[1, 2, 0]
.Output of
pd.show_versions()
[paste the output of
pd.show_versions()
here below this line]INSTALLED VERSIONS
commit: None
python: 3.6.0.final.0
python-bits: 64
OS: Darwin
OS-release: 17.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.2
pytest: 3.6.3
pip: 10.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.12.1
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: None
sqlalchemy: 1.2.6
pymysql: None
psycopg2: 2.7.4 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: