BUG: "S" dtype.kind not supported by pandas.core.internals.con_dtype_to_na_valuecat. #53525

garciampred · 2023-06-05T12:23:38Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np

df = pd.DataFrame(
    dict(a=np.concatenate([np.repeat([b"Hello"], 200), np.repeat([b"bye"], 200)]),
    b=np.repeat([2.3223], 400), c=np.repeat([np.nan], 400)), index=range(400), copy=False
)
df.copy()

Issue Description

The issue looks simple. "S" dtype.kind related to character arrays is not taken into account in pandas.core.internals.con_dtype_to_na_valuecat and it raises a NotImplementedError.

What I found very hard is to write the MCVE, I gave up after more that one hour. I don't know how to make the code to go through that way. I wrote a dataframe with a "|S5" data type column long enough to require truncation when printed, but it is not enough. So please note that the MCVE I wrote it is not actually able to reproduce the error.

I can reliably reproduce it with my data, even saving it to HDF5 and reading it afterwards, but it does not look appropriated to upload it here.

Fixing this looks very easy, but I wonder if there was a reason for leaving "S" outside that function.

Also, note that I was not able to install the version in the main branch (my CPU got stock in 100% usage in "Preparing metadata (pyproject.toml)" ), but I checked and the function is unchanged, so I think the bug is there too.

Regards

Expected Behavior

Print the data frame normally, without raising errors.

Installed Versions

INSTALLED VERSIONS

commit : 965ceca
python : 3.10.10.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.0-73-generic
Version : #80-Ubuntu SMP Mon May 15 15:18:26 UTC 2023
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.0.2
numpy : 1.23.5
pytz : 2023.3
dateutil : 2.8.2
setuptools : 67.7.2
pip : 23.1.2
Cython : None
pytest : 7.3.1
hypothesis : None
sphinx : 6.2.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.9.3
jinja2 : 3.1.2
IPython : None
pandas_datareader: None
bs4 : 4.12.2
bottleneck : None
brotli :
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : 0.56.4
numexpr : 2.8.4
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.10.1
snappy : None
sqlalchemy : 2.0.12
tables : 3.8.0
tabulate : None
xarray : 2023.4.2
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None

The text was updated successfully, but these errors were encountered:

garciampred · 2023-06-20T08:20:31Z

Could someone please look at this? I can write a PR to handle this dtype in .internals.con_dtype_ but I need someone to confirm that this makes sense. Thanks.

garciampred added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: "S" dtype.kind not supported by pandas.core.internals.con_dtype_to_na_valuecat. #53525

BUG: "S" dtype.kind not supported by pandas.core.internals.con_dtype_to_na_valuecat. #53525

garciampred commented Jun 5, 2023 •

edited

INSTALLED VERSIONS

garciampred commented Jun 20, 2023

BUG: "S" dtype.kind not supported by pandas.core.internals.con_dtype_to_na_valuecat. #53525

BUG: "S" dtype.kind not supported by pandas.core.internals.con_dtype_to_na_valuecat. #53525

Comments

garciampred commented Jun 5, 2023 • edited

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

garciampred commented Jun 20, 2023

garciampred commented Jun 5, 2023 •

edited