New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: None in string input is wrongly cast to the "None" string #21083

Closed
pitrou opened this Issue May 16, 2018 · 4 comments

Comments

Projects
None yet
3 participants
@pitrou
Contributor

pitrou commented May 16, 2018

In Pandas 0.22.0:

>>> df = pd.DataFrame({'data': ['x', None]}, dtype=str)
>>> df['data'].tolist()
['x', None]

In Pandas 0.23.0:

>>> df = pd.DataFrame({'data': ['x', None]}, dtype=str)
>>> df['data'].tolist()
['x', 'None']
>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-20-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: fr_FR.UTF-8

pandas: 0.23.0
pytest: 3.3.2
pip: 10.0.1
setuptools: 38.4.0
Cython: 0.28.2
numpy: 1.14.2
scipy: None
pyarrow: 0.9.1.dev14+g6599ab0.d20180323
xarray: None
IPython: 6.2.1
sphinx: 1.6.7
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

pitrou added a commit to pitrou/arrow that referenced this issue May 16, 2018

ARROW-2589: [CI] Avoid Pandas 0.23.0
There is a regression (*) in Pandas 0.23.0 that breaks test_parquet.py.
Pin to 0.22.0 until the issue gets fixed upstream.

(*) pandas-dev/pandas#21083

@TomAugspurger TomAugspurger added this to the 0.23.1 milestone May 16, 2018

pitrou added a commit to pitrou/arrow that referenced this issue May 16, 2018

ARROW-2589: [CI] Avoid Pandas 0.23.0
There is a regression (*) in Pandas 0.23.0 that breaks test_parquet.py.
Pin to 0.22.0 until the issue gets fixed upstream.

(*) pandas-dev/pandas#21083
@TomAugspurger

This comment has been minimized.

Contributor

TomAugspurger commented May 16, 2018

This sounds vaguely familiar... Interestingly, these two are different.

In [6]: pd.Series(['a', None]).values.tolist()
Out[6]: ['a', None]

In [7]: pd.Series(['a', None], dtype='str').values.tolist()
Out[7]: ['a', 'None']
@pitrou

This comment has been minimized.

Contributor

pitrou commented May 16, 2018

Hmm, the "str" dtype blindly represents everything:

>>> pd.Series(['a', None, 0.4j, object()], dtype='str').values.tolist()
['a', 'None', '0.4j', '<object object at 0x7fc22cf8e6d0>']

On Pandas 0.22.0, however, the "str" dtype specification seems completely ignored:

>>> pd.Series(['a', None, 0.4j, object()], dtype='str').values.tolist()
['a', None, 0.4j, <object at 0x7fe5ddcd1e90>]
@TomAugspurger

This comment has been minimized.

Contributor

TomAugspurger commented May 16, 2018

I think we're hitting the else in

if is_object_dtype(dtype) and (is_list_like(subarr) and
, when we want to go down the if.

(Pdb) dtype
dtype('<U')
(Pdb) is_object_type(dtype)
*** NameError: name 'is_object_type' is not defined
(Pdb) dtype
dtype('<U')
(Pdb) is_object_dtype(dtype)

pitrou added a commit to pitrou/arrow that referenced this issue May 16, 2018

ARROW-2589: [Python] Workaround regression in Pandas 0.23.0
There is a regression (*) in Pandas 0.23.0 that breaks test_parquet.py.
Pandas does not have an actual "str" dtype anyway, so pass "object" instead.

(*) pandas-dev/pandas#21083

xhochy added a commit to apache/arrow that referenced this issue May 16, 2018

ARROW-2589: [Python] Workaround regression in Pandas 0.23.0
There is a regression (*) in Pandas 0.23.0 that breaks test_parquet.py.
Pandas does not have an actual "str" dtype anyway, so pass "object" instead.

(*) pandas-dev/pandas#21083

Author: Antoine Pitrou <antoine@python.org>

Closes #2051 from pitrou/ARROW-2589 and squashes the following commits:

b581ef3 <Antoine Pitrou> ARROW-2589:  Workaround regression in Pandas 0.23.0

@jreback jreback modified the milestones: 0.23.1, 0.23.2 Jun 7, 2018

@TomAugspurger

This comment has been minimized.

Contributor

TomAugspurger commented Jun 7, 2018

This is a blocker for 0.23.1. Taking a look now.

@TomAugspurger TomAugspurger modified the milestones: 0.23.2, 0.23.1 Jun 7, 2018

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Jun 7, 2018

REGR: NA-values in ctors with string dtype
```python
In [1]: import pandas as pd
In [2]: pd.Series([1, 2, None], dtype='str')[2]  # None

```

Closes pandas-dev#21083
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment