Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Regression in 1.1.0. "invalid slicing for a 1-ndim ExtensionArray" #37631

Closed
2 of 3 tasks
buhrmann opened this issue Nov 4, 2020 · 6 comments · Fixed by #37657
Closed
2 of 3 tasks

BUG: Regression in 1.1.0. "invalid slicing for a 1-ndim ExtensionArray" #37631

buhrmann opened this issue Nov 4, 2020 · 6 comments · Fixed by #37657
Labels
Categorical Categorical Data Type Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@buhrmann
Copy link

buhrmann commented Nov 4, 2020

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandas.
  • (optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

s1 = pd.Series(list("abc")).astype("category").iloc[[0]]
s2 = pickle.loads(pickle.dumps(s1))
print(s1.dropna())
print(s2.dropna())
0    a
dtype: category
Categories (3, object): ['a', 'b', 'c']

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-1-c7b1204ccdb7> in <module>
      6 s2 = pickle.loads(pickle.dumps(s1))
      7 print(s1.dropna())
----> 8 print(s2.dropna())

~/anaconda/envs/grapy/lib/python3.7/site-packages/pandas/core/series.py in dropna(self, axis, inplace, how)
   4883 
   4884         if self._can_hold_na:
-> 4885             result = remove_na_arraylike(self)
   4886             if inplace:
   4887                 self._update_inplace(result)

~/anaconda/envs/grapy/lib/python3.7/site-packages/pandas/core/dtypes/missing.py in remove_na_arraylike(arr)
    564     """
    565     if is_extension_array_dtype(arr):
--> 566         return arr[notna(arr)]
    567     else:
    568         return arr[notna(np.asarray(arr))]

~/anaconda/envs/grapy/lib/python3.7/site-packages/pandas/core/series.py in __getitem__(self, key)
    904             key = check_bool_indexer(self.index, key)
    905             key = np.asarray(key, dtype=bool)
--> 906             return self._get_values(key)
    907 
    908         return self._get_with(key)

~/anaconda/envs/grapy/lib/python3.7/site-packages/pandas/core/series.py in _get_values(self, indexer)
    966     def _get_values(self, indexer):
    967         try:
--> 968             return self._constructor(self._mgr.get_slice(indexer)).__finalize__(self,)
    969         except ValueError:
    970             # mpl compat if we look up e.g. ser[:, np.newaxis];

~/anaconda/envs/grapy/lib/python3.7/site-packages/pandas/core/internals/managers.py in get_slice(self, slobj, axis)
   1559 
   1560         blk = self._block
-> 1561         array = blk._slice(slobj)
   1562         block = blk.make_block_same_class(array, placement=slice(0, len(array)))
   1563         return type(self)(block, self.index[slobj])

~/anaconda/envs/grapy/lib/python3.7/site-packages/pandas/core/internals/blocks.py in _slice(self, slicer)
   1756             if not isinstance(first, slice):
   1757                 raise AssertionError(
-> 1758                     "invalid slicing for a 1-ndim ExtensionArray", first
   1759                 )
   1760             # GH#32959 only full-slicers along fake-dim0 are valid

AssertionError: ('invalid slicing for a 1-ndim ExtensionArray', array([ True]))

Problem description

Not sure what changes in the serialization roundtrip through pickle, but it seems the copied Series cannot be indexed with a Boolean slice anymore, tripping up dropna() as a result. The following code more directly exposes the error:

s1 = pd.Series(list("abc")).astype("category").iloc[[0]]
s2 = pickle.loads(pickle.dumps(s1))
s2[[True]]

The error happens e.g. when processing multiple Series in parallel (triggering serialization with pickle), and when a categorical Series has been filtered down to a single row. With another dtype, or more than one row, this error doesn't get triggered.

The regression must been introduced in version 1.1.0, as in 1.0.5 the above code works as expected.

Expected Output

Behaviour of slicing, dropna etc. should be same before and after pickling a Series, and independent of the number of rows.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 67a3d42
python : 3.7.6.final.0
python-bits : 64
OS : Darwin
OS-release : 19.6.0
Version : Darwin Kernel Version 19.6.0: Mon Aug 31 22:12:52 PDT 2020; root:xnu-6153.141.2~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8

pandas : 1.1.4
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.0
pip : 20.0.2
setuptools : 46.1.1.post20200322
Cython : None
pytest : 5.4.1
hypothesis : None
sphinx : 2.4.4
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.0
html5lib : None
pymysql : 0.9.3
psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.9.1
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.2.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 1.0.0
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.15
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
numba : 0.46.0

@buhrmann buhrmann added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 4, 2020
@jorisvandenbossche jorisvandenbossche added Categorical Categorical Data Type Regression Functionality that used to work in a prior pandas version and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 4, 2020
@jorisvandenbossche jorisvandenbossche added this to the 1.1.5 milestone Nov 4, 2020
@jorisvandenbossche
Copy link
Member

@buhrmann Thanks for the report! I can confirm this is a regression in 1.1 (and on master) compared to 1.0.

@jorisvandenbossche
Copy link
Member

There is something wrong with the Block layout of the Series then went through the pickle roundtrip:

In [56]: s1._mgr
Out[56]: 
SingleBlockManager
Items: Int64Index([0], dtype='int64')
CategoricalBlock: 1 dtype: category

In [57]: s2._mgr
Out[57]: 
SingleBlockManager
Items: Int64Index([0], dtype='int64')
CategoricalBlock: slice(0, 1, 1), 1 x 1, dtype: category

In [58]: s1._mgr._block.ndim
Out[58]: 1

In [59]: s2._mgr._block.ndim
Out[59]: 2

The categorical block should not be 2D, here

@jorisvandenbossche
Copy link
Member

cc @jbrockmendel

@simonjayhawkins
Copy link
Member

simonjayhawkins commented Nov 4, 2020

I can confirm this is a regression in 1.1 (and on master) compared to 1.0.

assertion added in #32959. could confirm with bisect if needed.

@jbrockmendel
Copy link
Member

It looks like the incorrect block.ndim is also present in 1.0.5

@jorisvandenbossche
Copy link
Member

Indeed, but so then we didn't have the AssertionError checking it, and things were (here) still working

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants