BUG: Regression in 1.1.0. "invalid slicing for a 1-ndim ExtensionArray" #37631

buhrmann · 2020-11-04T15:09:52Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

s1 = pd.Series(list("abc")).astype("category").iloc[[0]]
s2 = pickle.loads(pickle.dumps(s1))
print(s1.dropna())
print(s2.dropna())

0    a
dtype: category
Categories (3, object): ['a', 'b', 'c']

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-1-c7b1204ccdb7> in <module>
      6 s2 = pickle.loads(pickle.dumps(s1))
      7 print(s1.dropna())
----> 8 print(s2.dropna())

~/anaconda/envs/grapy/lib/python3.7/site-packages/pandas/core/series.py in dropna(self, axis, inplace, how)
   4883 
   4884         if self._can_hold_na:
-> 4885             result = remove_na_arraylike(self)
   4886             if inplace:
   4887                 self._update_inplace(result)

~/anaconda/envs/grapy/lib/python3.7/site-packages/pandas/core/dtypes/missing.py in remove_na_arraylike(arr)
    564     """
    565     if is_extension_array_dtype(arr):
--> 566         return arr[notna(arr)]
    567     else:
    568         return arr[notna(np.asarray(arr))]

~/anaconda/envs/grapy/lib/python3.7/site-packages/pandas/core/series.py in __getitem__(self, key)
    904             key = check_bool_indexer(self.index, key)
    905             key = np.asarray(key, dtype=bool)
--> 906             return self._get_values(key)
    907 
    908         return self._get_with(key)

~/anaconda/envs/grapy/lib/python3.7/site-packages/pandas/core/series.py in _get_values(self, indexer)
    966     def _get_values(self, indexer):
    967         try:
--> 968             return self._constructor(self._mgr.get_slice(indexer)).__finalize__(self,)
    969         except ValueError:
    970             # mpl compat if we look up e.g. ser[:, np.newaxis];

~/anaconda/envs/grapy/lib/python3.7/site-packages/pandas/core/internals/managers.py in get_slice(self, slobj, axis)
   1559 
   1560         blk = self._block
-> 1561         array = blk._slice(slobj)
   1562         block = blk.make_block_same_class(array, placement=slice(0, len(array)))
   1563         return type(self)(block, self.index[slobj])

~/anaconda/envs/grapy/lib/python3.7/site-packages/pandas/core/internals/blocks.py in _slice(self, slicer)
   1756             if not isinstance(first, slice):
   1757                 raise AssertionError(
-> 1758                     "invalid slicing for a 1-ndim ExtensionArray", first
   1759                 )
   1760             # GH#32959 only full-slicers along fake-dim0 are valid

AssertionError: ('invalid slicing for a 1-ndim ExtensionArray', array([ True]))

Problem description

Not sure what changes in the serialization roundtrip through pickle, but it seems the copied Series cannot be indexed with a Boolean slice anymore, tripping up dropna() as a result. The following code more directly exposes the error:

s1 = pd.Series(list("abc")).astype("category").iloc[[0]]
s2 = pickle.loads(pickle.dumps(s1))
s2[[True]]

The error happens e.g. when processing multiple Series in parallel (triggering serialization with pickle), and when a categorical Series has been filtered down to a single row. With another dtype, or more than one row, this error doesn't get triggered.

The regression must been introduced in version 1.1.0, as in 1.0.5 the above code works as expected.

Expected Output

Behaviour of slicing, dropna etc. should be same before and after pickling a Series, and independent of the number of rows.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : 67a3d42
python : 3.7.6.final.0
python-bits : 64
OS : Darwin
OS-release : 19.6.0
Version : Darwin Kernel Version 19.6.0: Mon Aug 31 22:12:52 PDT 2020; root:xnu-6153.141.2~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8

pandas : 1.1.4
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.0
pip : 20.0.2
setuptools : 46.1.1.post20200322
Cython : None
pytest : 5.4.1
hypothesis : None
sphinx : 2.4.4
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.0
html5lib : None
pymysql : 0.9.3
psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.9.1
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.2.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 1.0.0
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.15
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
numba : 0.46.0

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2020-11-04T19:39:37Z

@buhrmann Thanks for the report! I can confirm this is a regression in 1.1 (and on master) compared to 1.0.

jorisvandenbossche · 2020-11-04T19:47:00Z

There is something wrong with the Block layout of the Series then went through the pickle roundtrip:

In [56]: s1._mgr
Out[56]: 
SingleBlockManager
Items: Int64Index([0], dtype='int64')
CategoricalBlock: 1 dtype: category

In [57]: s2._mgr
Out[57]: 
SingleBlockManager
Items: Int64Index([0], dtype='int64')
CategoricalBlock: slice(0, 1, 1), 1 x 1, dtype: category

In [58]: s1._mgr._block.ndim
Out[58]: 1

In [59]: s2._mgr._block.ndim
Out[59]: 2

The categorical block should not be 2D, here

jorisvandenbossche · 2020-11-04T19:47:31Z

cc @jbrockmendel

simonjayhawkins · 2020-11-04T20:09:41Z

I can confirm this is a regression in 1.1 (and on master) compared to 1.0.

assertion added in #32959. could confirm with bisect if needed.

jbrockmendel · 2020-11-04T22:30:14Z

It looks like the incorrect block.ndim is also present in 1.0.5

jorisvandenbossche · 2020-11-05T13:21:49Z

Indeed, but so then we didn't have the AssertionError checking it, and things were (here) still working

buhrmann added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 4, 2020

jorisvandenbossche added Categorical Categorical Data Type Regression Functionality that used to work in a prior pandas version and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 4, 2020

jorisvandenbossche added this to the 1.1.5 milestone Nov 4, 2020

jbrockmendel mentioned this issue Nov 6, 2020

BUG: unpickling modifies Block.ndim #37657

Merged

5 tasks

jreback closed this as completed in #37657 Nov 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Regression in 1.1.0. "invalid slicing for a 1-ndim ExtensionArray" #37631

BUG: Regression in 1.1.0. "invalid slicing for a 1-ndim ExtensionArray" #37631

buhrmann commented Nov 4, 2020

INSTALLED VERSIONS

jorisvandenbossche commented Nov 4, 2020

jorisvandenbossche commented Nov 4, 2020

jorisvandenbossche commented Nov 4, 2020

simonjayhawkins commented Nov 4, 2020 •

edited

jbrockmendel commented Nov 4, 2020

jorisvandenbossche commented Nov 5, 2020

BUG: Regression in 1.1.0. "invalid slicing for a 1-ndim ExtensionArray" #37631

BUG: Regression in 1.1.0. "invalid slicing for a 1-ndim ExtensionArray" #37631

Comments

buhrmann commented Nov 4, 2020

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

jorisvandenbossche commented Nov 4, 2020

jorisvandenbossche commented Nov 4, 2020

jorisvandenbossche commented Nov 4, 2020

simonjayhawkins commented Nov 4, 2020 • edited

jbrockmendel commented Nov 4, 2020

jorisvandenbossche commented Nov 5, 2020

Output of `pd.show_versions()`

simonjayhawkins commented Nov 4, 2020 •

edited