# Bug in Fancy/Boolean Indexing with nested lists #2702

Closed
opened this issue Jan 15, 2013 · 9 comments · Fixed by #4756

### jim22k commented Jan 15, 2013

Fancy or Boolean indexing on a Series has two strange behaviors. My examples only show the behavior with Fancy indexing, but it's the same for Boolean indexing.

## LHS vs RHS length

``````    >>> s = pd.Series(list('abc'))
>>> s[[0,1,2]] = range(27)
>>> list(s)
[0, 1, 2]
``````

I would have expected an error, similar to what I get with slice indexing

``````    >>> s = pd.Series(list('abc'))
>>> s[0:3] = range(27)
ValueError: cannot copy sequence with size 27 to array axis with dimension 3
``````

An even odder behavior is when you have too few items in the RHS

``````    >>> s = pd.Series(list('abc'))
>>> s[[0,1,2]] = range(2)
>>> list(s)
[0, 1, 0]
``````

It seems to be using something like itertools.cycle which seems very arbitrary to me

## Nested RHS

This may seem like a strange use of pandas, but I need to store Python lists

``````    >>> s = pd.Series(list('abc'))
>>> s[[0,1,2]] = [[100,200], [300,400], [500,600]]
>>> list(s)
[100, 200, 300]
``````

Very strange. It's like it flattens the input first.
But this flattening only happens if the nested levels are all the same size.

``````    >>> s = pd.Series(list('abc'))
>>> s[[0,1,2]] = [[100,200], [300,400], [500,600, 601, 602]]
>>> list(s)
[[100,200], [300,400], [500,600, 601, 602]]
``````

I know in numpy the array constructor would make a distinction between these two inputs, so maybe that's the reason for the difference, but I still don't see why ndarrays are being flattened.

I can work around the issue by converting the RHS to a 1-D array and passing that in.

``````    >>> s = pd.Series(list('abc'))
>>> rhs = np.empty(3).astype('object')
>>> rhs[:] = [[100,200], [300,400], [500,600]]
>>> s[[0,1,2]] = rhs
>>> list(s)
[[100,200], [300,400], [500,600]]
``````

Slice indexing doesn't have this problem at all

``````    >>> s = pd.Series(list('abc'))
>>> s[0:3] = [[100,200], [300,400], [500,600]]
>>> list(s)
[[100,200], [300,400], [500,600]]
``````

My Question: Are these behaviors a bug or a "feature"? I think Fancy/Boolean indexing should operate the same as slice indexing -- i.e. check for matching lengths and don't auto-convert to numpy array.

Member

### wesm commented Jan 20, 2013

 Oh boy. Hitting a bunch of buggy/underspecified NumPy stuff here. I'm having a look but may kick this can down the road
Member

### wesm commented Jan 20, 2013

 This is all NumPy behavior. It's going to be too much work for me to fix this anytime soon. I'm already completely fed up with the NumPy library so i would like to overhaul all this mess to make it consistent at some point in the future
Author

### jim22k commented Jan 21, 2013

 You're right. I just validated the same bugs on a plain ndarray. Do you think there is any value in raising this issue on a NumPy forum? Thanks for looking into these corner cases. Pandas just keeps getting better and I find myself using it more and more when dealing with any non-trivial dataset.
Contributor

### jtratner commented Sep 5, 2013

 @jreback is this resolved for pandas now that `Series` isn't an `ndarray` anymore?
Member

### cpcloud commented Sep 5, 2013

 Did I miss something? Series is no longer an NDFrame?
Contributor

### jreback commented Sep 5, 2013

 I will take a look - haven't seen this issue before
Contributor

### jtratner commented Sep 5, 2013

 @cpcloud whoops! miswrote - mean no longer an ndarray
Member

### cpcloud commented Sep 5, 2013

 @jtratner No worries! Figured it was something like that....just wanted to stay in the loop!
Contributor

### jreback commented Sep 5, 2013

 This is easy to make all of these act the same, just an extension in `where`. Right for `ndim==1` we basically handle a single element and a single list element on the rhs, as well as a boolean indexer that matches the rhs. so this good (#2745) ``````In [3]: s = Series([1, 2]) In [4]: s[[True, False]] = [0, 1] In [5]: s Out[5]: 0 0 1 2 dtype: int64 `````` else it is converted to a `ndarray`. So just need to deal with shorter/longer ones and raise a ValueError. https://github.com/pydata/pandas/blob/master/pandas/core/generic.py#L2285
referenced this issue Sep 5, 2013