Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a bug while slicing in the non monotonic multi indices? #44380

Open
AayushSameerShah opened this issue Nov 10, 2021 · 4 comments
Open
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex

Comments

@AayushSameerShah
Copy link

AayushSameerShah commented Nov 10, 2021

Please take this question lightly as asked from curiosity:

As I was trying to see how the slicing in MultiIndex works, I came across the following situation ↓

# Simple MultiIndex Creation
index = pd.MultiIndex.from_product([['a', 'c', 'b'], [1, 2]])

# Making Series with that MultiIndex
data = pd.Series(np.random.randint(10, size=6), index=index)

Returns:

a  1    5
   2    0
c  1    8
   2    6
b  1    6
   2    3
dtype: int32

NOTE that the indices are not in the sorted order ie. a, c, b is the order which will result in the expected error that we want while slicing.

# When we do slicing
data.loc["a":"c"]

Errors like:

UnsortedIndexError

----> 1 data.loc["a":"c"]
UnsortedIndexError: 'Key length (1) was greater than MultiIndex lexsort depth (0)'

That's expected. But now, after doing the following steps:

# Making a DataFrame
data = data.unstack()

# Redindexing - to unsort the indices like before
data = data.reindex(["a", "c", "b"])

# Which looks like 
   1  2
a  5  0
c  8  6
b  6  3

# Then again making series
data = data.stack()

# Reindex Again!
data = data.reindex(["a", "c", "b"], level=0)


# Which looks like before
a  1    5
   2    0
c  1    8
   2    6
b  1    6
   2    3
dtype: int32

The Problem

So, now the process is: Series → Unstack → DataFrame → Stack → Series

Now, if I do the slicing like before (still on with the indices unsorted) we don't get any error!

# The same slicing
data.loc["a":"c"]

Results without an error:

a  1    5
   2    0
c  1    8
   2    6
dtype: int32

Even if the data.index.is_monotonicFalse. Then still why can we slice?

So the question is: WHY?.

I hope you got the understanding of the situation here. Because see, the same series which was before giving the error, after the unstack and stack operation is not giving any error.

So is that a bug, or a new concept that I am missing here?

Thanks!

Aayush ∞ Shah

@mroeschke mroeschke added Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex labels Nov 14, 2021
@taozuoqiao
Copy link

I don't know whether it is relevent to #44752 I encountered, but one thing shared in common is that the _lexsort_depth (or is_lexsort) is not set right. In your example:

data.index.is_monotonic
# False

but

data.index._lexsort_depth
# 2
data.idnex.is_lexsorted
# True

@phofl
Copy link
Member

phofl commented Dec 10, 2021

This is not related to #44752, but thanks for checking @taozuoqiao

The codes are wrong here.

@phofl
Copy link
Member

phofl commented Dec 10, 2021

This is non trivial to fix, because unstack relies on the wrong codes unfortunately

@taozuoqiao
Copy link

Thanks for point this out, I have no much knowledge on the internal of pandas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

No branches or pull requests

4 participants