Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: support for removing unused levels of a MultiIndex (interally) #15700

Closed
wants to merge 3 commits into from

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Mar 16, 2017

on top of #15694

xref #2770

Here's an example of what we could do with this

In [1]: df = pd.DataFrame({'value': [1, 2, 3, 4]}, index=pd.MultiIndex(
   ...:              levels=[['a', 'b'], ['bb', 'aa']],
   ...:              labels=[[0, 0, 1, 1], [0, 1, 0, 1]]))

In [2]: df
Out[2]: 
      value
a bb      1
  aa      2
b bb      3
  aa      4

In [14]: df.index.is_lexsorted()
Out[14]: True

In [15]: df.index.is_monotonic
Out[15]: False

sorting makes this monotonic & usually lexsorted (but not always)

In [3]: df2 = df.sort_index()

In [4]: df2
Out[4]: 
      value
a aa      2
  bb      1
b aa      4
  bb      3

In [12]: df2.index.is_lexsorted()
Out[12]: False

In [13]: df2.index.is_monotonic
Out[13]: True

If we expose a method .remove_unused_labels() (or even just do this under the hood on certain operations.

In [5]: df3 = df2.copy()

In [6]: df3.index._reconstruct(sort=True)
Out[6]: 
MultiIndex(levels=[['a', 'b'], ['aa', 'bb']],
           labels=[[0, 0, 1, 1], [0, 1, 0, 1]])

In [7]: df3.index = df3.index._reconstruct(sort=True)

In [8]: df3
Out[8]: 
      value
a aa      2
  bb      1
b aa      4
  bb      3

In [9]: df3.index.is_lexsorted()
Out[9]: True
In [11]: df3.index.is_monotonic

@codecov
Copy link

codecov bot commented Mar 16, 2017

Codecov Report

Merging #15700 into master will decrease coverage by <.01%.
The diff coverage is 91.66%.

@@            Coverage Diff             @@
##           master   #15700      +/-   ##
==========================================
- Coverage   91.01%   91.01%   -0.01%     
==========================================
  Files         143      143              
  Lines       49400    49448      +48     
==========================================
+ Hits        44963    45006      +43     
- Misses       4437     4442       +5
Impacted Files Coverage Δ
pandas/core/groupby.py 95.48% <100%> (+0.51%) ⬆️
pandas/core/frame.py 97.86% <100%> (-0.1%) ⬇️
pandas/core/reshape.py 99.27% <100%> (-0.01%) ⬇️
pandas/core/sorting.py 97.81% <100%> (+0.03%) ⬆️
pandas/core/series.py 94.79% <85.71%> (-0.08%) ⬇️
pandas/indexes/multi.py 96.37% <90.69%> (-0.22%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/indexes/base.py 96.08% <0%> (-0.06%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 94720d9...f3ec8ac. Read the comment docs.

@jreback jreback force-pushed the unused branch 2 times, most recently from dbf1c94 to aa6190f Compare March 22, 2017 13:58
@jreback
Copy link
Contributor Author

jreback commented Mar 22, 2017

going to roll this into #15694

@jreback jreback closed this Mar 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant