Fix multiindex selection #2621

fujiisoup · 2018-12-19T10:30:15Z

Closes Selection of MultiIndex makes following unstack wrong #2619
Tests added
Fully documented, including whats-new.rst for all changes and api.rst for new API

Fix using MultiIndex.remove_unused_levels()

pep8speaks · 2018-12-19T10:30:19Z

Hello @fujiisoup! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on December 24, 2018 at 12:51 Hours UTC

shoyer · 2018-12-19T17:26:36Z

Should we update levels after indexing or when stacking? Pandas does the later:

In [31]: import pandas as pd

In [32]: mindex = pd.MultiIndex.from_product([[1, 2, 3], ['a', 'b']])

In [33]: s = pd.Series(range(6), mindex)

In [34]: s.loc[:1].index
Out[34]:
MultiIndex(levels=[[1, 2, 3], ['a', 'b']],
           labels=[[0, 0], [0, 1]])

In [35]: s.loc[:1].unstack()
Out[35]:
   a  b
1  0  1

The advantage of waiting until unstacking is that removing unused levels could be expensive compared to indexing: it runs in time O(L+N), where L is the size of the original level and N is the length of the new multi-index.

fujiisoup · 2018-12-19T21:38:18Z

@shoyer

This issue does not matter only in unstack but also in sel (reindex also?).
We can do it in both methods for efficiency, but I just prefered to do it in one place for the quick fix.

But what is the good place to do it actually?
If we invoke this in sel, it would be again inefficient as this is called every time we did .sel.

Probably the best for the efficiency is to keep a flag for it in PandasIndexAdapter and do it in _get_item_with_mask and get_loc only if flag is raised?

shoyer · 2018-12-20T01:07:27Z

This issue does not matter only in unstack but also in sel (reindex also?).

Can you explain why matters for .sel()? I guess it slows down repeated indexing?

My inclination was just to copy what pandas does (which is only removing unused levels in unstack)

fujiisoup · 2018-12-20T06:22:38Z

@shoyer

My inclination was just to copy what pandas does (which is only removing unused levels in unstack)

Thanks. I will take a look the source later.

Can you explain why matters for .sel()?

Selection of non-exsiting level variable should be KeyError, but it gives a 0-size index.

In [8]: ds = xr.DataArray(np.arange(40).reshape(8, 5), dims=['x', 'y'],  
                  coords={'x': np.arange(8), 'y': np.arange(5)}).stack(xy=['x', 'y']) 

In [9]: ds.isel(xy=ds['x'] < 4).sel(x=5)  # should be KeyError
Out[9]: 
<xarray.DataArray (y: 0)>
array([], dtype=int64)
Coordinates:
  * y        (y) int64

fujiisoup · 2018-12-20T06:59:43Z

But this problem of .sel can be simply solved by manually raising a keyerror if the result of get_loc is size 0 array.

fujiisoup · 2018-12-20T08:09:09Z

I moved MultiIndex.remove_unused_levels() to the inside of unstack. Now it is invoked only when unstack is called.

shoyer · 2018-12-23T19:48:33Z

xarray/core/pdcompat.py

@@ -0,0 +1,79 @@
+import numpy as np
+import pandas as pd
+import pandas.core.algorithms as algos


can we do a local import here instead the function?

I'm a little nervous that some future version of pandas may drop this (private) module, which would then break imports even though this isn't actually used.

Good catch! Fixed.

shoyer · 2018-12-23T19:49:32Z

xarray/core/pdcompat.py

+import pandas as pd
+import pandas.core.algorithms as algos
+
+


Can you copy the pandas license directly into this file, too? See coding/cftimesindex.py for an example.

* master: DEP: drop python 2 support and associated ci mods (pydata#2637) TST: silence warnings from bottleneck (pydata#2638) revert to dev version DOC: fix docstrings and doc build for 0.11.1 Source encoding always set when opening datasets (pydata#2626) Add flake check to travis (pydata#2632) Fix dayofweek and dayofyear attributes from dates generated by cftime_range (pydata#2633) silence import warning (pydata#2635) fill_value in shift (pydata#2470) Flake fixed (pydata#2629) Allow passing of positional arguments in `apply` for Groupby objects (pydata#2413) Fix failure in time encoding for pandas < 0.21.1 (pydata#2630) Fix multiindex selection (pydata#2621) Close files when CachingFileManager is garbage collected (pydata#2595) added some logic to deal with rasterio objects in addition to filepaths (pydata#2589) Get 0d slices of ndarrays directly from indexing (pydata#2625) FIX Don't raise a deprecation warning for xarray.ufuncs.{angle,iscomplex} (pydata#2615) CF: also decode time bounds when available (pydata#2571)

Fix multiindex selection

50eab43

fujiisoup added 2 commits December 19, 2018 22:21

Support pandas0.19

762f496

a bugfix

6bb8166

fujiisoup added 3 commits December 20, 2018 08:31

Do remove_unused_levels only once in unstack.

a806c64

import algos

205f948

Remove unused import

b15cab3

shoyer reviewed Dec 23, 2018

View reviewed changes

fujiisoup added 2 commits December 24, 2018 12:23

Adopt local import

edb4a24

Merge branch 'master' into multiindex_remove_unused

61d1d49

shoyer merged commit b5059a5 into pydata:master Dec 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix multiindex selection #2621

Fix multiindex selection #2621

fujiisoup commented Dec 19, 2018

pep8speaks commented Dec 19, 2018 •

edited

shoyer commented Dec 19, 2018

fujiisoup commented Dec 19, 2018

shoyer commented Dec 20, 2018

fujiisoup commented Dec 20, 2018

fujiisoup commented Dec 20, 2018

fujiisoup commented Dec 20, 2018

shoyer Dec 23, 2018

fujiisoup Dec 24, 2018

shoyer Dec 23, 2018

fujiisoup Dec 24, 2018

Fix multiindex selection #2621

Fix multiindex selection #2621

Conversation

fujiisoup commented Dec 19, 2018

pep8speaks commented Dec 19, 2018 • edited

Comment last updated on December 24, 2018 at 12:51 Hours UTC

shoyer commented Dec 19, 2018

fujiisoup commented Dec 19, 2018

shoyer commented Dec 20, 2018

fujiisoup commented Dec 20, 2018

fujiisoup commented Dec 20, 2018

fujiisoup commented Dec 20, 2018

shoyer Dec 23, 2018

Choose a reason for hiding this comment

fujiisoup Dec 24, 2018

Choose a reason for hiding this comment

shoyer Dec 23, 2018

Choose a reason for hiding this comment

fujiisoup Dec 24, 2018

Choose a reason for hiding this comment

pep8speaks commented Dec 19, 2018 •

edited