Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP/API/ENH: IntervalIndex #8707

Closed
wants to merge 8 commits into from
Closed

Conversation

shoyer
Copy link
Member

@shoyer shoyer commented Nov 2, 2014

closes #7640
closes #8625

This is a work in progress, but it's far enough along that I'd love to get
some feedback.

TODOs (more called out in the code):

  • documentation
  • docstrings
  • implement an IntervalTree under the hood to handle index methods
  • index methods handle point queries properly:
    • get_loc
    • get_indexer
    • slice_locs
  • index methods handle interval queries properly
    • get_loc
    • get_indexer (for non-overlapping monotonic indexes)
  • comparison operations (should be lexicographic)
  • ensure is_monotonic always works (also lexicographic)
  • ensure order works
  • cast arrays of intervals when input to pd.Index
  • cythonize the bottlenecks:
    • values
    • from_intervals?
    • Interval?
    • from_tuples?
  • Categorical/cut
  • nice to haves (not essential for MVP)
    • get_indexer in full generality (for randomly ordered indexes)
    • support for arithmetic operations
    • MultiIndex -- should at least give a sane error messages
    • serialization (HDF5) -- should at least raise NotImplementedError
    • consider creating interval_range -- like period_range
    • add to_interval for casting strings (e.g., to_interval('(0, 1]'))
    • release the GIL in IntervalTree queries
  • loads more tests:
    • Series.loc.__getitem__
    • Series.loc.__setitem__
    • indexing a dataframe
    • groupby
    • reset_index
    • add tests in test_index for the subcommon API

CC @jreback @cpcloud @immerrr

@jreback
Copy link
Contributor

jreback commented Nov 2, 2014

so the whole notion of boxing - eg having an underlying impl then getting s boxed scalar type out is inherent in what u r doing

you get this for free - see here
https://github.com/pydata/pandas/blob/master/pandas/tseries/base.py

though prob need to strip out the box/iter stuff (and maybe some other stuff - contains)

can put in another mixin that u can use (it could live in index.py maybe BoxMixIn)
and as long as update tseries/base.py should be good to go

@jreback
Copy link
Contributor

jreback commented Nov 2, 2014

so one of the big use cases here is as the index in a series coming out of cut/qcut
eg it returns an IntervalIndex

def __ne__(self, other):
return not self == other

def __lt__(self, other):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can generically do the index ops like this: https://github.com/pydata/pandas/blob/master/pandas/core/index.py#L43
alternatively you can have a _make_index_op that returns the a function

@jreback
Copy link
Contributor

jreback commented Nov 2, 2014

@shoyer

looks really good!

Instead of cythonizing Interval, you can prob get away with just creating a fastpath constructor lke this: https://github.com/pydata/pandas/blob/master/pandas/tseries/period.py#L68

You are using an _engine of ObjectEngine (the default) in IntervalIndex. You might need to create a IntervalIndexEngine to handle the indexing (but not necessary for first version).

@jreback jreback added API Design Enhancement Internals Related to non-user accessible pandas implementation Indexing Related to indexing on series/frames, not to indexes themselves labels Nov 2, 2014
@jreback jreback added this to the 0.16.0 milestone Nov 2, 2014
@jreback
Copy link
Contributor

jreback commented Nov 2, 2014

@shoyer sort of confused why _data materialis Intervals at ALL.

you don't need it for indexing (e.g you have left/right for that purpose).

ONLY on boxing do you need to do this (e.g. you need to implement the format methods I think).
e.g. so displaying a frame with this as an index would materialze only a small subset of the values

Think of this like DatetimeIndex. You have an underlying imple (e.g. an array of i8). Where you do virtually all computations. ONLY when asked (e.g. could be a selection or iteration), do you actually materialize the values. (which can be expensive I agree).

@shoyer
Copy link
Member Author

shoyer commented Nov 2, 2014

@jreback Yes, not including _data at all and skipping all the standard index machinery was my original idea.

But, there are actually a few advantages to keeping that representation around (the object index):

  1. It gives us default implementations for miscellaneous index methods like is_unique (not entirely sure this is worth the trouble, though).
  2. It gives us the possibility of doing fast O(1) lookups for Interval objects in the index. This could make a significant difference for operations like reindexing/get_indexer (although, again, np.searchsorted is already quite fast).

@jreback
Copy link
Contributor

jreback commented Nov 2, 2014

I disagree entirely. The whole point is that you don't need the object representation. You already have the impl in _left/_right which is very fast to lookup.

In [18]: def f1():
   ....:     i = pd.IntervalIndex.from_breaks(np.arange(1000))
   ....:     iv = pd.Interval(50,51)
   ....:     return i.get_loc(iv)
   ....: 

In [19]: %timeit f1()
100 loops, best of 3: 6.19 ms per loop

In [20]: def f2():
   ....:     i = pd.IntervalIndex.from_breaks(np.arange(1000))
   ....:     iv = pd.Interval(50,51)
   ....:     return i._left.get_loc(iv.left), i._right.get_loc(iv.right)
   ....: 

In [21]: f2()
Out[21]: (50, 50)

In [22]: f1()    
Out[22]: 50

In [23]: %timeit f2()
10000 loops, best of 3: 72.3 us per loop

caveat:

what I did is naive. And not sure of the exact lookup semantics for IN an interval, but easy enough to maybe keep a 'freq' for left/right, (or can just do right-left), e.g. to find the 'natural' interval of the Index. Then you can take a number and find the left and right of it (WITHOUT using searchsorted, but indexing which is O(1)) then do a simple comparison.

you ONLY need to use search sorted I think if their is no freq. E.g. you have a bunch of non-regular intervals (but since you have left and right you can prob do a pretty good job, e.g. if you find where it is on the lft, then you know about where it is on the right). You may need a custom searchsorted type of this 'irregular' intervals.

@jreback
Copy link
Contributor

jreback commented Nov 2, 2014

unique is easy too, something like

(this is w/o boxing)

In [26]: %timeit Index(zip(i._left,i._right)).unique()
1000 loops, best of 3: 1.27 ms per loop

With Boxing

In [32]: %timeit [ pd.Interval(*x) for x in Index(zip(i._left,i._right)).unique() ]
100 loops, best of 3: 2.18 ms per loop

remember these still bee the original construction + ._data

and keeping it around has memory overhead (much more than 2x the original)

@shoyer
Copy link
Member Author

shoyer commented Nov 2, 2014

@jreback OK, I will give this a try without using _data at all. Using a natural interval "frequency" is indeed an appealing idea for facilitating O(1) lookups (this is similar to the "grid index" I discussed in the original proposal).

@jreback
Copy link
Contributor

jreback commented Nov 2, 2014

haha, ok.

so the freq I am talking about is between 2 Intervals and not the distance in a single interval (which is a value like the left/right). Hmm do they have to be the same? I guess for a regular interval series they do.

In [40]: x = pd.IntervalIndex.from_breaks(pd.timedelta_range('1s',periods=1000,freq='s'))

In [41]: pd.Interval(pd.Timedelta('1s'),pd.Timedelta('3s'))
Out[41]: Interval(Timedelta('0 days 00:00:01'), Timedelta('0 days 00:00:03'), closed='right')

have a look at tseries.frequency.FrequencyInferer, though it this case its almost trivial

e.g. (i._right-i.left).unique() if you have 1 value only then that s the freq

In [44]: (x._right-x._left).unique()[0]
Out[44]: Timedelta('0 days 00:00:01')

BTW, my point of this example is to have tests for the possible types of interval operands:

integer, float, Timedeltas come to mind, though I suppose Timestamps/Periods make sense too.

hmm, then their is:

Interval('a','b')

Which i suppose 'works', but not sure how searching this would work. Maybe only allow certain types (and the left/right of an Interval should be the same).

I think CategoricalIndex is the way to the above.

@shoyer
Copy link
Member Author

shoyer commented Nov 2, 2014

I think you do need both the space between intervals and the distance for single intervals to be at a constant frequency in order to be able to do fast lookups. Everything needs to be able to map to a Int64Index under the hood (not unlike PeriodIndex).

I may try to get things working first for the general case and then add the optimized freq later, though I agree that this is a important as intervals with a constant frequency are extremely common.

Also agreed that we need tests for many types. Strings can't have a well-defined frequency, but I don't much harm in allowing their use.

@jreback
Copy link
Contributor

jreback commented Nov 2, 2014

yup

though you don't need it to map to Int64Index per-se, more like you need to have the Index support certain operations, e.g. get_loc and subtraction (which all of the indexes do).

This Index is just a dispatcher really to the combination of the underlying indexes. Much like MultiIndex is a collection of Index objects (that is well-ordered).


@cache_readonly
def mid(self):
# TODO: figure out how to do add/sub as arithemtic even on Index
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I need some way to take the difference of two DatetimeIndex objects (i.e., returning a TimedeltaIndex). Because +/- still means union/set difference for indexes, I can't think of any way to do this right now short of converting to series objects (ugh) or converting the indexes to ndarrays and doing the NaT checks manually (also ugh).

What do you think? I think we either need to define add and sub methods for indexes (kind of pointless in the long term) or make +/- on indexes do arithmetic in 0.16. Or we could define temporary private _add and _sub methods on indexes for this strict benefit of this method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess in reality all I really need is the mean of two indexes. So I could also write a one-off function for that, doing the special work arounds for time indexes.

@jreback
Copy link
Contributor

jreback commented Nov 5, 2014

Just do this to subtract the 2 dti's (and not so set difference). bti I would for now not allow tz's in any passed in left/right (may work though)

In [8]: rng = date_range('20130101',periods=5)

In [9]: rng2 = rng+pd.to_timedelta(np.arange(5),unit='s')

In [10]: rng
Out[10]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01, ..., 2013-01-05]
Length: 5, Freq: D, Timezone: None

In [11]: rng2
Out[11]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01 00:00:00, ..., 2013-01-05 00:00:04]
Length: 5, Freq: 86401S, Timezone: None

In [12]: pd.TimedeltaIndex(rng2.values-rng.values)
Out[12]: 
<class 'pandas.tseries.tdi.TimedeltaIndex'>
['00:00:00', ..., '00:00:04']
Length: 5, Freq: None

@shoyer
Copy link
Member Author

shoyer commented Nov 5, 2014

@jreback you're not worried about NaT screwing things up? I suppose we might as well insist that left and right cannot have missing values, anyways.

@shoyer
Copy link
Member Author

shoyer commented Nov 5, 2014

Actually, turns out subtracting NaT is not entirely broken in numpy:

# wtf
In [9]: pd.to_datetime(['NaT']).values - pd.to_datetime(['2000-01-01']).values
Out[9]: array([-9223372036854775808], dtype='timedelta64[ns]')

# but put it in an index, and it's still NaT!
In [10]: pd.Index(pd.to_datetime(['NaT']).values - pd.to_datetime(['2000-01-01']).values)
Out[10]:
<class 'pandas.tseries.tdi.TimedeltaIndex'>
[NaT]
Length: 1, Freq: None

I am shocked!

@jreback
Copy link
Contributor

jreback commented Nov 5, 2014

hahha

their is a simple routine to do this
maybe I need a dti.sub(dti2)

though let me see about making this not do set difference and do subtraction on - it does make more sense

maybe

dti - dti2.values works now

@jreback
Copy link
Contributor

jreback commented Jan 24, 2016

@shoyer hows this coming?

@wesm
Copy link
Member

wesm commented Jan 24, 2016

I suggest moving this code to the new pandas.indexes subpackage

@jreback
Copy link
Contributor

jreback commented Jan 24, 2016

+1 on moving as well

@jreback
Copy link
Contributor

jreback commented Mar 12, 2016

can you rebase/update

@jreback jreback removed this from the Next Major Release milestone Mar 13, 2016
@auvipy
Copy link

auvipy commented May 2, 2016

@shoyer can u rebase it?

@jreback
Copy link
Contributor

jreback commented May 20, 2016

@shoyer status (can you rebase as well)

@shoyer
Copy link
Member Author

shoyer commented May 20, 2016

I still would love to see Interval and IntervalIndex happen in pandas, but realistically I probably need to abandon this. I think we did finally figure out the right approach, but this will still require some significant work to finish up and merge, and I don't think I'll be able to find the time to make that happen. I'll still be around if someone else wants to take this over the finish line, though.

It feels like this change is 80% of the way there, but that last 20% (tests, docs, handling edge cases, compat with other parts of the library) is a lot of work. I also worry about adding in more ad-hoc Cython templating when we are on the verge of a core rewrite that will provide a much more sustainable way to do this (with C++ templates).

@jreback
Copy link
Contributor

jreback commented Sep 9, 2016

status?

@shoyer
Copy link
Member Author

shoyer commented Sep 9, 2016

See above. Nothing new to add.

@jreback
Copy link
Contributor

jreback commented Sep 9, 2016

ok closing then
still searchable of course and your prior art is pretty good

@jreback jreback closed this Sep 9, 2016
@jreback
Copy link
Contributor

jreback commented Feb 2, 2017

@shoyer thinking of finishing this
can easily change to use tempita and our Index API is pretty clean now

assume no objections and still quite useful feature?

@shoyer
Copy link
Member Author

shoyer commented Feb 2, 2017

@jreback Absolutely, I would be totally supportive if you or anyone else wants to finish this!

I agree that tempita is the way to go for templating.

@jreback
Copy link
Contributor

jreback commented Feb 2, 2017

@shoyer

        exp_levels = np.array([Interval(0, 2.664, closed='both'),
                               Interval(2.664, 5.328), Interval(5.328, 8)])

so this is possible as a numpy array, but does not make sense as an IntervalIndex, IOW, closed has to be the same across all Intervals. or is this valid as an II?

This comes from this (a test in pandas/tools/tests/test_tile.py (this is a diff on this existing PR vs master at the time).

     def test_qcut_include_lowest(self):
@@ -183,9 +179,8 @@ class TestCut(tm.TestCase):
 
         cats = qcut(values, 4)
 
-        ex_levels = [Interval(0, 2.25, closed='both'), Interval(2.25, 4.5),
-                     Interval(4.5, 6.75), Interval(6.75, 9)]
-        self.assert_numpy_array_equal(unique(cats), ex_levels)
+        ex_levels = ['[0, 2.25]', '(2.25, 4.5]', '(4.5, 6.75]', '(6.75, 9]']
+        self.assertTrue((cats.categories == ex_levels).all())
 

?

@shoyer
Copy link
Member Author

shoyer commented Feb 2, 2017

@jreback Yes, that was one of the trickier aspects of this PR. Basically, the fundamental problem here is the include_lowest argument to pandas.cut:

>>> pd.cut([1, 2, 3, 4, 5], 3, include_lowest=True)
[[0.996, 2.333], [0.996, 2.333], (2.333, 3.667], (3.667, 5], (3.667, 5]]
Categories (3, object): [[0.996, 2.333] < (2.333, 3.667] < (3.667, 5]]

The lowest category is an interval that is closed on both sides, different from all the other intervals.

Possible the best path forward here is to deprecate include_lowest, and for now still return string categorical (rather than an interval categorical) if include_lowest=True.

@jreback
Copy link
Contributor

jreback commented Feb 2, 2017

hmm, this should be then equivalent (at least for floats) to:

Interval(lb-1e-12, ub, closed='right') and Interval(lb, ub, closed='both')?

or reallly just -precision

(this would work for Timestamp/Timedelta as well, though for ints would end up changing the dtype.....

@shoyer
Copy link
Member Author

shoyer commented Feb 2, 2017

So there's also a case for ditching closed='both' and closed='neither', only sticking with right/left, maybe using right=True like pd.cut. There are lots of messy edge cases with both/neither and I don't think they have many use cases.

@jreback
Copy link
Contributor

jreback commented Feb 2, 2017

hmm, ok, have it mostly working. rebasing was easy. but have to fix things to be in the style of Index, and the templating. soon.

@jreback
Copy link
Contributor

jreback commented Feb 3, 2017

@shoyer I think was your intention, and after mulling it over for a while it makes sense.
cutting produces a 'categorical' like thing, which if not labels are specified will be an IntervalIndex (rather than previously a Categorical with interval-like string labels). BUT the label= case will label as specified by the user and by-definition return a Categorical.

Even though this might be a bit odd (as the API now will return different 'types' depending if the labels are specified), this makes intuitive sense (as if a user provides labels, I could make an IntervalIndex out of it, though it looks a bit odd.

In point of fact these 2 things (Categoricals and IntervalIndex) work very similarly when grouping and such, so I think this is ok.

0.19.2

In [1]: pd.cut(np.arange(5), bins=3)
Out[1]: 
[(-0.004, 1.333], (-0.004, 1.333], (1.333, 2.667], (2.667, 4], (2.667, 4]]
Categories (3, object): [(-0.004, 1.333] < (1.333, 2.667] < (2.667, 4]]

# this is unchanged
In [3]: pd.cut(np.arange(5), bins=3, labels=list('abc'))
Out[3]: 
[a, a, b, c, c]
Categories (3, object): [a < b < c]

# this is unchanged
In [1]: pd.cut(np.arange(5), bins=3, labels=False)
Out[1]: array([0, 0, 1, 2, 2])

with PR

In [1]: pd.cut(np.arange(5), bins=3)
Out[1]: 
IntervalIndex(left=[-0.004, -0.004, 1.333, 2.667, 2.667],
              right=[1.333, 1.333, 2.667, 4.0, 4.0],
              closed='right')

# I just noticed that this one is un-ordered, while the 0.19.2 is ordered
# thoughts?
In [3]: pd.cut(np.arange(5), bins=3, labels=list('abc'))
Out[3]: 
[a, a, b, c, c]
Categories (3, object): [a, b, c]

@jreback
Copy link
Contributor

jreback commented Feb 4, 2017

@shoyer pls have a look, all tests finally passing, though a few issues remain.
https://github.com/pandas-dev/pandas/compare/master...jreback:intervalindex?expand=1

here's summary of changes since your version.

  • redid the construction impl a bit, now uses the proper Index calling conventions (as well as other sub-classing conventions).
  • more testing, including all generic index tests (but still prob need some more)
  • add ._mask to track internal nans (its lazily computed), IOW you can have null entries in the index.
  • using tempita, so have intervalindex.pxi.in
  • the extension is housed in _interval now, rather than a part of lib
  • tiny tiny hack of a Interval[...] dtype (going to fix this soon and make it part of the index dtype), but still as object in a frame (similar to how Period works now).
  • not really tested for perf of anything, will have to see about that
    I needed an API change on pd.cut:

Going to work some more on this before a PR, but any comments welcome.

# we now return an II rather than a Categorical (of course this is the API we all want!)
In [9]: pd.cut(np.arange(3), 2)
Out[9]: 
IntervalIndex(left=[-0.002, -0.002, 1.0],
              right=[1.0, 1.0, 2.0],
              closed='right')

In [10]: pd.cut(np.arange(3), bins=[0, 2, 4])
Out[10]: 
IntervalIndex(left=[nan, 0.0, 0.0],
              right=[nan, 2.0, 2.0],
              closed='right')

# include_lowest now adjusts the left-hand bind so everything is closed='right'
In [11]: pd.cut(np.arange(3), bins=[0, 2, 4], include_lowest=True)
Out[11]: 
IntervalIndex(left=[-0.001, -0.001, -0.001],
              right=[2, 2, 2],
              closed='right')

# I now also return the uniques (because you can't necessarily compute them from the result
# and the input bins are not good enough because we may have adjusted them (e.g. include_lowest)
In [12]: pd.cut(np.arange(3), bins=[0, 2, 4], include_lowest=True, retbins=True)
Out[12]: 
(IntervalIndex(left=[-0.001, -0.001, -0.001],
               right=[2, 2, 2],
               closed='right'),
 array([0, 2, 4]),
 IntervalIndex(left=[-0.001, 2.0],
               right=[2, 4],
               closed='right'))

the retbins change is no big deal as its really just used internally.

I had a heckava time getting .value_counts() to work properly (IOW to handle dropna).

@jorisvandenbossche
Copy link
Member

Re the cut change: it can also be a Categorical with IntervalIndex categories what we want? (which would be a bit more backwards compatible)
(I have to look more in detail and try to wrap my head around what would in practice be the difference between Categorical of Intervals vs IntervalIndex)

@jreback
Copy link
Contributor

jreback commented Feb 5, 2017

@jorisvandenbossche yeah I think we did talk about doing a Categorical with an IntervalIndex as the categories. it should work.

In [2]: pd.Categorical.from_codes([0, 1, 1, -1, 0, 1], categories=pd.IntervalIndex.from_breaks(np.arange(3)))
Out[2]: 
[(0, 1], (1, 2], (1, 2], NaN, (0, 1], (1, 2]]
Categories (2, object): [(0, 1], (1, 2]]

In [3]: pd.Categorical.from_codes([0, 1, 1, -1, 0, 1], categories=pd.IntervalIndex.from_breaks(np.arange(3))).codes
Out[3]: array([ 0,  1,  1, -1,  0,  1], dtype=int8)

In [4]: pd.Categorical.from_codes([0, 1, 1, -1, 0, 1], categories=pd.IntervalIndex.from_breaks(np.arange(3))).categories
Out[4]: 
IntervalIndex(left=[0, 1],
              right=[1, 2],
              closed='right')

@jreback
Copy link
Contributor

jreback commented Feb 5, 2017

actually this just works (so will fix API).

In [2]: pd.Categorical(pd.cut(range(3), 3))
Out[2]: 
[(-0.002, 0.667], (0.667, 1.333], (1.333, 2.0]]
Categories (3, object): [(-0.002, 0.667], (0.667, 1.333], (1.333, 2.0]]

In [3]: pd.Categorical(pd.cut(range(3), 3)).codes
Out[3]: array([0, 1, 2], dtype=int8)

In [6]: pd.Categorical(pd.cut(range(3), 3)).categories
Out[6]: 
IntervalIndex(left=[-0.002, 0.667, 1.333],
              right=[0.667, 1.333, 2.0],
              closed='right')

and in 0.19.2

In [1]: pd.cut(range(3), 3)
Out[1]: 
[(-0.002, 0.667], (0.667, 1.333], (1.333, 2]]
Categories (3, object): [(-0.002, 0.667] < (0.667, 1.333] < (1.333, 2]]

In [2]: pd.cut(range(3), 3).codes
Out[2]: array([0, 1, 2], dtype=int8)

In [3]: pd.cut(range(3), 3).categories
Out[3]: Index(['(-0.002, 0.667]', '(0.667, 1.333]', '(1.333, 2]'], dtype='object')

@jreback
Copy link
Contributor

jreback commented Feb 5, 2017

ok pretty simple. added .astype as well. and they are ordered.

In [1]: pd.cut(range(3), 3)
Out[1]: CategoricalIndex([(-0.002, 0.667], (0.667, 1.333], (1.333, 2.0]], categories=[(-0.002, 0.667], (0.667, 1.333], (1.333, 2.0]], ordered=True, dtype='category')

In [2]: pd.cut(range(3), 3).categories
Out[2]: 
IntervalIndex(left=[-0.002, 0.667, 1.333],
              right=[0.667, 1.333, 2.0],
              closed='right')

In [3]: pd.cut(range(3), 3).categories.astype('category')
Out[3]: CategoricalIndex([(-0.002, 0.667], (0.667, 1.333], (1.333, 2.0]], categories=[(-0.002, 0.667], (0.667, 1.333], (1.333, 2.0]], ordered=True, dtype='category')

@jreback jreback mentioned this pull request Feb 5, 2017
jreback added a commit that referenced this pull request Apr 14, 2017
closes #7640
closes #8625

reprise of #8707

Author: Jeff Reback <jeff@reback.net>
Author: Stephan Hoyer <shoyer@climate.com>

Closes #15309 from jreback/intervalindex and squashes the following commits:

11ab1e1 [Jeff Reback] merge conflicts
834df76 [Jeff Reback] more docs
fbc1cf8 [Jeff Reback] doc example and bug
7577335 [Jeff Reback] fixup on merge of changes in algorithms.py
3a3e02e [Jeff Reback] sorting example
4333937 [Jeff Reback] api-types test fixing
f0e3ad2 [Jeff Reback] pep
b2d26eb [Jeff Reback] more docs
e5f8082 [Jeff Reback] allow pd.cut to take an IntervalIndex for bins
4a5ebea [Jeff Reback] more tests & fixes for non-unique / overlaps rename _is_contained_in -> contains add sorting test
340c98b [Jeff Reback] CLN/COMPAT: IntervalIndex
74162aa [Stephan Hoyer] API/ENH: IntervalIndex
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Enhancement Indexing Related to indexing on series/frames, not to indexes themselves Internals Related to non-user accessible pandas implementation Interval Interval data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API/ENH: create Interval class Proposal: New Index type for binned data (IntervalIndex)