Skip to content

BUG: pandas.cut should optionally allow overlapping IntervalIndex bins #27654

@c-thiel

Description

@c-thiel

Due to #23980 the following code now raises a ValueError since 0.25:

Code Sample, a copy-pastable example if possible

ii = pd.IntervalIndex.from_tuples([(0, 10), (2, 12), (4, 14)])
pd.cut([5, 6], bins=ii)

Problem description

Before #23980 an IntervalIndex with overlapping columns could be used. It would return every Interval which is valid for the required data, which is obviously the correct solution.

In #23980 it was stated that this doesn't make sense in the context of cut. Unfortunately I missed the discussion over there (there really was None). I argue that by raising a value error we unnecessarily remove a valid feature: I use cut frequently as kind of a more versatile replacement to pd.rolling for overlapping non-equal sized custom windows.

If there is a smarter way to do this I am happy to learn about it. Otherwise we should at least give the option to use overlapping indices in cut. Thus I would recommend to raise a warning instead of an error here:

elif isinstance(bins, IntervalIndex):
if bins.is_overlapping:
raise ValueError("Overlapping IntervalIndex is not accepted.")

Expected Output

Raise a warning maybe (I am still not sure if this is necessary) and return:

[(0, 10], (2, 12], (4, 14], (0, 10], (2, 12], (4, 14]]
Categories (3, interval[int64]): [(0, 10] < (2, 12] < (4, 14]]

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementIntervalInterval data typeNeeds DiscussionRequires discussion from core team before further actioncutcut, qcut

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions