New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate IntervalIndex.from_intervals in favor of the IntervalIndex constructor #19263

Closed
jschendel opened this Issue Jan 16, 2018 · 10 comments

Comments

Projects
None yet
5 participants
@jschendel
Member

jschendel commented Jan 16, 2018

Currently, the IntervalIndex constructor and IntervalIndex.from_intervals essentially have the same functionality:

In [2]: ivs = [pd.Interval(0, 1), pd.Interval(2, 3)]

In [3]: pd.IntervalIndex(ivs)
Out[3]:
IntervalIndex([(0, 1], (2, 3]]
              closed='right',
              dtype='interval[int64]')

In [4]: pd.IntervalIndex.from_intervals(ivs)
Out[4]:
IntervalIndex([(0, 1], (2, 3]]
              closed='right',
              dtype='interval[int64]')

However, both are using independent code to do this, which seems redundant. This has also led to two being slightly out of sync, as the IntervalIndex constructor supports a few more parameters (dtype, closed) and has some additional checks that IntervalIndex.from_intervals does not.

Seems like there are a few options to clean things up:

  1. Move all code to the constructor, and redirect from_intervals to the constructor
  2. Move all construction code to from_intervals, and redirect the constructor to from_intervals
    • Would still have some constructor specific behavior, e.g. fastpath
  3. Remove from_intervals entirely, and only use the constructor
  4. Something else?

I'm leaning towards 1, as it looks like that's what eventually happens elsewhere for other from_* methods, e.g. Categorical.from_codes, MultiIndex.from_product. Don't really have a strong opinion though.

@jreback jreback added this to the 0.23.0 milestone Jan 16, 2018

@jreback

This comment has been minimized.

Contributor

jreback commented Jan 16, 2018

lets do option 1 and deprecate

@jschendel jschendel changed the title from Refactor or remove IntervalIndex.from_intervals to Deprecate IntervalIndex.from_intervals in favor of the IntervalIndex constructor Jan 16, 2018

@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Jan 17, 2018

+1. A few things become easier if we can get rid of the special cases for fooIndex._constructor, which this would be a step towards.

@jorisvandenbossche

This comment has been minimized.

Member

jorisvandenbossche commented Jan 22, 2018

What was the original motivation to add this?

@jreback

This comment has been minimized.

Contributor

jreback commented Jan 22, 2018

@jorisvandenbossche I would ask you the reverse question? why have another method of constructing things.

@jreback

This comment has been minimized.

Contributor

jreback commented Jan 23, 2018

@shoyer I know you prefer the from_* but we already have a lot of them.

@shoyer

This comment has been minimized.

Member

shoyer commented Jan 23, 2018

In the original design the IntervalIndex constructor looked like IntervalIndex(left, right, closed='right', name=None, ...), so we needed a separate constructor to create one from a list of Interval objects. At some point, that was changed to the current constructor IntervalIndex(data, closed='right', name=None, ...) and a separate constructor IntervalIndex.from_arrays() was added for constructing from separate arrays.

So I agree that we don't need the separate from_intervals() classmethod now.

I will also add that I'm slightly puzzled by why we have a closed argument in the IntervalIndex constructor now. It seems like that should be set entirely by the data (and/or maybe the dtype).

@jschendel

This comment has been minimized.

Member

jschendel commented Jan 24, 2018

I will also add that I'm slightly puzzled by why we have a closed argument in the IntervalIndex constructor now. It seems like that should be set entirely by the data (and/or maybe the dtype).

Was actually going to open an issue regarding this later today, but will share my thoughts here too.

There are a couple corner cases where closed cannot be inferred from the data, specifically for empty or purely NA data, so specifying closed is necessary to get non-default in these cases:

In [2]: pd.IntervalIndex([], closed='both')
Out[2]:
IntervalIndex([]
              closed='both',
              dtype='interval[int64]')

In [3]: pd.IntervalIndex([np.nan, np.nan], closed='neither')
Out[3]:
IntervalIndex([nan, nan]
              closed='neither',
              dtype='interval[float64]')

This currently isn't supported (it's what the issue I'm planning to open is about), but I'm of the opinion that we should allow users to override the inferred closed if they explicitly provide a different value. This how other parameters behave, like dtype, so it seems reasonable. The current behavior (at least after #19339 is merged) seems a bit inconsistent in terms of what parameters override vs get ignored vs raise:

In [4]: ii = pd.interval_range(0, 3)

In [5]: ii
Out[5]:
IntervalIndex([(0, 1], (1, 2], (2, 3]]
              closed='right',
              dtype='interval[int64]')

In [6]: pd.IntervalIndex(ii, closed='both', dtype='interval[float64]')  # dtype changes, closed ignored
Out[6]:
IntervalIndex([(0.0, 1.0], (1.0, 2.0], (2.0, 3.0]]
              closed='right',
              dtype='interval[float64]')

In [7]: pd.IntervalIndex(ii.values, closed='both', dtype='interval[float64]')  # closed raises
---------------------------------------------------------------------------
ValueError: conflicting values for closed: constructor got 'both', inferred from data 'right'
@shoyer

This comment has been minimized.

Member

shoyer commented Jan 24, 2018

Can we add closed to the IntervalDtype instead?

@jschendel

This comment has been minimized.

Member

jschendel commented Jan 24, 2018

Can we add closed to the IntervalDtype instead?

Adding closed to IntervalDtype makes sense to me, as in my mind closed is inherently part of the dtype. Since differing closed makes two IntervalIndex incompatible, I'd expect their dtypes to not be equal, but they currently are, which seems a little strange:

In [2]: ii1 = pd.interval_range(0, 3, closed='left')

In [3]: ii2 = pd.interval_range(0, 3, closed='right')

In [4]: ii1.dtype == ii2.dtype
Out[4]: True

That being said, I'm not sure it'd be best to remove the closed parameter from the IntervalIndex constructor, even if we add closed to IntervalDtype. Using a closed parameter would be more convenient for users than having them to construct an IntervalDtype, especially if that's the only thing they want to alter. For what it's worth, CategoricalDtype has both categories and ordered parameters, but the CategoricalIndex constructor still takes them as parameters. I'm open to being convinced otherwise though. Will create a new issue for discussion on this topic.

@jschendel

This comment has been minimized.

Member

jschendel commented Jan 24, 2018

xref #19370 (allowing closed to override inferred value)
xref #19371 (adding closed to IntervalDtype)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment