Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid Index DeprecationWarning in Series getitem #31361

Merged
merged 8 commits into from
Jan 28, 2020

Conversation

TomAugspurger
Copy link
Contributor

xref #30867

cc @jorisvandenbossche. The alternative is to use a warnings filter inside SingleBlockManger.get_slice to filter the warning, but I think avoiding the warning in the first place is a bit nicer.

@TomAugspurger TomAugspurger added this to the 1.0.0 milestone Jan 27, 2020
@TomAugspurger TomAugspurger added Indexing Related to indexing on series/frames, not to indexes themselves Warnings Warnings that appear or should be added to pandas labels Jan 27, 2020
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good!

pandas/tests/series/indexing/test_indexing.py Outdated Show resolved Hide resolved
@TomAugspurger
Copy link
Contributor Author

Oh, I got who calls what backwards here... Will need to re-think things...

We don't want to change all the index subclasses __getitem__s to do the _getitem_deprecate_nd version that maybe silences warnings.

@TomAugspurger
Copy link
Contributor Author

Changed to using a warnings filter. Note that this has a perf impact for slicing a Series. ~25% slower

Master

In [9]: s = pd.Series(range(10000))

In [10]: %timeit s[:5]
43.5 µs ± 2.15 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [11]: t = pd.Series([], dtype=float)

In [12]: %timeit t[:]
47.8 µs ± 2.24 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

This PR

In [7]: s = pd.Series(range(10000))

In [8]: %timeit s[:5]
55.6 µs ± 1.69 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [9]: t = pd.Series([], dtype=float)

In [10]: %timeit t[:]
64.2 µs ± 2.27 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Is that slowdown acceptable? Right now, I don't think it is, given that

  1. This is just a DeprecationWarning
  2. We'll be deprecating this in the future anyway.

@jbrockmendel
Copy link
Member

jbrockmendel commented Jan 27, 2020

I think there are 2 ways we get to get_slice with non-slice. One of them is L931, and can be avoided be extracting the single-item list after potentially warning in #31333. The other is on L940 and is explicltly an "mpl hackaround", so we can maybe-unpack there.

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Jan 28, 2020

Thanks @jbrockmendel, that did it. We're able to apply the filter only when using the mpl compat path.

master

In [2]: s = pd.Series(range(10000))

In [3]: %timeit s[:, None]
36.2 µs ± 293 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [4]: t = pd.Series([], dtype=float)

In [5]: %timeit t[:]
52 µs ± 3.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [6]: %timeit s[:5]
45 µs ± 2.27 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

PR

In [4]: s = pd.Series(range(10000))

In [5]: %timeit s[:, None]
50.2 µs ± 2.56 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [6]: t = pd.Series([], dtype=float)

In [7]: %timeit t[:]
48.4 µs ± 2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [8]: %timeit s[:5]
44.3 µs ± 2.52 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

So the first case is the only one with a slowdown, which I think is acceptable given that we want to deprecate that behavior anyway.

@TomAugspurger
Copy link
Contributor Author

Had to xfail a test asserting that this produced a warning. Will want to restore that with one from Series.__getitem__ I think, but not for 1.0

@@ -1505,7 +1505,7 @@ def get_slice(self, slobj, axis=0):
if axis >= self.ndim:
raise IndexError("Requested axis not found in manager")

return type(self)(self._block._slice(slobj), self.index[slobj], fastpath=True)
return type(self)(self._block._slice(slobj), self.index[slobj], fastpath=True,)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we dont want the trailing comma here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth rerunning CI over?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can change it in my next "assorted cleanups" PR

@jbrockmendel
Copy link
Member

So the first case is the only one with a slowdown, which I think is acceptable given that we want to deprecate that behavior anyway.

Sounds good

@TomAugspurger TomAugspurger merged commit 0575149 into pandas-dev:master Jan 28, 2020
@TomAugspurger TomAugspurger deleted the ndim-indexing-series branch January 28, 2020 20:55
meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Jan 28, 2020
TomAugspurger added a commit that referenced this pull request Jan 28, 2020
…31403)

Co-authored-by: Tom Augspurger <TomAugspurger@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Warnings Warnings that appear or should be added to pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants