Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: avoid is_bool_indexer check where possible #31399

Merged
merged 3 commits into from Feb 4, 2020

Conversation

jbrockmendel
Copy link
Member

xref #30349, not a lot of ground to pick up here, but we can avoid a few calls

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you show some timings?

else:
except (KeyError, ValueError):
if isinstance(key, tuple) and isinstance(self.index, MultiIndex):
# kludge
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such a comment is not really helpful if you don't explain the kludge ..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, but this comment is present in the status quo

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this make any difference or just moving code around?

pandas/core/series.py Show resolved Hide resolved
@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance labels Jan 31, 2020
@jbrockmendel
Copy link
Member Author

does this make any difference or just moving code around?

This should not change any behavior.

prefer you just do a return rather than assign and return

I like doing it in two lines inside try/excepts so i can see in coverage measurements whether the call is ever successful

@jreback
Copy link
Contributor

jreback commented Feb 2, 2020

can you rebase; i am not sure if @jorisvandenbossche had more comments.

@jorisvandenbossche
Copy link
Member

Just my question from above: Can you show some timings?

@jbrockmendel
Copy link
Member Author

Can you show some timings?

It's pretty small (especially since is_scalar and is_iterator were recently optimized)

In [1]: import pandas as pd 
   ...: import numpy as np 
   ...:  
   ...: ser = pd.Series(range(10**5)) 
   ...:  
   ...: indexer = np.random.random(10**5).astype(bool) 
   ...:                                                                                                                                                                                                                                           

In [2]: %timeit ser[indexer]                                                                                                                                                                                                                      
220 µs ± 3.93 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)  # <-- master
213 µs ± 9.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)  # <-- PR

@jorisvandenbossche
Copy link
Member

And with a scalar indexer?

@jbrockmendel
Copy link
Member Author

for the scalar case I expect the difference to be zero, or even slower by the amount of a isinstance(self.index, MultiIndex) check

In [4]: %timeit ser[999]                                                        
1.85 µs ± 81.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)  # <-- master
1.82 µs ± 57.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)  # <-- PR

@jorisvandenbossche
Copy link
Member

or even slower

That's what I expected as well.

(in general, if you do changes specifically for performance, it's helpful for reviewing that you show some relevant timings)

@jorisvandenbossche jorisvandenbossche merged commit feee467 into pandas-dev:master Feb 4, 2020
@jreback jreback added this to the 1.1 milestone Feb 4, 2020
@jreback
Copy link
Contributor

jreback commented Feb 4, 2020

@jorisvandenbossche pls make sure labels and versions on the issue and PR are good before merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants