Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is DataFrame.loc[[1]] 1,800x slower than df.ix [[1]] and 3,500x than df.loc[1]? #9126

Closed
sergeny opened this issue Dec 22, 2014 · 1 comment · Fixed by #9127
Closed
Labels
Performance Memory or execution speed performance
Milestone

Comments

@sergeny
Copy link

sergeny commented Dec 22, 2014

import pandas as pd
s=pd.Series(xrange(5000000))
%timeit s.loc[[0]] # You need pandas 0.15.1 or newer for it to be that slow
1 loops, best of 3: 445 ms per loop

Also see the question and the answer here: http://stackoverflow.com/questions/27596832/why-is-dataframe-loc1-1-800x-slower-than-df-ix-1-and-3-500x-than-df-loc

Since .loc[] started to raise KeyError in pandas 0.15.1, the calls via .loc[ [1] ] (when passed a list) have slowed down enormously, because, I think, an exhaustive search is done to determine if a KeyError has to be raised or not, even if the list has only one element that can be easily located.

Profiling:
File: .../anaconda/lib/python2.7/site-packages/pandas/core/indexing.py,

1278                                                       # require at least 1 element in the index
1279         1          241    241.0      0.1              idx = _ensure_index(key)
1280         1       391040 391040.0     99.9              if len(idx) and not idx.isin(ax).any():
1281                                           
1282                                                           raise KeyError("None of [%s] are in the [%s]" %
@shoyer shoyer added the Performance Memory or execution speed performance label Dec 22, 2014
@shoyer
Copy link
Member

shoyer commented Dec 22, 2014

Thanks for the report!

There were no performance tests for .loc previously, so this snuck through (in #8003).

#9127 includes a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants