Why is DataFrame.loc[[1]] 1,800x slower than df.ix [[1]] and 3,500x than df.loc[1]? #9126

sergeny · 2014-12-22T06:14:12Z

import pandas as pd
s=pd.Series(xrange(5000000))
%timeit s.loc[[0]] # You need pandas 0.15.1 or newer for it to be that slow
1 loops, best of 3: 445 ms per loop

Also see the question and the answer here: http://stackoverflow.com/questions/27596832/why-is-dataframe-loc1-1-800x-slower-than-df-ix-1-and-3-500x-than-df-loc

Since .loc[] started to raise KeyError in pandas 0.15.1, the calls via .loc[ [1] ] (when passed a list) have slowed down enormously, because, I think, an exhaustive search is done to determine if a KeyError has to be raised or not, even if the list has only one element that can be easily located.

Profiling:
File: .../anaconda/lib/python2.7/site-packages/pandas/core/indexing.py,

1278                                                       # require at least 1 element in the index
1279         1          241    241.0      0.1              idx = _ensure_index(key)
1280         1       391040 391040.0     99.9              if len(idx) and not idx.isin(ax).any():
1281                                           
1282                                                           raise KeyError("None of [%s] are in the [%s]" %

The text was updated successfully, but these errors were encountered:

shoyer · 2014-12-22T08:31:45Z

Thanks for the report!

There were no performance tests for .loc previously, so this snuck through (in #8003).

#9127 includes a fix.

shoyer added the Performance Memory or execution speed performance label Dec 22, 2014

shoyer mentioned this issue Dec 22, 2014

PERF: fix slow s.loc[[0]] #9127

Merged

shoyer added this to the 0.16.0 milestone Dec 22, 2014

shoyer closed this as completed in #9127 Dec 23, 2014

sergeny mentioned this issue Dec 24, 2014

Poor performance for .loc and .iloc compared to .ix #6683

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is DataFrame.loc[[1]] 1,800x slower than df.ix [[1]] and 3,500x than df.loc[1]? #9126

Why is DataFrame.loc[[1]] 1,800x slower than df.ix [[1]] and 3,500x than df.loc[1]? #9126

sergeny commented Dec 22, 2014

shoyer commented Dec 22, 2014

Why is DataFrame.loc[[1]] 1,800x slower than df.ix [[1]] and 3,500x than df.loc[1]? #9126

Why is DataFrame.loc[[1]] 1,800x slower than df.ix [[1]] and 3,500x than df.loc[1]? #9126

Comments

sergeny commented Dec 22, 2014

shoyer commented Dec 22, 2014