PERF: mutli-index selection vs repeated selections #10287

jreback · 2015-06-05T13:12:06Z

from SO

idx = pd.IndexSlice

n=10000000
np.random.seed(1234)
mdt = pandas.DataFrame()
mdt['A'] = np.random.choice(range(10000,45000,1000), n)
mdt['B'] = np.random.choice(range(10,400), n)
mdt['C'] = np.random.choice(range(1,150), n)
mdt['D'] = np.random.choice(range(10000,45000), n)
mdt['x'] = np.random.choice(range(400), n)
mdt['y'] = np.random.choice(range(25), n)


test_A = 25000
test_B = 25
test_C = 40
test_D = 35000

eps_A = 5000
eps_B = 5
eps_C = 5
eps_D = 5000

mdt2 = mdt.set_index(['A','B','C','D']).sortlevel()

selection

    In [106]: %timeit  mdt2.loc[idx[test_A-eps_A:test_A+eps_A,test_B-eps_B:test_B+eps_B,test_C-eps_C:test_C+eps_C,test_D-eps_D:test_D+eps_D],:]
    1 loops, best of 3: 4.34 s per loop

Repeated selection

    In [105]: %timeit mdt2.loc[idx[test_A-eps_A:test_A+eps_A],:].loc[idx[:,test_B-eps_B:test_B+eps_B],:].loc[idx[:,:,test_C-eps_C:test_C+eps_C],:].loc[idx[:,:,:,test_D-eps_D:test_D+eps_D],:]
    10 loops, best of 3: 140 ms per loop

The text was updated successfully, but these errors were encountered:

* commit 'v0.16.2-42-g383865f': (72 commits) BUG: provide categorical concat always on axis 0, pandas-dev#10430 numpy 1.10 makes this an error for 1-d on axis != 0 DOC: update missing.rst with ref to groupby.rst BUG: Timedeltas with no specified units (and frac) should raise, pandas-dev#10426 BUG: using .loc[:,column] fails when the object is a multi-index, pandas-dev#10408 Removed scikit-timeseries migration docs from FAQ BUG: GH10395 bug in DataFrame.interpolate with axis=1 and inplace=True BUG: GH10392 bug where Table.select_column does not preserve column name TST: Use unicode literals in string test PERF: fix _get_level_indexer to accept an intermediate indexer result PERF: bench for pandas-dev#10287 BUG: drop_duplicates drops name(s). ENH: Enable ExcelWriter to construct in-memory sheets BLD: remove support for 3.2, pandas-dev#9118 PERF: timedelta and datetime64 ops improvements PERF: parse timedelta strings in cython pandas-dev#6755 closes bug in reset_index when index contains NaT Check for size=0 before setting item Fixes pandas-dev#10193 closes bug in apply when function returns categorical BUG: frequencies.get_freq_code raises an error against offset with n != 1 CI: run doc-tests always ...

jreback added Performance Memory or execution speed performance MultiIndex labels Jun 5, 2015

jreback added this to the Next Major Release milestone Jun 5, 2015

jreback mentioned this issue Jun 5, 2015

PERF: improved performance of multiindex slicing #10290

Merged

jreback modified the milestones: 0.16.2, Next Major Release, 0.17.0 Jun 5, 2015

jreback added a commit to jreback/pandas that referenced this issue Jun 22, 2015

PERF: bench for pandas-dev#10287

b069253

jreback closed this as completed in #10290 Jun 24, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: mutli-index selection vs repeated selections #10287

PERF: mutli-index selection vs repeated selections #10287

jreback commented Jun 5, 2015

PERF: mutli-index selection vs repeated selections #10287

PERF: mutli-index selection vs repeated selections #10287

Comments

jreback commented Jun 5, 2015