DataFrame constructor speed #621

dieterv77 · 2012-01-12T21:13:05Z

Hi, i was playing around with constructing DataFrame's from nested dicts, and noticed that things have gotten a bit slower since v0.6.1.

Here's some sample code i was playing with:

import time
import pandas

print pandas.version

data = dict((i,dict((j,float(j)) for j in xrange(100))) for i in xrange(5000))
t0 = time.time(); df = pandas.DataFrame(data); t1 = time.time(); print t1 - t0

With version 0.6.1, the printed time is about 0.21s on my machine, with a little help from git bisect,
i found that:

commit f3ca67d takes it from 0.21s to 0.44s
commit 9d65e8e takes it from 0.44s to 0.54s

It's possible that some of these were unavoidable considering they may have been necessary bugfixes, but i wanted to
see if anyone else is seeing this too.

environment info: 64bit ubuntu 11.10, python2.7, numpy 1.6.1, cython 0.15.1

The text was updated successfully, but these errors were encountered:

wesm · 2012-01-12T21:31:00Z

Well rats, and you see in those commits I totally thought I was making things faster! I see the issue and I'm going to address it now and add a vbenchmark (http://pandas.sourceforge.net/vbench.html) so we can track the performance more systematically going forward.

…ted dict with integer indexes, add vbench for it, speed up _stack_dict in internals, GH #621

wesm · 2012-01-12T21:52:54Z

OK I fixed things up and even made things a little faster.

before (3ed22d7):

In [3]: timeit df = DataFrame(data)
1 loops, best of 3: 690 ms per loop

after (HEAD):

In [3]: timeit df = DataFrame(data)
10 loops, best of 3: 167 ms per loop

and 0.6.1:

In [3]: timeit df = DataFrame(data)
1 loops, best of 3: 273 ms per loop

Note this problem only affected integer-indexed data. The issue if you're interested had to do with boxing of int64 scalars (from the Index when doing dict lookups).

wesm added a commit that referenced this issue Jan 12, 2012

BUG/ENH: fix performance regression in DataFrame constructor from nes…

79cc4e0

…ted dict with integer indexes, add vbench for it, speed up _stack_dict in internals, GH #621

wesm closed this as completed Jan 12, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataFrame constructor speed #621

DataFrame constructor speed #621

dieterv77 commented Jan 12, 2012

wesm commented Jan 12, 2012

wesm commented Jan 12, 2012

DataFrame constructor speed #621

DataFrame constructor speed #621

Comments

dieterv77 commented Jan 12, 2012

wesm commented Jan 12, 2012

wesm commented Jan 12, 2012