You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, i was playing around with constructing DataFrame's from nested dicts, and noticed that things have gotten a bit slower since v0.6.1.
Here's some sample code i was playing with:
import time
import pandas
print pandas.version
data = dict((i,dict((j,float(j)) for j in xrange(100))) for i in xrange(5000))
t0 = time.time(); df = pandas.DataFrame(data); t1 = time.time(); print t1 - t0
With version 0.6.1, the printed time is about 0.21s on my machine, with a little help from git bisect,
i found that:
commit f3ca67d takes it from 0.21s to 0.44s
commit 9d65e8e takes it from 0.44s to 0.54s
It's possible that some of these were unavoidable considering they may have been necessary bugfixes, but i wanted to
see if anyone else is seeing this too.
Well rats, and you see in those commits I totally thought I was making things faster! I see the issue and I'm going to address it now and add a vbenchmark (http://pandas.sourceforge.net/vbench.html) so we can track the performance more systematically going forward.
In [3]: timeit df = DataFrame(data)
1 loops, best of 3: 690 ms per loop
after (HEAD):
In [3]: timeit df = DataFrame(data)
10 loops, best of 3: 167 ms per loop
and 0.6.1:
In [3]: timeit df = DataFrame(data)
1 loops, best of 3: 273 ms per loop
Note this problem only affected integer-indexed data. The issue if you're interested had to do with boxing of int64 scalars (from the Index when doing dict lookups).
Hi, i was playing around with constructing DataFrame's from nested dicts, and noticed that things have gotten a bit slower since v0.6.1.
Here's some sample code i was playing with:
import time
import pandas
print pandas.version
data = dict((i,dict((j,float(j)) for j in xrange(100))) for i in xrange(5000))
t0 = time.time(); df = pandas.DataFrame(data); t1 = time.time(); print t1 - t0
With version 0.6.1, the printed time is about 0.21s on my machine, with a little help from git bisect,
i found that:
commit f3ca67d takes it from 0.21s to 0.44s
commit 9d65e8e takes it from 0.44s to 0.54s
It's possible that some of these were unavoidable considering they may have been necessary bugfixes, but i wanted to
see if anyone else is seeing this too.
environment info: 64bit ubuntu 11.10, python2.7, numpy 1.6.1, cython 0.15.1
The text was updated successfully, but these errors were encountered: