Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
PERF: speed-up DateFrame.itertuples() with namedtuples #11625
Conversation
|
you would have to add a benchmark for this |
|
As fas as I can see,
I disagree, to me this is exactly what the
|
jreback
and 1 other
commented on an outdated diff
Nov 17, 2015
| pass | ||
| + else: |
jreback
Contributor
|
|
Here are some simple timings: import collections
import pandas as pd
from pandas.compat import map, zip
class DataFrame(pd.DataFrame):
def itertuples_new(self, index=True, name="Pandas"):
(...)
else:
return (itertuple(*row) for row in zip(*arrays))
# fallback to regular tuples
return zip(*arrays)
def itertuples_make(self, index=True, name="Pandas"):
(...)
else:
return map(itertuple._make, zip(*arrays))
# fallback to regular tuples
return zip(*arrays)
df = DataFrame({'A': 'spam', 'B': range(1000), 'C': None,
'D': range(1000), 'E': range(1000), 'F': range(1000)})
%timeit list(df.itertuples_new())
100 loops, best of 3: 3.04 ms per loop
%timeit list(df.itertuples_make())
100 loops, best of 3: 2.68 ms per loop
%timeit list(df.itertuples_make(name=None))
1000 loops, best of 3: 1.17 ms per loop |
|
pls add a benchmark to the asv suite (make about 10x bigger). |
jreback
added the
Performance
label
Nov 17, 2015
$ asv continuous master HEAD -b
frame_methods.frame_itertuples
· Creating environments
· Discovering benchmarks
·· Uninstalling from py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-sci
py-sqlalchemy-xlrd-xlsxwriter-xlwt
·· Building for py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sq
lalchemy-xlrd-xlsxwriter-xlwt
·· Installing into py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy
-sqlalchemy-xlrd-xlsxwriter-xlwt
· Running 2 total benchmarks (2 commits * 1 environments * 1 benchmarks)
[ 0.00%] · For pandas commit hash 2238f73e:
[ 0.00%] ·· Building for py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytable
s-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 0.00%] ·· Benchmarking py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytable
s-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 50.00%] ··· Running ...thods.frame_itertuples.time_frame_itertuples 10.03ms
[ 50.00%] · For pandas commit hash e29bf614:
[ 50.00%] ·· Building for py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytable
s-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 50.00%] ·· Benchmarking py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytable
s-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[100.00%] ··· Running ...thods.frame_itertuples.time_frame_itertuples 12.24msB
ENCHMARKS NOT SIGNIFICANTLY CHANGED. |
jreback
commented on an outdated diff
Nov 17, 2015
|
jreback
and 3 others
commented on an outdated diff
Nov 17, 2015
| @@ -5545,6 +5545,8 @@ def test_itertuples(self): | ||
| dfaa = df[['a', 'a']] | ||
| self.assertEqual(list(dfaa.itertuples()), [(0, 1, 1), (1, 2, 2), (2, 3, 3)]) | ||
| + self.assertEqual(type(next(df.itertuples(name=None))), tuple) |
jreback
Contributor
|
|
@xflr6 pls just follow my directions. It has nothing to do with whether I like it or not. Its not consistent at all in the code base. |
|
Performance comparison with the regular tuple returning branch:
|
|
ok, add this issue number onto where #11269 is in whatsnew/v0.17.0 |
|
There is no issue for this (only the PR), should I open one? |
|
no use the pr number |
jreback
added this to the
0.17.1
milestone
Nov 19, 2015
|
merged via 4ffc3ef thanks! |
xflr6 commentedNov 17, 2015
Also:
except:to avoid catchingSystemExitandKeyboardInterruptreturnfrom thetry-clause to anelsename=None(docs?)