Join GitHub today
Plotting performance deterioration on DataFrames with date-index #4705
The below code-sample gives a big performance drop on plotting in pandas 0.12, compared to pandas 0.11:
In Pandas 0.12 : "Ran in 0:00:29.542475 secs"
It only happens on a date-indexed DataFrame:
from pandas import *
N = 10000
df = DataFrame(randn(N,M), index=date_range('1/1/1975', periods=N))
t0 = datetime.now()
As I also wrote on the mailing list, I can confirm this. For me it gives 35 s (0.12) vs 9 s (0.11) on Windows 7 (also both Matplotlib 1.2.1).
You can see it here (together with the result of %prun):
I looked a little bit into it, and I am not an expert at all but I thought to share some insights (for the case it would be useful):
After another look, I might have figured it out. Short version: with this commit (jorisvandenbossche@d63e77c) the time goes down from 30s to only 210ms!
If you think this is a sensible change, I put it in a PR (in every case, travis passes, but I don't know if this is much tested).