Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: faster plotting of timeseries with PeriodIndex #4722

Merged
merged 2 commits into from
Sep 7, 2013

Conversation

jorisvandenbossche
Copy link
Member

See discussion in #4705.

This improves the speed of plotting timeseries with PeriodIndex (or with DatetimeIndex that is converted to a PeriodIndex during plotting) drastically: from 30 s in 0.12 (but 7 s in 0.11) to 0.2 s with this commit for the plotting of following data:

N = 10000
M = 25
df = pd.DataFrame(np.random.randn(N,M), index=pd.date_range('1/1/1975', periods=N))
df.plot()

Some questions I still have:

  • I added a check for PeriodIndex, but I left the original if isinstance(values, Index) in place. Is this needed? Is it possible that another index than PeriodIndex ends up in PeriodConverter?
  • @jreback You asked in the issue to do some tests (to at least ensure that their is a valid plot). Do you just mean by plotting it and looking if it seems OK? This seems OK. Or can you do some tests for plotting in the testing infrastructure?
  • Would it be interesting to add a performance test for this case? Because it was a big decrease in speed with 0.12 which was not detected.

@jreback
Copy link
Contributor

jreback commented Sep 1, 2013

@jreback
Copy link
Contributor

jreback commented Sep 1, 2013

@jorisvandenbossche I think you could try to create a vbench for this, nothing there now but create a file tlike this: (named say plotting or graphics): https://github.com/pydata/pandas/blob/master/vb_suite/ctors.py, and add it to the suite.py.

test using:

test_perf.sh -b <commit prior> -t <your last commit>

(lots of options to test_perf, e.g. you might want to start all of the tests with plot_, then you can add the
-r plot optino to ONLY test thoses specific vbenches

@jtratner
Copy link
Contributor

jtratner commented Sep 1, 2013

@jorisvandenbossche matplotlib actually tests this by comparing images. I don't know whether it's usable in client packages though.

@jorisvandenbossche
Copy link
Member Author

@jreback and @jtratner Thanks for the comments. Adding tests with comparing images does lead a little bit too far for this PR (and for me) I think (that is another discussion), but I will look at adding it to the current tests (the _ckeck_plot_works) and to add something to the vbench suite.

What is actually the difference between pandas/bench and pandas/vb_suite?

@jtratner
Copy link
Contributor

jtratner commented Sep 1, 2013

vb_suite tests pandas performance against itself - used for testing that changes don't introduce performance regressions. bench - pandas vs. other software.

On testing images - that seems fine to me, don't know what @jreback thinks.

@jorisvandenbossche
Copy link
Member Author

I added something to the vbench suite (see commit). This is the output I get when I run it with test_perf.sh ... -r plot:

-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
plot_timeseries_period                       | 267.8120 | 21528.0979 |   0.0124 |
-------------------------------------------------------------------------------

Ratio < 1.0 means the target commit is faster then the baseline.
Seed used: 1234

Target [7c4866b] : PERF: add plotting to vbench
Base   [c472099] : Merge pull request #4613 from jtratner/refactor-datetimes

The base is now taking 21 s to run (the target 0.26 s), should I decrease the size of the dataframe so it runs faster?

@jtratner
Copy link
Contributor

jtratner commented Sep 1, 2013

If it's too fast (i.e., too small of a dataframe), it makes it harder to tell if there are actually differences...@jreback? @y-p?

@jreback
Copy link
Contributor

jreback commented Sep 1, 2013

prob make a bit smaller so that the older/slower one doesn't take so long
you will still see a similar ratio

@jorisvandenbossche
Copy link
Member Author

OK, I added a test (actually there was already a PeriodSeries defined, but not used in a test) and changed the dimensions of the dataframe in the benchmark (it now runs in 0.9 s for base and 0.07 s for target)

Should something be added to release notes? (it is not a bug fix, so maybe not?)

@hayd
Copy link
Contributor

hayd commented Sep 5, 2013

Sure it's a bug-fix (it fixes 4705) so you should add to release.rst, also mention you've added it to vbench. :)

@jorisvandenbossche
Copy link
Member Author

Oeps, git-newbie here. I wanted to finish this up, but while rebasing interactive to squash some commits I also picked the last three commits from @jreback

How can I remove them from this PR? (or is it not a problem for the history that it is included here in the overview?)

@jreback
Copy link
Contributor

jreback commented Sep 7, 2013

@jorisvandenbossche

rebase again to 1 commit prior to yours and force push

@jorisvandenbossche
Copy link
Member Author

@jreback What do you mean exactly? I dit git rebase -i HEAD~2 and a forced push, but your commits are still here.

@jreback
Copy link
Contributor

jreback commented Sep 7, 2013

I think this might work git rebase -i origin/master

@cpcloud
Copy link
Member

cpcloud commented Sep 7, 2013

@jorisvandenbossche @jreback's cmd will only work if your origin/master is up to date with upstream/master....

better to do

git remote add upstream git://github.com/pydata/pandas.git
git fetch upstream
git rebase upstream/master

@jreback
Copy link
Contributor

jreback commented Sep 7, 2013

@cpcloud just about to ask! you have git-fu.

I think I solved this once by

git checkout master
git pull 
git checkout -b new_branch --track master
git cherry_pick old_branch_commits (one by one)

but a little kludgy

@jorisvandenbossche
Copy link
Member Author

OK, I also just did a simple git rebase upstream/master, and this indeed seems to fixed it!

@cpcloud
Copy link
Member

cpcloud commented Sep 7, 2013

great!

@cpcloud
Copy link
Member

cpcloud commented Sep 7, 2013

@jorisvandenbossche you might want to read this wiki section. it gives a brief overview of rebase and merge

@jorisvandenbossche
Copy link
Member Author

@cpcloud Yeah, the basic rebase and merge I (think I) understand now, but it was because I picked too many commits with a rebase -i HEAD~4 to squash some commits, that I 'recommited' some commits from jreback ..

I don't know there is something else that I have to do for this PR, but for me this is ready to merge.

@cpcloud
Copy link
Member

cpcloud commented Sep 7, 2013

also @jorisvandenbossche if you ever run into this problem again one way to resolve it is this:

let's say you're 2 commits ahead of upstream/master

git fetch upstream  # don't forget this otherwise the next line will put you in a different state than current master!
git reset HEAD~2
git stash  # push on to the stash stack
git reset --hard upstream/master  # get back on master
git stash pop  # pop off the stack
git add .
git commit -m'my aw-sum commit mess-ige'

@cpcloud
Copy link
Member

cpcloud commented Sep 7, 2013

@jreback how's that for kludgy? 😄

@jorisvandenbossche
Copy link
Member Author

I think I prefer the git fetch upstream, git rebase upstream/master then :-)

@jreback
Copy link
Contributor

jreback commented Sep 7, 2013

@cpcloud and I thought mine was kludgy!

@jorisvandenbossche
Copy link
Member Author

But back tot the PR, anything else I have to do? For me it is ready to merge.

@cpcloud
Copy link
Member

cpcloud commented Sep 7, 2013

looks okay on my end

jreback added a commit that referenced this pull request Sep 7, 2013
PERF: faster plotting of timeseries with PeriodIndex
@jreback jreback merged commit a8e367b into pandas-dev:master Sep 7, 2013
@jreback
Copy link
Contributor

jreback commented Sep 7, 2013

@jorisvandenbossche thanks much!

@jorisvandenbossche
Copy link
Member Author

@jreback Enjoying that I can help!

@jtratner
Copy link
Contributor

jtratner commented Sep 7, 2013

or just use git reflog and go back to the last time the branch was good :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants