ENH: Add vbench #1070

TomAugspurger · 2013-09-03T21:28:27Z

Closes #936

Still a WIP atm. This is just the skeleton and I've got a few details to iron out over the next couple of days.

Do we know what kind of machine this will be running on? I can test in a windows VM if need be, I just need to setup a python environment.

I'm keeping some docs too.

Here's an example of the output!

TomAugspurger · 2013-09-03T21:33:18Z

I should reemphasize that the is still very much WIP. It's probably not worth reviewing yet. I just wanted to get it onto GH.

TomAugspurger · 2013-09-04T14:14:59Z

Any questions that you anticipate having while I'm writing up some developer docs for this? Right now I have headings for

Writing a good vbench
Pre-PR testing

which I think will suffice for most cases. I'll also write something up for

Running the full suite

which most people won't have to do. And then some notes on the implementation in case people need to change it down the line. Perhaps I can write up some vbench docs and push them to the vbench repo.

vincentarelbundock · 2013-09-04T14:25:50Z

So just to be clear, the process is we need a computer that checks git periodically and runs the vbench suite every time there's a commit. Then, we upload results and graph to a website somewhere, right?

I have a couple always on machines that could be used for that. Long-term, or just for testing. Both run relatively recent version of ubuntu.

josef-pkt · 2013-09-04T15:04:37Z

@TomAugspurger Thanks again for working on this

I only gave it a quick browse

If you can link to documentation in other packages, then our documentation could be pretty short.

Some question besides the general "How do you run it?" that I thought about

How do we add benchmarks? Should they have a loop to run the same code several times (e.g. 10 OLS(..).fit() )
Do we add conditional code directly in the benchmark, if the API has changed?

Can we make selective runs, additional to the scheduled runs?
selective either in terms of additional commit points, or
selective in terms which benchmarks are run.

Related: In the test suite we have some unit tests marked as slow. Is it possible to run basic benchmarks on a higher frequency than slow benchmarks?
or in another way: can we define groups/sets of benchmarks that can be run on demand.

I'm trying to figure out how we can handle benchmarks for the different submodules
We will want to run the core models linear models, discrete models, RLM, GLM, and the tsa models on a regular basis and as soon as it is working. (plus formula and pandas versions)
nonparametric and emplike are largely isolated from the other parts but are the most time consuming to run.
and then there are many smaller functions (stats, distributions), where it is currently less important to benchmark them.

to the question by @vincentarelbundock Should we really run on every commit or periodically?
Every commit sounds like a lot of processing, when often there will only be changes that affect a small part.

vincentarelbundock · 2013-09-04T15:42:01Z

Oh yeah, well the cron job can check github for new commits once a day or once a week. Doesn't have to be every second...

TomAugspurger · 2013-09-04T16:03:41Z

There's actually two main ways to use vbench. I'll talk to the pandas people to get a better idea of how they use it for the long-term testing. But the thing that most contributors will use is the test_perf.py file (which I haven't added yet.) This is a command-line tool that lets you compare your commit against a known benchmark and gives you a list of the benchmarks that differ. You'll get a nice output like

frame_reindex_axis0                         0.6189     1.8109     0.3418
frame_reindex_axis0                         1.0079     1.2519     0.8051
frame_reindex_axis0                         0.5227     0.6118     0.8543
frame_reindex_axis0                         0.4115     0.4681     0.8792
frame_reindex_axis0                         0.4141     0.4454     0.9296
frame_reindex_axis0                         0.5567     0.5934     0.9382
frame_reindex_axis0                         0.4374     0.4661     0.9385
frame_reindex_axis0                         0.4723     0.5015     0.9417
frame_reindex_axis0                         0.4407     0.4583     0.9616

where the columns are head (your PR), base, and the ratio.

vincentarelbundock · 2013-09-04T17:23:12Z

What would be nice is if the initial run could be taken at different points in the commit history, so we can "backfill" benchmark history. :)

TomAugspurger · 2013-09-04T17:44:56Z

@vincentarelbundock If I'm understanding you correctly, then that should be possible.

@josef-pkt I'll get those answered more formally later, but for now:

How do we add benchmarks? S

Basically you write a module (a single .py file) for a related suite of tests. Each of those test is run once, either for each commit in a specified date range, or just a diff between your current and some know good benchmark.

Do we add conditional code directly in the benchmark, if the API has changed?

Yep. There's a few examples in the pandas repo that does this. I'll link to them.

Can we make selective runs, additional to the scheduled runs?

Yes. This will be the test_perf.py file

I'll check about the running subsets. I think it's possible.

TomAugspurger · 2013-09-08T02:14:44Z

Should I add a new vbench file to docs/source/dev for the notes on this? Or should I stick them into docs/source/dev/test_notes.rst?

Getting close on this. One thing is a bit weird, and I want to compare it to the pandas results before saying this is ready. But I'm hitting some error when running the pandas vbench, so I'm waiting on that.

TomAugspurger · 2013-09-08T23:25:29Z

Okay this is probably about ready to go. I typed up some notes that will hopefully clear some questions up. I might clean them up and submit them to the vbench repo, but this should do for now.

One question for now. Each benchmark takes a start_date argument for how far back you want to go. I've got it set to just a few months ago for testing, but we'll want to push that back. Any idea how far you want to go back?

And obviously we need to expand the coverage. I basically just took the discrete examples and converted them into a benchmark.
I can probably add more later, but there's no reason this can't be merged before I can get around to that.
And that way anyone who wants to can throw one in.

Anyone mind checkout out my branch and giving it a shot? You just need config file, see the notes I wrote up.

TomAugspurger · 2013-09-09T15:25:16Z

For future reference: wesm/vbench#34

TomAugspurger · 2013-09-09T15:29:24Z

There's a final bit that I haven't implemented yet. pandas and numpy have webpages with the performance benchmarks. There are make files that I haven't added yet to automate the job. Any interest?

josef-pkt · 2013-09-09T16:49:07Z

I wanted to look at it today, but didn't find time yet.

make file for creating the html would be needed (maybe you have it already). The part to push them automatically on a webpage won't be necessary until we have decided how to publish them.

TomAugspurger · 2013-09-09T17:20:45Z

No rush on my end. The steps to make the docs are

run the suite :) python run_suite.py
Generate the rst files: python generate_rst.py
Make the html: python make.py html

All from the statsmodels/vb_suite/ directory.

TomAugspurger · 2013-09-09T17:24:33Z

The things we'll need to change are in make.py, specifically the function upload() and the dict funcd that I've commented out for now.

It also expects config and credentials files.

vincentarelbundock · 2013-09-10T12:51:10Z

I'll give this a serious look next week if nobody has gotten around to it by then. Sorry I can't do it any sooner!

TomAugspurger · 2013-09-10T13:08:18Z

Thanks! No rush.

-Tom
On Sep 10, 2013, at 7:51 AM, Vincent Arel-Bundock <notifications@github.com mailto:notifications@github.com>
wrote:

I'll give this a serious look next week if nobody has gotten around to it by then. Sorry I can't do it any sooner!

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/1070#issuecomment-24156826.

vincentarelbundock · 2013-09-11T12:46:32Z

vb_suite/suite.py

+    TMP_DIR = config.get('setup', 'tmp_dir')
+except:
+    REPO_PATH = os.path.abspath(os.path.join(os.path.dirname(__file__), "../"))
+    REPO_URL = 'git@github.com:statsmodels/statsmodels.git'


On my machine, I had to change git@ to a URL: https://github.com/statsmodels/statsmodels.git

Thanks. I forgot to check the except part of that block.

vincentarelbundock · 2013-09-11T13:14:50Z

Build works well for me. I uploaded it to my website in case @jseabold and @josef-pkt want to have a look.

http://umich.edu/~varel/vbench

This looks really great and it's easy to use. Thanks a lot Tom!

A couple things:

We probably need a very short blurb just to explain what this is
I like the folding TOC on the right-hand side, so perhaps we don't need to duplicate it with the ugly link hierarchy in the middle
If benchmark results are saved locally to benchmark.db, then I think we should go way back in time with benchmarks, like a year or two. Subsequent runs will be much cheaper than the first.

vincentarelbundock · 2013-09-11T13:15:42Z

Also, can we include a download link to the raw benchmark database?

TomAugspurger · 2013-09-11T13:23:28Z

re: sharing the raw database, I'll have to check how vbench hashes the runs (so that it doesn't have to rerun one that it's already done). As long as its not using anything specific to the local file system, that should be fine I think.

I'm guessing that this won't work on windows right now. I've had a bit of trouble getting python setup on windows. I think my VM starves it of ram. I might have a bit of time later this week to look into that.

vincentarelbundock · 2013-09-11T13:28:04Z

Well, I do know that vbench won't re-run the same benchmarks if they're in the db already. Just try running the suite twice in a row :)

Also, the build machine will likely be Linux or mac, so that shouldn't be a problem.

josef-pkt · 2013-09-11T13:32:17Z

@TomAugspurger You don't need to get a Windows VM just for this. I can look at the Windows specific problems when I try to run it.

(the cheapest way to get a almost fully loaded python environment is https://code.google.com/p/winpython/ or pythonxy for integration with Windows.
git also has a portable package, only the ssh key needs to be set up.)

vincentarelbundock · 2013-09-11T13:32:34Z

For the download link, it would just be a matter of inserting a line in make.py to copy benchmark.db somewhere in the build/html path, and then to include an appropriate link in the docs.

josef-pkt · 2013-09-11T13:37:07Z

Thanks to both, It looks good on the website that Vincent added.

One question will be to figure out how noisy the results are, or how to reduce noise (spike before Sep 2013 label)

vincentarelbundock · 2013-09-11T15:26:38Z

Not sure about noise, but it looks like this varies from run to run. I uploaded a new set and it doesn't have the big Sep 2013 spike (it has other new ones). I think I put my laptop to sleep in the middle of the initial run, so that might explain that. Might be sensitive to what else is going on on the computer. I plan to run this on an "at rest" computer tonight but I don't have that available to me now. Perhaps noise will disappear.

josef-pkt · 2013-09-11T15:38:50Z

(aside for this PR:
ols 1.7 milliseconds, ols with formula 6 milliseconds)

Mostly self-contained in the `vb_suite` directory in the main statsmodels repo. Also added some docs under `/docs/source/vbench.rst` BUG: Change remote repo location to url. BUG: Just ignore any database BUG: Change repo location to url removed accidental addition of database. BUG: Change version to .__version__

TomAugspurger · 2013-09-11T16:22:11Z

Just pushed those two fixes (the version and the git vs. https url).

Thanks for looking at this.

What else needs to be done? Vincent mentioned a short write-up of what this is, how to use it. I can trim down the note I put in the docs to something more manageable.

TomAugspurger · 2013-09-13T04:26:27Z

FYI I just added some for GLM, RLM, WLS, and a bit more for ARIMA. I'm running those now and if everything checks out I'll push them up too.

jseabold · 2014-04-02T02:12:06Z

Didn't forget about this (totally). Is this in decent shape? I was just thinking how I'd like some quick scripts that I can use to do some profiling.

jseabold · 2014-04-02T02:58:24Z

This looks ok to me. I made some changes locally, so I can add it to my cron jobs and push this along with the docs. I'm likely going to stick this in cron.monthly unless we start focusing on performance more. If everything looks ok after it finishes running and the docs builds then I'll probably open a new PR to add 1-2 commits to this.

jseabold · 2014-04-02T13:48:24Z

@TomAugspurger I moved all of your work here https://github.com/statsmodels/vbench

Let me know if you'd like commit rights to this repo.

josef-pkt · 2014-04-02T14:14:34Z

@jseabold I think it would be better organized if we move the benchmarks into a subdirectory, to keep them separate from the vbench files.

something like:

__import__('benchmark_modules.' + modname') in suite.py

jseabold · 2014-04-02T14:15:39Z

Go ahead and file an issue on the other repo. Feel free to make a PR.

TomAugspurger · 2014-04-02T17:42:13Z

@jseabold Thanks. I've subscribed to the repo. I'll take a look this weekend to see if anything has gone stale. If you run into troubles go ahead and ping me. I'm happy to maintain it.

jseabold · 2014-04-02T17:54:43Z

One thing I didn't do is add any config file to the repo. It seemed the defaults were fine. I also never looked at the vbench code. I just updated the e-mail and upload stuff and ran it.

TomAugspurger mentioned this pull request Sep 9, 2013

PERF: refactor tokenizer to give compiler branching hints. (OSX testers wanted) pandas-dev/pandas#4777

Closed

vincentarelbundock reviewed Sep 11, 2013
View reviewed changes

josef-pkt added the PR label Feb 19, 2014

jseabold mentioned this pull request Apr 2, 2014

ENH: Add vbench #1538

Closed

jseabold closed this Apr 2, 2014

ENH: Add vbench #1070

ENH: Add vbench #1070

Conversation

TomAugspurger commented Sep 3, 2013

TomAugspurger commented Sep 3, 2013

TomAugspurger commented Sep 4, 2013

vincentarelbundock commented Sep 4, 2013

josef-pkt commented Sep 4, 2013

vincentarelbundock commented Sep 4, 2013

TomAugspurger commented Sep 4, 2013

vincentarelbundock commented Sep 4, 2013

TomAugspurger commented Sep 4, 2013

TomAugspurger commented Sep 8, 2013

TomAugspurger commented Sep 8, 2013

TomAugspurger commented Sep 9, 2013

TomAugspurger commented Sep 9, 2013

josef-pkt commented Sep 9, 2013

TomAugspurger commented Sep 9, 2013

TomAugspurger commented Sep 9, 2013

vincentarelbundock commented Sep 10, 2013

TomAugspurger commented Sep 10, 2013

vincentarelbundock Sep 11, 2013

Choose a reason for hiding this comment

TomAugspurger Sep 11, 2013

Choose a reason for hiding this comment

vincentarelbundock commented Sep 11, 2013

vincentarelbundock commented Sep 11, 2013

TomAugspurger commented Sep 11, 2013

vincentarelbundock commented Sep 11, 2013

josef-pkt commented Sep 11, 2013

vincentarelbundock commented Sep 11, 2013

josef-pkt commented Sep 11, 2013

vincentarelbundock commented Sep 11, 2013

josef-pkt commented Sep 11, 2013

TomAugspurger commented Sep 11, 2013

TomAugspurger commented Sep 13, 2013

jseabold commented Apr 2, 2014

jseabold commented Apr 2, 2014

jseabold commented Apr 2, 2014

josef-pkt commented Apr 2, 2014

jseabold commented Apr 2, 2014

TomAugspurger commented Apr 2, 2014

jseabold commented Apr 2, 2014