This is a harness for a (to-be-written) set of benchmarks for measuring Django's performance over time.
This doesn't test a single Django version in isolation -- that wouldn't be very useful. Instead, it benchmarks an "experiment" Django against a "control", reporting on the difference between the two and measuring for statistical significance.
So to run this, you'll need two complete Django source trees. By default djangobench.py
looks for directories named django-control
and django-experiment
here in this directory, but you can change that by using the --control
or --experiment
options.
So, for example, to benchmark Django 1.2 against trunk:
svn co http://code.djangoproject.com/svn/django/tags/releases/1.2/ django-control
svn co http://code.djangoproject.com/svn/django/trunk django-experiment
./djangobench.py
At the time of this writing Django's trunk hasn't significantly diverged from Django 1.2, so you should expect to see not-statistically-significant results:
Running 'startup' benchmark ...
Min: 0.138701 -> 0.138900: 1.0014x slower
Avg: 0.139009 -> 0.139378: 1.0027x slower
Not significant
Stddev: 0.00044 -> 0.00046: 1.0382x larger
Benchmarks are very simple: they're a Django app, along with a settings file, and an executable benchmarks.py
that gets run by the harness. This file should print timing data to stdout.
See the startup
directory benchmark for an example.
Please write new benchmarks and send me pull requests on Github!
- Right now each benchmark gets run multiple times by the harness, incurring startup overhead. The startup benchmark shows this is a non-trivial amount of time, so there really needs to be a way for individual benchmarks to run n-trials in-process to avoid that overhead and warmup time. Unladen's
perf.py
supports this; the harness code needs to, also. - The number of trials is hard-coded. This should be an --option, or, better yet, it could be automatically determined by running trials until the results reach a particular confidence internal or some large ceiling is hit.
Lots and lots and lots more benchmarks. Some ideas:
- template rendering (something useful, not the unladen one)
- ORM queries
- ORM overhead compared to cursor.execute()
- signal/dispatch
- datastructures (specifically MultiDict)
- url resolving and reversing
- form/model validation
- holistic request/response round-trip time
- middleware (piecemeal and standard "stacks" together)