-
-
Notifications
You must be signed in to change notification settings - Fork 31.7k
pybench and test.pystone poorly documented #59574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The benchmarking tools "pystones" and "pybench" which are shipped with the Python standard distribution are not documented. The only information is in the what's-new for Python 2.5: IMHO, they should be mentioned somewhere in the HOWTOs, the FAQ or the standard library documentation ("Development Tools" or "Debugging and Profiling") |
I disagree. They are outdated benchmarks and probably should either be removed or left undocumented. Proper testing of performance is with the Unladen Swallow benchmarks. |
Brett Cannon wrote:
I disagree with your statement. Just like every benchmark, they serve |
Actually, I discovered "python -m test.pystone" during the talk of Mike Müller at EuroPython. http://is.gd/fasterpy Even if they are suboptimal for true benchmarks, they should probably be mentioned somewhere. http://hg.python.org/benchmarks The last paragraph of this wiki page might be reworded and included in the Python documentation: BTW, there's also this website which seems not updated anymore… |
The Unladen Swallow benchmarks are in no way specific to JITs; it is a set of thorough benchmarks for measuring the overall performance of a Python VM. As for speed.python.org, we know that it is currently not being updated as we are waiting for people to have the time to move it forward and replace speed.pypy.org for all Python VMs. |
I don't really think they deserve documenting. pystones can arguably be a cheap and easy way of comparing performance of different systems *using the exact same Python interpreter*. It's the only point of running pystones. As for pybench, it probably had a point when there wasn't anything better, but I don't think it has anymore. We have a much better benchmarks suite right now, and we also have a couple specialized benchmarks in the tools directory. |
New changeset 08a0b75904c6 by Victor Stinner in branch 'default': |
New changeset e03c1b6830fd by Victor Stinner in branch 'default': |
We now have a good and stable benchmark suite: https://github.com/python/performance I removed pystone and pybench from Python 3.7. Please use performance instead of old and not reliable microbenchmarks like pybench or pystone. |
Please add notes to the Tools/README pointing users to the performance suite. I'd also like to request that you reword this dismissive line in the performance package's readme: """ I suppose this was taken from the Unladden Swallow list of benchmarks and completely misses the point of what pybench is all about: it's a benchmark to run performance tests for individual parts of CPython's VM implementation. It never was intended to be representative. The main purpose is to be able to tell whether an optimization in CPython has an impact on individual areas of the interpreter or not. Thanks. |
Please report issues of the performance module on its own bug tracker: Can you please propose a new description? You might even create a pull Note: I'm not sure that we should keep pybench, this benchmark really |
On 14.09.2016 15:20, STINNER Victor wrote:
I'll send a PR.
Well, pybench is not just one benchmark, it's a whole collection of The number of iterations per benchmark will not change between Here's the comment with the guideline for the number of rounds # Number of rounds to execute per test run. This should be
# adjusted to a figure that results in a test run-time of between
# 1-2 seconds.
rounds = 100000 BTW: Why would you want to run benchmarks in child processes |
Hum, since the discussion restarted, I reopen the issue ... "Well, pybench is not just one benchmark, it's a whole collection of benchmarks for various different aspects of the CPython VM and per concept it tries to calibrate itself per benchmark, since each benchmark has different overhead." In the performance module, you now get individual timing for each pybench benchmark, instead of an overall total which was less useful. "The number of iterations per benchmark will not change between runs, since this number is fixed in each benchmark." Please take a look at the new performance module, it has a different design. Calibration is based on minimum time per sample, no more on hardcoded things. I modified all benchmarks, not only pybench. "BTW: Why would you want to run benchmarks in child processes and in parallel ?" Child processes are run sequentially. Running benchmarks in multiple processes help to get more reliable benchmarks. Read my article if you want to learn more about the design of my perf module: "Ideally, the pybench process should be the only CPU intense work load on the entire CPU to get reasonable results." The perf module automatically uses isolated CPU. It strongly suggests to use this amazing Linux feature to run benchmarks! I started to write advices to get stable benchmarks: Note: See also the https://mail.python.org/mailman/listinfo/speed mailing list ;-) |
On 15.09.2016 11:11, STINNER Victor wrote:
pybench had the same intention. It was a design mistake to add an Perhaps it would make sense to try to port the individual benchmark
I think we are talking about different things here: calibration is pybench runs a calibration method which has the same It then takes the minimum timing from overhead runs and uses This may not be ideal in all cases, but it's the closest I'll have a look at what performance does.
Ah, ok.
Will do, thanks.
I've read some of your blog posts and articles on the subject |
2016-09-15 11:21 GMT+02:00 Marc-Andre Lemburg <report@bugs.python.org>:
Calibration in perf means computing automatically the number of I simply removed the code to estimate the overhead of the outer loop # Get calibration
min_overhead = min(self.overhead_times) This is no such "minimum timing", it doesn't exist :-) In benchmarks, If you badly estimate the minimum overhead, you might get negative It's not possible to compute *exactly* the "minimum overhead". Moreover, removing the code to estimate the overhead simplified the code.
Benchmarking was always a hard problem. Modern hardware (Out of order |
I'm closing the issue again. Again, pybench moved to http://github.com/python/performance : please continue the discussion there if you consider that we still need to do something on pybench. FYI I reworked deeply pybench recently using the new perf 0.8 API. perf 0.8 now supports running multiple benchmarks per script, so pybench was written as only a benchmark runner. Comparison between benchmarks can be done using performance, or directly using perf (python3 -m perf compare a.json b.json). |
Misc/NEWS
so that it is managed by towncrier #552Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: