-
Notifications
You must be signed in to change notification settings - Fork 478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Profile Log #2463
Comments
I think you're correct @adam-grant-hendry, and we could use pytest-benchmark for this. Would be nice to have the tests in this repository, though I'm loath to adding any more workflows to our CI/CD. |
@akaszynski I completely understand. And thanks for introducing me to |
We should consider creating a new repository that includes our benchmarks outside this repository. We have enough workflows as it is and it's becoming quite something. @pyvista/developers, your thoughts on this? Create a new repo with benchmarks ( |
I'm good with whatever decision is made. |
I also don't think CI is the right place for this. We could do benchmarks for releases, and even periodically, but not for each PR. There are also some points that I don't instantly see how we could handle best. Varying hardware means a given runtime "now" doesn't necessarily mean anything. We'd have to benchmark the current state of the library against a reference task also run "now". We could compare to older versions of the codebase, but then new features and deprecations might make this impractical. I'm sure there are existing solutions for this, but when I hear "benchmark" my first association is to comparing the performance of different implementations, whereas what we're looking for is a general check for performance regressions "anywhere". And then it could also be nice if users could run benchmarks locally to see if their changes are faster. Which is just to say the task might not be trivial. |
Yes, this is a challenging problem. Personally, w.r.t. #500, I want to show that my change provides a needed performance improvement (esp. for large datasets). So in my particular instance, I am referring to performance improvements. I'm not sure how to show that an implementation improves performance without running a test against old code, which would mean maintaining a copy of the old code. That's why I was thinking a log might be a better approach. There may be certain cases where a change would slow performance a bit, but it would be worthwhile to add. How much performance can be allowed to change should be a decision left up to the developers and maintainers and chosen based on the merits of the PR. That's another reason why I log the idea of a log: it gives us a goal to strive towards, must like coverage, but isn't entirely binding. Not to mention that, as you correctly stated, performance will vary from machine to machine (and possibly from run to run). Benchmarking locally seems interesting: certain applications (e.g. SolidWorks, Blender, etc.) have benchmark scripts you can download and run locally and then you can publish your results on their website. This approach might be a +1 in the column for a separate Ultimately though, I'm more concerned about regressions than improvements. I don't want a change I make that helps Windows users to hurt Linux/MacOS users, for example. In my mind, the ideal situation would be something that confirms a change doesn't (significantly) hurt performance for everyone and the developers can decide whether they want to add the change or not. |
(Sorry, you edited your comment while I was writing this one, so quotes might be inaccurate)
OK, so I think there are two different goals here that might need different approaches. The first goal is verifying that a specific planned change is a performance improvement. For this we could have a set of benchmark scripts in the repo, which can be run both in the PR's tip and in the base (hopefully The second goal is to keep a rough idea about the performance of the library, and to catch unexpected regressions. Most PRs are not performance-related, in fact they aren't expected to affect performance in any way. This is why I wouldn't want to have benchmarking in CI. The occasional regression will probably be due to unrelated changes that sneak in performance impacts without anyone noticing. So this is why I'm lumping regressions in this category. A "performance log" (whatever that may mean ;) would work for this of course. Anything that lets us see if something's out of the ordinary. But in this case doing the benchmarking right is probably more difficult, since we have to use references further back in time.
This is a good point for having some automation for benchmarking. Although I don't think I'd expect regressions to be system-dependent typically.
I don't think we have the maintainer (and CI) bandwidth to really assess each PR from this angle as well. I think it should be enough if we catch severe regressions every once in a while. (Benchmark running every week?)
You also mentioned this in the PR comment. We already have the complete history of the library in git :) (So I'm probably missing your point.) Keeping the benchmark separate from the main repo would also mean that they can be checked out independently. But, like I said, handling code changes over longer time scales might be problematic (we'd probably have to upgrade the benchmark on a medium time-frame, maybe every (few?) release?). |
I would like to benchmark performance for the last few minor releases of critical workflows like:
Additionally, we're going to have to consider which version of VTK to use for this. I'd prefer keeping to one version for as long as possible to identify which changes within Furthermore, hardware isolation and consistency is critical for any long running benchmark. I have a few machines laying around that I could repurpose for this. |
I wonder if it would be worth adding a range of sizes from small to huge. If we accidentally break the scaling of a given step it would be more obvious. Of course for large datasets it should all be clear anyway. |
That is brilliant @adeak , I wish I thought of that. Doing this plus the fact that
would solve the first problem.
Completely agreed. We could get away with doing performance regression testing on a limited, but regular basis. |
Here's an initial stab at benchmarks: |
@akaszynski This is a great start, but I also need to benchmark memory usage. It looks like |
@akaszynski Can we add |
@akaszynski |
Just as an idea: I personally like to use https://github.com/airspeed-velocity/asv for benchmarking - it is the benchmark-suite used, e.g., by NumPy and Scipy (an example for SciPy from Pauli Virtanen can be seen at https://pv.github.io/scipy-bench/). |
This is awesome! |
@prisae Sweet! I’m eager to take a look! |
In attempting how best to tackle testing for #500, I think the best solution would be to add and maintain a performance profile log with the repo, similar to a coverage report.
Tests could be written with
timeit
,cProfile
,memory_profile
, etc., and perhaps we could write something like apytest
fixture that would generate aninstaviz
report, or similar? I'm not sure how to implement yet (perhaps I would need to create a separate PyPI package orpytest
plug-in the project would use?). Please let me know your thoughts.I think a log makes the most sense because:
The alternative (in my mind) would be maintaining old copies of code to test against to demonstrate PRs introduce performance improvements, which seems silly and unmaintainable, and
It may be desirable in the future to add a PR that reduces performance to some degree for a big benefit in readability or usability, so rather than force future PRs to meet or exceed a certain level of performance, we could reference the log and use it as a guide (similar to how coverage reports are used as a guide, but aren't gospel).
What are your thoughts?
@akaszynski @MatthewFlamm @banesullivan I'm unaware of any such python packages or plug-ins that would implement this. If you have any suggestions, that would be greatly appreciated! I think this could be a great addition!
Goals:
(@pyvisa/developers : Please update/correct as needed)
Verifying specific Planned PR's are performance improvements
pyvista:main
inpyvista/pyvista-benchmarks
CONTRIBUTING.md
documentation how to perform performance improvement test and reference in PR for reviewersKeep a rough idea of library performance and catch unexpected regressions
pyvista/pyvista-benchmarks
repoThe text was updated successfully, but these errors were encountered: