Performance Profile Log #2463

adam-grant-hendry · 2022-04-13T21:48:38Z

In attempting how best to tackle testing for #500, I think the best solution would be to add and maintain a performance profile log with the repo, similar to a coverage report.

Tests could be written with timeit, cProfile, memory_profile, etc., and perhaps we could write something like a pytest fixture that would generate an instaviz report, or similar? I'm not sure how to implement yet (perhaps I would need to create a separate PyPI package or pytest plug-in the project would use?). Please let me know your thoughts.

I think a log makes the most sense because:

The alternative (in my mind) would be maintaining old copies of code to test against to demonstrate PRs introduce performance improvements, which seems silly and unmaintainable, and
It may be desirable in the future to add a PR that reduces performance to some degree for a big benefit in readability or usability, so rather than force future PRs to meet or exceed a certain level of performance, we could reference the log and use it as a guide (similar to how coverage reports are used as a guide, but aren't gospel).

What are your thoughts?

@akaszynski @MatthewFlamm @banesullivan I'm unaware of any such python packages or plug-ins that would implement this. If you have any suggestions, that would be greatly appreciated! I think this could be a great addition!

Goals:

(@pyvisa/developers : Please update/correct as needed)

Verifying specific Planned PR's are performance improvements
- Create workflow for marking PR as performance improvement and require testing against pyvista:main in pyvista/pyvista-benchmarks
- Add to CONTRIBUTING.md documentation how to perform performance improvement test and reference in PR for reviewers
Keep a rough idea of library performance and catch unexpected regressions
- Create pyvista/pyvista-benchmarks repo
- Add benchmark scripts
- Identify version of VTK to test against
- Identify hardware to use for test isolation and consistency

The text was updated successfully, but these errors were encountered:

akaszynski · 2022-04-14T13:47:42Z

I think you're correct @adam-grant-hendry, and we could use pytest-benchmark for this. Would be nice to have the tests in this repository, though I'm loath to adding any more workflows to our CI/CD.

adam-grant-hendry · 2022-04-15T16:03:31Z

@akaszynski I completely understand. And thanks for introducing me to pytest-benchmark! If I can teach myself how to add workflows to our CI/CD, would you be willing to let me add it?

akaszynski · 2022-04-15T17:31:22Z

If I can teach myself how to add workflows to our CI/CD, would you be willing to let me add it?

We should consider creating a new repository that includes our benchmarks outside this repository. We have enough workflows as it is and it's becoming quite something.

@pyvista/developers, your thoughts on this? Create a new repo with benchmarks (pyvista-benchmarks) or add the workflow here in this repo?

adam-grant-hendry · 2022-04-15T17:55:06Z

I'm good with whatever decision is made.

adeak · 2022-04-15T18:23:45Z

@pyvista/developers, your thoughts on this? Create a new repo with benchmarks (pyvista-benchmarks) or add the workflow here in this repo?

I also don't think CI is the right place for this. We could do benchmarks for releases, and even periodically, but not for each PR.

There are also some points that I don't instantly see how we could handle best. Varying hardware means a given runtime "now" doesn't necessarily mean anything. We'd have to benchmark the current state of the library against a reference task also run "now". We could compare to older versions of the codebase, but then new features and deprecations might make this impractical.

I'm sure there are existing solutions for this, but when I hear "benchmark" my first association is to comparing the performance of different implementations, whereas what we're looking for is a general check for performance regressions "anywhere". And then it could also be nice if users could run benchmarks locally to see if their changes are faster. Which is just to say the task might not be trivial.

adam-grant-hendry · 2022-04-15T19:14:37Z

@pyvista/developers, your thoughts on this? Create a new repo with benchmarks (pyvista-benchmarks) or add the workflow here in this repo?

I also don't think CI is the right place for this. We could do benchmarks for releases, and even periodically, but not for each PR.

There are also some points that I don't instantly see how we could handle best. Varying hardware means a given runtime "now" doesn't necessarily mean anything. We'd have to benchmark the current state of the library against a reference task also run "now". We could compare to older versions of the codebase, but then new features and deprecations might make this impractical.

I'm sure there are existing solutions for this, but when I hear "benchmark" my first association is to comparing the performance of different implementations, whereas what we're looking for is a general check for performance regressions "anywhere". And then it could also be nice if users could run benchmarks locally to see if their changes are faster. Which is just to say the task might not be trivial.

Yes, this is a challenging problem. Personally, w.r.t. #500, I want to show that my change provides a needed performance improvement (esp. for large datasets). So in my particular instance, I am referring to performance improvements.

I'm not sure how to show that an implementation improves performance without running a test against old code, which would mean maintaining a copy of the old code. That's why I was thinking a log might be a better approach. There may be certain cases where a change would slow performance a bit, but it would be worthwhile to add. How much performance can be allowed to change should be a decision left up to the developers and maintainers and chosen based on the merits of the PR. That's another reason why I log the idea of a log: it gives us a goal to strive towards, must like coverage, but isn't entirely binding. Not to mention that, as you correctly stated, performance will vary from machine to machine (and possibly from run to run).

Benchmarking locally seems interesting: certain applications (e.g. SolidWorks, Blender, etc.) have benchmark scripts you can download and run locally and then you can publish your results on their website. This approach might be a +1 in the column for a separate pyvista-benchmarks repo.

Ultimately though, I'm more concerned about regressions than improvements. I don't want a change I make that helps Windows users to hurt Linux/MacOS users, for example. In my mind, the ideal situation would be something that confirms a change doesn't (significantly) hurt performance for everyone and the developers can decide whether they want to add the change or not.

adeak · 2022-04-15T19:36:09Z

(Sorry, you edited your comment while I was writing this one, so quotes might be inaccurate)

Yes, this is a challenging problem. With respect to pyvista/pyvista-support#500, I'm interested in showing that my change provides a needed performance improvement (esp. for large datasets). So in fact, in my particular instance, I am speaking of performance improvements vs. regressions (though showing code doesn't regress would achieve the same result).

OK, so I think there are two different goals here that might need different approaches.

The first goal is verifying that a specific planned change is a performance improvement. For this we could have a set of benchmark scripts in the repo, which can be run both in the PR's tip and in the base (hopefully pyvista:main). We'd still need to take care that changes to the benchmarks themselves don't trip up benchmarking, but this shouldn't be an issue if benchmarks are changed in dedicated PRs and each library-code-PR is always compared to main.

The second goal is to keep a rough idea about the performance of the library, and to catch unexpected regressions. Most PRs are not performance-related, in fact they aren't expected to affect performance in any way. This is why I wouldn't want to have benchmarking in CI. The occasional regression will probably be due to unrelated changes that sneak in performance impacts without anyone noticing. So this is why I'm lumping regressions in this category. A "performance log" (whatever that may mean ;) would work for this of course. Anything that lets us see if something's out of the ordinary. But in this case doing the benchmarking right is probably more difficult, since we have to use references further back in time.

Ultimately though, I'm more concerned about regressions than improvements. I don't want a change I make that helps Windows users to hurt Linux/MacOS users, for example.

This is a good point for having some automation for benchmarking. Although I don't think I'd expect regressions to be system-dependent typically.

In my mind, the ideal situation would be something that confirms a change doesn't (significantly) hurt performance for everyone and the developers can decide whether they want to add the change or not.

I don't think we have the maintainer (and CI) bandwidth to really assess each PR from this angle as well. I think it should be enough if we catch severe regressions every once in a while. (Benchmark running every week?)

I'm not sure how to show that my change improves performance without running a test against the old code, which would mean maintaining a copy of the old code.

You also mentioned this in the PR comment. We already have the complete history of the library in git :) (So I'm probably missing your point.) Keeping the benchmark separate from the main repo would also mean that they can be checked out independently. But, like I said, handling code changes over longer time scales might be problematic (we'd probably have to upgrade the benchmark on a medium time-frame, maybe every (few?) release?).

akaszynski · 2022-04-15T20:29:06Z

I would like to benchmark performance for the last few minor releases of critical workflows like:

Adding many actors to a Plotter.
Adding a single large dataset to a plotter (probably of one of each type).
Several key filters
Fully fledged demo/animation.

Additionally, we're going to have to consider which version of VTK to use for this. I'd prefer keeping to one version for as long as possible to identify which changes within PyVista are causing any potential regressions.

Furthermore, hardware isolation and consistency is critical for any long running benchmark. I have a few machines laying around that I could repurpose for this.

adeak · 2022-04-15T20:36:53Z

Adding a single large dataset to a plotter (probably of one of each type).

I wonder if it would be worth adding a range of sizes from small to huge. If we accidentally break the scaling of a given step it would be more obvious. Of course for large datasets it should all be clear anyway.

adam-grant-hendry · 2022-04-16T00:11:04Z

For this we could have a set of benchmark scripts in the repo, which can be run both in the PR's tip and in the base (hopefully pyvista:main).

That is brilliant @adeak , I wish I thought of that. Doing this plus the fact that

We already have the complete history of the library in git :)

would solve the first problem.

The second goal is to keep a rough idea about the performance of the library, and to catch unexpected regressions. Most PRs are not performance-related, in fact they aren't expected to affect performance in any way. This is why I wouldn't want to have benchmarking in CI...I think it should be enough if we catch severe regressions every once in a while. (Benchmark running every week?)

Completely agreed. We could get away with doing performance regression testing on a limited, but regular basis.

akaszynski · 2022-04-17T16:03:17Z

Here's an initial stab at benchmarks:
https://github.com/pyvista/pyvista-benchmarks

adam-grant-hendry · 2022-04-19T22:23:08Z

@akaszynski This is a great start, but I also need to benchmark memory usage. It looks like pytest-benchmark only tests speed.

adam-grant-hendry · 2022-04-19T22:28:58Z

@akaszynski Can we add pytest-monitor to pyvista-benchmarks?

adam-grant-hendry · 2022-04-19T22:38:07Z

@akaszynski pytest-leaks would also be nice (it seems like #500 might have a memory leak.

prisae · 2022-07-14T15:16:55Z

Just as an idea: I personally like to use https://github.com/airspeed-velocity/asv for benchmarking - it is the benchmark-suite used, e.g., by NumPy and Scipy (an example for SciPy from Pauli Virtanen can be seen at https://pv.github.io/scipy-bench/).

akaszynski · 2022-07-14T15:51:30Z

Just as an idea: I personally like to use https://github.com/airspeed-velocity/asv for benchmarking - it is the benchmark-suite used, e.g., by NumPy and Scipy (an example for SciPy from Pauli Virtanen can be seen at https://pv.github.io/scipy-bench/).

This is awesome!

adam-grant-hendry · 2022-07-14T20:43:31Z

@prisae Sweet! I’m eager to take a look!

adam-grant-hendry added the feature-request Please add this cool feature! label Apr 13, 2022

adam-grant-hendry mentioned this issue Apr 20, 2022

FR: Add Label performance-improvement to PRs #2490

Open

3 tasks

adam-grant-hendry mentioned this issue Aug 16, 2022

fix(add_volume): use memory-efficient rescaling #3170

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Profile Log #2463

Performance Profile Log #2463

adam-grant-hendry commented Apr 13, 2022 •

edited

Loading

akaszynski commented Apr 14, 2022

adam-grant-hendry commented Apr 15, 2022

akaszynski commented Apr 15, 2022

adam-grant-hendry commented Apr 15, 2022

adeak commented Apr 15, 2022 •

edited

Loading

adam-grant-hendry commented Apr 15, 2022 •

edited

Loading

adeak commented Apr 15, 2022 •

edited

Loading

akaszynski commented Apr 15, 2022

adeak commented Apr 15, 2022

adam-grant-hendry commented Apr 16, 2022

akaszynski commented Apr 17, 2022

adam-grant-hendry commented Apr 19, 2022

adam-grant-hendry commented Apr 19, 2022

adam-grant-hendry commented Apr 19, 2022

prisae commented Jul 14, 2022

akaszynski commented Jul 14, 2022

adam-grant-hendry commented Jul 14, 2022

Performance Profile Log #2463

Performance Profile Log #2463

Comments

adam-grant-hendry commented Apr 13, 2022 • edited Loading

Goals:

akaszynski commented Apr 14, 2022

adam-grant-hendry commented Apr 15, 2022

akaszynski commented Apr 15, 2022

adam-grant-hendry commented Apr 15, 2022

adeak commented Apr 15, 2022 • edited Loading

adam-grant-hendry commented Apr 15, 2022 • edited Loading

adeak commented Apr 15, 2022 • edited Loading

akaszynski commented Apr 15, 2022

adeak commented Apr 15, 2022

adam-grant-hendry commented Apr 16, 2022

akaszynski commented Apr 17, 2022

adam-grant-hendry commented Apr 19, 2022

adam-grant-hendry commented Apr 19, 2022

adam-grant-hendry commented Apr 19, 2022

prisae commented Jul 14, 2022

akaszynski commented Jul 14, 2022

adam-grant-hendry commented Jul 14, 2022

adam-grant-hendry commented Apr 13, 2022 •

edited

Loading

adeak commented Apr 15, 2022 •

edited

Loading

adam-grant-hendry commented Apr 15, 2022 •

edited

Loading

adeak commented Apr 15, 2022 •

edited

Loading