Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing/Benchmarking of examples #1024

Open
MuawizChaudhary opened this issue Sep 12, 2023 · 3 comments
Open

Testing/Benchmarking of examples #1024

MuawizChaudhary opened this issue Sep 12, 2023 · 3 comments

Comments

@MuawizChaudhary
Copy link
Collaborator

MuawizChaudhary commented Sep 12, 2023

I believe it was a design decision to not test nor benchmark our examples. (If this is correct, please remind me.)

However, some examples produce an accuracy, and I don't believe the accuracy produces by the examples has too much variance when done on different systems or hardware.

I wonder if we should have a special jenkins test that benchmarks specific examples only when there's been a change to the examples or the parts of the API used by the example. I think a good check would be if the accuracy is close enough to some stable commit.

for example, @vivianwhite has provided pull request #1022. Right now I am testing the examples before Vivian's PR and after their PR. should I just look at the accuracy of 1 run for each possible argument, or should I be doing multiple runs and seeing if its within error bars? if so, how many runs?

It makes sense to me to have a stable commit which we run for a reasonable number of runs (3-5) over all possible arguments and compare if the new commit is within error bars (with 3-5 runs making up the mean accuracy) over all possible arguments.

What do we think?

@lostanlen
Copy link
Collaborator

Related: #858

@MuawizChaudhary
Copy link
Collaborator Author

Good memory thanks.

@janden
Copy link
Collaborator

janden commented Oct 30, 2023

I wonder if we should have a special jenkins test that benchmarks specific examples only when there's been a change to the examples or the parts of the API used by the example.

Don't know that this is possible to do. Also, there may be changes that don't affect the API yet cause the examples to give different results.

One option is to have a workflow that is only triggered manually or on some other even (such as a beta release).

It makes sense to me to have a stable commit which we run for a reasonable number of runs (3-5) over all possible arguments and compare if the new commit is within error bars (with 3-5 runs making up the mean accuracy) over all possible arguments.

That sounds tricky. I think a safer bet here is to fix the seed(s). That way we can guarantee that the result is exactly the same and test for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants