Add benchmark checks to CI #5424

jaimergp · 2021-06-10T10:30:51Z

Description

This PR provides a new CI workflow that runs the current commit against the PR target branch. The different strategies are discussed in the conversation below, which conclusions driven by data points collected in my fork.

It also adds a asv check command to the standard CI suite.

Checklist

Docstrings for all functions
Gallery example in ./doc/examples (new features only)
Benchmark in ./benchmarks, if your changes aren't covered by an
existing benchmark
Unit tests
Clean style in the spirit of PEP8

For reviewers

Check that the PR title is short, concise, and will make sense 1 year
later.
Check that new functions are imported in corresponding __init__.py.
Check that new features, API changes, and deprecations are mentioned in
doc/release/release_dev.rst.

Apparently time_* functions need to accept the params too, even if they don't use it (same signature as setup). Adding *args is enough to fix, no need to copy the full signature.

jaimergp · 2021-06-10T15:00:31Z

Noting here that some of the benchmarks included with asv dev will result in memory allocation errors. It takes 30 minutes to run a quick pass. Might only be worth it for some entries in the job matrix. I'll let the team guide that decision... thoughts?

grlee77 · 2021-06-11T10:01:15Z

Thanks, I just merged #5426 and manually started the CI here. We will see if it autostarts on the next commit now...

…ctor)

jaimergp · 2021-06-11T11:43:31Z

That seemed to do the trick, thanks @grlee77!

grlee77 · 2021-06-11T12:06:33Z

FYI, if you are seeing warnings about a deprecated multichannel argument, those will be resolved by #5427 for the 0.19 release, see:
2c6e721

Basically, we changed the API to allow specification of the axis containing "channels" for color images rather than using a boolean multichannel argument. For 0.19 both arguments are still present, but the user will get a warning if multichannel is used. To be able to benchmark against older releases, I added this _channel_kwarg helper function

jaimergp · 2021-06-11T16:05:32Z

I'm going to aim for relative performance measurements in GH Actions. I've seen several projects being ok with this approach so at least the catch regressions on PRs. We won't get a history of performance, but that won't be possible without dedicated hardware.

I've merged a similar PR in my own fork that will run the benchmarks every 6h for a week for the same two commits. I will then collect the results (uploaded via artifacts) and compare how reliable they are. We should expect the same relative differences (+- some statistical deviation) in all runs if the approach is stable enough.

I'll ping you when I have some numbers next week!

grlee77 · 2021-06-11T17:46:48Z

I've merged a similar PR in my own fork that will run the benchmarks every 6h for a week for the same two commits. I will then collect the results (uploaded via artifacts) and compare how reliable they are. We should expect the same relative differences (+- some statistical deviation) in all runs if the approach is stable enough.

Nice. This will be interesting to see!

jaimergp · 2021-06-12T11:30:46Z

This notebook (attached) will help us assess the stability of the GHA CI for benchmarking purposes. We still need to wait a bit to collect more data points, though, but it looks ok so far!

timeseries.ipynb.zip

jaimergp · 2021-06-14T09:39:26Z

During the weekend we have collected 15 independent attempts. See analysis in this gist.

Reliability is okayish, but it does give some false positives here and there... Let's see how it behaves over the weekdays instead of the weekend.

jni · 2021-07-06T02:49:34Z

Super amazing work @jaimergp! I agree with @stefanv that some instructions would be great. 😃

By the way, I just removed/added the label and that worked too! So great! 🎉 🚀

jaimergp · 2021-07-06T08:22:08Z

How do I look at the artifact via asv? It may be worth adding a README.md into the artifact zip file with instructions.

Got it! I'll add some instructions!

jaimergp · 2021-07-06T10:47:17Z

@jni @stefanv -- I added some instructions here. This file will be included in the artifact, together with the CI logs and the JSON databases. Let me know if that's clear enough or if you need something else!

stefanv · 2021-07-06T18:09:30Z

@jni @stefanv -- I added some instructions here. This file will be included in the artifact, together with the CI logs and the JSON databases. Let me know if that's clear enough or if you need something else!

This is... AMAZING?! Really, really nicely done. Thank you!

grlee77

Thanks @jaimergp, the new docs are both clear and comprehensive. I had just one suggestion to correct a typo.

benchmarks/README_CI.md

Co-authored-by: Gregory R. Lee <grlee77@gmail.com>

jaimergp · 2021-07-07T11:17:17Z

Thanks for the kind words @stefanv @grlee77! I think we should run the benchmarks one more time before the merge so we can check the contents of the artifact. Can you please add the label one more time? Thanks!

grlee77 · 2021-07-07T11:47:36Z

Can you please add the label one more time? Thanks!

Done

jaimergp · 2021-07-07T13:39:06Z

Ok, confirmed! The files are there as expected! 🥳

grlee77 · 2021-07-08T15:45:28Z

Hi @jaimergp, I am seeing many more Azure CI failures after this PR. There seem to be two separate issues related to this commit:
2fa66e5#diff-7915b9b726a397ae7ba6af7b9703633d21c031ebf21682f3ee7e6a4ec52837a5

Removing the condition on running the gallery examples has resulted in all jobs timing out now, although the timeouts should be resolved if #5446 is merged.

The second issue is that a couple of the cases fail to build wheels for (dependencies of) sphinx-gallery. I am not sure of the best fix for that one: (see here)

The specific error is

  error: failed to run custom build command for `pyo3 v0.13.2`
  
  Caused by:
    process didn't exit successfully: `C:\Users\VSSADM~1\AppData\Local\Temp\pip-install-vc714uet\pywinpty_c3cbae2a690e484a91f2810baf619595\target\release\build\pyo3-93fbd3a6215435b6\build-script-build` (exit code: 1)
    --- stderr
    Error: "Your Rust target architecture (64-bit) does not match your python interpreter (32-bit)"

It is probably easiest to just restore the conditional running of the gallery examples, but let me know what you think.

jaimergp · 2021-07-08T16:28:36Z

Oh, wow, that's totally an accident on my end. I shouldn't have removed that conditional. Will open a PR to revert now.

Not sure about the 2nd issue.

grlee77 · 2021-07-08T16:54:18Z

Not sure about the 2nd issue.

Yeah, I am not familiar with that package it is trying to build, but it seems to be something pulled in by sphinx-gallery. It might work if we selected a 64-bit Python interpreter, but I don't know. I think those failure cases will be bypassed by restoring the conditional in any case.

jaimergp added 7 commits June 10, 2021 12:18

add benchmark checks to CI

9eacafa

Add Windows

01d3ba6

add verbose flags

db37a2d

fix ResizeLocalMeanSuite.time_resize_local_mean signature

20dc840

Apparently time_* functions need to accept the params too, even if they don't use it (same signature as setup). Adding *args is enough to fix, no need to copy the full signature.

run asv dev too

42e7f58

register machine

6754ebf

separate run from checks and add exit 1 on failure

1cf4cb0

jaimergp added 9 commits June 11, 2021 10:27

add a dedicated benchmark workflow for relative performance measurements

5c42d71

use numpy 1.17 only for asv

e792850

use sha's

c5297f0

echo some info

8b51263

Fix git checkout issue with shallow clones

297d7af

Check for "Traceback" lines too

c1c3e9d

discover errors in benchmarks

323535a

mroe verbose

417e83c

catch regression error messages

95c4cd7

jaimergp added 2 commits June 11, 2021 13:40

fix phase_cross_correlation signature (kwarg required for upsample_fa…

0435b70

…ctor)

add pooch to benchmark requirements

c365bad

jaimergp added 2 commits June 11, 2021 15:14

store benchmark results as artifacts

56637d4

interleave processes to reduce long runtime biases

a1b42b7

mkcor added action: mrg+1 and removed action: mrg+1 labels Jun 12, 2021

jni added the 📊 run-benchmark label Jul 6, 2021

jni approved these changes Jul 6, 2021

View reviewed changes

jaimergp added 2 commits July 6, 2021 12:43

add instructions to artifact

03bd552

fix filename

461f33b

jaimergp added 2 commits July 6, 2021 15:50

add info about the UI / results

be5ceb0

add docs about ASV_SKIP_SLOW

38664c1

stefanv approved these changes Jul 6, 2021

View reviewed changes

grlee77 approved these changes Jul 6, 2021

View reviewed changes

benchmarks/README_CI.md Outdated Show resolved Hide resolved

Update benchmarks/README_CI.md

9d8826c

Co-authored-by: Gregory R. Lee <grlee77@gmail.com>

grlee77 added 📊 run-benchmark and removed 📊 run-benchmark labels Jul 7, 2021

grlee77 merged commit 2fa66e5 into scikit-image:main Jul 8, 2021

jaimergp mentioned this pull request Jul 8, 2021

readd gallery testing condition #5466

Merged

grlee77 mentioned this pull request Jul 25, 2021

Remove legacy Travis-CI scripts and update contributor documentation accordingly #5486

Merged

Illviljan mentioned this pull request Sep 14, 2021

Add asv benchmark jobs to CI pydata/xarray#5796

Merged

14 tasks

Illviljan mentioned this pull request Oct 2, 2021

Move ASV benchmark related files to own folder #5591

Open

Illviljan mentioned this pull request Jan 28, 2022

test: Add asv benchmark jobs to CI deepcharles/ruptures#234

Open

1 task

grlee77 mentioned this pull request Mar 27, 2022

MeeseeksDev extension to run benchmarks #3342

Closed

manodeep mentioned this pull request May 14, 2023

Add asv benchmarks manodeep/Corrfunc#292

Open

mkcor mentioned this pull request Jun 12, 2024

Investigate whether CI caches gallery examples #7441

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmark checks to CI #5424

Add benchmark checks to CI #5424

jaimergp commented Jun 10, 2021 •

edited

Loading

jaimergp commented Jun 10, 2021

grlee77 commented Jun 11, 2021

jaimergp commented Jun 11, 2021

grlee77 commented Jun 11, 2021

jaimergp commented Jun 11, 2021

grlee77 commented Jun 11, 2021

jaimergp commented Jun 12, 2021

jaimergp commented Jun 14, 2021

jni commented Jul 6, 2021

jaimergp commented Jul 6, 2021

jaimergp commented Jul 6, 2021

stefanv commented Jul 6, 2021

grlee77 left a comment

jaimergp commented Jul 7, 2021

grlee77 commented Jul 7, 2021

jaimergp commented Jul 7, 2021

grlee77 commented Jul 8, 2021

jaimergp commented Jul 8, 2021

grlee77 commented Jul 8, 2021

Add benchmark checks to CI #5424

Add benchmark checks to CI #5424

Conversation

jaimergp commented Jun 10, 2021 • edited Loading

Description

Checklist

For reviewers

jaimergp commented Jun 10, 2021

grlee77 commented Jun 11, 2021

jaimergp commented Jun 11, 2021

grlee77 commented Jun 11, 2021

jaimergp commented Jun 11, 2021

grlee77 commented Jun 11, 2021

jaimergp commented Jun 12, 2021

jaimergp commented Jun 14, 2021

jni commented Jul 6, 2021

jaimergp commented Jul 6, 2021

jaimergp commented Jul 6, 2021

stefanv commented Jul 6, 2021

grlee77 left a comment

Choose a reason for hiding this comment

jaimergp commented Jul 7, 2021

grlee77 commented Jul 7, 2021

jaimergp commented Jul 7, 2021

grlee77 commented Jul 8, 2021

jaimergp commented Jul 8, 2021

grlee77 commented Jul 8, 2021

jaimergp commented Jun 10, 2021 •

edited

Loading