Run and test batch_processing.sh using GitHub Actions #3198

joshuacwnewton · 2021-01-25T20:04:06Z

Checklist

GitHub

I've given this PR a concise, self-descriptive, and meaningful title
I've linked relevant issues in the PR body
I've applied the relevant labels to this PR
I've assigned a reviewer

PR contents

I've consulted SCT's internal developer documentation to ensure my contribution is in line with any relevant design decisions
I've added relevant tests for my contribution
I've updated the relevant documentation for my changes, including argparse descriptions, docstrings, and ReadTheDocs tutorial pages

Description

This PR follows the experimentation done in #3196. It introduces a new GH Actions that does the following:

Runs batch_processing.sh
Runs a test that compares the results to cached csv files committed to the repo.

Some remaining design questions:

When should we run this?
- Right now, this PR adopts the triggers used in Port CI to Github Actions #3125. See justification here:
What tolerance values should be chosen for each metric?
- I've chosen tolerances as small as can be without the tests failing, so that any deviations will be caught.
Should any other new values be tested?

Linked issues

Fixes #2888.

joshuacwnewton · 2021-01-25T20:36:44Z

The test is failing, but that's intentional. I set it to showcase the fact that different values need different tolerances.

(Also, look at the pretty colour in that test! I found a workaround to re-enable colour in GH Actions.)

https://github.com/neuropoly/spinalcordtoolbox/blob/796d3011b1e7b50104b833a979272b1fae4ab2d4/.github/workflows/tests.yml#L56-L58

jcohenadad · 2021-01-25T20:40:02Z

(Also, look at the pretty colour in that test! I found a workaround to re-enable colour in GH Actions.)

awesome, i love pretty colors!

.github/workflows/tests.yml

joshuacwnewton · 2021-01-26T18:17:35Z

Here's a question that's been on my mind this past day: Should we even use this Artifact API? Or, would it be better to hardcode the results via a commit to the repo?

Pros for Artifact API

In theory, it would be a more seamless way of updating the cached values by doing everything automatically in the background.

Cons for Artifact API

The API is pretty opaque -- there's not really an easy way to check what values are being stored. You can query for artifacts, but then you still have to download the .csv files to check. (Commits, on the other hand, put the results directly in the repo, visible for anyone to see.)
The API makes it too easy to accidentally upload erroneous results. (Commits would force us to be more transparent/explicit about the current state of batch_processing.sh. To update the values, we would need to be clearer in our intentions that something has changed by making a commit, which IMO is a good thing.)
The API feels brittle. It relies on refreshing the values within a 90 day window, which is doable, but it's inherently temporary. Also, what happens if the .csv files ever don't get uploaded/downloaded properly? (Commits feel much more reliable -- no chance of "oops, the cached values are gone!")
Using the API makes us dependent on an extra GitHub service, which carries documentation issues, could change unpredictably in the future, etc.

Using the Artifacts API was a fun experiment, but I'm leaning towards not using it at all.

kousu · 2021-01-26T18:54:54Z

Since you're using this as a cache, I lean towards committing them to the repo too.

I keep linking this, and I wish I had a better example than a js project, but https://jestjs.io/docs/en/snapshot-testing shows us the way to do this, imo: when values change it fails the test; to update the values on purpose, the process is just rm sct_example_data/t2/csa_c2c3.csv; pytest; git add -u; git commit -m "update snapshot tests because we changed [...]". Uh, assume we have some plugin that can make the snapshots for us.

Drulex · 2021-01-26T19:05:42Z

Here's a question that's been on my mind this past day: Should we even use this Artifact API? Or, would it be better to hardcode the results via a commit to the repo?

I agree that using the artifacts API introduces unneeded complexity. At the end of the day we just need to compare numbers. Committing the files to the repo sounds reasonable to me.

.github/workflows/tests.yml

This reverts commit 757830e

joshuacwnewton · 2021-01-31T20:33:12Z

batch_processing.sh is now failing because of #3202.

This is actually pretty neat -- the CI run of batch_processing.sh caught a bug that the tests didn't catch.

joshuacwnewton · 2021-02-01T22:10:11Z

Huzzah, #3203 fixed the failing CI in this PR.

I have one more thing I want to try: Running batch_processing.sh on both Ubuntu and macOS to more clearly demonstrate #3194. (If my test works well, then it should pass on Ubuntu but fail on macOS.)

joshuacwnewton · 2021-02-01T23:52:44Z

Well, the macOS CI job did fail, but not for the reason I predicted. Instead, it ran into the concurrency bug from #2957 which should be fixed by #3152.

I'm going to try restarting the build...

unit_testing/batch_processing/test_batch_processing.py

joshuacwnewton · 2021-02-02T00:24:20Z

Huzzah x2!

#3194 suggests that 3 values (mt/MTR(WM), dmri/FA(CST_r), and dmri/FA(CST_l)) aren't consistent between Ubuntu and macOS. And, as hoped, those 3 tests are failing for macOS.

Drulex

Thanks for the update @joshuacwnewton, this looks good.

At what point are new values added to the cached results file?

joshuacwnewton · 2021-02-25T14:38:42Z

My apologies, @Drulex. I thought I wrote a reply, but I must not have submitted it.

At what point are new values added to the cached results file?

To clarify, do you mean updating the values for existing metrics? Or, adding new (different) metrics to test?

For updating existing metrics, I was thinking we would decide on a case-by-case basis whenever a PR causes a shift:
- If we expected a PR to shift the values, we would update the cached values within that PR. (This has the benefit of creating a history of changes that are linked with specific PR changesets.)
- If the shift was unexpected, then we would look more closely at the PR to see why it caused a change.
For adding new metrics, it would depend on which other metrics are valuable to test. I'm still becoming familiar with typical MRI processing pipelines, so instead @jcohenadad (or a student in the lab) might be the best to ask.

Drulex · 2021-02-25T15:23:22Z

If we expected a PR to shift the values, we would update the cached values within that PR. (This has the benefit of creating a history of changes that are linked with specific PR changesets.)

That answers the question thanks!

joshuacwnewton

I've created a follow-up issue in #3250 to investigate testing more than just the current 6 values. For now, I think this PR would still be useful as-is, and perhaps more values could be added in another PR.

@kousu @Drulex Is there anything else, then, that needs to be done before this can be approved/merged?

joshuacwnewton · 2021-02-25T17:23:44Z

.github/workflows/test-batch-processing.yml

+        os: [ ubuntu-18.04 ]  # TODO: Change to [ ubuntu-18.04, macos-10.15 ]
+                              # macOS currently fails due to https://github.com/neuropoly/spinalcordtoolbox/issues/3194


I've removed macos-10.15 in b978a6e, because I figure it will only be useful when the discrepancies are fixed.

Drulex · 2021-02-25T19:18:12Z

I've created a follow-up issue in #3250 to investigate testing more than just the current 6 values. For now, I think this PR would still be useful as-is, and perhaps more values could be added in another PR.

@kousu @Drulex Is there anything else, then, that needs to be done before this can be approved/merged?

Looks ok to me.

tests.yml: Run and test batch_processing.sh using GitHub Actions

796d301

joshuacwnewton added tests context: unit, integration, or functional tests CI category: TravisCI, GitHub Actions, etc. labels Jan 25, 2021

joshuacwnewton added this to the next-release milestone Jan 25, 2021

joshuacwnewton requested review from Drulex and kousu January 25, 2021 20:34

Drulex reviewed Jan 26, 2021

View reviewed changes