Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run and test batch_processing.sh using GitHub Actions #3198

Merged
merged 19 commits into from Mar 6, 2021

Conversation

joshuacwnewton
Copy link
Member

@joshuacwnewton joshuacwnewton commented Jan 25, 2021

Checklist

GitHub

  • I've given this PR a concise, self-descriptive, and meaningful title
  • I've linked relevant issues in the PR body
  • I've applied the relevant labels to this PR
  • I've assigned a reviewer

PR contents

Description

This PR follows the experimentation done in #3196. It introduces a new GH Actions that does the following:

  • Runs batch_processing.sh
  • Runs a test that compares the results to cached csv files committed to the repo.

Some remaining design questions:

  • When should we run this?
  • What tolerance values should be chosen for each metric?
    • I've chosen tolerances as small as can be without the tests failing, so that any deviations will be caught.
  • Should any other new values be tested?

Linked issues

Fixes #2888.

@joshuacwnewton joshuacwnewton added tests context: unit, integration, or functional tests CI category: TravisCI, GitHub Actions, etc. labels Jan 25, 2021
@joshuacwnewton joshuacwnewton added this to the next-release milestone Jan 25, 2021
@joshuacwnewton
Copy link
Member Author

joshuacwnewton commented Jan 25, 2021

The test is failing, but that's intentional. I set it to showcase the fact that different values need different tolerances.

(Also, look at the pretty colour in that test! I found a workaround to re-enable colour in GH Actions.)

https://github.com/neuropoly/spinalcordtoolbox/blob/796d3011b1e7b50104b833a979272b1fae4ab2d4/.github/workflows/tests.yml#L56-L58

@jcohenadad
Copy link
Member

(Also, look at the pretty colour in that test! I found a workaround to re-enable colour in GH Actions.)

awesome, i love pretty colors!

.github/workflows/tests.yml Outdated Show resolved Hide resolved
.github/workflows/tests.yml Outdated Show resolved Hide resolved
@joshuacwnewton
Copy link
Member Author

joshuacwnewton commented Jan 26, 2021

Here's a question that's been on my mind this past day: Should we even use this Artifact API? Or, would it be better to hardcode the results via a commit to the repo?

Pros for Artifact API

  • In theory, it would be a more seamless way of updating the cached values by doing everything automatically in the background.

Cons for Artifact API

  • The API is pretty opaque -- there's not really an easy way to check what values are being stored. You can query for artifacts, but then you still have to download the .csv files to check. (Commits, on the other hand, put the results directly in the repo, visible for anyone to see.)
  • The API makes it too easy to accidentally upload erroneous results. (Commits would force us to be more transparent/explicit about the current state of batch_processing.sh. To update the values, we would need to be clearer in our intentions that something has changed by making a commit, which IMO is a good thing.)
  • The API feels brittle. It relies on refreshing the values within a 90 day window, which is doable, but it's inherently temporary. Also, what happens if the .csv files ever don't get uploaded/downloaded properly? (Commits feel much more reliable -- no chance of "oops, the cached values are gone!")
  • Using the API makes us dependent on an extra GitHub service, which carries documentation issues, could change unpredictably in the future, etc.

Using the Artifacts API was a fun experiment, but I'm leaning towards not using it at all.

@kousu
Copy link
Contributor

kousu commented Jan 26, 2021

Since you're using this as a cache, I lean towards committing them to the repo too.

I keep linking this, and I wish I had a better example than a js project, but https://jestjs.io/docs/en/snapshot-testing shows us the way to do this, imo: when values change it fails the test; to update the values on purpose, the process is just rm sct_example_data/t2/csa_c2c3.csv; pytest; git add -u; git commit -m "update snapshot tests because we changed [...]". Uh, assume we have some plugin that can make the snapshots for us.

@Drulex
Copy link
Collaborator

Drulex commented Jan 26, 2021

Here's a question that's been on my mind this past day: Should we even use this Artifact API? Or, would it be better to hardcode the results via a commit to the repo?

I agree that using the artifacts API introduces unneeded complexity. At the end of the day we just need to compare numbers. Committing the files to the repo sounds reasonable to me.

.github/workflows/tests.yml Outdated Show resolved Hide resolved
.github/workflows/tests.yml Outdated Show resolved Hide resolved
.github/workflows/tests.yml Outdated Show resolved Hide resolved
.github/workflows/tests.yml Outdated Show resolved Hide resolved
.github/workflows/tests.yml Outdated Show resolved Hide resolved
.github/workflows/tests.yml Outdated Show resolved Hide resolved
@joshuacwnewton
Copy link
Member Author

batch_processing.sh is now failing because of #3202.

This is actually pretty neat -- the CI run of batch_processing.sh caught a bug that the tests didn't catch.

@joshuacwnewton
Copy link
Member Author

joshuacwnewton commented Feb 1, 2021

Huzzah, #3203 fixed the failing CI in this PR.

I have one more thing I want to try: Running batch_processing.sh on both Ubuntu and macOS to more clearly demonstrate #3194. (If my test works well, then it should pass on Ubuntu but fail on macOS.)

@joshuacwnewton
Copy link
Member Author

Well, the macOS CI job did fail, but not for the reason I predicted. Instead, it ran into the concurrency bug from #2957 which should be fixed by #3152.

I'm going to try restarting the build...

@joshuacwnewton
Copy link
Member Author

joshuacwnewton commented Feb 2, 2021

Huzzah x2!

#3194 suggests that 3 values (mt/MTR(WM), dmri/FA(CST_r), and dmri/FA(CST_l)) aren't consistent between Ubuntu and macOS. And, as hoped, those 3 tests are failing for macOS.

@kousu kousu mentioned this pull request Feb 2, 2021
7 tasks
Copy link
Collaborator

@Drulex Drulex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @joshuacwnewton, this looks good.

At what point are new values added to the cached results file?

@joshuacwnewton joshuacwnewton removed this from the 5.2.0 milestone Feb 23, 2021
@joshuacwnewton
Copy link
Member Author

joshuacwnewton commented Feb 25, 2021

My apologies, @Drulex. I thought I wrote a reply, but I must not have submitted it.

At what point are new values added to the cached results file?

To clarify, do you mean updating the values for existing metrics? Or, adding new (different) metrics to test?

  • For updating existing metrics, I was thinking we would decide on a case-by-case basis whenever a PR causes a shift:
    • If we expected a PR to shift the values, we would update the cached values within that PR. (This has the benefit of creating a history of changes that are linked with specific PR changesets.)
    • If the shift was unexpected, then we would look more closely at the PR to see why it caused a change.
  • For adding new metrics, it would depend on which other metrics are valuable to test. I'm still becoming familiar with typical MRI processing pipelines, so instead @jcohenadad (or a student in the lab) might be the best to ask.

@Drulex
Copy link
Collaborator

Drulex commented Feb 25, 2021

If we expected a PR to shift the values, we would update the cached values within that PR. (This has the benefit of creating a history of changes that are linked with specific PR changesets.)

That answers the question thanks!

Copy link
Member Author

@joshuacwnewton joshuacwnewton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created a follow-up issue in #3250 to investigate testing more than just the current 6 values. For now, I think this PR would still be useful as-is, and perhaps more values could be added in another PR.

@kousu @Drulex Is there anything else, then, that needs to be done before this can be approved/merged?

Comment on lines +28 to +29
os: [ ubuntu-18.04 ] # TODO: Change to [ ubuntu-18.04, macos-10.15 ]
# macOS currently fails due to https://github.com/neuropoly/spinalcordtoolbox/issues/3194
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed macos-10.15 in b978a6e, because I figure it will only be useful when the discrepancies are fixed.

@Drulex
Copy link
Collaborator

Drulex commented Feb 25, 2021

I've created a follow-up issue in #3250 to investigate testing more than just the current 6 values. For now, I think this PR would still be useful as-is, and perhaps more values could be added in another PR.

@kousu @Drulex Is there anything else, then, that needs to be done before this can be approved/merged?

Looks ok to me.

@joshuacwnewton joshuacwnewton merged commit 1eee837 into master Mar 6, 2021
@joshuacwnewton joshuacwnewton deleted the jn/2888-test-batch_processing.sh branch March 6, 2021 22:37
@joshuacwnewton joshuacwnewton mentioned this pull request Mar 7, 2021
@joshuacwnewton joshuacwnewton added this to the 5.3.0 milestone Apr 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI category: TravisCI, GitHub Actions, etc. tests context: unit, integration, or functional tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate running batch_processing during CI
4 participants