Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dry run #4214

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from
Draft

Dry run #4214

wants to merge 5 commits into from

Conversation

bentsherman
Copy link
Member

Close #844

Signed-off-by: Ben Sherman <bentshermann@gmail.com>
@netlify
Copy link

netlify bot commented Aug 24, 2023

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit 93af8fe
🔍 Latest deploy log https://app.netlify.com/sites/nextflow-docs-staging/deploys/64e778e021aea7000817e30c

@pditommaso

This comment was marked as off-topic.

Copy link

netlify bot commented Apr 19, 2024

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit a1caced
🔍 Latest deploy log https://app.netlify.com/sites/nextflow-docs-staging/deploys/6626e2d9fe8f97000a4d0dbd

@bentsherman bentsherman changed the title Add dry resume option Dry resume Apr 19, 2024
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
@bentsherman bentsherman linked an issue Apr 22, 2024 that may be closed by this pull request
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
@bentsherman
Copy link
Member Author

Added basic cache invalidation reporting to this PR. It requires the cache to also load the index file of the previous run, so that when a cache entry can't be found, it can search the previous run for a matching task (i.e. matching process + tag) and report any differences between the two tasks.

Currently it only reports the different hashes and the scripts (if they are different), but could be expanded to include things like:

  • container / conda / spack / modules config
  • stdin
  • input environment vars
  • input files (compare metadata or checksum based on cache mode)

The last point requires the input file metadata to be saved to the cache as discussed for #3802 and #3849 . I would be fine to address that in a future iteration, since for now, even just reporting the changed hashes is a great improvement.

Other notes:

  • we can use java-diff-utils to print a nice diff of the script text
  • comparing the previous run adds a lot of read pressure to the cache store, I'm concerned it might slow down large runs a lot. I would consider enabling this "deep" cache analysis only when the dry run is enabled

@bentsherman bentsherman changed the title Dry resume Dry run Apr 22, 2024
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
@ewels
Copy link
Member

ewels commented May 20, 2024

  • I would consider enabling this "deep" cache analysis only when the dry run is enabled

And / or as a opt-in? I can imagine some people (me) wanting it routinely if not doing large runs. Might be annoying to have to go back and re-do a dry run every time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature] Resume: print cache invalidation reason to console "dry run" option?
3 participants