-
Notifications
You must be signed in to change notification settings - Fork 521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add Bash script functionality #1821
Conversation
Nice! I like the approach! Feel free to finalize and let me know once it should be reviewed in detail. One thing to note immediately: I think the variables should be lower case. The upper case seems a bit heavy on the eye. echo "The first input file is ${SNAKEMAKE_INPUT[0]}" > "${SNAKEMAKE_OUTPUT[0]}" 2> "${SNAKEMAKE_LOG[0]}"
# vs
echo "The first input file is ${snakemake_input[0]}" > "${snakemake_output[0]}" 2> "${snakemake_log[0]}" |
Okay @johanneskoester, I've made the variables lowercase and added documentation with two examples featuring some of the slightly different use compared to other languages' scripts. One thing, the windows tests seems to be failing as Bash doesn't seem to be a conda package for windows? Not sure what to do about that... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work!
Kudos, SonarCloud Quality Gate passed!
|
Description
This PR adds the ability to access snakemake/rule variables in Bash scripts [closes #294].
There are some design choices here that deserve some discussion. And I will update the docs when/if it is agreed that this PR is desirable.
The main thing to know is that the interface with the snakemake object in the Bash script is a little different to other languages we support.
We use associative arrays (AAs) to provide access to rule variables, but you can't nest AAs. As such, there is not a single snakemake variable (AA), but one for each rule attribute that is encoded as a Namedlist (i.e., input, output, wildcards, log, params etc.), and then a "master" variable for everything else (i.e., threads, rule name, scriptdir etc.).
The other thing to know is we cannot fully support all python data structures. For example, if you have a
dict
as the value for a variable inparams
, it cannot be correctly represented, as it would require us to nest an AA.In addition, lists (of say files) are encoded as space-separated strings - because again, you can't nest arrays.
Despite these limitations, I think the functionality is still super useful, and I know it will save me a lot of effort in my workflows.
Example
Here is an example script that we create for the Bash test rule I added
I'm happy to take feedback and make any changes, but this is a good starting point I think.
QC
docs/
) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).