Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use module specific config files for module versions/default settings #39

Closed
svandenhoek opened this issue Sep 8, 2020 · 3 comments
Closed
Labels
enhancement New feature or request

Comments

@svandenhoek
Copy link
Contributor

svandenhoek commented Sep 8, 2020

Is your feature request related to a problem? Please describe.
Related to:

Using a separate .config file for each module should make it easier to see the current configuration and adjust it (in comparison to looking it up in the file). An added benefit is also that by viewing the .config file for a bash script, one knows exactly all modules each script depends on to run (as this is not documented right now and needs to be viewed in the source code).

Describe the solution you'd like
For each module, an identically named .config file. For example, a pipeline_annotate.config for pipeline_annotate.sh.

Example config file:

# Set module versions to be used
CAPICEVERSION=v1.2-foss-2018b

# Set paths to needed files/dirs
VEP_DATA=/apps/data/Ensembl/VEP/100

# Configure default values for input parameters (overridable by command line arguments)
CPU_CORES=4
ASSEMBLY=GRCh37

# Set default arguments of used tools
VEP_ARGS=" --stats_text \
--offline --cache --dir_cache ${VEP_DATA} \
--species homo_sapiens --assembly ${ASSEMBLY} \
--flag_pick_allele \
--coding_only \
--no_intergenic \
--af_gnomad --pubmed --gene_phenotype \
--shift_3prime 1 \
--no_escape \
--numbers \
--dont_skip \
--allow_non_variant \
--fork ${CPU_CORES}"

Example bash script:

source pipeline_annotate.config
module load CAPICE/${CAPICEVERSION}

Describe alternatives you've considered
Use a single global config file:

  • Config file could become cumbersome in the long-term if many modules exist.
  • If different bash scripts need to load a different version of a specific module, could cause issues unless module is always named explicitly as well (in which case the added benefit of a single config file might be reduced).

Setting module-versions/default values in the bash script and needed files/dirs in a global config file:

  • All bash scripts would require loading the single config file and therefore loading stuff they don't need.
  • Easier to configure basic configuration to run the pipeline (as only a single file needs to be adjusted).
  • Does require adjusting bash script adjustments when f.e. wanting to set a different CPU_CORES as default. When updating to a new version, which custom-changes need to be adjusted again is less clear.
@joerivandervelde joerivandervelde added the enhancement New feature or request label Sep 8, 2020
@svandenhoek
Copy link
Contributor Author

Edited main text to add a note in regards to how a .config file for each bash script also functions as an indirect "documentation" of all module dependencies for each script.

@svandenhoek
Copy link
Contributor Author

Another additional feature would be to have an optional argument for custom a .config file to override the default one if extra flexibility is needed (f.e. loading the default .config file first and then override any settings that are defined in the supplied .config file).

@dennishendriksen
Copy link
Contributor

The proposed alternative solution using one global config file was implemented in #132 with a default config added in #133 and support for multiple config files added in #149.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants