Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

divvy adapters #47

Closed
nsheff opened this issue Mar 17, 2020 · 14 comments
Closed

divvy adapters #47

nsheff opened this issue Mar 17, 2020 · 14 comments
Assignees
Labels
enhancement likely-solved
Milestone

Comments

@nsheff
Copy link
Contributor

@nsheff nsheff commented Mar 17, 2020

adapters allow you to use divvy with any source of variables.

divvy originally was part of looper. therefore, the default divvy variables (like {CODE}, etc) are from looper. removing divvy from looper decoupled the software, but the variables are still tightly coupled. To make it more flexible, we need to remove this coupling. divvy adapters do that.

here's a config file with adapters:

adapters:
  code: looper.command
  logfile: looper.logfile
  jobname: looper.jobname
  cores: compute.cores
  time: compute.time
  mem: compute.mem
  docker_args: compute.docker_args
  docker_image: compute.docker_image
  singluarity_image: compute.singularity_image
  singularity_args: compute.singularity_args
compute_packages:
  default:
    submission_template: submit_templates/localhost_template.sub
    submission_command: sh
  local:
    submission_template: submit_templates/localhost_template.sub
    submission_command: sh
    adapters:
      custom: custom_adapter_here
  slurm:
    submission_template: submit_templates/slurm_template.sub
    submission_command: sbatch
  singularity:
    submission_template: submit_templates/localhost_singularity_template.sub
    submission_command: sh
    singularity_args: ""
  singularity_slurm:
    submission_template: submit_templates/slurm_singularity_template.sub
    submission_command: sbatch
    singularity_args: ""

adapters are simple variable mappings from one name to another. they can just be straight-up var:var mappings, but they can also include namespaces (on the supply side; divvy variables aren't namespaced).

This system would allow us to include a 'divvy-looper' adapter. this adapter could be modified either for a universal divvy config, or for a particular compute package, which would enable divvy templates to be used with multiple variable sources.

under this system, looper would simply provide to divvy all available namespaces, the same as it does for command templates. the adapter would convert these into the divvy variables. the advantages is now divvy templates are useful beyond looper. it also simplifies what looper has to do: nothing.

divvy should ship with looper adapters, something like the above example.

what do you think @stolarczyk ?

@nsheff nsheff added the enhancement label Mar 17, 2020
@nsheff nsheff added this to the 0.5 milestone Mar 17, 2020
@stolarczyk stolarczyk self-assigned this Mar 17, 2020
stolarczyk added a commit that referenced this issue Mar 18, 2020
stolarczyk added a commit that referenced this issue Mar 18, 2020
stolarczyk added a commit that referenced this issue Mar 18, 2020
@nsheff
Copy link
Contributor Author

@nsheff nsheff commented Mar 19, 2020

In my testing of looper I'm missing how to use the new adapters on rivanna. I need an example.

@nsheff
Copy link
Contributor Author

@nsheff nsheff commented Mar 19, 2020

I put these adapters into my divvy config file:

adapters:
  code: looper.command
  jobname: looper.jobname
  cores: compute.cores
  logfile: compute.logfile
  time: compute.time
  mem: compute.memory
  docker_args: compute.docker_args
  docker_image: compute.docker_image
  singluarity_image: compute.singularity_image
  singularity_args: compute.singularity_args

It correctly populated the {CODE} variable, but not none of the others:

#!/bin/bash
#SBATCH --job-name='{JOBNAME}'
#SBATCH --output='{LOGFILE}'
#SBATCH --mem='{MEM}'
#SBATCH --cpus-per-task='{CORES}'
#SBATCH --time='{TIME}'
#SBATCH --partition='standard'
#SBATCH -m block
#SBATCH --ntasks=1
#SBATCH --open-mode=append

echo 'Compute node:' `hostname`
echo 'Start time:' `date +'%Y-%m-%d %T'`

cmd="/home/ns5bc/code/sra_convert/sra_convert.py --srr /project/shefflab/data/sra/SRR8435075.sra /project/shefflab/data/sra/SRR8435076.sra /project/shefflab/data/sra/SRR8435077.sra /project/shefflab/data/sra/SRR8435078.sra -O /project/shefflab/processed/paqc/results_pipeline --verbosity 4 --logdev"

y=`echo "$cmd" | sed -e 's/^/srun /'`
eval "$y"

@stolarczyk
Copy link
Member

@stolarczyk stolarczyk commented Mar 19, 2020

it's because it's looking for the exact keys in the template, uppercase

  CODE: looper.command
  LOGFILE: looper.log_file
  JOBNAME: looper.job_name
  CORES: compute.cores
  TIME: compute.time
  MEM: compute.mem
  DOCKER_ARGS: compute.docker_args
  DOCKER_IMAGE: compute.docker_image
  SINGULARITY_IMAGE: compute.singularity_image
  SINGULARITY_ARGS: compute.singularity_args

@nsheff
Copy link
Contributor Author

@nsheff nsheff commented Mar 19, 2020

got it!. code worked lowercase...

@nsheff
Copy link
Contributor Author

@nsheff nsheff commented Mar 19, 2020

great, those looper variables are working for me now. But the compute namespace is not working yet, is that expected?

@nsheff
Copy link
Contributor Author

@nsheff nsheff commented Mar 19, 2020

I've added an adapter version here: https://github.com/pepkit/divcfg/blob/master/uva_rivanna_adapters.yaml

will later integrate into the main config (should be backwards compatible)

@stolarczyk
Copy link
Member

@stolarczyk stolarczyk commented Mar 19, 2020

the compute namespace is not working yet, is that expected?

it works for me in looper, hmmm.. maybe we're doing sth differently? How are you testing it?

@nsheff
Copy link
Contributor Author

@nsheff nsheff commented Mar 19, 2020

DIVCFG=/project/shefflab/rivanna_config/divcfg/uva_rivanna_adapters.yaml looper run paqc.yaml --amendments sra_convert -d
cat /project/shefflab/processed/paqc/submission/convert_ATAC-seq_Suspension_rep3.sub
#!/bin/bash
#SBATCH --job-name='convert_ATAC-seq_Suspension_rep3'
#SBATCH --output='/project/shefflab/processed/paqc/submission/convert_ATAC-seq_Suspension_rep3.log'
#SBATCH --mem='{MEM}'
#SBATCH --cpus-per-task='{CORES}'
#SBATCH --time='{TIME}'
#SBATCH --partition='standard'
#SBATCH -m block
#SBATCH --ntasks=1
#SBATCH --open-mode=append

echo 'Compute node:' `hostname`
echo 'Start time:' `date +'%Y-%m-%d %T'`

cmd="/home/ns5bc/code/sra_convert/sra_convert.py --srr /project/shefflab/data/sra/SRR8435075.sra /project/shefflab/data/sra/SRR8435076.sra /project/shefflab/data/sra/SRR8435077.sra /project/shefflab/data/sra/SRR8435078.sra -O /project/shefflab/processed/paqc/results_pipeline --logdev"

y=`echo "$cmd" | sed -e 's/^/srun /'`
eval "$y"

@nsheff
Copy link
Contributor Author

@nsheff nsheff commented Mar 19, 2020

@stolarczyk
Copy link
Member

@stolarczyk stolarczyk commented Mar 19, 2020

have you specified size_dependent_variablesas a TSV in the compute section of sra_convert piface?

@stolarczyk
Copy link
Member

@stolarczyk stolarczyk commented Mar 19, 2020

I didn't make it backwards compatible. Only the TSV way is supported now

@stolarczyk
Copy link
Member

@stolarczyk stolarczyk commented Mar 19, 2020

have you specified size_dependent_variablesas a TSV in the compute section of sra_convert piface?

worked for me this way:

[mjs5kd@udc-ba36-36 paqc](master): echo $DIVCFG
/project/shefflab/rivanna_config/divcfg/uva_rivanna_adapters.yaml
[mjs5kd@udc-ba36-36 paqc](master): looper run paqc.yaml --amendments sra_convert -d --limit 1
Command: run (Looper version: 0.12.6-dev)
Using amendments: sra_convert
Finding pipelines for protocol(s): *
Known protocols: *
## [1 of 17] GSM4289908 (*)
Writing script to /project/shefflab/processed/paqc/submission/convert_GSM4289908.sub
Job script (n=1; 0.00 Gb): /project/shefflab/processed/paqc/submission/convert_GSM4289908.sub
Dry run, not submitted

Looper finished
Samples valid for job generation: 1 of 1
Successful samples: 1 of 1
Commands submitted: 1 of 1
Jobs submitted: 1
Dry run. No jobs were actually submitted.
[mjs5kd@udc-ba36-36 paqc](master): c /project/shefflab/processed/paqc/submission/convert_GSM4289908.sub
#!/bin/bash
#SBATCH --job-name='convert_GSM4289908'
#SBATCH --output='/project/shefflab/processed/paqc/submission/convert_GSM4289908.log'
#SBATCH --mem='8000'
#SBATCH --cpus-per-task='1'
#SBATCH --time='00-04:00:00'
#SBATCH --partition='standard'
#SBATCH -m block
#SBATCH --ntasks=1
#SBATCH --open-mode=append

echo 'Compute node:' `hostname`
echo 'Start time:' `date +'%Y-%m-%d %T'`

cmd="sra_convert.py --srr /project/shefflab/data/sra/SRR10988638.sra "

y=`echo "$cmd" | sed -e 's/^/srun /'`
eval "$y"

[mjs5kd@udc-ba36-36 paqc](master): c ${CODE}/sra_convert/pipeline_interface_convert.yaml
protocol_mapping:
  "*": convert

pipelines:
  convert:
    name: convert
    path: sra_convert.py
    # required_input_files: SRR_files
    arguments:
      "--srr": SRR_files
    command_template: >
      {pipeline.path} --srr {sample.SRR_files}
    compute:
      bulker_crate: databio/sra_convert
      size_dependent_variables: resources.tsv
[mjs5kd@udc-ba36-36 paqc](master): c ${CODE}/sra_convert/resources.tsv 
max_file_size	cores	mem	time
NaN	1	8000	00-04:00:00
0.05	2	12000	00-08:00:00
0.5	4	16000	00-12:00:00
1	8	16000	00-24:00:00
10	16	32000	02-00:00:00

@nsheff
Copy link
Contributor Author

@nsheff nsheff commented Mar 19, 2020

perfect -- can you push those changes to sra_convert ?

I got mixed up between the adapter changes and the compute changes :)

nsheff added a commit to pepkit/sra_convert that referenced this issue Mar 19, 2020
@nsheff
Copy link
Contributor Author

@nsheff nsheff commented Mar 19, 2020

nm I got it. works! thanks.

stolarczyk added a commit that referenced this issue May 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement likely-solved
Projects
None yet
Development

No branches or pull requests

2 participants