Project.get_outputs() functionality #165

nsheff · 2019-04-16T21:02:18Z

Caravel would like a list of outputs produced by a pipeline.

This information is stored in the pipeline_interface, which looper.Project contains. The pipeline_interface could encode outputs using syntax like this:

pipelines:
  pepatac.py:
    name: PEPATAC
    path: pipelines/pepatac.py
    looper_args: True
    arguments:
      "--sample-name": sample_name
    optional_arguments:
      "--input2": read2
    outputs:
      smooth_bw: "aligned_{sample.genome}/{sample.name}_smooth.bw"
      pre_smooth_bw: "aligned_{project.prealignments}/{sample.name}_smooth.bw"
    compute:
      singularity_image: ${SIMAGES}pepatac
    summarizers:
      - tools/PEPATAC_summarizer.R
    summary_results:
      - alignment_percent_file:
        caption: "Alignment percent file"
        description: "Plots percent of total alignment to all pre-alignments and primary genome."
        thumbnail_path: "summary/{name}_alignmentPercent.png"
        path: "summary/{name}_alignmentPercent.pdf"

the get_outputs function should return a nested Dict:

{
pipeline: {
  output_name: {
    path: output_path,
    samples: [sample_key1, sample_key2, ...]
  }
}

{
PEPATAC: {
  smooth_bw: {
    path: "aligned_{sample.genome}/{sample.name}_smooth.bw",
    samples: [sample_key1, sample_key2, ...]
  }
}

This best preserves the structure of outputs. they need not have unique names across pipelines.

The Project object will need to look at each PipelineInterface it holds, see if it provides any outputs, and then identify any samples that would run that pipeline.

The text was updated successfully, but these errors were encountered:

nsheff · 2019-04-16T21:07:07Z

Related to other issues dealing with the pipeline_interface structure:

#61

#32

#5

nsheff · 2019-04-18T12:22:56Z

I wrote a function that takes this output and populates the actual paths... maybe this should just belong on the project object as well?

    populated_outputs = {}
    # populate path variables
    for pipeline_name, pipeline_outputs in project_outputs:
        populated_outputs[pipeline_name] = {}
        for output_name, output_info in pipeline_outputs:
            populated_outputs[pipeline_name][output_name] = {}
            for sample in output_info.samples:
                populated_output = "".join("{base_url}data/{project.metadata.results_subdir}/{sample.name}",
                    output_info.path).format(sample=globs.p.get_sample(sample), 
                    base_url=request.url_root,
                    project=globs.p)

nsheff added this to the 0.12 milestone Apr 16, 2019

nsheff assigned vreuter Apr 16, 2019

nsheff mentioned this issue Apr 16, 2019

a platform-agnostic way to add sample attributes #94

Closed

vreuter mentioned this issue Apr 25, 2019

Outputs, and extras #175

Merged

vreuter added the likely-solved label Apr 26, 2019

stolarczyk closed this as completed Feb 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project.get_outputs() functionality #165

Project.get_outputs() functionality #165

nsheff commented Apr 16, 2019

nsheff commented Apr 16, 2019

nsheff commented Apr 18, 2019

Project.get_outputs() functionality #165

Project.get_outputs() functionality #165

Comments

nsheff commented Apr 16, 2019

nsheff commented Apr 16, 2019

nsheff commented Apr 18, 2019