Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project.get_outputs() functionality #165

Closed
nsheff opened this issue Apr 16, 2019 · 2 comments
Closed

Project.get_outputs() functionality #165

nsheff opened this issue Apr 16, 2019 · 2 comments
Assignees
Milestone

Comments

@nsheff
Copy link
Contributor

nsheff commented Apr 16, 2019

Caravel would like a list of outputs produced by a pipeline.

This information is stored in the pipeline_interface, which looper.Project contains. The pipeline_interface could encode outputs using syntax like this:

pipelines:
  pepatac.py:
    name: PEPATAC
    path: pipelines/pepatac.py
    looper_args: True
    arguments:
      "--sample-name": sample_name
    optional_arguments:
      "--input2": read2
    outputs:
      smooth_bw: "aligned_{sample.genome}/{sample.name}_smooth.bw"
      pre_smooth_bw: "aligned_{project.prealignments}/{sample.name}_smooth.bw"
    compute:
      singularity_image: ${SIMAGES}pepatac
    summarizers:
      - tools/PEPATAC_summarizer.R
    summary_results:
      - alignment_percent_file:
        caption: "Alignment percent file"
        description: "Plots percent of total alignment to all pre-alignments and primary genome."
        thumbnail_path: "summary/{name}_alignmentPercent.png"
        path: "summary/{name}_alignmentPercent.pdf"

the get_outputs function should return a nested Dict:

{
pipeline: {
  output_name: {
    path: output_path,
    samples: [sample_key1, sample_key2, ...]
  }
}

{
PEPATAC: {
  smooth_bw: {
    path: "aligned_{sample.genome}/{sample.name}_smooth.bw",
    samples: [sample_key1, sample_key2, ...]
  }
}

This best preserves the structure of outputs. they need not have unique names across pipelines.

The Project object will need to look at each PipelineInterface it holds, see if it provides any outputs, and then identify any samples that would run that pipeline.

@nsheff nsheff added this to the 0.12 milestone Apr 16, 2019
@nsheff
Copy link
Contributor Author

nsheff commented Apr 16, 2019

Related to other issues dealing with the pipeline_interface structure:

#61

#32

#5

@nsheff
Copy link
Contributor Author

nsheff commented Apr 18, 2019

I wrote a function that takes this output and populates the actual paths... maybe this should just belong on the project object as well?

    populated_outputs = {}
    # populate path variables
    for pipeline_name, pipeline_outputs in project_outputs:
        populated_outputs[pipeline_name] = {}
        for output_name, output_info in pipeline_outputs:
            populated_outputs[pipeline_name][output_name] = {}
            for sample in output_info.samples:
                populated_output = "".join("{base_url}data/{project.metadata.results_subdir}/{sample.name}",
                    output_info.path).format(sample=globs.p.get_sample(sample), 
                    base_url=request.url_root,
                    project=globs.p)


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants