Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WDL imports #43

Merged
merged 47 commits into from
Dec 28, 2017
Merged

WDL imports #43

merged 47 commits into from
Dec 28, 2017

Conversation

ekoltsova
Copy link
Contributor

@ekoltsova ekoltsova commented Nov 22, 2017

General idea

Currently pipeline-builder has no support for import statements (https://software.broadinstitute.org/wdl/documentation/spec#import-statements)

The PR resolve this problem

Changes

Added support for sub workflows and import statements:

  • import "foo.wdl" as Foo and import "bar.wdl" like imports
    • In this case additional files can be provided via .zip archive file or baseURI (baseURI + fileName)
  • http:// and https:// protocols support (like import "http://foo/bar.wdl" as Bar)

Added tests

Acceptance criteria

1st scenario:

  • Run gulp serve to launch the demo
  • Type .wdl script in textarea
  • Click Load zip button and choose .zip file containing .wdl import files

Example:

WDL script with import statements:

import "tasks.wdl"
import "sub_workflow.wdl" as SubWorkflow

workflow RootWorkflow {
    File? wfInput
    File? wfInputTwo
    File? wfInputThree

    call tasks.TaskOne {
        input:
            taskInput = wfInput
    }

    call SubWorkflow.Workflow {
        input:
            wf_input = TaskOne.task_output,
            wf_input_two = wfInput
    }

    call SubWorkflow.Workflow as WorkflowAliasOne {
        input:
            wf_input = TaskOne.task_output,
            wf_input_two = wfInputTwo
    }

    call SubWorkflow.Workflow as WorkflowAliasTwo {
        input:
            wf_input = TaskOne.task_output,
            wf_input_two = wfInputThree
    }

    output {
        String output_1 = WorkflowAliasTwo.output_string
        String? output_2 = Workflow.output_string
        String? output_3 = WorkflowAliasOne.output_string
    }
}

.zip archive containing files for imports:
importsTestArchive.zip

2nd scenario:

  • Run gulp serve to launch the demo
  • Type .wdl script in textarea
  • Type baseURI in corresponding input
  • Click Build button

Example:

WDL script with import statements:

import "cnv_common_tasks.wdl" as CNVTasks

workflow CNVSomaticPairWorkflow {
    # Workflow input files
    File? targets
    String gatk_jar

    # If no target file is input, then do WGS workflow
    Boolean is_wgs = select_first([targets, ""]) == ""
 
     # docker images
    String gatk_docker
 
    if (!is_wgs) {
        call CNVTasks.PadTargets {
            input:
                targets = targets,
                gatk_jar = gatk_jar,
                gatk_docker = gatk_docker
        }
    }
 
  output {
        String tumor_entity_id = PadTargets.padded_targets
   }
}

baseURI:
https://raw.githubusercontent.com/broadinstitute/gatk/master/scripts/cnv_wdl

3rd scenario:

  • Run gulp serve to launch the demo
  • Type .wdl script with http:// URI right in imput statement in textarea
  • Click Build button

Example:

WDL script with import statements:

import "https://raw.githubusercontent.com/broadinstitute/gatk/master/scripts/cnv_wdl/cnv_common_tasks.wdl" as CNVTasks

workflow CNVSomaticPairWorkflow {
    # Workflow input files
    File? targets
    String gatk_jar

    # If no target file is input, then do WGS workflow
    Boolean is_wgs = select_first([targets, ""]) == ""
 
     # docker images
    String gatk_docker
 
    if (!is_wgs) {
        call CNVTasks.PadTargets {
            input:
                targets = targets,
                gatk_jar = gatk_jar,
                gatk_docker = gatk_docker
        }
    }
 
  output {
        String tumor_entity_id = PadTargets.padded_targets
   }
}

Visualization result for first example should looks like:
image

Visualization result for last two examples should looks like:
image

4th scenario:

  • Run gulp serve to launch the demo
  • Type .wdl script in textarea
  • Set sub workflow expanding level (Depth of recursion; default is 0) and sepcified sub workflows to expand according to expanding level (SubWorkflow to be detailed; type sub workflows names as you can see it on graph comma separated or * to expand all sub workflows)
  • Click Load zip button and choose .zip file containing .wdl import files

Example:

WDL script with import statements:

import "tasks.wdl"
import "sub_workflow.wdl" as SubWorkflow
import "sub_workflow2.wdl" as SubWorkflow2

workflow RootWorkflow {
    File? wfInput
    File? wfInputTwo
    File? wfInputThree

    call tasks.TaskOne {
        input:
            taskInput = wfInput
    }

    call SubWorkflow2.SubWorkflow as wf2{
 	input:
		wf_input = wfInput
    }

    call SubWorkflow.Workflow {
        input:
            wf_input = TaskOne.task_output,
            wf_input_two = wfInput
    }

    call SubWorkflow.Workflow as WorkflowAliasOne {
        input:
            wf_input = TaskOne.task_output,
            wf_input_two = wfInputTwo
    }

    call SubWorkflow.Workflow as WorkflowAliasTwo {
        input:
            wf_input = TaskOne.task_output,
            wf_input_two = wfInputThree
    }

    output {
        String output_1 = WorkflowAliasTwo.output_string
        String? output_2 = Workflow.output_string
        String? output_3 = WorkflowAliasOne.output_string
    }
}

.zip archive containing files for imports:
importsTestArchive2.zip

Visualization result:

Specified parameters:
SubWorkflow to be detailed - *
Depth of recursion - 1
image

Specified parameters:
SubWorkflow to be detailed - SubWorkflow2_SubWorkflow
Depth of recursion - 1
image

Example from Broadinstitute running on 1st scenario:

WDL script:

# Workflow for running GATK CNV (and optionally, ACNV) on tumor/normal or tumor-only cases. Supports both WGS and WES.
#
# Notes:
#
# - The target file (targets) is required for the WES workflow and should be a TSV file with the column headers:
#    contig    start    stop    name
#   These targets will be padded on both sides by the amount specified by PadTargets.padding (default 250).
#
# - If a target file is not provided, then the WGS workflow will be run instead and the specified value of
#   wgs_bin_length (default 1000) will be used.
#
# - A normal BAM (normal_bam) is requrired for the tumor/normal workflow.  If not provided, the tumor-only workflow
#   will be run.
#
# - The sites file (common_sites) is required for the ACNV workflow and should be a Picard interval list.
#   If not provided, the ACNV workflow will not be run.
#
# - Example invocation:
#    java -jar cromwell.jar run cnv_somatic_pair_workflow.wdl myParameters.json
#   See cnv_somatic_pair_workflow_template.json for a template json file to modify with your own parameters (please save
#   your modified version with a different filename and do not commit to the gatk repository).
#
#############

import "cnv_common_tasks.wdl" as CNVTasks
import "cnv_somatic_copy_ratio_bam_workflow.wdl" as CopyRatio
import "cnv_somatic_allele_fraction_pair_workflow.wdl" as AlleleFraction
import "cnv_somatic_oncotate.wdl" as Oncotate

workflow CNVSomaticPairWorkflow {
    # Workflow input files
    File? targets
    File? common_sites
    File tumor_bam
    File tumor_bam_idx
    File? normal_bam
    File? normal_bam_idx
    File ref_fasta
    File ref_fasta_dict
    File ref_fasta_fai
    File cnv_panel_of_normals
    String gatk_jar

    # If no target file is input, then do WGS workflow
    Boolean is_wgs = select_first([targets, ""]) == ""
    # If no sites file is input, then do not do ACNV workflow
    Boolean is_cnv_only = select_first([common_sites, ""]) == ""
    # If no normal BAM is input, then do tumor-only workflow
    Boolean is_tumor_only = select_first([normal_bam, ""]) == ""

    Boolean is_run_oncotator = false

    # docker images
    String gatk_docker
    String oncotator_docker="broadinstitute/oncotator:1.9.3.0-eval-gatk-protected"

    if (!is_wgs) {
        call CNVTasks.PadTargets {
            input:
                # The task will fail if targets is not defined when it gets here, but that should not be allowed to happen.
                targets = select_first([targets, ""]),
                gatk_jar = gatk_jar,
                gatk_docker = gatk_docker
        }
    }

    call CopyRatio.CNVSomaticCopyRatioBAMWorkflow as TumorCopyRatioWorkflow {
        input:
            padded_targets = PadTargets.padded_targets,
            bam = tumor_bam,
            bam_idx = tumor_bam_idx,
            ref_fasta = ref_fasta,
            ref_fasta_dict = ref_fasta_dict,
            ref_fasta_fai = ref_fasta_fai,
            cnv_panel_of_normals = cnv_panel_of_normals,
            gatk_jar = gatk_jar,
            gatk_docker = gatk_docker
    }

    if (!is_tumor_only) {
        call CopyRatio.CNVSomaticCopyRatioBAMWorkflow as NormalCopyRatioWorkflow {
            input:
                padded_targets = PadTargets.padded_targets,
                bam = select_first([normal_bam, ""]),
                bam_idx = select_first([normal_bam_idx, ""]),
                ref_fasta = ref_fasta,
                ref_fasta_dict = ref_fasta_dict,
                ref_fasta_fai = ref_fasta_fai,
                cnv_panel_of_normals = cnv_panel_of_normals,
                gatk_jar = gatk_jar,
                gatk_docker = gatk_docker
        }
    }

    if (!is_cnv_only) {
        call AlleleFraction.CNVSomaticAlleleFractionPairWorkflow as TumorAlleleFractionWorkflow {
            input:
                common_sites = select_first([common_sites, ""]),
                tumor_bam = tumor_bam,
                tumor_bam_idx = tumor_bam_idx,
                normal_bam = normal_bam,    # If no normal BAM is input, tumor-only GetBayesianHetCoverage will be run
                normal_bam_idx = normal_bam_idx,
                tumor_tn_coverage = TumorCopyRatioWorkflow.tn_coverage,
                tumor_called_segments = TumorCopyRatioWorkflow.called_segments,
                ref_fasta = ref_fasta,
                ref_fasta_dict = ref_fasta_dict,
                ref_fasta_fai = ref_fasta_fai,
                gatk_jar = gatk_jar,
                gatk_docker = gatk_docker,
                is_wgs = is_wgs
        }
    }

    if (is_run_oncotator) {
        call Oncotate.CNVOncotateCalledSegments as OncotateCalledCNVWorkflow {
            input:
                 called_file=TumorCopyRatioWorkflow.called_segments,
                 oncotator_docker=oncotator_docker
        }
    }

    output {
        String tumor_entity_id = TumorCopyRatioWorkflow.entity_id
        File tumor_tn_coverage = TumorCopyRatioWorkflow.tn_coverage
        File tumor_called_segments = TumorCopyRatioWorkflow.called_segments
        String? normal_entity_id = NormalCopyRatioWorkflow.entity_id
        File? normal_tn_coverage = NormalCopyRatioWorkflow.tn_coverage
        File? normal_called_segments = NormalCopyRatioWorkflow.called_segments
        File? tumor_hets = TumorAlleleFractionWorkflow.tumor_hets
        File? tumor_acnv_segments = TumorAlleleFractionWorkflow.acnv_segments
        File? oncotated_called_file = OncotateCalledCNVWorkflow.oncotated_called_file
    }
}

.zip archive containing files for imports:
broad_cnvSomaticLegacy.zip

Visualization result:

Specified parameters:
SubWorkflow to be detailed - *
Depth of recursion - 1
(Expanding all sub workflows)
image

Specified parameters:
SubWorkflow to be detailed - AlleleFraction_CNVSomaticAlleleFractionPairWorkflow
Depth of recursion - 1
(Expanding only 1 sub workflow)
image

ekoltsova and others added 28 commits November 13, 2017 11:57
…ols for imports. Exceptions added: 'file://' protocol for imports; generation from graph with imports
@coveralls
Copy link

Coverage Status

Coverage decreased (-1.3%) to 95.104% when pulling 951b3eb on import_wdl into ac1cf43 on dev.

@sidoruka
Copy link
Contributor

sidoruka commented Nov 23, 2017

@ekoltsova , @TimSPb89 , Thanks for your work!
Several things to improve:

  1. As we see coverage is down by 1.3%, fix that please
  2. Provide a sample wdl with imports and all necessary instruction on loading it, including an expected visualization (this can be added to the PR description)
  3. Update documentation on this feature, where appropriate

@coveralls
Copy link

Coverage Status

Coverage increased (+0.2%) to 96.534% when pulling 71113aa on import_wdl into ac1cf43 on dev.

@sidoruka
Copy link
Contributor

That looks great for me as a starting point. Thank you!

@daniilsavchuk , could you please review the code once you'll have time?

@daniilsavchuk
Copy link
Collaborator

daniilsavchuk commented Nov 23, 2017

Dear @sidoruka and colleagues,

Looks like the PR is too big to be reviewed just by me. Let me add @paulsmirnov to the reviewers?
Guess that checking 32 commits with 26 files changed will take a while. As well as visual issues to be discussed.

Thank you in advance!

@sidoruka
Copy link
Contributor

sidoruka commented Nov 24, 2017

@daniilsavchuk , indeed I've counted 6 files that are somehow changed (if not counting tests and configuration changes). But if you are busy now, that's fine, I'll ask someone else to perform the review

@sidoruka sidoruka removed the request for review from daniilsavchuk November 24, 2017 12:02
@daniilsavchuk
Copy link
Collaborator

I reviewed part of changes in files I responsible for. The code looks quite good, but there are some slight issues (see file changes comments)

Copy link
Member

@paulsmirnov paulsmirnov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your valuable efforts in making PB to enhance the support of WDL syntax! Import statements are quite important. However, I have a few inline comments for changed files. Please consider revising the changes.

src/app.js Outdated
flow1 = res.model[0];
diagram.attachTo(flow1);
} else {
throw new Error(res.message, 'wdl parsing error');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I'm not quite sure why do you throw here and where do you catch this exception. When you decide to migrate to a promise-based solution, you should take care of errors differently.
  2. The second argument to Error constructor is a file name. Shouldn't you remove the erroneous argument or use a plus (+) instead of a comma? I think it's an old bug but I've just noticed it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


return `${res}${EOL}`;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method genWorkflow() completely duplicates genCall(). Please refactor this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generation feature is not part of this PR.

@@ -0,0 +1,22 @@
export default function $http(method, url, data) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of a file and its default export should match and describe their purpose. Please decide if it's a single method (then rename both) or a module (then don't use a default export and rename the method properly).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -27,6 +28,7 @@ export default class Context {
* @param {ast} ast - Root ast tree node of parsing result
*/
buildWorkflowList(ast) {
if (ast.attributes.imports && ast.attributes.imports.list.length) this.hasImports = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we use a single statement per line for easier debugging and comprehension?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

src/app.js Outdated
} else {
throw new Error(res.message, 'wdl parsing error');
}
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this block duplicates the one above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After changing error handling it's not actual anymore


// check if calls're already in existing tasks
calls = calls.filter(call => !tasks.includes(call));
if (!calls.length) return result;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's stick to using one statement per line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

ast = ret.ast;
const ret = hermesStage(data);
result.status = ret.status;
result.message = ret.message;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider revising the approach to returning results, as we change parse() method to an async promise-based implementation. Errors should be thrown to be caught in a .catch() clause.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

parseWDL(text, opts).then((data) => {
resolve(data);
});
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Am I right saying that with this implementation we are unable to catch exceptions raised in the parseWDL() function?
  • I'm not sure that we need a new Promise() here since we already have one from parseWDL() function.

Please test thoroughly if a user can successfully intercept all exceptions by artificially generating them in different parts of code, and then adjust the code correspondingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -13,11 +13,15 @@ export default class VisualWorkflow extends VisualGroup {
super(_.defaultsDeep(opts, {
attrs: {
'.label': {
text: opts.step.type,
text: `${opts.step.type} ${opts.step.name}`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really necessary to see a step type on a diagram?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be multiple subWorkflows in real WDL examples. So it's better to show name to be able to recognise one subWorkflow from other and show it's type to recognise subWorkflow visualisation from task visualisation

})).to.throw(Error);
return parse(src, { format: 'cwl' }).catch((e) => {
expect(e).to.be.an('Error');
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately the logic of the assertion has changed and it tests almost nothing now. If parse() completes successfully the test will succeed (but it should fail). Consider using chai-as-promised plugin for this and other tests to test promise-based interface in more intuitive manner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

return pipeline.parse(wdl, { format: 'wdl' }).model[0];
async function createFlow() {
const res = await pipeline.parse(wdl);
return res.model[0];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paulsmirnov what do you think about compatibility with all supported browsers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I've forgotten to check if polyfills are included in the demo app build. I guess they are not, neither for Promise class, nor for Array/Object methods. We should fix this too then. Of course, library should not have polyfills included, as it is a concern of the final application. We could carefully use babel-polyfill and adjust the docs to make user know about the steps to take.

@@ -238,6 +241,46 @@ export default class WorkflowGenerator {
return res;
}

genWorkflow(child) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain what genWorkflow does? It looks like genCall function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generation feature is not part of this PR.

@@ -25,7 +27,11 @@ export default function generate(objectModel) {
actionSelector(objectModel.children);

_.forEach(actionsToBeRendered, (action) => {
tasks += new TaskGenerator(action).renderTask();
if (!!action.type && action.type.toLowerCase() === 'workflow') {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a generation result?
Users are waiting that WDL script will no differ from original(only spacing) if nothing changed on the diagram after multiple generations performed.
If so, I kindly expect function that generates strings like: " "smthng"" will be implemented as another type of WDL script item class like WorkflowGenerator or TaskGenerator, but not injected to existed one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generation feature is not part of this PR.

@@ -10,6 +10,9 @@ import generateWDL from './WDL/generate';
* @returns {string} Textual representation of the workflow.
*/
function generate(flow, opts = {}) {
if (flow.hasImports) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean that declared "importing" feature not supported in fact?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generation feature is not part of this PR.

}

/** Parse function with WDL imports support */
async function importParsingStage(firstAst, opts) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to me, this function is a code copy according to existed workflow parser. It is suitable to reuse existed parser. And patch it to support multilevel name accessors like "."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need you to explain your idea more detailed

@@ -13,11 +13,15 @@ export default class VisualWorkflow extends VisualGroup {
super(_.defaultsDeep(opts, {
attrs: {
'.label': {
text: opts.step.type,
text: `${opts.step.type} ${opts.step.name}`,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this solution approved by the final user? It breaks out start ideology to keep just names for steps and just types for groups

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answered above

@coveralls
Copy link

Coverage Status

Coverage increased (+0.6%) to 96.982% when pulling 9ab68fe on import_wdl into ac1cf43 on dev.

@ekoltsova
Copy link
Contributor Author

Added sub workflow expanding with new examples added to PR body. And updated code according to reviews.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.7%) to 97.1% when pulling 076c3d9 on import_wdl into ac1cf43 on dev.

@sidoruka sidoruka merged commit baef7b2 into dev Dec 28, 2017
@sidoruka sidoruka deleted the import_wdl branch December 28, 2017 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants