GitHub - sevenbridges-openworkflows/Broad-Best-Practice-Data-pre-processing-CWL1.0-workflow: The Broad Best Practice Data Pre-processing Workflow 4.1.0.0 is used to prepare data for variant calling analysis

Description

The Broad Best Practice Data Pre-processing Workflow 4.1.0.0 is used to prepare data for variant calling analysis.

It can be divided into two major segments: alignment to reference genome and data cleanup operations that correct technical biases [1].

A list of all inputs and parameters with corresponding descriptions can be found at the bottom of this page.

Common Use Cases

The Broad Best Practice Data Pre-processing Workflow 4.1.0.0 is designed to operate on individual samples.
Resulting BAM files are ready for variant calling analysis and can be further processed by other Broad best practice pipelines, like Generic germline short variant per-sample calling workflow [2], Somatic CNVs workflow [3] and Somatic SNVs+Indel workflow [4].

Changes Introduced by Seven Bridges

This pipeline represents the CWL implementation of BROADs original WDL file available on github. Minor differences are introduced in order to successfully adapt to the Seven Bridges Platform. These differences are listed below:

SamToFastqAndBwaMem step is divided into elementary steps: SamToFastq and BWA Mem
SortAndFixTags is divided into elementary steps: SortSam and SetNmMdAndUqTags
Added SBG Lines to Interval List: this tool is used to adapt results obtained with CreateSequenceGroupingTSV for platform execution, more precisely for scattering.

Common Issues and Important Notes

The Broad Best Practice Data Pre-processing Workflow 4.1.0.0 expects unmapped BAM file format as the main input.
Input Alignments (--in_alignments) - provided unmapped BAM (uBAM) file should be in query-sorter order and all reads must have RG tags. Also, input uBAM files must pass validation by ValidateSamFile.
For each tool in the workflow, equivalent parameter settings to the one listed in the corresponding WDL file are set as defaults.

Performance Benchmarking

Since this CWL implementation is meant to be equivalent to GATKs original WDL, there are no additional optimization steps beside instance and storage definition. The c5.9xlarge AWS instance hint is used for WGS inputs and attached storage is set to 1.5TB. In the table given below one can find results of test runs for WGS and WES samples. All calculations are performed with reference files corresponding to assembly 38.

Cost can be significantly reduced by spot instance usage. Visit knowledge center for more details.

Input Size	Experimental Strategy	Coverage	Duration	Cost (spot)	AWS Instance Type
6.6 GiB	WES	70	1h 9min	$2.02	c5.9
3.4 GiB	WES	40	39min	$1.15	c5.9
2.1 GiB	WES	20	28min	$0.82	c5.9
0.7 GiB	WES	10	14min	$0.42	c5.9
116 GiB	WGS (HG001)	50	1day 1h 22min	$49.19	r4.8
185 GiB	WGS	50	1day 8h 14 min	$56.05	c5.9
111.3 GiB	WGS	30	19h 27min	$33.82	c5.9
37.2 GiB	WGS	10	6h 22min	$11.09	c5.9

API Python Implementation

The app's draft task can also be submitted via the API. In order to learn how to get your Authentication token and API endpoint for corresponding platform visit our documentation.

# Initialize the SBG Python API
from sevenbridges import Api
api = Api(token="enter_your_token", url="enter_api_endpoint")
# Get project_id/app_id from your address bar. Example: https://igor.sbgenomics.com/u/your_username/project/app
project_id = "your_username/project"
app_id = "your_username/project/app"
# Replace inputs with appropriate values
inputs = {
	"in_alignments": list(api.files.query(project=project_id, names=["HCC1143BL.reverted.bam"])), 
	"reference_index_tar": api.files.query(project=project_id, names=["Homo_sapiens_assembly38.fasta.tar"])[0], 
	"in_reference": api.files.query(project=project_id, names=["Homo_sapiens_assembly38.fasta"])[0], 
	"ref_dict": api.files.query(project=project_id, names=["Homo_sapiens_assembly38.dict"])[0],
	"known_snps": api.files.query(project=project_id, names=["1000G_phase1.snps.high_confidence.hg38.vcf"])[0],
        "known_indels": list(api.files.query(project=project_id, names=["Homo_sapiens_assembly38.known_indels.vcf", Mills_and_1000G_gold_standard.indels.hg38.vcf]))}
# Creates draft task
task = api.tasks.create(name="BROAD Best Practice Data Pre-processing Workflow 4.1.0.0 - API Run", project=project_id, app=app_id, inputs=inputs, run=False)

Instructions for installing and configuring the API Python client, are provided on github. For more information about using the API Python client, consult the client documentation. More examples are available here.

Additionally, API R and API Java clients are available. To learn more about using these API clients please refer to the API R client documentation, and API Java client documentation.

References

[1] Data Pre-processing

[2] Generic germline short variant per-sample calling

[3] Somatic CNVs

[4] Somatic SNVs+Indel pipeline

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
steps		steps
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
broad-best-practice-data-pre-processing-workflow-4-1-0-0_decomposed.cwl		broad-best-practice-data-pre-processing-workflow-4-1-0-0_decomposed.cwl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

steps

steps

.travis.yml

.travis.yml

LICENSE

LICENSE

README.md

README.md

broad-best-practice-data-pre-processing-workflow-4-1-0-0_decomposed.cwl

broad-best-practice-data-pre-processing-workflow-4-1-0-0_decomposed.cwl

Repository files navigation

Description

Common Use Cases

Changes Introduced by Seven Bridges

Common Issues and Important Notes

Performance Benchmarking

API Python Implementation

References

About

Releases 3

Packages

Languages

License

sevenbridges-openworkflows/Broad-Best-Practice-Data-pre-processing-CWL1.0-workflow

Folders and files

Latest commit

History

Repository files navigation

Description

Common Use Cases

Changes Introduced by Seven Bridges

Common Issues and Important Notes

Performance Benchmarking

API Python Implementation

References

About

Resources

License

Stars

Watchers

Forks

Languages