This is a nextflow-based pipeline built to analyze exome data. There are two main pathways which the pipe will run:
-
Unrelated samples
-
Trio samples (although technically there is no limit, there could be numerous related samples)
Nextflow script containing pre-processing and instructions for running sarek and Exomiser on FastQ data.
Pipeline Flow:
Stage 1:
-
cleanBeds: Cleans up the BED file for the sample group
-
renameFastQs: Groups fastQ files by pairs and renames them
-
runPipe: Runs sarek pipeline
Stage 2 (differs for multisample and triosample):
-
produceHpoString: Produces a short string from hpo file using Proband as key
-
produceExomiserYAML: Produces a YAML analysis file specifically for Proband
-
produceExomiserBatch: Produce a batch.txt file which lists the analysis YAML files (only in multiSample pipe)
-
runExomiser: Runs exomiser using batch analysis file
Set alias nextflow=/efs/sam/bin/nextflow
Then copy and paste pipe.nf into Haggis (vi pipe.nf
- i
to insert and then :wq
to save and quit)
-c
: Configuration file
--bed
: BED file to use with samples (is always SureSelect_v6.bed
)
--fastq
: Directory containing .fastq
files
--hpo
: Directory containing hpo files for unrelated samples || Hpo file for proband in trio sample
--ped
: .ped
file containing pedigree.
--pipe
: multiSample || trioSample
Example Commands:
nextflow run pipe.nf -profile slurm \
-c /efs/sam/configScripts/slurm.config \
--bed /efs/sam/Macrogen_HN00115050/SureSelect_v6.bed \
--fastq /efs/sam/Macrogen_HN00115050/fastq \
--hpo /efs/sam/Macrogen_HN00115050/hpo \
--ped /efs/sam/Macrogen_HN00115050/ped \
--pipe multiSample
OR
nextflow run pipe.nf -profile slurm \
-c /efs/sam/configScripts/slurm.config \
--bed /efs/sam/Macrogen_HN00115050/SureSelect_v6.bed \
--fastq /efs/sam/trio_example/fastq \
--hpo /efs/sam/trio_example/hpo/141641.hpo \
--ped /efs/sam/trio_example/ped \
--pipe trioSample
Runtime will be roughly 6-7 hours.
- Samples are named in the format
proband_1.fastq.gz
.bed
files containchr
prefix and 2 lines of unneccessary headers.
Configuration parameters for slurm when running pipe.nf
and nf-core/sarek
.
Each contains the test pipelines for analyzing trio or multi-sample data. Replaced by --pipe
tag in pipe.nf.
Template for exomiser analysis.
- Add in function which automatically uploads data to database