⚠️ This pipeline is in its early stages. Please use with caution.
Minghao Jiang, jiang01@icloud.com
-
SV callers
cuteSV, Sniffles, SVIM, SVision, Severus, NanoSV, NanoVar, Delly, Debreak
-
Annotation tools
-
R packages
vroom, tibble, glue, dplyr, tidyr, purrr, GenomicRanges, stringr, BiocParallel, parallel
-
Other tools
Minimap2, SAMtools, BCFtools, SURVIVOR, vcf2maf (1.6.21), SnpSift, duphold
%%{
init: {
'theme': 'base',
'themeVariables': {
'fontFamily': 'Comic Sans MS',
'primaryColor': '#F4CE14',
'primaryTextColor': '#FFFFFF',
'lineColor': '#EE6983',
'secondaryColor': '#00B8A9',
'tertiaryColor': '#FFFFFF'
}
}
}%%
flowchart TD
classDef myclass fill:#00B8A9, stroke-width:0px, padding:0px, margin:0px;
classDef myclass2 fill:#A5DD9B, stroke-dasharray:5 5;
fastq([FASTQ]) -- Minimap2 --> bam([BAM])
bam([BAM]) -- cuteSV --> cutesv_vcf([VCF])
bam([BAM]) -- Sniffles2 --> sniffles2_vcf([VCF])
cutesv_vcf([VCF]) -- BCFtools --> cutesv_filtered_vcf([filtered VCF])
sniffles2_vcf([VCF]) -- BCFtools --> sniffles2_filtered_vcf([filtered VCF])
cutesv_filtered_vcf([filtered VCF]) --- survivor["SURVIVOR"]
sniffles2_filtered_vcf([filtered VCF]) --- survivor["SURVIVOR"]
survivor["SURVIVOR"] --> merged_vcf([merged VCF])
merged_vcf([merged VCF]) -- VEP, AnnotSV, and SnpEff --> annotated_vcf([annotated VCF/TSV])
annotated_vcf([annotated VCF/TSV]) --> somatic_vcf([somatic SVs])
annotated_vcf([annotated VCF/TSV]) --> germline_vcf([germline SVs])
bam([BAM]) -. "other callers (e.g. SVIM)" .-> other_vcfs([VCFs])
other_vcfs([VCFs]) -. BCFtools .-> other_filtered_vcf([filtered VCFs])
other_filtered_vcf([filtered VCFs]) -.- survivor["SURVIVOR"]
survivor:::myclass
other_vcfs:::myclass2
other_filtered_vcf:::myclass2
-
Clone this repo and navigate into it:
git clone https://github.com/jasonwong-lab/smk_sv.git cd smk_sv
- Follow all steps below after you are in the top dir of this repo.
- Uncomment all rules in the
Snakefile
. - Check the predefined
wildcards_constraints
in theSnakefile
and modify/delete it if necessary. - Using a JSON schema to validate the configuration file might prevent Snakemake from monitoring changes to the parameters. You can comment the
validate(config, "config/config.schema.json")
in theSnakefile
.
-
Install AnnotSV manually.
- AnnotSV is not included in the image due to its large annotation resources (~ 20GB) that cannot be specified elsewhere.
- Creating a lock file for each combination of sample and type_sv has been implemented. However, AnnotSV might still encounter errors since it doesn’t support processing multiple files within the same directory. To address this, an additional resource parameter
constraint_annotsv=1
has been added to the ruleannotate_sv_annotsv
to ensure that only one instance of AnnotSV runs at a time. You can modify this parameter inworkflow/profile/default/config.yaml
where its default is1
.
-
Prepare config files:
- Copy
config/config-test.yaml
toconfig/config.yaml
.- Adjust the configuration settings according to your project's needs.
- Specification of important elements:
dir_run
: working directory where all results will be stored.mapper
: dict whose keys are names of mappers and values (boolean) indicate whether perform mapping or not. Only the first mapper will be used. When a mapper is specified and its value isfalse
, no mapping by this mapper will be performed, but its results will be used in the following steps.callers
: dict whose keys are names of callers and values (boolean) indicate whether perform SV calling using this caller or not. When a caller is specified and its value isfalse
, no SV calling by this caller will be performed, but its results will be used in the following steps.types_sv
: SV types to be called. BND indicates translocations.threads
: number of CPUs of each rule to be used.- ...
- Copy
workflow/profiles/default/config-test.yaml
toworkflow/profiles/default/config.yaml
.- Bind directories you need in the container.
- Change the number of CPUs you prefer.
- Modify/add/delete other parameters of this snakemake pipeline.
- Copy
-
Prepare sample data:
- Copy
config/pep/samples-test.csv
toconfig/pep/samples.csv
, and updatesample_name
in the csv. - Copy
config/pep/config-test.yaml
andconfig/pep/config.yaml
. More information please see Portable Encapsulated Projects (PEP).
- Copy
-
Set up Conda environments:
snakemake --conda-create-envs-only
-
Run the pipeline locally:
snakemake
-
Run the pipeline on a cluster: If you want to run this pipeline on a cluster (e.g., SLURM, or PBS), you should customise your own profile and place it into
~/.config/snakemake/
, and then run the pipeline with the profile you have set as a parameter:snakemake --profile <your_profile_name>
Or run the pipeline with the profile you have set as an environment variable:
export SNAKEMAKE_PROFILE=<your_profile_name> snakemake
You can refer to the profile I have been using at
workflow/profiles/mycluster
, or turn to snakemake websites.
If you are using a cluster that does not support Singularity well, please switch to the without_docker
branch. This branch is tailored for environments where containers might not be the best option.
git checkout without_docker
Codes here are licensed under the GNU General Public License v3.