A snakemake pipeline to call structural variants from tumor-only ONT data

⚠️ This pipeline is in its early stages. Please use with caution.

Author

Tools used

SV callers

cuteSV, Sniffles, SVIM, SVision, Severus, NanoSV, NanoVar, Delly, Debreak
Annotation tools

AnnotSV, VEP (release/111), SnpEff
R packages

vroom, tibble, glue, dplyr, tidyr, purrr, GenomicRanges, stringr, BiocParallel, parallel
Other tools

Minimap2, SAMtools, BCFtools, SURVIVOR, vcf2maf (1.6.21), SnpSift, duphold

Pipeline structure

%%{
  init: {
    'theme': 'base',
    'themeVariables': {
      'fontFamily': 'Comic Sans MS',
      'primaryColor': '#F4CE14',
      'primaryTextColor': '#FFFFFF',
      'lineColor': '#EE6983',
      'secondaryColor': '#00B8A9',
      'tertiaryColor': '#FFFFFF'
    }
  }
}%%

flowchart TD

  classDef myclass fill:#00B8A9, stroke-width:0px, padding:0px, margin:0px;
  classDef myclass2 fill:#A5DD9B, stroke-dasharray:5 5;

  fastq([FASTQ]) -- Minimap2 --> bam([BAM])
  bam([BAM]) -- cuteSV --> cutesv_vcf([VCF])
  bam([BAM]) -- Sniffles2 --> sniffles2_vcf([VCF])
  cutesv_vcf([VCF]) -- BCFtools --> cutesv_filtered_vcf([filtered VCF])
  sniffles2_vcf([VCF]) -- BCFtools --> sniffles2_filtered_vcf([filtered VCF])
  cutesv_filtered_vcf([filtered VCF]) --- survivor["SURVIVOR"]
  sniffles2_filtered_vcf([filtered VCF]) --- survivor["SURVIVOR"]
  survivor["SURVIVOR"] --> merged_vcf([merged VCF])
  merged_vcf([merged VCF]) -- VEP, AnnotSV, and SnpEff --> annotated_vcf([annotated VCF/TSV])
  annotated_vcf([annotated VCF/TSV]) --> somatic_vcf([somatic SVs])
  annotated_vcf([annotated VCF/TSV]) --> germline_vcf([germline SVs])

  bam([BAM]) -. "other callers (e.g. SVIM)" .-> other_vcfs([VCFs])
  other_vcfs([VCFs]) -. BCFtools .-> other_filtered_vcf([filtered VCFs])

  other_filtered_vcf([filtered VCFs]) -.- survivor["SURVIVOR"]

  survivor:::myclass
  other_vcfs:::myclass2
  other_filtered_vcf:::myclass2

Getting started

Prerequisites

Clone this repo and navigate into it:
```
git clone https://github.com/jasonwong-lab/smk_sv.git
cd smk_sv
```
- Follow all steps below after you are in the top dir of this repo.
- Uncomment all rules in the Snakefile.
- Check the predefined wildcards_constraints in the Snakefile and modify/delete it if necessary.
- Using a JSON schema to validate the configuration file might prevent Snakemake from monitoring changes to the parameters. You can comment the validate(config, "config/config.schema.json") in the Snakefile.
Install AnnotSV manually.
- AnnotSV is not included in the image due to its large annotation resources (~ 20GB) that cannot be specified elsewhere.
- Creating a lock file for each combination of sample and type_sv has been implemented. However, AnnotSV might still encounter errors since it doesn’t support processing multiple files within the same directory. To address this, an additional resource parameter constraint_annotsv=1 has been added to the rule annotate_sv_annotsv to ensure that only one instance of AnnotSV runs at a time. You can modify this parameter in workflow/profile/default/config.yaml where its default is 1.

Configuration

Prepare config files:
1. Copy config/config-test.yaml to config/config.yaml.
  - Adjust the configuration settings according to your project's needs.
  - Specification of important elements:
    - dir_run: working directory where all results will be stored.
    - mapper: dict whose keys are names of mappers and values (boolean) indicate whether perform mapping or not. Only the first mapper will be used. When a mapper is specified and its value is false, no mapping by this mapper will be performed, but its results will be used in the following steps.
    - callers: dict whose keys are names of callers and values (boolean) indicate whether perform SV calling using this caller or not. When a caller is specified and its value is false, no SV calling by this caller will be performed, but its results will be used in the following steps.
    - types_sv: SV types to be called. BND indicates translocations.
    - threads: number of CPUs of each rule to be used.
    - ...
2. Copy workflow/profiles/default/config-test.yaml to workflow/profiles/default/config.yaml.
  - Bind directories you need in the container.
  - Change the number of CPUs you prefer.
  - Modify/add/delete other parameters of this snakemake pipeline.
Prepare sample data:
1. Copy config/pep/samples-test.csv to config/pep/samples.csv, and update sample_name in the csv.
2. Copy config/pep/config-test.yaml and config/pep/config.yaml. More information please see Portable Encapsulated Projects (PEP).
Set up Conda environments:
```
snakemake --conda-create-envs-only
```

Execution

Run the pipeline locally:
```
snakemake
```
Run the pipeline on a cluster: If you want to run this pipeline on a cluster (e.g., SLURM, or PBS), you should customise your own profile and place it into ~/.config/snakemake/, and then run the pipeline with the profile you have set as a parameter:
```
snakemake --profile <your_profile_name>
```
Or run the pipeline with the profile you have set as an environment variable:
```
export SNAKEMAKE_PROFILE=<your_profile_name>
snakemake
```
You can refer to the profile I have been using at workflow/profiles/mycluster, or turn to snakemake websites.

Note for Cluster Users

If you are using a cluster that does not support Singularity well, please switch to the without_docker branch. This branch is tailored for environments where containers might not be the best option.

git checkout without_docker

License

Codes here are licensed under the GNU General Public License v3.

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
config		config
workflow		workflow
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A snakemake pipeline to call structural variants from tumor-only ONT data

Author

Tools used

Pipeline structure

Getting started

Prerequisites

Configuration

Execution

Note for Cluster Users

License

About

Releases

Languages

jasonwong-lab/smk_sv

Folders and files

Latest commit

History

Repository files navigation

A snakemake pipeline to call structural variants from tumor-only ONT data

Author

Tools used

Pipeline structure

Getting started

Prerequisites

Configuration

Execution

Note for Cluster Users

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Languages