RB-SV

Pipeline of structural-variation calling for Retinoblastoma samples

Sample Information

CustomerID	Name	Group	SampleID	Data
FC_RB	范宸乐	RB	N02	Nanopore sequencing
FJ	范家乐	WT_RB	N01	Nanopore sequencing
ZJ	朱玥雯之姐	WT_RB_TSC	N03	Nanopore sequencing
ZY_RT	朱玥雯	RB_TSC	N04	Nanopore sequencing
unkown	unkown	unkown	BJ181303-02	PacBio CLR sequencing
unkown	unkown	unkown	HPDE6C7	PacBio CLR sequencing
unkown	unkown	unkown	PANC1	PacBio CLR sequencing

Pipeline

A two-step structural variation (SV) detection pipeline was used for the dataset, i.e., breakpoints were first detected per each sample, and then merged to genotype for the population. Detailed information was described as follows.

Minimap2 was used to align each sample by nanopore sequencing to GRCh38 reference, using the default parameters of Minimap for ONT.

  minimap2 --MD -ax map-ont -t $NTHREADS $PATH_REFERENCE_FASTA $PATH_NANOPORE_FASTQ |
    samtools sort -@ $NTHREADS -o $PATH_TO_SAVE_DATA/$RESULT_BAM
  samtools index -@ $NTHREADS $PATH_TO_SAVE_DATA/$RESULT_SORTED_BAM;

Pbmm2 was used to align each sample by PacBio sequencing to GRCh38 reference.

  pbmm2 align $PATH_REFERENCE_MMI $PATH_PACBIO_BAM_SUBREADS $PATH_TO_SAVE_DATA/$RESULT_SORTED_BAM \
    --sort --bam-index BAI -j $NTHREADS -J $NTHREADS --median-filter

For each sample, SV was detected simultaneously using Sniffles, cuteSV and SVIM. In addition, pbsv was used to detect SVs from PacBio data instead of cuteSV. For each SV signature, a length of 50 bp and at least 5 supporting-reads are required. Each tool is required to record sequences of insertions and ID of supporting-reads.

   sniffles -s 5 -t $NTHREADS -l 50 -n -1 \
     -m $PATH_TO_SAVE_DATA/$RESULT_SORTED_BAM -v $PATH_TO_SAVE_DATA/$RESULT_VCF_SNIFFLES

     cuteSV --threads $NTHREADS \
       --sample $SAMPLEID \
       --min_support 5 \
       --report_readid \
       --genotype --min_size 50 \
       --max_cluster_bias_INS 100 \
       --diff_ratio_merging_INS 0.3 --max_cluster_bias_DEL 100 --diff_ratio_merging_DEL 0.3 \
       $PATH_TO_SAVE_DATA/$RESULT_SORTED_BAM $PATH_REFERENCE_FASTA $PATH_TO_SAVE_DATA/$RESULT_VCF_CUTESV $PATH_TO_SAVE_DATA/

   pbsv discover \
     --tandem-repeats $HUMAN_GRCH38_NO_ALT_ANALYSIS_SET.TRF.BED \
     $PATH_TO_SAVE_DATA/$RESULT_SORTED_BAM $PATH_TO_SAVE_DATA/$RESULT_SV_SIGNATURE

   pbsv call \
     --gt-min-reads 5 \
     --call-min-reads-one-sample 5 \
     --call-min-reads-all-samples 5 \
     -j 40 \
     -m 50 \
     $PATH_REFERENCE_FASTA $PATH_TO_SAVE_DATA/$RESULT_SV_SIGNATURE $PATH_TO_SAVE_DATA/$RESULT_VCF_PBSV

   svim alignment --sample $SAMPLEID \
     --insertion_sequence \
     --min_sv_size 50 \
     --minimum_depth 5 \
     --read_names \
     --heterozygous_threshold 0.3 \
     $PATH_TO_SAVE_DATA $PATH_TO_SAVE_DATA/$RESULT_SORTED_BAM $PATH_REFERENCE_FASTA

   bcftools view -i 'QUAL >= 10 & SUPPORT >= 5' $PATH_TO_SAVE_DATA/variants.vcf > $PATH_TO_SAVE_DATA/$RESULT_VCF_SVIM

For each sample, SVs detected by above three methods were combined using SURVIVOR, requiring that each SV be detected by at least two tools and that adjacent SVs be separated by more than 1 kbp. The SVs detected by different software around the same location do not need to have the same SVtype or the same orientation, which is to obtain as many breakpoints with high confidence as possible. Finally breakpoints detected by all samples are merged using SURVIVOR and each breakpoint must be supported by at least one sample.
```
  for sampleID in $(ls $PATH_TO_SAVE_DATA); do   
      SURVIVOR merge $PATH_TO_SAVE_DATA/$SAMPLEID/LIST_MERGE_METHOD 1000 2 1 1 0 50 $PATH_TO_SAVE_DATA/$SAMPLEID/$SAMPLEID_VCF_MERGE_METHOD
  done

  SURVIVOR merge $PATH_TO_SAVE_DATA/LIST_MERGE_SAMPLE 1000 1 1 0 0 50 $PATH_TO_SAVE_DATA/$RESULT_VCF_MERGE_SAMPLE
```
Finally, we need to get a multi-sample vcf with complete genotypes. We re-run Sniffles and genotyped for each sample in all these potential breakpoints and then merge them using SURVIVOR. In this run, SURVIVOR only report SVs supported by at least one sample, and must have the same SVtype. In addition, we used a hard threshold of five supporting-reads, and all breakpoints smaller than this threshold were modified to ref (0/0).
```
 sniffles -s 5 -t $NTHREADS -l 50 -n -1 \
   -m $PATH_TO_SAVE_DATA/$RESULT_SORTED_BAM \
   -v $PATH_TO_SAVE_DATA/$RESULT_VCF_FORCE \
   --Ivcf $PATH_TO_SAVE_DATA/$RESULT_VCF_MERGE_SAMPLE
```
```
 SURVIVOR merge $PATH_TO_SAVE_DATA/LIST_MERGE_SAMPLE 1000 -1 1 1 0 50 $PATH_TO_SAVE_DATA/$RESULT_VCF_MERGE_SAMPLE_FORCE
```

We constructed the Ensembl Canonical transcript for GRCh38 and then annotated all regions where SVs are located using svtk in GATK.

   svtk annotate --gencode $GENCODE_GRCH38_CANONICAL_ANNOTATION_GTF \
     $PATH_TO_SAVE_DATA/$RESULT_VCF_FORCE $PATH_TO_SAVE_DATA/$RESULT_VCF_FORCE_SVTK

All SVs were fully annotated using AnnotSV and the annotation was visualized using KnotAnnotSV.

   AnnotSV -SVinputFile $PATH_TO_SAVE_DATA/$RESULT_VCF_FORCE_SVTK \
     -outputDir $PATH_TO_SAVE_DATA/ \
     -outputFile $RESULT_ANNOTSV_TSV \
     -promoterSize 2000 \
     -overlap 50 \
     -genomeBuild GRCh38 \
     -annotationMode full

   perl knotAnnotSV.pl \
     --configFile CONFIG_ANNOTSV_YAML \
     --annotSVfile $PATH_TO_SAVE_DATA/$RESULT_ANNOTSV_TSV \
     --outDir $PATH_TO_SAVE_DATA/ \
     --outPrefix $PATH_TO_SAVE_DATA/$RESULT_ANNOTSV_TSV \
     --genomeBuild hg38

Environments

Softwares	Version
Minimap	2.22-r1101
pbmm2	1.5.0
samtools	1.7
htslib	1.7-2
Sniffles	1.0.12
pbsv	2.4.1
CuteSV	1.0.8
SVIM	2.0.0
SURVIVOR	1.0.7
bcftools	1.9
GATK: svtk	0.1
AnnotSV	3.0.9
KnotAnnotSV	1.1.1
R	4.0.5
Metascape	3.5

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
plots		plots
scripts		scripts
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RB-SV

Sample Information

Pipeline

Environments

About

Releases

Packages

Languages

License

quanc1989/RB-SV

Folders and files

Latest commit

History

Repository files navigation

RB-SV

Sample Information

Pipeline

Environments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages