# Structural Variation Calling - Solutions
No questions in this section.

# Looking at Structural Variants in VCF

## Exercises

1. What does the CIPOS format tag indicate? __Confidence interval around POS for imprecise variants__

2. What does the PE tag indicate? __Number of paired-end reads supporting the variant across all samples__

3. What tag is used to describe an inversion event? __INV__

4. What tag is used to describe a duplication event? __DUP__

5. How many deletions were called in total? (**Hint:** DEL is the info field for a deletion. The -c option of the grep command can be used to return a count of matches.) __31, try__

__`grep -c "<DEL>" ERR1015121.vcf`__

6. What type of event is predicted at IV:437148? What is the length of the SV? How many paired-end reads and split-reads support this SV variant call? __Deletion -370 20 PE 21 split__

__`grep "437148" ERR1015121.vcf`__

7. What is the total number of SV calls predicted on the IV chromosome? __10, try__

__`grep -c "^IV" ERR1015121.vcf `__

# Calling Structural Variants

**Q:** mean=454.87 std=86.29

## Breakdancer

### Exercises

`grep "83065" ERR1015121.breakdancer.out`

1. Inversion

2. -116, 

3. 42

`grep "258766" ERR1015121.breakdancer.out`

4. Deletion (7325, 99)

5.  `grep DEL | awk OFS= breakdancer.dels.bed | awk '{print $1"\t"$2"\t"$5"\t"$7"\t"$9}' > breakdancer.dels.bed`

## Inspecting SVs with IGV

### Exercises

1. Yes, a deletion (view as paired, sort by insert size, squish).

2. There are very few reads mapping, the reads that are mapped are of low mapQ and it has a SV score = 99 

3. Size estimate? ~7.5k

Was the deletion at II:258766 also called by the other structural variant software and was the predicted size?

5. Yes, SVTYPE=DEL, SVLEN=-7438

6. DEL called by breakdancer (score=59). Not found by other caller Lumpy.

7. Yes, 2 reads support (red).

## Dysgu

### Exercises

1. What was the total number of SVs identified? How many PASS SVs were identified by Dysgu? Why did the rest of the SVs fail?
bcftools view -H ERR1015069.vcf | wc -l
30

bcftools view -H -i 'FILTER="PASS"' ERR1015069.vcf |wc -l
8

lowProb: ##FILTER=<ID=lowProb,Description="Probability below threshold set with --thresholds">

3. What type of SV event occurs at position IV:384221? What is the length of the SV event? What is the genotype quality?

DEL = Deletion, SVLEN=328, GQ=62

3. What type of SV event occurs at position XV:31115? What is the length of the SV event? What is the probability of the structural variant?

INS = Insertion, SVLEN=63, PROB=0.816


###### Calling Structural Variants from Long Reads 

### Align the reads with minimap and convert to bam

`minimap2 -t 2 -x map-pb -a ../ref/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa YPS128.filtered_subreads.10x.fastq.gz | samtools view -b -o YPS128.filtered_subreads.10x.bam -`

### Sort the bam
`samtools sort -T temp -o YPS128.filtered_subreads.10x.sorted.bam YPS128.filtered_subreads.10x.bam`

`samtools calmd -b YPS128.filtered_subreads.10x.sorted.bam ../ref/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa > YPS128.filtered_subreads.10x.sorted.calmd.bam`

### Index the sorted bam
`samtools index YPS128.filtered_subreads.10x.sorted.calmd.bam`

### Call SVs with sniffles
`sniffles --input YPS128.filtered_subreads.10x.sorted.calmd.bam --vcf YPS128.filtered_subreads.10x.vcf`

### Exercises
1. What sort of SV was called at on chromosome ‘XV’ at position 854272? __Deletion_

2. What is the length of the SV? __344__

3. How many reads are supporting the SV? __14 (SUPPORT tag)__

4. What sort of SV was called at on chromosome 'XI' at position 74608? __Insertion_

5. What is the length of the SV? __358__

6. How many reads are supporting the SV? __15__

7. How many inversions were called in the VCF? Note inversions are denoted by the type 'INV'. __6 in total, 5 passed__

8. How many duplications were called in the VCF? Note duplications are denoted by the type 'DUP'. __2__

# Bedtools

## Exercises

1. How many SVs found in `ERR1015069.dels.vcf` overlap with a gene? (**Hint:** Use bedtools intersect command)
__18, try (note the -u parameter is required to get the unique number of SVs)__

`bedtools intersect -u -a ERR1015069.dels.vcf -b Saccharomyces_cerevisiae.R64-1-1.82.genes.gff3  | wc -l`

2. How many SVs found in `ERR1015069.dels.vcf` do not overlap with a gene? (**Hint:** note the -v parameter to bedtools intersect)
__9, try__

`bedtools intersect -v -a ERR1015069.dels.vcf -b Saccharomyces_cerevisiae.R64-1-1.82.genes.gff3  | wc -l`

3. How many SVs found in `ERR1015069.dels.vcf` overlap with a more strict definition of 50%?
__14, try__

`bedtools intersect -u -f 0.5 -a ERR1015069.dels.vcf -b Saccharomyces_cerevisiae.R64-1-1.82.genes.gff3  | wc -l`

4. How many features does the deletion at VII:811446 overlap with? What type of genes? Note you will need to also use the -wb option in bedtools intersect.
`bedtools intersect -wb -a ERR1015069.dels.vcf -b Saccharomyces_cerevisiae.R64-1-1.82.genes.gff3 | grep 811446`
__4 features, all of them are protein coding genes (biotype=protein_coding)__


5. How many features does the deletion at XII:650823 overlap with? What type of genes? Note you will need to also use the -wb option in bedtools intersect.
`bedtools intersect -wb -a ERR1015069.dels.vcf -b Saccharomyces_cerevisiae.R64-1-1.82.genes.gff3 | grep 811446`
__2 features, all of them are protein coding genes (biotype=protein_coding)__

4. What is the closest gene to the structural variant at IV:384220 in `ERR1015069.dels.vcf`?
__YDL037C, try__

`bedtools closest -d -a ERR1015069.dels.vcf -b Saccharomyces_cerevisiae.R64-1-1.82.genes.gff3| grep IV | grep 384220` 

5. How many SVs overlap between the two files `ERR1015069.dels.vcf` and `ERR1015121.dels.vcf`?
__27, try__

`bedtools intersect -u -a ERR1015069.dels.vcf -b ERR1015121.dels.vcf | wc -l`

6. How many SVs have a 90% reciprocal overlap between the two files `ERR1015069.dels.vcf` and `ERR1015121.dels.vcf` (**Hint:** first find the option for reciprocal overlap by typing: bedtools intersect -h)
__24, try__

`bedtools intersect -u -r -f 0.9 -a ERR1015069.dels.vcf -b ERR1015121.dels.vcf | wc -l`