# Running IsoMut with BAQ recalibration indel filtering for SNVs

---

Test run for the post-processing

---

### Necessary things:

- samtools, the method relies on it heavily
    - http://www.htslib.org/
    - only 0.19+ tested


- Biopython, as it is used to compute the genomic region used for parallelisation
    - http://biopython.org/wiki/Main_Page
   
    - comes preinstall in Anaconda python
    - can be istalled with pip:
        pip install biopython 
    - or from variuos linux package managers:
        - see http://biopython.org/wiki/Download


- bam files and reference genomes
    - test data will be available soon
    
---


### First compile C app
- later if you want to use it, add path it to system path, or copy the compiled application to where you want to use it

In [1]:
%%bash 
gcc -c -O3 isomut_lib.c fisher.c  -W -Wall
gcc -O3 -o isomut isomut.c isomut_lib.o  fisher.o -lm -W -Wall

### Run Isomut in parallel from the notebook:

- Note: The test bams are just from a region, and they only intersect with the first 4 parallelization blocks now, so only they do the work. With full bams this is not the case.
- Note threre are more blocks than min_block_no, because of the constraint that blocks do not overlap chromosomes.

In [2]:
# %load example_script_isomut_w_pp.py
#!/usr/bin/env python
#################################################
# importing the wrapper
#################################################
#add path for isomput_parallel.py if its not here
import sys,os
sys.path.append(os.getcwd())
#load the wrapper function
from isomut_wrappers import run_isomut_with_pp

#################################################
# defining administrative parameters
#################################################
#using parameter dictionary, beacause there are awful lot of parameters
params=dict()
#minimum number of blocks to run
# usually there will be 10-20 more blocks
params['n_min_block']=200
#number of concurrent processes to run
params['n_conc_blocks']=4
#genome
params['ref_fasta']="/home/ribli/input/index/gallus/Gallus_gallus.Galgal4.74.dna.toplevel.fa"
#input dir output dir
params['input_dir']='/nagyvinyok/adat86/sotejedlik/ribli/dt40/test_bams/'
params['output_dir']='output/'
#the bam files used
params['bam_filenames']=['DS014.bam', 'DS051.bam', 'DS052.bam', 'DS053.bam', 'DS054.bam', 'DS055.bam',
         'DS056.bam', 'DS057.bam', 'DS058.bam', 'DS101.bam', 'DS102.bam', 'DS103.bam']

#################################################
# defining mutation calling parameters
#    default values here ...
#################################################
params['min_sample_freq']=0.31
params['min_other_ref_freq']=0.93
params['cov_limit']=7
params['base_quality_limit']=30
params['min_gap_dist_snv']=0
params['min_gap_dist_indel']=20

#################################################
# and finally run it
#################################################
run_isomut_with_pp(params)

Defining parallell blocks ...
Done

blocks to run: 215
running: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 
Done
Defining parallell blocks ...
Done

blocks to run: 215
running: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 

In [3]:
%%bash

head output/all_indels.isomut
echo
head output/all_SNVs.isomut

#sample	chr	pos	type	score	ref	mut	depth	mut_freq
9	1	1109427	INS	3.78	-	A	12	0.50
9	1	2170843	INS	4.63	-	t	23	0.48
7	1	4682973	INS	3.87	-	a	43	0.42
9	1	6415587	INS	3.64	-	A	35	0.46
7	1	8176804	INS	2.46	-	c	23	0.48
10	1	8807101	DEL	3.91	c	-	22	0.50
1	1	9048557	DEL	1.85	CAAGCAAATCTC	-	12	0.42
9	1	9455648	INS	3.06	-	a	21	0.43
7	1	9904333	DEL	3.90	T	-	42	0.45

#sample	chr	pos	type	score	ref	mut	depth	mut_freq
7	1	54563	SNV	5.12	G	T	42	0.43
7	1	54573	SNV	2.85	A	T	38	0.45
10	1	161769	SNV	5.10	A	T	42	0.55
9	1	284156	SNV	10.21	G	T	34	0.62
3	1	315124	SNV	5.81	A	G	28	0.39
1	1	670109	SNV	4.04	C	A	21	0.48
10	1	738825	SNV	6.92	A	T	35	0.57
6	1	1183758	SNV	5.29	G	A	21	0.52
9	1	1240316	SNV	3.02	G	T	35	0.31


### Without notebook, just run the sample script
- After modifying the path names, bam filenames for your usage

In [5]:
%%bash
./example_script_isomut_w_pp.py 2> samtools.log

Defining parallell blocks ...
Done

blocks to run: 215
running: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 
Done
Defining parallell blocks ...
Done

blocks to run: 215
running: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 