### Parse Info
This function parses sniffles column 8 (INFO) to extract the Type, End, Lenght, and Chr2 of the Strcutural variant. 

Sample data at index 7:
```
PRECISE;SVMETHOD=Snifflesv1.0.12;CHR2=1;END=3097326;ZMW=17;STD_quant_start=0.000000;STD_quant_stop=1.897367;Kurtosis_quant_start=2.000000;Kurtosis_quant_stop=1.946987;SVTYPE=INS;SUPTYPE=AL;SVLEN=3644;STRANDS=+-;STRANDS2=9,8,9,8;RE=17;REF_strand=14,14;Strandbias_pval=1;AF=0.377778
```

In [2]:
def parseInfo(line_array, infoIndex):
    
    dict_values={'SVTYPE':'NN', 'END':-1, 'CHR2':'', 'SVLEN':0}
    
    if(len(line_array)<=infoIndex):
        return dict_values
    
    infoString=line_array[infoIndex]
    info_array = infoString.split(";");
    
    if(len(info_array)<=2):
        return dict_values
    
    for k, v in dict_values.items():
        for item in info_array:
            if(k+"=" in item):
                dict_values[k]=item.replace(k+"=","")
                break
    
    return dict_values

In [3]:
%%html 
<style>table {display: inline-block}</style>

### Generate BED File

The output file has the following format:

Column | Data         | Comment                          
:---   | ---          | ---                              
Col1   | chr num      | 'chr' prefixed                   
Col2   | start        | from sniffles output column (2)  
Col3   | end          | fom info data                    
Col4   | length       | fom info data                    
Col5   | type         | fom info data                    

In [4]:
sv_file_path='../../data/r64089e.sorted.vcf'
bed_file_path=sv_file_path + ".bed"

In [5]:
limit=-1
index=0
exclude_bnd=True
exlude_noend=True 

chr_names=["chr1","chr2","chr3","chr4","chr5","chr6","chr7","chr8","chr9","chr10","chr11","chr12","chr13","chr14","chr15","chr16","chr17","chr18","chr19"]

sv_file = open(sv_file_path)

with open(bed_file_path, 'w') as file:
    for line in sv_file.readlines():
        line_array=line.strip().split("\t")

        if(len(line_array) <8):
            continue
        index+=1

        chr1="chr"+line_array[0]
        start=line_array[1]
        info=parseInfo(line_array, 7)

        if((exclude_bnd and info['SVTYPE']=='BND') or (chr1 not in chr_names) or (exlude_noend and info['END']==-1)):
            continue

        out_line=chr1+"\t"+str(start)+"\t"+str(info['END'])+"\t"+str(abs(int(info['SVLEN'])))+"\t"+info['SVTYPE']+"\n"
        file.write(out_line)

        if(limit!=-1 and index>=limit):
            break

In [6]:
with open(bed_file_path, 'r') as file:
    for i in range(10):
        line = file.readline()
        print(line)

chr1	3090436	3090438	2800	INS

chr1	3097326	3097326	3644	INS

chr1	3099521	3099521	32	INS

chr1	3101583	3101940	357	DEL

chr1	3108755	3108821	66	DEL

chr1	3120398	3120398	115	INS

chr1	3121617	3121617	104	INS

chr1	3135255	3135255	34	INS

chr1	3147022	3147484	462	DEL

chr1	3148891	3148891	38	INS

