### Optional : Another real life example:

#### Let us assume that each paired-end read we obtain from a sequencing experiment is in a bed file. We want to convert the reads into a genome browser track.

A bed file has the following format (https://genome.ucsc.edu/FAQ/FAQformat.html#format1):<br>

`chromosome    (start-1)    end    blah    blah    Strand`

Let us break it down into columns:
 - First Column: Chromosome
 - Second Column: (Start position - 1)
 - Third Column: End position
 - Fourth Column: Name
 - Fifth Column: Score
 - Sixth Column: Strand
 
In this example, we will ignore strand. For making a browser track, here is our strategy:

`123456789`<br>
`----_____`<br>
`___----__`<br>
`_____----`<br>
`______---`<br>
`______---`<br>
`111212433`<br>

In [None]:
read = {} #this will be the dictionary for our analysis
# Here is the dictionary structure:
# First level will be chromosome
# Second level will be genomic coordinate

bed_file = open('80mM_Ctl_sub.bed', 'r' ) #Open the bed file
for line in bed_file: # Iterate through the fastq file
    line=line.rstrip() #get that pesky newline out
    
    #let us read in column values#####
    cols = line.split()
    chrom=cols[0].replace('chr','') #remove the pesky "chr" character from chromosome field
    st = int(cols[1])+1
    en = int(cols[2])
    ##########################
    
    read.setdefault(chrom,{}) #when reading a file, we haven't assigned any chromosome keys in our dictionary.
                              #setdefault makes a key if it doesn't exist
    #Fill in the reads here
    for i in range(st,en):
        read[chrom].setdefault(i,0)
        read[chrom][i]+=1
    
    #write the reads to a file
    fh = open("read_density.txt","w")
    for i in read:
        for j in sorted(read[i]):
            txt = str(i) + "\t" + str(j-1) + "\t" + str(j)+"\t"+ str(read[i][j]) + "\n"
            fh.write(txt)
    fh.close()

How does the track look? Let us try to plot it (we have a small region in this example)

In [None]:
import matplotlib
import matplotlib.pyplot as plt
import numpy as np

yt = []

x = np.asarray(sorted(read['2L']))
x = x - 21163511
for i in sorted(read['2L']):
    yt.append(read['2L'][i])
y = np.asarray(yt)

fig, ax = plt.subplots()
ax.plot(x, y)

ax.set(xlabel='Position', ylabel='Read Density')

plt.show()

This is all fragment sizes. Can we get only nucleosomal size (134-160 bp), and transcription factor size (<50 bp)?

In [None]:
read_nuc = {} #this will be the dictionary for nucleosomes
read_tf  = {} #this will be the dictionary for transcription factors

# Here is the dictionary structure:
# First level will be chromosome
# Second level will be genomic coordinate

bed_file = open('80mM_Ctl_sub.bed', 'r' ) #Open the bed file
for line in bed_file: # Iterate through the fastq file
    line=line.rstrip() #get that pesky newline out
    
    #let us read in column values#####
    cols = line.split()
    chrom=cols[0].replace('chr','') #remove the pesky "chr" character from chromosome field
    st = int(cols[1])+1
    en = int(cols[2])
    fragment_length = en - st + 1
    
    ##########################
    
    #Fill in the reads here
    if(fragment_length <=50):
        for i in range(st,en):
            read_tf.setdefault(chrom,{})
            read_tf[chrom].setdefault(i,0)
            read_tf[chrom][i]+=1
    elif(fragment_length >=134 and fragment_length <= 160):    
        for i in range(st,en):
            read_nuc.setdefault(chrom,{})
            read_nuc[chrom].setdefault(i,0)
            read_nuc[chrom][i]+=1

In [None]:
yt = []

x1 = np.asarray(sorted(read_nuc['2L']))
for i in sorted(read_nuc['2L']):
    yt.append(read_nuc['2L'][i])
y1 = np.asarray(yt)

yt = []

x2 = np.asarray(sorted(read_tf['2L']))
for i in sorted(read_tf['2L']):
    yt.append(read_tf['2L'][i])
y2 = np.asarray(yt)


fig, ax = plt.subplots()
ax.plot(x1, y1, color='black',label="Nucleosome")

ax.plot(x2, y2, color='red', label="TF")

ax.set(xlabel='Position', ylabel='Read Density')

plt.legend()

plt.show()