<span style="float:left;">Licence CC BY-NC-ND</span><span style="float:right;">François Rechenmann &amp; Thierry Parmentelat&nbsp;<img src="media/inria-25.png" style="display:inline"></span><br/>

# Searching for coding regions

We will now see a second, more complete, version of the algorithm that searches for coding regions, that will account for all 3 phases as well as reverse complement.

We are thus to search for coding regions 6 times: 
 * on three phases on the original DNA
 * and on three phases on its reverse complement.

In [None]:
# this is so that we can use print() in python2 like in python3
from __future__ import print_function
# with this, division will behave in python2 like in python3
from __future__ import division

### Assets

We actually already have all the pieces that we need to do this painlessly.

In [None]:
# searching on one phase, as seen in sequence 2
from w3_s02_c1_coding_regions_v1 import coding_regions_one_phase

In [None]:
# computing the reverse complement, that we have just seen 
# in this same sequence
from w3_s04_c1_reverse_complement import reverse_complement

### Computation on all 6 phases

We can now compute rather simply genes on all 6 phases. The only notable difference as compared with `coding_regions_one_phase` is in the outputs format. Indeed we can no longer return a list of genes each defined as a `[start, stop]` tuple, since this would not tell us whether the gene was found on the input sequence or on its reverse complement. So this time we decide to return a list of genes in extenso. Which leads us this time to the following code:

In [None]:
# the genes found on all 6 phases
def all_coding_regions(dna):
    reverse = reverse_complement(dna)
    # future result
    genes = [] 
    for subject in dna, reverse:
        for phase in 0, 1, 2:
            for start, end in coding_regions_one_phase(subject, phase):
                # let us extract the actual gene contents
                # and add it to the results
                genes.append(subject[start:end])
    return genes

### An example

Let us stay on the same input sample as for `coding_regions_one_phase`, (reminder: [Bacillus subtilis](http://www.ebi.ac.uk/ena/data/view/CP010053) - key `CP010053`) here is what we would get now:

In [None]:
from samples import subtilis
print("subtilis has {} bases".format(len(subtilis)))

In [None]:
# reminder, genes found on phase 0 only
genes_phase_0 = coding_regions_one_phase(subtilis, 0)
print("{} genes were found on phase 0".format(len(genes_phase_0)))

In [None]:
# on all 6 phases now
genes = all_coding_regions(subtilis)
print("On all 6 phases we found {} genes".format(len(genes)))