# How to write a script on your own

## Start with the problem: Analysis

The problem we want to solve is:
I want a program which translates all my protein file to nucleotide files.
Although, this is a simple problem, there is not enough information for us to get started with the solution. A little more analysis is required. For example, how do we specify which files are to be processed? How does the translation work? Can all the information be translated or should we skip some information? How are the translated files stored? 
After analysing the problem properly, we design our program. 

## Design and test 

We make a list of things about how our program should work. We decompose the order of functional steps the program should follow and for each step we think about what information is needed to process the step and what should be the result of the step. 

For instance, in our case we need the following steps

1.	Get a list of files that need to be processed
2.	Process per file
<ul>
<li>a.	Get to be translated amino acid sequence </li>
<li>b.	Translate the amino acid sequence to dna sequence</li>
<li>c.	Store the translated dna sequence to an outputfile</li></ul>

Since we need a,b and c to execute multiple times it is best to put them in functions. Best is to write a simple body of functions and test the logic with print.


In [1]:
#!/usr/bin/env python3

""" translates all protein files to nucleotide files"""

__author__ = 'fennaf'

#imports
import sys

#functions
def fetch_sequence():
    print("fetch sequence")

def translate_aa2dna():
    print("translate aa to dna")

def store_translation():
    print("store stranslation")

def main(args):
    fetch_sequence()
    translate_aa2dna()
    store_translation()
    
if __name__ == '__main__':
    exitcode = main(sys.argv)
    sys.exit(exitcode)


fetch sequence
translate aa to dna
store stranslation


So let us now think about what information is needed to process the step and what should be the result of the step. 
For the fetch_sequence we need a file to be read and we should return a amino acid sequence. For the translate_aa2dna we need a amino acid sequence to be translated and a translation process. The output will be a dna sequence. The store_translation function needs a name of a file to write to and the translated sequence. If we design the needed input and output into our script it looks as follow:



In [2]:
#!/usr/bin/env python3

""" translates all protein files to nucleotide files"""

__author__ = 'fennaf'

#imports
import sys

#functions
def fetch_sequence(file):
    sequence = ""
    print("fetch sequence")
    return sequence

def translate_aa2dna(aa_sequence):
    dna_sequence = ""
    print("translate aa to dna")
    return dna_sequence


def store_translation(output_file, sequence):
    print("store stranslation")
    o = open(output_file, "w")
    o.write(sequence)
    o.close()


def main(args):
    input_file = "CFTR_protein.fasta"
    aa_sequence = fetch_sequence(input_file)
    dna_sequence = translate_aa2dna(aa_sequence)
    store_translation("test.txt", dna_sequence)
    
if __name__ == '__main__':
    exitcode = main(sys.argv)
    sys.exit(exitcode)

    

fetch sequence
translate aa to dna
store stranslation


It is now time to move one step further and develop the logic per function. Best practise is to develop and test small pieces. First let us develop the fetch_sequence function. We need to open the file, read the file but skip the first line because this has no sequence information. Only if the file starts with the letter M we now that it contains protein information. 

In [3]:
def fetch_sequence(file):
    sequence = ""
    f = open(file)
    for line in f:
        if not line.startswith('>'):
            sequence += line
    print("fetch sequence", sequence)
    return sequence

def main(args):
    input_file = "CFTR_protein.fasta"
    aa_sequence = fetch_sequence(input_file)
    # dna_sequence = translate_aa2dna(aa_sequence)
    # store_translation("test.txt", dna_sequence)
    
    
if __name__ == '__main__':
    exitcode = main(sys.argv)
    sys.exit(exitcode)

fetch sequence MQRSPLEKASVVSKLFFSWTRPILRKGYRQRLELSDIYQIPSVDSADNLSEKLEREWDRELASKKNPKLI
NALRRCFFWRFMFYGIFLYLGEVTKAVQPLLLGRIIASYDPDNKEERSIAIYLGIGLCLLFIVRTLLLHP
AIFGLHHIGMQMRIAMFSLIYKKTLKLSSRVLDKISIGQLVSLLSNNLNKFDEGLALAHFVWIAPLQVAL
LMGLIWELLQASAFCGLGFLIVLALFQAGLGRMMMKYRDQRAGKISERLVITSEMIENIQSVKAYCWEEA
MEKMIENLRQTELKLTRKAAYVRYFNSSAFFFSGFFVVFLSVLPYALIKGIILRKIFTTISFCIVLRMAV
TRQFPWAVQTWYDSLGAINKIQDFLQKQEYKTLEYNLTTTEVVMENVTAFWEEGFGELFEKAKQNNNNRK
TSNGDDSLFFSNFSLLGTPVLKDINFKIERGQLLAVAGSTGAGKTSLLMVIMGELEPSEGKIKHSGRISF
CSQFSWIMPGTIKENIIFGVSYDEYRYRSVIKACQLEEDISKFAEKDNIVLGEGGITLSGGQRARISLAR
AVYKDADLYLLDSPFGYLDVLTEKEIFESCVCKLMANKTRILVTSKMEHLKKADKILILHEGSSYFYGTF
SELQNLQPDFSSKLMGCDSFDQFSAERRNSILTETLHRFSLEGDAPVSWTETKKQSFKQTGEFGEKRKNS
ILNPINSIRKFSIVQKTPLQMNGIEEDSDEPLERRLSLVPDSEQGEAILPRISVISTGPTLQARRRQSVL
NLMTHSVNQGQNIHRKTTASTRKVSLAPQANLTELDIYSRRLSQETGLEISEEINEEDLKECFFDDMESI
PAVTTWNTYLRYITVHKSLIFVLIWCLVIFLAEVAASLVVLWLLGNTPLQDKGNSTHSRNNSYAVIITST
SSYYVFYIYVGVADTLLAMGFFRGLPLVHTLITVSKILHHKMLHSVLQAPMSTLNTLKAGGI

The print statement helps us to debug the function. It seems to work, so we eliminate the print function.

In [4]:
def fetch_sequence(file):
    sequence = ""
    f = open(file)
    for line in f:
        if not line.startswith('>'):
            sequence += line
    return sequence

def main(args):
    input_file = "CFTR_protein.fasta"
    aa_sequence = fetch_sequence(input_file)
    # dna_sequence = translate_aa2dna(aa_sequence)
    # store_translation("test.txt", dna_sequence)
    
    
if __name__ == '__main__':
    exitcode = main(sys.argv)
    sys.exit(exitcode)

Now we move on to the translate function. To translate we need to go character(one amino acid is one character) by charachter through the file and translate this to the dna codon. The dna codon must be added to the dan_sequence. I use a simple reverse_codon table first to test the principle

In [5]:

def fetch_sequence(file):
    sequence = ""
    f = open(file)
    for line in f:
        if not line.startswith('>'):
            sequence += line
    return sequence

def translate_aa2dna(aa_sequence):
    dna_sequence = ""
    # this need to be extended
    reverse_codon = {
        "M" : "ATG",
        "F" : "TTT",
        "D" : "GAT"
        }
    for char in aa_sequence:
        if char in reverse_codon:
            dna_sequence += reverse_codon[char]
    print(dna_sequence)
    return dna_sequence

def main(args):
    input_file = "CFTR_protein.fasta"
    aa_sequence = fetch_sequence(input_file)
    dna_sequence = translate_aa2dna(aa_sequence)
    # store_translation("test.txt", dna_sequence)
    
    
if __name__ == '__main__':
    exitcode = main(sys.argv)
    sys.exit(exitcode)

ATGTTTTTTGATGATGATGATTTTTTTTTTATGTTTTTTGATGATTTTTTTATGATGATGTTTGATTTTGATTTTATGTTTTTTTTTATGATGATGGATATGATGATGTTTTTTTTTTTTTTTTTTTTTTTTTTTATGTTTGATGATTTTATGTTTTTTTTTGATGATTTTTTTTTTGATTTTATGATGTTTTTTATGTTTGATGATTTTGATGATGATGATTTTGATTTTATGATGGATTTTTTTGATTTTATGGATTTTGATTTTTTTGATTTTTTTTTTATGGATGATGATATGGATGATTTTTTTGATGATATGTTTTTTGATTTTGATATGTTTTTTATGATGTTTGATGATGATTTTGATTTTTTTTTTATGTTTTTTTTTTTTTTTTTTTTTATGATGTTTTTTTTTTTTATGATGGATGATATGTTTTTTGATATGATGGATGATATGGATTTTTTTGATGATTTTTTTTTTTTTGATGATGATTTTGATTTTGATATGGATGATTTTGATATGTTTGATTTTGATTTTGAT


Now let us test the final part. To make sure that each output file is stored with a different name we make an outputvariable

In [6]:

def fetch_sequence(file):
    sequence = ""
    f = open(file)
    for line in f:
        if not line.startswith('>'):
            sequence += line
    return sequence

def translate_aa2dna(aa_sequence):
    dna_sequence = ""
    # this need to be extended
    reverse_codon = {
        "M" : "ATG",
        "F" : "TTT",
        "D" : "GAT"
        }
    for char in aa_sequence:
        if char in reverse_codon:
            dna_sequence += reverse_codon[char]
    return dna_sequence


def store_translation(output_file, sequence):
    print("store stranslation")
    o = open(output_file, "w")
    o.write(sequence)
    o.close()

def main(args):
    input_file = "CFTR_protein.fasta"
    aa_sequence = fetch_sequence(input_file)
    dna_sequence = translate_aa2dna(aa_sequence)
    output_file = "output" + input_file[:-6] 
    store_translation(output_file, dna_sequence)
    
    
if __name__ == '__main__':
    exitcode = main(sys.argv)
    sys.exit(exitcode)

store stranslation


The final thing we need to do is incoperate process multiple files feature, so we design a for loop:

In [8]:


def main(args):
    files = args[1:]
    for input_file in files:
        aa_sequence = fetch_sequence(input_file)
        dna_sequence = translate_aa2dna(aa_sequence)
        output_file = "output" + input_file[:-6] 
        store_translation(output_file, dna_sequence)
    
if __name__ == '__main__':
    exitcode = main(sys.argv)
    sys.exit(exitcode)

store stranslation


## summary: 

After analyzing the problem properly, we design our program. We make a list of things about how our program should work. If you do the design, you may not come up with the same kind of analysis since every person has their own way of doing things, so that is perfectly okay.


Software development process
In the practical lectures, we translated several problems into python programs.
We had to go through various phases in the process of writing the software. 

- What (Analysis)
- How (Design)
- Do It (Implementation)
- Test (Testing and Debugging)
- Use (Operation or Deployment)
- Refine 

Best practice is to start implementing with a simple version. Test and debug it. Use it to ensure that it works as expected. Now, add any features that you want and continue to repeat the Do It-Test-Use cycle as many times as required.
Software is grown, not built. -- Bill de hÓra

Source: https://python.swaroopch.com/problem_solving.html
