# Lesson 10: Organization and code style

Jurre Hageman

This lesson is about code organization and code style.  
It is important to organize your code.  
Well organized code can be re-used, reduces bugs and is easier to maintain.

## Code organization

Python offers different levels of code organization:
- Functions
- Classes (informatics 3)
- Modules (informatics 2)

As you will deal with classes and modules in the next courses, we will focus on organization of a script in functions. Cut your code in functions. As a rule of thumb, limit your functions to around 12 lines of code. Re-use your functions as much as possible and limit duplicated code!

Many programming languages use a main function that is executed when a program runs. In Python, there is no function that automatically executes.  
Nevertheless, having an explicit starting point for the start of execution of a program is good practice. 
Therefore, Python programmers often use a `main` function to define this starting point:

In [4]:
def calc_cg(seq):
    """Calculates the CG percentage of a sequence"""
    C = seq.count("C")
    G = seq.count("G")
    CG_perc = (C + G)/len(seq) * 100
    return CG_perc


def main():
    """main function that calls other functions"""
    seq = "GAGC"
    cg_perc = calc_cg(seq)
    print(cg_perc)
    print("Done")

main()

75.0
Done


Note that in the previous example the main function calls the `calc_cg` function.  
The first line after the function header is a multi-line string that describes the function.  
Use multi-line strings for this as they also act as doc-strings in Python. This will be explained in informatics 2. For now, just remember to add the description in a multiline string. 

Now imagine that you need to write a (command-line) program with similar functionality as shown in [this link](https://www.bioinformatics.org/sms/rev_comp.html).
The program accepts a DNA sequence and can:
- reverse DNA
- complement DNA
- reverse complement DNA

Now how would you organize such a program?  
Start with writing function headers. 
Include multi-line strings with a short description of the function.
Include a main function.  
Use `pass` to prevent errors:  

In [7]:
def reverse(seq):
    """Returns the reverse of a DNA string"""
    pass

def complement(seq):
    """Returns the complement of a DNA string"""
    pass

def reverse_complement(seq):
    """Returns the reverse complement of a DNA string"""
    pass

def main():
    """Main function that starts the scripts"""
    

The next step is to write the functions.  
Test them step by step.  
Make sure that the function works as expected:

In [12]:
def reverse(seq):
    """Returns the reverse of a DNA string"""
    return seq[::-1]

def complement(seq):
    """Returns the complement of a DNA string"""
    bases = {"A": "T", "T": "A", "C": "G", "G": "C"}
    comp = ""
    for nuc in seq:
        comp += bases[nuc]
    return comp
    
def reverse_complement(seq):
    """Returns the reverse complement of a DNA string"""
    rev = reverse(seq)
    rev_comp = complement(rev)
    return rev_comp

def main():
    """Main function that starts the scripts"""
    seq = "GACC"
    print("input:", seq)
    print("reverse:", reverse(seq))
    print("complement:", complement(seq))
    print("reverse-complement:", reverse_complement(seq))

main()

input: GACC
reverse: CCAG
complement: CTGG
reverse-complement: GGTC


Note that the reverse_complement function has no code duplication. Instead, it calls the reverse function and the complement function to create a reverse complement of the sequence.

More about code organization in the informatics 2 course.