# Putting theory into practise
Today is the culmination of the last 3/4 weeks of your programming journey. We will finish the course by putting into practise some of the principles of programming you have learnt to date.

These include how to debug code, use version control, code testing and implement defensive programming.


# Debugging

You have been provided with a set of sequence data (on MyAberdeen) and some code with several functions defined and bit of example code to show how to use them.  You can copy and paste the function code into your jupyter notebook, then run the test code using the sequence data you have been provided with. 

Your first task in this practical to debug the code to figure out why the code does not work. The code is deliberately designed to give the following series of error (in order). Therefore, you will need to fix each bug in turn to reveal the next error message.

__First error message:__

![error code 1](error_code1.png "First error message")

__Second error message:__

![error message 2](error_code2.png "Second error message")

__Third error message:__
![Third error to debug](error_code3.png)

__Fourth error message:__
![Error 4](error_code4.png)

You will then be able to execute the code properly - you will know because the output will be identical to below:

![code debugged](working_code_output.png)

__Exercise break__
How would you go about changing the code in the first place to ensure that these bugs did not happen? Think about what you know about molecular biology - this is knowledge that can be used to make your code better using defensive programming.

# Defensive programming

One way to implement defensive programming is to write out pseudocode before even touching a keyboard - this is a very good habit to get into, particularly for complex algoirithms.

The other thing we can do is to put checks inside the code, like the examples in the lecture, to either create informative error messages or to fail nicely. I will use the function below to illustrate this.

![defensive programming function](defensive_programming_pic.png)

In [1]:
def transcribeDNA(seq):
    '''    
    Transcribe a DNA sequence into an mRNA sequence
    '''
    
    trans_dict = {"A": "U", "C": "G", "T": "A", "G":"C"}
    dna_seq = "".join([trans_dict[x] for x in seq])
    
    return(dna_seq[::-1])


def translateSeq(seq, codons, frame=1):
    '''
    translate an mRNA sequence into a peptide
    st is the starting nt index, i.e. which reading frame: 0, 1 or 2
    '''
    
    peptide = ""
    if frame == 1:
        fseq = seq
    elif frame == 2:
        fseq = seq[1:-2]
    elif frame == 3:
        fseq = seq[2:-1]
    
    for idx in range(0, len(fseq), 3):
        codon = fseq[idx: idx+3]
        aa = codons[codon]
        if idx != len(fseq) - 3:
            if aa == "*":
                return(peptide + aa)
            else:
                peptide += aa
        else:
            peptide += aa
    
    return(peptide)


def findLongestFrame(seq, codons):
    '''
    Find the longest reading frame.
    seq: string - an mRNA sequence containing A, U, C, G
    codons: dict - a dictionary that maps 3 nucleotide codons to amino acids
    
    Return:
    int - the position of the longest reading frame for the input sequence.    
    '''
    
    if type(seq) != str:
        raise AttributeError("{} is an incorrect data format - must be str".format(type(seq)))
        
    # check that seq only contains A, U, C and G
    unique_nt = set(seq) # get the base characters of seq
    nt_int = unique_nt.intersection(set(['A', 'U', 'C', 'G'])) # check for overlap with expected nts    
    
    if len(nt_int) < 4:
        raise AttributeError("seq contains non-nucleotide characters: {}".format(",".join(list(unique_nt))))
    elif len(nt_int) > 4:
        add_nt = unique_nt.difference(set(['A', 'U', 'C', 'G']))
        raise AttributeError("seq contains non-nucleotide characters: {}".format(",".joint(list(add_nt))))
    
    
    # generate all reading frames - select the longest
    aa1 = translateSeq(seq, codons)
    aa2 = translateSeq(seq[1:-2], codons)
    aa3 = translateSeq(seq[2:-1], codons)
    
    if len(aa1) >= len(aa2) and len(aa1) >= len(aa2):
        return(1)
    elif len(aa2) >= len(aa1) and len(aa2) >= len(aa3):
        return(2)
    elif len(aa3) >= len(aa1) and len(aa3) >= len(aa2):
        return(3)

In [3]:
dna = "ATGATCCACGATTCCAGGCTTCCCATTCAAAATTGCCGCCATCCAAAGGCTGACTGGGGACGTGTAAGGAGCGTTCGAGAATATACAAAGTCAGATCGAGACACGCCGGTACATTCTATCTGGAACGGGCCGTCGCCGAGACCTTCGCTGTGCTTCTTTAGTAGTTCCTCAATGCCGAATGGGGCATCGGGCAGGTGTAACAACAAGCTTCGTATGAATGTATTTCTAAGTCTGGACTATTCCGTATATGCTGCTTTATTAGTGCGACTATGGCGAGATCATAGGCAATCGTGTCCCATGGGTCAGGACGCGGTAAGGAGTACAACCAGCATCAACTGA"

codon_dict = {"UUU": "F", "UUC": "F", "UUA": "L", "UUG": "L", "UCU": "S", "UCC": "S", "UCA": "S", "UCG": "S",
                  "UAU": "Y", "UAC": "Y", "UAA": "*", "UAG": "*", "UGU": "C", "UGC": "C", "UGA": "*", "UGG": "W",
                  "CUU": "L", "CUC": "L", "CUA": "L", "CUG": "L", "CCU": "P", "CCC": "P", "CCA": "P", "CCG": "P",
                  "CAU": "H", "CAC": "H", "CAA": "Q", "CAG": "Q", "CGU": "R", "CGC": "R", "CGA": "R", "CGG": "R",
                  "AUU": "I", "AUC": "I", "AUA": "I", "AUG": "M", "ACU": "T", "ACC": "T", "ACA": "T", "ACG": "T",
                  "AAU": "N", "AAC": "N", "AAA": "K", "AAG": "K", "AGU": "S", "AGC": "S", "AGA": "R", "AGG": "R",
                  "GUU": "V", "GUC": "V", "GUA": "V", "GUG": "V", "GCU": "A", "GCC": "A", "GCA": "A", "GCG": "A",
                  "GAU": "D", "GAC": "D", "GAA": "E", "GAG": "E", "GGU": "G", "GGC": "G", "GGA": "G", "GGG": "G"}

rna = transcribeDNA(dna)
findLongestFrame("AUCG", codons=codon_dict)

KeyError: 'G'

__Exercise__
The current function only works with an RNA sequence of specific lengths - can you figure why, and then come up with some defensive programming to protect against the wrong length sequences from causing this error message?

Go back to the anaconda-navigator and select the JupyterLab app:

![jupyer lab app](jupyterlab.png)

This will open up the JupyterLab environment, which allows you to have multiple tabs open, including code files, notebooks and terminal sessions. The latter is a command-line interface to your computer, and is a natural and powerful way to work on a computer, but can take some practise if you are not familiar with this.

We are going to use the terminal to use `git` for version control. The first step is to go to your GitHub page and create a new repository:

![new repo](new_repo.png)

### Setting up a repository

Go ahead and fill in the details with the name and description for this workshop. This will create your repository similar to below:

![a repo](a_repo.png)


Go and click on the blue `<> Code` box to get the dropdown menu and copy the URL to your clipboard.


![cloning a repo](clone_repo.png)

The next step is to go back to your new JupyterLab open a new `Terminal` session inside:

![open terminal](jupyterlab_app.png)

I strongly recommend opening this in your Home drive so that you can keep track of this in the future.

### Cloning a repository

Now type the following, and paste the URL that you copied from GitHub:

`git clone <paste URL>` (the URL will be specific to your repository)

![clone repo](git_clone.png)

Hit Enter and this will clone your repository that you made. This doesn't contain anything, but it sets up the version control on your computer in the directory where you Terminal session is open.

### Using git

While learning how to use git I strongly recommend keeping a git cheatsheet close by: https://training.github.com/downloads/github-git-cheat-sheet/. The series of commands can get confusing but the main ones you need to know are:

```
git add <file paths here>
git commit -m 'useful message here'

git pull
git push
```

These commands are executed in order whenever you want to log your code changes on a repository. 

`git add` "stages" your changes and tells git that you are going to log the changes in the specific set of files on the `git add` command.

`git commit -m 'message'` commits the changes to _your local repository_ - this means you can add and change things locally without affecting what is happening to your central repository on GitHub.

`git pull` retrieves the latest version of the repository from GitHub. You will mostly be the only one working on these repos, but on many software projects there will be multiple team members making code contributions, so this is a good habit to get into.

`git push` this sends your changes to the central repository and logs them.

This process is also very useful if you need to switch machines and retrieve your code from the central GitHub repository - you can just pull in the changes you made elsewhere and keep working seamlessly.

Now try this with your repository and the Python code file - but add in the new defensive programming features that you added.

Once you've done that you will see that your GitHub page now contains the code file that you just added.

![added to repo](populated_repo.png)

__Exercise break__
Spend a bit of time familiarsing yourself with git and GitHub - try adding some other code files. BEWARE - do not add large files (megabytes) as they can grow very quickly and block you from logging changes or new files. Therefore, the best practise is to add specific files, and the mantra is to submit little and often.

__Final exercise__

Send me your GitHub ID, if you haven't already. I will add you to the class GitHub team from where you can create your own repository for the final assignment: https://github.com/BT5511-BiocomputationProjects-2024. Your repository name should include your name and the year, i.e. 2024. I will make sure that you only have read/write access to your own code and not anyone elses.