# Introduction to Python Programming for Biology

This notebook introduces basic Python concepts using simple biology-related examples.

## Part 1: Hello World ðŸ‘‹

The traditional first program in any language prints the words "Hello, world!".

**Example:**

In [None]:
print("Hello, world!")

### Exercise
Modify the code above to print your name.

## Part 2: Learning Functions

A function is a reusable block of code that performs a specific task.

**Example: A simple function**

In [None]:

def greet(name):
    print("Hello", name)

# call the function
greet("Alice")


### Using conditions inside functions

We can change what a function prints depending on the input.

In [None]:

def describe_mood(mood):
    if mood == "happy":
        print("I am feeling happy today!")
    elif mood == "sad":
        print("I am feeling a bit sad today.")
    else:
        print("I am feeling okay.")

describe_mood("happy")
describe_mood("sad")
describe_mood("excited")


### Exercise
Write a function called `get_organism_type` that prints:

* "This is a bacterium" if input is `"bacteria"`

* "This is a virus" if input is `"virus"`

* "Unknown organism" otherwise

## Part 3: Reading and Writing DNA Sequences

DNA sequences can be stored as strings in Python.

**Example: DNA Sequence as a string**

In [None]:

dna = "ATGCGTACGTTAG"
print(dna)


### Reading a DNA sequence from a file

In [None]:

with open("dna.txt", "w") as file:
    file.write("ATGAAATTTGGG")

with open("dna.txt", "r") as file:
    dna = file.read().strip()

print(dna)


### Writing a DNA sequence to a file

In [None]:
new_dna = "ATGAAATTTGGG"


with open("output_dna.txt", "w") as file:
    file.write(new_dna)

**Exercise**: Write the sentence **'I enjoy Python programming'** to a file named `like_python.txt`. Read the sentence back and print it out.

### Better ways of handling biological sequences

The exercise above illustrates an issue of reading/writing biological sequences - this can work on anything, and is not specific for biological sequences.

In Python there are *packages* specifically designed for handling different operations and analysis related to biological data.

In the lecture we introduced the FASTA format for storing biological sequences - a python package named Biopython has functionalities for defining DNA and amino acid sequences, as well as reading/writing sequences in the FASTA format.

In [None]:
# Run once if Biopython is not installed
# in a jupyter notebok, a '!' at the front of the line means this is to be run on the terminal on the system, rather than a Python command
!pip install biopython

In [None]:
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio import SeqIO

### Defining a DNA sequence

In [None]:
dna_seq = Seq("ATGCGTACGTTAG")
dna_seq


### Create a SeqRecord

A `SeqRecord` also allows you to define identifiers and description for the sequence. (`SeqRecord` and `Seq` are examples of Python objects [they are technically referred to as 'classes' in Python] - recall discussion on object-oriented programming in the lecture.)

In [None]:
record = SeqRecord(
    dna_seq,
    id="example_dna",
    description="A short example DNA sequence"
)

record


### Write the DNA sequence to a FASTA file

`SeqIO` is the utility from Biopython that allow you to perform read/write operations of sequences - 'IO' refers to input/output, which is one of the most basic operations of data one can perform in computer programming!

In [None]:
with open("example_dna.fasta", "w") as output_handle:
    SeqIO.write(record, output_handle, "fasta")

Try opening the `example_dna.fasta` file in text format and remind yourself of the requirements of the FASTA format.

### Read the DNA sequence back from the file

In [None]:
with open("example_dna.fasta", "r") as input_handle:
    records = list(SeqIO.parse(input_handle, "fasta"))

records

# Acess the sequence
dna_from_file = records[0].seq
dna_from_file



Another advantage of manipulating sequences with Biopython classes is that the class comes with functions allowing you to perform useful operations:

In [None]:
# Length of the sequence
len(dna_from_file)

# Reverse complement
dna_from_file.reverse_complement()


## Part 4: Translating DNA to Amino Acids ðŸ§¬

Here we will practice writing simple algorithms in Python. As an example we will consider the problem of translation: we know that DNA is translated into amino acids using **codons** (groups of three nucleotides).

In [None]:

codon_table = {
    "ATG": "M", # Methionine (start codon)
    "TTT": "F",
    "TTC": "F",
    "TAA": "*", # Stop codon
    "TAG": "*",
    "TGA": "*"
}

*(Note: This is a simplified table just for illustration - we know of course the codon table is much larger than this!)*

At the end of this part we would like to write a simple algorithm that takes in a DNA sequence as input, and return the translated DNA sequence following a codon table. 

This is possible with functionalities available in Biopython - but just for teaching & learning purposes we will break the problem down step by step, and think about how to implement this using basic operation in Python without specialised packages.

### Some related tools we need for this

#### 'Looping' over lists

One thing we need to learn before approaching the problem is how to efficiently perform repeated oeprations over a bunch of data entries. In programming we call this 'looping'. 

A simple loop is the 'for' loop:

In [None]:
animals = ["cat", "dog", "mouse"]

for animal in animals:
    print(animal)


You can also loop over a list of numbers:

In [None]:
# to define a list of numbers we can use the python function 'range'
print(range(5))

In [None]:
for i in range(5):
    print(i) # each entry in the list of numbers printed in a new line


In [None]:
# By default it starts from '0'. It doesn't have to be the case:
for i in range(1, 10):
    print(i) 


In [None]:
# Also, we don't have to iterate step-by-step.
for i in range(0, 10, 2):
    print(i)

# function 'range' expects: range(start, stop, step)

In [None]:
# going back to the example of strings. A string is actually a list of letters in Python. Try this out:
for i in 'I like Python programming':
    print(i) 


In [None]:
# we can also use a for loop to generate a new string!
my_string = 'I like Python programming'
print(my_string)
for i in ['because', 'it', 'is', 'fun']:
    j = ' ' + i # '+' here means concatenation (i.e. sticking them together). Just putting a space in front as it looks nicer
    my_string += ' ' + i # '+=' is a convenient way to do BOTH concatenation and assignment (e.g. y += x means the same as y = y + x)
    print(my_string)

**Exercise**: write a 'for' loop that prints an input DNA sequence in triplets. (Let's use as example the input string `"ATGTTTTAA"`)

### Python 'dictionary'

In python, 'dictionary' is a data structure that maps 'keys' to their corresponding 'values'. Think of it like a real dictionary: *word* maps to *definition*.

In [None]:
ages = {
    "Alice": 20,
    "Bob": 25,
    "Charlie": 30
}

print(ages)


The advantage of the key-value structure is that you can look up the value corresponding to a key (just like you look up definitions of a word in dictionary):

In [None]:
print(ages["Alice"])


A requirement is that the keys have to be unique!

In [None]:
example = {
    "A": 1,
    "A": 2
}

print(example) # the second value overwrites the first!!!


A safe way to perfofrm this lookup is to use the *dictionary method* `.get()`.
* If key exists â†’ return its value
* If key does not exist â†’ return `default_value`



In [None]:
print(ages.get("Bob", "N/A")) # should return a value ('Bob' is one of the keys in the dictionary!)
print(ages.get("Dave", "N/A")) # but 'Dave' isn't ...
print(ages.get("Dave", "I don't know")) # How about this?

**Exercise:** Create a dictionary that function as the table of triplet DNA codes you learnt in basic molecular biology, and verify that you can query the dictionary to return an amino acid given input triplets.

(You might want to search for the codon table - this is a classic Biocomputing exercise, e.g. https://gist.github.com/juanfal/09d7fb53bd367742127e17284b9c47bf)

### Final Exercise

Write the complete algorithm! Here are some helpful hints:

* you want the program to stop when it sees a stop codon
* make sure it is capable of handling 'wrong' DNA triplets (otherwise it will break the for loop ...)
* remember a string can be thought of as a list of letters. This applies to iterating through an existing string, but also is useful if you want to generate a new string (by repeatedly appending elements to the back of the string)