# Introduction to Python Programming for Bioinformatics

## About this notebook

This notebook was originally written by [Marc Cohen](https://github.com/mco-gh), an engineer at Google. The original source can be found on [Marc's short link service](https://mco.fyi/), and starts with [Python lesson 0](https://mco.fyi/py0), and I encourage you to work through that notebook if you find some details missing here.

Rob Edwards edited the notebook, adapted it for bioinformatics, using some simple geneticy examples, condensed it into a single notebook, and rearranged some of the lessons, so if some of it does not make sense, it is Rob's fault!

It is intended as a hands-on companion to an in-person course, and if you would like Rob to teach this course (or one of the other courses) don't hesitate to get in touch with him.

## Using this notebook

You can download the original version of this notebook from [GitHub](https://linsalrob.github.io/ComputationalGenomicsManual/Python/Python_Lesson_5.ipynb) and from [Rob's Google Drive]()

**You should make your own copy of this notebook by selecting File->Save a copy in Drive from the menu bar above, and then you can edit the code and run it as your own**

There are several lessons, and you can do them in any order. I've tried to organise them in the order I think most appropriate, but you may disagree!


# Lesson Links

* [Lesson 5 - Conditionals](#Lesson-5---Conditionals)
  * [Controlling Program Flow](#Controlling-Program-Flow)
  * [if Statements](#if-Statements)
  * [if Block Structure](#if-Block-Structure)
  * [Python's use of whitespace](#Python's-use-of-whitespace)
  * [else Statements](#else-Statements)
  * [elif Statements](#elif-Statements)
  * [For loops](#For-loops)

[Previous Lesson](Python_Lesson_4.ipynb) | [Next Lesson](Python_Lesson_6.ipynb)

# Lesson 5 - Conditionals


# Controlling Program Flow

Question:

Does an stretch of DNA 2329 bp long encode a gene?

* Up to now, we've looked at very simple programs, involving a sequence of statements (A, then B, then C…)
* But what you really want to do is probably a lot more complex than adding numbers or simple statements.

For our question, we are going to assume that we are talking about simple phage and bacterial genes that don't have introns (after all, you have more phage and bacterial genes than human genes).

Genes start with the codon `ATG` and end with one of the codons `TAA`, `TGA`, or `TAG`, and the codons need to be in-frame.

<details>
<summary>More assumptions!</summary>
Of course, we are assuming that this is a phage or bacterial gene that doesn't have an intron, and that it doesn't start with `GTG` or `TTG`, and that the bacteria doesn't contain suppressor mutations that allow them to substitute amino acids in for the standard stop codons. In real life, we need to consider all those cases, but for now, we'll keep it simple!
</details>


# `if` Statements

* The `if` statement is how we express conditional logic in Python.
* Virtually every programming language has this concept.
* If statements define a condition and a sequence of statements to execute if the condition is `True`.

Prototype...

```
if some_expression:    
  do_this()
  do_that()
```

If the condition is true, the indented statements are executed.
Otherwise, the indented statements are skipped and program execution continues after the `if` statement.


In [None]:
bases = "AAAAATGCCCCC"
start = "ATG"
if start in bases:
    print(f"The sequence {bases} has a start in it!")
else:
    print(f"Sorry, no start in {bases}")

The sequence AAAAATGCCCCC has a start in it!


## Challenge

In Python, we use indentation to associate a block of statements with a condition, for example...

```
print("1")
if some_condition:    
  print("2")    
  print("3")
print("4") # this line is NOT part of the if block
```

What does the output look like...
* when the `some_condition` is True?
* when the `some_condition` is False?

Here’s a slightly different example...
```
print("1")
if some_condition:    
  print("2")    
print("3")  # this line is NOT part of if block
print("4")  # this line is NOT part of if block
```
What’s different?
What does the output look like...
* when the `some_condition` is True?
* when the `some_condition` is False?


## `if` Block Structure

* In Python, `if` statements blocks are defined by indentation.
* This idea of using indentation to delineate program structure is pervasive in Python and unique across programming languages.
* For now, we're focusing on if statements but later we'll see how indentation is used to define other kinds of statement blocks.

### Block Stucture in Other Languages

In other languages, explicit delineators are used. For example, in Java, C and C++ we would write:

```
if (bases contains "ATG") {
    has_start = true;
}
```

whereas, in Python we write:

```
if "ATG" in bases:
    has_start = True
```
Indentation in Java/C/C++ is a helpful practice for program readability but it does not affect program functionality.
In Python, indentation is not just a good idea - it's affects program logic!


## Python's use of whitespace

* Many people have strong opinions about this aspect of Python.
* Don’t get hung up on this feature. Try it and see what you think after you've written a few Python programs.
* Pitfalls:
  * watch out for mismatched indentation within a block
  * avoid mixing tabs and spaces in your code
  * I prefer spaces because it's more explicit, and most programs will automatically insert spaces even if you press Tab

**Pick _either_ tabs _or_ spaces _but_ be consistent.**

## `else` Statements

Sometimes we want to specify an alternative to the `if` condition, which we do with an `else` statement, for example...

```
if <condition>:
    <block1>
else:
    <block2>
```

* If the condition is true, block1 is executed.
* if the condition is false, block2 is executed.

The else cause is Python's way of saying "otherwise..."



Just as `if` blocks are defined by indentation, `else` blocks are also defined by indentation.

For example, this:

```
if <condition>:
    <statement1>
else:
    <statement2>
    <statement3>
```
is different from this:
```
if <condition>:
    <statement1>
else:
    <statement2>
<statement3>
```


## `elif` Statements

Sometimes we need one or more intermediate conditions between the if and else parts, for example...

`if A then do X, else if B then do Y, otherwise do Z`

We use the `elif` statement to express this in Python...
```
if condition1:
    do_thing_1()
elif condition2:
    do_thing_2()
else:
    do_thing_3()
```
* If `condition1` is true, `do_thing_1()` is executed.
* Otherwise, if `condition2` is true, `do_thing_2()` is executed.
* Otherwise, `do_thing_3()` is executed.


* `elif` blocks are defined the same way as `if` and `else` blocks - using indentation.

* It's good to have an if/elif for every condition of interest and not lump errors together with cases of interest.

For example, if you care about values 1 and 2 and everything else is considered an error, this code:

```
if "ATG" in bases:      # deal with 1 here
  starts_with_atg()
elif "TTG" in bases:    # deal with 2 here
  starts_with_ttg()
else:           # deal with errors here
  no_start_codon()
```
is better than this:
```
if "ATG" in bases:      # deal with 1 here
  starts_with_atg()
else:    # x must be 2 then, right? not necessarily!
  starts_with_ttg()
```
The latter code hides errors by combining a valid case with error cases.


# For loops

We have already actually used `for` loops when we were looking at dictionaries and lists, but just to reiterate ... if you want to iterate over a series of things, you can do so with a `for` loop.


In [None]:
genetic_code = { 'UUU' : 'Phe', 'UUA': 'Leu', 'CGA' : 'Arg', 'CGC' : 'Arg', 'CGG' : 'Arg', 'CGU' : 'Arg' }
# loop through a dictionary (this iterates over the dictionary keys)
for codon in genetic_code:
    amino_acid = genetic_code[codon]
    print(f"The translation of {codon} is {amino_acid}")


The translation of UUU is Phe
The translation of UUA is Leu
The translation of CGA is Arg
The translation of CGC is Arg
The translation of CGG is Arg
The translation of CGU is Arg


<details>
<summary>Advanced for loops</summary>

Python also has a built in iterator that you sometimes see people use. This allows you to apply something to a list, and is called a `list iterator`.

```
genetic_code = { 'UUU' : 'Phe', 'UUA': 'Leu', 'CGA' : 'Arg', 'CGC' : 'Arg', 'CGG' : 'Arg', 'CGU' : 'Arg' }
amino_acids = [genetic_code[base] for base in genetic_code]
print(f"All the amino acids are {amino_acids}")
```
</details>


[Previous Lesson](Python_Lesson_4.ipynb) | [Next Lesson](Python_Lesson_6.ipynb)
