<h1 id="toctitle">Conditions</h1>
<ul id="toc"/>

##True and False

__Conditions__ are things that can be evaluated as either true or false:

In [12]:
print(3 == 5)
print(3 > 5)
print(3 <= 5)
print(len("ATGC") > 5)
print("GAATTC".count("T") > 1)
print("ATGCTT".startswith("ATG"))
print("ATGCTT".endswith("TTT"))
print("ATGCTT".isupper())
print("ATGCTT".islower())
print("V" in ["V", "W", "L"])

False
False
True
False
True
True
False
True
False
True


There are a bunch of different types of true/false conditions in Python:

- equals (and not equals with `!=`)
- numerical comparisons (`>`, `<`, `>=`, `<=`)
- string methods (`startswith`, `isupper`)
- is a value in a list

We can experiment by typing in at the console:

In [13]:
4 != 5

True

##Using conditions

###`if` statements

The simplest thing we can do with a condition is execute some code if it's true:

In [15]:
expression_level = 125
if expression_level > 100:
    print("gene is highly expressed")

gene is highly expressed


Notice that

- the condition line starts with `if`
- the thing we want to test goes after the `if`
- the line ends with a colon
- just like with loops, the body is indented
- can have multiple lines in the body

A more interesting example:

In [16]:
accs = ['ab56', 'bh84', 'hv76', 'ay93']
for accession in accs:
    if accession.startswith('a'):
        print(accession)

ab56
ay93


Code with conditions often involves mulitple levels of indentation. Watch out for `IndentationErrors` and indentation mistakes.

###`else` statements

The `if` examples above are yes/no - either we execute the bit of code, or we do nothing. 

Sometimes we want either/or - two different branches:

In [20]:
accs = ['ab56', 'bh84', 'hv76', 'ay93', 'ap97', 'bd72']
for accession in accs:
    if accession.startswith('a'):
        print(accession  + " starts with a")
    else:
        print(accession  + " doesn't start with a")

ab56 starts with a
bh84 doesn't start with a
hv76 doesn't start with a
ay93 starts with a
ap97 starts with a
bd72 doesn't start with a


The `else` line has nothing on it. The `else` block is indented and is run when the condition is false.

### `elif` statements

Sometimes we want multiple branches. We can nest conditions like this:

In [21]:
accs = ['ab56', 'bh84', 'hv76', 'ay93', 'ap97', 'bd72']
for accession in accs:
    if accession.startswith('a'):
        print(accession  + " starts with a")
    else:
        if accession.startswith('b'):
            print(accession  + " starts with b")
        else:
            print(accession  + " starts with something else")

ab56 starts with a
bh84 starts with b
hv76 starts with something else
ay93 starts with a
ap97 starts with a
bd72 starts with b


But each extra option needs an extra level of indentation. Better to use `elif`:

In [22]:
accs = ['ab56', 'bh84', 'hv76', 'ay93', 'ap97', 'bd72']
for accession in accs:
    if accession.startswith('a'):
        print(accession  + " starts with a")
    elif accession.startswith('b'):
        print(accession  + " starts with b")
    else:
        print(accession  + " starts with something else")

ab56 starts with a
bh84 starts with b
hv76 starts with something else
ay93 starts with a
ap97 starts with a
bd72 starts with b


Much easier to read, and we can add as many conditions as we like:

In [24]:
accs = ['ab56', 'eh84', 'hv76', 'ay93', 'zp97', 'cd72']

for accession in accs:
    if accession.startswith('a'):
        print('a!')
    elif accession.startswith('b'):
        print('b!')
    elif accession.startswith('c'):
        print('c!')
    elif accession.startswith('d'):
        print('d!')
    elif accession.startswith('e'):
        print('e!')
    else:
        print('sometheing else!')

a!
e!
sometheing else!
a!
sometheing else!
c!


`else` and `elif` are good when we have _mutually excluside_ posibilities. If more than one can be true, then use multiple `if` lines:

In [25]:
accs = ['ab56', 'bh84', 'hv76', 'ay93', 'ap97', 'bd72']
for accession in accs:
    if accession.startswith('a'):
        print(accession  + " starts with a")
    if accession.endswith('6'):
        print(accession  + " ends with 6")

ab56 starts with a
ab56 ends with 6
hv76 ends with 6
ay93 starts with a
ap97 starts with a


##Complex conditions
###`and`

We can join together two conditions with `and` to make a complex one that's only true if both of the simple ones are true:

In [26]:
accs = ['ab56', 'bh84', 'hv76', 'ay93', 'ap97', 'bd72']
for a in accs:
    if a.startswith('a') and a.endswith('3'):
        print("both conditions are true for " + a)

both conditions are true for ay93


###`or`

We can do the same with `or` and it will be true if either of the simple conditions are true:

In [27]:
accs = ['ab56', 'bh84', 'hv76', 'ay93', 'ap97', 'bd72']
for a in accs:
    if a.startswith('a') or a.endswith('3'):
        print("at least one condition is true for " + a)

at least one condition is true for ab56
at least one condition is true for ay93
at least one condition is true for ap97


We can even join together complex conditions in this way:

In [32]:
accs = ['ab56', 'bh84', 'hv76', 'ay93', 'ap97', 'bd72']
for a in accs:
    if (a.startswith('b') or a.startswith('h')) and a.endswith('4'):
        print("complex condition is true for " + a)

complex condition is true for bh84


Complex conditions can be hard to read. If we put parentheses around the whole thing, then we are allowed to split it up over multiple lines:

In [35]:
accs = ['ab56', 'bh84', 'hv76', 'ay93', 'ap97', 'bd72']
for a in accs:
    if (
        (a.startswith('b') 
        or a.startswith('h')) 
        and a.endswith('4')
    ):
        print("complex condition is true for " + a)

complex condition is true for bh84


##Exercises

###Processing tabular data

Open the text file called _data.csv_, which contains some made-up data for a number of genes. Each line contains the following fields for a single gene in this order: species name, sequence, gene name, expression level. The fields are separated by commas (hence the name of the file – __csv__ stands for __Comma Separated Values__):

```
Drosophila yakuba,cgcgcgc...gatgc,hdt739,85
```

Think of it as a representation of a table in a spreadsheet – each line is a row, and each field in a line is a column. 

Print out the gene names for all genes belonging to Drosophila melanogaster or Drosophila simulans.

Print out the gene names for all genes between 90 and 110 bases long.

Print out the gene names for all genes whose AT content is less than 0.5 and whose expression level is greater than 200.

Print out the gene names for all genes whose name begins with "k" or "h" except those belonging to Drosophila melanogaster.

For each gene, print out a message giving the gene name and saying whether its AT content is high (greater than 0.65), low (less than 0.45) or medium (between 0.45 and 0.65).

###Pairwise distance

Warning: difficult!

Here is a list of DNA sequences:

`['ATTGTACGG', 'AATGAACCG', 'AATGAACCC', 'AATGGGAAT']`

Write a program that calculates and prints, for each pair of sequences, the percentage of identical positions. 

Hint: 

```
if base1 == base2:
    # do something
```

###Kmer counting

Warning: difficult!

Write a program that, given a DNA sequence, will print all the k-mers (e.g. 4-mers) that occur more than n times. 

E.g. with dna="ATGCATCATG", k=2 and n=2 print:

AT 

In [4]:
# ignore this cell, it's for loading custom js code
from IPython.core.display import Javascript
Javascript(filename="custom.js")

<IPython.core.display.Javascript object>

In [3]:
# ignore this cell, it's for loading custom css code
from IPython.core.display import HTML
HTML(filename="custom.css")