!['python logo'](https://www.python.org/static/community_logos/python-logo-generic.svg)
***
<font color=#7a7a7a>This is a Python tutorial. If it gets too boring, go for https://docs.python.org/3/tutorial/ or scroll through http://flowingdata.com/ to get some inspiration for the course</font>
***

## <font color=#4b8bbe>Numbers</font>
Let’s try some simple Python commands. You can run a cell by pressing *Shift + Enter*

In [None]:
# this is just a comment

Expression syntax is straightforward: the operators +,  -,  *  and / work just like in most other languages;
parentheses (  ) can be used for grouping


In [None]:
2 + 2

8 / 5  # division always returns a float

Note that only the last result is shown. To see and to remember both, we can store them as variables. The equal sign is used to assign a value to a variable

In [None]:
total = 2 + 2
quotient = 8 / 5

Now, to see the answers, we print them

In [None]:
print(total)
print(quotient)

We can use these variables now

In [None]:
total - quotient

And ask questions about them with ==, !=, >, <. The answer will be a boolean value

In [None]:
total > quotient

If a variable is not defined, trying to use it will give you an error

In [None]:
x  # try to access an undefined variable

For additional arithmetic operations, there is a library called *math*

In [None]:
import math

To see what is can do, let's use some help

In [None]:
help(math)

To use it, just type *math.function(...)*

In [None]:
# try it here

## <font color=#4b8bbe>Strings</font>

In [None]:
empty = '' # single or double quotes can be used to initialize a string 

String methods

In [None]:
dna = 'ACGCACATCAGTATAAGTGCACATATCGATGACGAGAACATGGAATCGTCAGCAGGAGAA'
rna = dna.replace('T', 'U')

In [None]:
rna

In [None]:
dna == rna

In [None]:
stop_codon = 'TAG'
dna += stop_codon # same as dna = dna + stop_codon; note that you can sum strings but not strings and numbers

In [None]:
# print the dna sequence to see how it looks like now

In [None]:
n_codons = len(dna) / 3

In [None]:
print('this DNA sequence is made up of ' + str(n_codons) + ' codons') # str() converts an integer to a string

In [None]:
dna[0:3]

In [None]:
# take two next codons here

In [None]:
dna[-3:]

In [None]:
sentence = 'this is a sentence'
words = sentence.split(' ')

In [None]:
words # why are words in square brackets?

In [None]:
type(words)

## <font color=#4b8bbe>Lists</font>

Lists are used to group together other things. Those can be of different types, but usually all the items have same type

In [None]:
days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

In [None]:
days[1] # indexing works just the same as for strings

In [None]:
days.index('Monday')

In [None]:
# print the weekends here

In [None]:
days.insert(3, 'Saturday') # let's have more weekends!
print(days)

In [None]:
# but we do'nt need that many, do we?
days.pop(3)
print(days)

In [None]:
# what are other ways to remove items from a list?

In [None]:
numbers = [0, 1, 2, 3, 4, 5]

In [None]:
numbers.append(6)
print(numbers)

In [None]:
numbers.extend([7, 9, 8])
print(numbers)

Be careful, as these functions do change lists. Also important: 
    
   `list1 = list2` makes lists refer to the same memory location, therefore changes in one change the other. 

   `list3 = list1[:]` creates an independent copy. 

In [None]:
sorted(numbers)

In [None]:
max(numbers)

A very convenient way to work with lists is to use the *numpy* library

In [None]:
import numpy as np

In [None]:
print('the average of our list is', np.mean(numbers))
print('the standard deviation is', np.std(numbers))

## <font color=#4b8bbe>Dictionaries</font>

This is another useful data type. It is best to think of a dictionary as a set of key and value pairs, with the requirement that the keys are unique within one dictionary. Dictionaries are **unordered**

In [None]:
genome_size = {'yeast': 12, 'nematode': 95.5, 'rice': 470, 'chicken': 1000} # estimated total size of genomes in Mb

In [None]:
genome_size['rice']

In [None]:
genome_size.keys()

In [None]:
genome_size.values()

In [None]:
n_genes = dict(zip(['yeast', 'nematode', 'rice', 'chicken'], [6000, 18000, 51000, 23000])) # estimated number of protein-coding genes

In [None]:
n_genes

## <font color=#4b8bbe>First steps towards programming</font>

Of course, we can use Python for more complicated tasks than adding two and two together

#### <font color=#4b8bbe>Conditions</font>

In [None]:
import random
x = random.randint(-100, 100) # here we generate a pseudo random number between -100 and 100

if x > 0:
    print('x is positive')
elif x == 0:
    print('x equals 0')
else:
    print('x is negative')

In [None]:
x

#### <font color=#4b8bbe>Loops</font>

In [None]:
for i in range(0, len(days)):
    print('the ' + str(i + 1) +'th day is ' + days[i])

When looping through dictionaries, the key and corresponding value can be retrieved at the same time using the *items( )* method

In [None]:
for k, v in n_genes.items():
    print(str(k) + ' has approximately ' + str(v) + ' genes')

#### <font color=#4b8bbe>List comprehensions</font>

List comprehensions provide a concise way to create lists. Common applications are to make new lists where each element is the result of some operations applied to each member of another sequence or iterable, or to create a subsequence of those elements that satisfy a certain condition

For example, assume we want to create a list of squares, like:

In [None]:
squares = []
for x in range(10):
    squares.append(x ** 2)

print(squares)

Or, equivalently:

In [None]:
squares = [x ** 2 for x in range(10)]

print(squares)

#### <font color=#4b8bbe>Excercise</font>

Let's translate DNA to protein

In [None]:
codon_table = {'TTT': 'F', 'TTC': 'F', 'TTA': 'L', 'TTG': 'L', 'TCT': 'S', 'TCC': 'S', 'TCA': 'S', 'TCG': 'S', 'TAT': 'Y', 'TAC': 'Y', 'TAA': '-', 'TAG': '-', 'TGT': 'C', 'TGC': 'C', 'TGA': '-', 'TGG': 'W', 'CTT': 'L', 'CTC': 'L', 'CTA': 'L', 'CTG': 'L', 'CCT': 'P', 'CCC': 'P', 'CCA': 'P', 'CCG': 'P', 'CAT': 'H', 'CAC': 'H', 'CAA': 'Q', 'CAG': 'Q', 'CGT': 'R', 'CGC': 'R', 'CGA': 'R', 'CGG': 'R', 'ATT': 'I', 'ATC': 'I', 'ATA': 'I', 'ATG': 'M', 'ACT': 'T', 'ACC': 'T', 'ACA': 'T', 'ACG': 'T', 'AAT': 'N', 'AAC': 'N', 'AAA': 'K', 'AAG': 'K', 'AGT': 'S', 'AGC': 'S', 'AGA': 'R', 'AGG': 'R', 'GTT': 'V', 'GTC': 'V', 'GTA': 'V', 'GTG': 'V', 'GCT': 'A', 'GCC': 'A', 'GCA': 'A', 'GCG': 'A', 'GAT': 'D', 'GAC': 'D', 'GAA': 'E', 'GAG': 'E', 'GGT': 'G', 'GGC': 'G', 'GGA': 'G', 'GGG': 'G'}

In [None]:
dna

In [None]:
peptide = ''

In [None]:
for i in range(0, len(dna), 3):
    print(dna[i:i+3])

In [None]:
for i in range(0, len(dna), 3):
    print(codon_table[dna[i:i+3]])

In [None]:
for i in range(0, len(dna), 3):
    peptide += codon_table[dna[i:i+3]]

In [None]:
peptide

How would you do this with a list comprehension?

In [None]:
# translate the DNA here

## <font color=#4b8bbe>Working with data</font>

Let's read a .txt file with some data

In [None]:
with open('iris.txt') as f:
    lines = [x.strip('\n').split('\t') for x in f.readlines()]

In [None]:
lines

This is a table. To work with tables, there is a *pandas* library 

In [None]:
import pandas as pd

In [None]:
data = pd.DataFrame(lines[1:], columns=lines[0])

In [None]:
data

In [None]:
data = data.set_index('species')

In [None]:
data.columns

In [None]:
data.values

In [None]:
data.iloc[1:3] # use iloc to select lines of table by line number

In [None]:
data.loc['setosa'] # use loc to select by labels

Go here to read more about indexing and selecting data: https://pandas.pydata.org/pandas-docs/stable/indexing.html

Now, let's save the table as a *.csv* file

In [None]:
data.to_csv('iris.csv')

## <font color=#4b8bbe>And finally some plots</font>

There are many libraries in python for plotting data. We will start start with the basic one

In [None]:
from matplotlib import pyplot as plt
%matplotlib inline

#### <font color=#4b8bbe>y = sin(x)</font>

In [None]:
x = np.linspace(-10, 10, 100) # a numpy array of one hundred floats from 0 to 10

In [None]:
y = [math.sin(i) for i in x] # a list of sin(x) values

In [None]:
plt.plot(x, y);

#### <font color=#4b8bbe>Normal distribution</font>

In [None]:
np.random.normal() # draws samples from a normal distribution

In [None]:
randoms = [] # let's sample more

for i in range(100):
    randoms.append(np.random.normal())

In [None]:
plt.hist(randoms, bins=10);

#### <font color=#4b8bbe>The *iris* data set</font>

For plotting *pandas* dataframe there is a nice *seaborn* library. Take a look at here: https://seaborn.pydata.org/generated/seaborn.pairplot.html

In [None]:
# feel free to write here; let's use plt.scatter(x, y) for now