# BTM Beginner Python Notebook

In this session we will use examples from https://github.com/pycam/python-basic condensed down to a 1 hour session

## Printing values

The first bit of python syntax we're going to learn is the <tt>print</tt> statement. This command lets us print messages to the user:

In [None]:
print("Hello from python!")

In [None]:
print(34)

In [None]:
print(2 + 3)

You can print  multiple expressions if you seperate them with commas. Python will insert a space between each element, and a newline at the end of the message:

In [None]:
print("The answer:", 42)

## Using variables

We can give a name to a value using _variables_, the name is apt because the values stored in a variable can _vary_.

A variable can be assigned to a simple value...

In [None]:
x = 3
print(x)

... or the outcome of a more complex expression.

In [None]:
x = 2 + 2
print(x)

A variable can be called whatever you like (as long as it starts with a character, it does not contain space and is meaningful) and you assign a value to a variable with the **`=` operator**. Note that this is different to mathematical equality (which we will come to later...)

You can <tt>print</tt> a variable to see what python thinks its current value is.

In [None]:
serine = "TCA"
print(serine, "codes for serine")
serine = "TCG"
print("as does", serine)

In the interactive interpreter you don't have to <tt>print</tt> everything, if you type a variable name (or just a value), the interpreter will automatically print out what python thinks the value is. Note though that this is not the case if your code is in a file.

In [None]:
3 + 4

In [None]:
x = 5
3 * x

## Simple data types

Python treats different types of data differently. Python has 4 main basic data types. Types are useful to constrain some operations to a certain category of variables. 

### Object type

You can check what type python thinks an expression is with the <tt>type</tt> function, which you can call with the name <tt>type</tt> immediately followed by parentheses enclosing the expression you want to check (either a variable or a value), e.g. <tt>type(3)</tt>. (This is the general form for calling functions, we'll see lots more examples of functions later...)

In [None]:
a = 5
print(a, "is of", type(a))

### Integers

Integers represent whole numbers, as you would use when counting items, and can be positive or negative.

In [None]:
i = -7
j = 123
print(i, j)

### Floats

Floating point numbers, often simply referred to as <tt>float</tt>s, are numbers expressed in the decimal system, i.e. 2.1, 999.998, -0.000004 etc. The value 2.0 would also be interpreted as a floating point number, but the value 2, without the decimal point will not; it will be interpreted as an integer.

In [None]:
x = 3.14159
y = -42.3
print(x * y)

### Strings

Strings represent text, i.e. "strings" of characters. They can be delimited by single quotes <tt>‘</tt> or double quotes <tt>“</tt>, but you have to use the same delimiter at both ends. Unlike some programming languages, such as Perl, there is no difference between the two types of quote, although using one type does allow the other type to appear inside the string as a regular character.

In [None]:
s = "ATGTCGTCTACAACACT"
t = 'Serine'
u = "It's a string with apostrophes"
v = """A string that extends
over multiple lines"""
print(v)

### Booleans

Boolean values represent truth or falsehood, as used in logical operations, for example. Not surprisingly, there are only two values, and in Python they are called <tt>True</tt> and <tt>False</tt>.

In [None]:
a = True
b = False
print(a, b)

### The <tt>None</tt> object

The None object is special built-in value which can be thought of as **representing nothingness or that something is undefined**. For example, it can be used to indicate that a variable exists, but has not yet been set to anything specific.

In [None]:
z = None
print(z)

## Arithmetic

Python supports all the standard arithmetical operations on numerical types, and mostly uses a similar syntax to several other computer languages:

In [None]:
x = 4.5
y = 2

print('x', x, 'y', y)
print('addition x + y =', x + y) 
print('subtraction x - y =', x - y) 
print('multiplication x * y =', x * y) 
print('division x / y =', x / y) 

In [None]:
x = 4.5
y = 2

print('x', x, 'y', y)
print('division x / y =', x / y)
print('floored division x // y =', x // y) 
print('modulus (remainder of x/y) x % y =', x % y) 
print('exponentiation x ** y =', x ** y)

There are a few shortcut assignment statements to make modifying variables directly faster to type

In [None]:
x = 3
x += 1 # equivalent to x = x + 1
x

In [None]:
x = 2
y = 10
y *= x
y

These shortcut operators are available for all arithmetic and logical operators.

## Comments

When you are writing a program it is often convenient to annotate your code to remind you what you were (intending) it to do. In programming these annotations are known as _comments_. You can include a comment in python by prefixing some text with a <tt>#</tt> character. All text following the <tt>#</tt> will then be ignored by the interpreter. You can start a comment on its own line, or you can include it at the end of a line of code.

It is also often useful to temporarily remove some code from a script without deleting it. This is known as _commenting out_ some code.

In [None]:
print("Hi") # this will be ignored
# as will this
print("Bye")
# print "Never seen"

### Quick Exercises

- Assign numerical values to 2 variables, calculate the mean of these two variables and store the result in another variable. Print out the result to the screen.
- Print the type of your result

### Collections of values

As well as the basic data types we introduced above, very commonly you will want to store and operate on collections of values, and python has several _data structures_ that you can use to do this. The general idea is that you can place several items into a single collection and then refer to that collection as a whole. Which one you will use will depend on what problem you are trying to solve.

## Tuples

- Can contain any number of items
- Can contain different types of items
- __Cannot__ be altered once created (they are immutable)
- Items have a defined order

A tuple is created by using round brackets around the items it contains, with commas seperating the individual elements.

In [None]:
a = (123, 54, 92) # tuple of 3 integers
b = () # empty tuple
c = ("Ala",) # tuple of a single string (note the trailing ",")
d = (2, 3, False, "Arg", None) # a tuple of mixed types

print(a)
print(b)
print(c)
print(d)

You can of course use variables in tuples and other data structures

In [None]:
x = 1.2
y = -0.3
z = 0.9
t = (x, y, z)

print(t)

Tuples can be _packed_ and _unpacked_ with a convenient syntax. The number of variables used to unpack the tuple must match the number of elements in the tuple.

In [None]:
t = 2, 3, 4 # tuple packing
print('t is', t)
x, y, z = t # tuple unpacking
print('x is', x)
print('y is', y)
print('z is', z)

## Lists

- Can contain any number of items
- Can contain different types of items
- __Can__ be altered once created (they are _mutable_)
- Items have a particular order

Lists are created with square brackets around their items:

In [None]:
a = [1, 3, 9]
b = ["ATG"]
c = []

print(a)
print(b)
print(c)

Lists and tuples can contain other list and tuples, or any other type of collection:

In [None]:
matrix = [[1, 0], [0, 2]]
print(matrix)

## Manipulating tuples and lists

Once your data is in a list or tuple, python supports a number of ways you can access elements of the list and manipulate the list in useful ways, such as sorting the data.

Tuples and lists can generally be used in very similar ways.

### Index access

You can access individual elements of the collection using their _index_, note that the first element is at **index 0**. Negative indices count backwards from the end.

In [None]:
t = (123, 54, 92, 87, 33)
x = [123, 54, 92, 87, 33]

print('t is', t)
print('t[0] is', t[0])
print('t[2] is', t[2])

print('x is', x)
print('x[-1] is', x[-1])

### Slices

You can also access a range of items, known as _slices_, from inside lists and tuples using a colon `:` to indicate the beginning and end of the slice inside the square brackets. **Note that the slice notation `[a:b]` includes positions from `a` up to _but not including_ `b`**.

In [None]:
t = (123, 54, 92, 87, 33)
x = [123, 54, 92, 87, 33]
print('t[1:3] is', t[1:3])
print('x[2:] is', x[2:])
print('x[:-1] is', x[:-1])

### Modifying lists
You can alter lists in place, but not tuples

In [None]:
x = [123, 54, 92, 87, 33]
print(x)
x[2] = 33
print(x)

Tuples _cannot_ be altered once they have been created, if you try to do so, you'll get an error.

In [None]:
t = (123, 54, 92, 87, 33)
print(t)
t[1] = 4

You can add elements to the end of a list with <tt>append()</tt>

In [None]:
x = [123, 54, 92, 87, 33]
x.append(101)
print(x)

or insert values at a certain position with <tt>insert()</tt>, by supplying the desired position as well as the new value

In [None]:
x = [123, 54, 92, 87, 33]
x.insert(3, 1111)
print(x)

You can remove values with <tt>remove()</tt>

In [None]:
x = [123, 54, 92, 87, 33]
x.remove(123)
print(x)

and delete values by index with <tt>del</tt>

In [None]:
x = [123, 54, 92, 87, 33]
print(x)
del x[0]
print(x)

## String manipulations

Strings are a lot like tuples of characters, and individual characters and substrings can be accessed and manipulated using similar operations we introduced above.


In [None]:
text = "ATGTCATTTGT"
print(text[0])
print(text[-2])
print(text[0:6])

Just as with tuples, trying to assign a value to an element of a string results in an error

In [None]:
text = "ATGTCATTTGT"
text[0:2] = "CCC" 

### Quick Exercises

- Create a list of DNA codons for proline - CCT, CCC, CCA, CCG
- Print the first item, print the last item
- Append the stop codon TAG
- Replace the first element with AUG

### Getting help

In [None]:
help(len)

In [None]:
help(list)

In [None]:
help(list.insert)

In [None]:
help(list.count)

## Dictionaries

Lists are useful in many contexts, but often we have some data that has no inherent order and that we want to access by some useful name rather than an index. For example, as a result of some experiment we may have a set of genes and corresponding expression values. We could put the expression values in a list, but then we'd have to remember which index in the list corresponded to which gene and this would quickly get complicated.

For these situations a _dictionary_ is a very useful data structure.

Dictionaries:

- Contain a mapping of keys to values (like a word and its corresponding definition in a dictionary)
- The keys of a dictionary are unique, i.e. they cannot repeat
- The values of a dictionary can be of any data type
- The keys of a dictionary cannot be an internally modifiable type (e.g. lists, but you can use tuples)
- Dictionaries do not store data in any particular order

KEY::VALUE

In [None]:
dna = {"A": "Adenine", "C": "Cytosine", "G": "Guanine", "T": "Thymine"}
print(dna)

You can access values in a dictionary using the key inside square brackets

In [None]:
dna = {"A": "Adenine", "C": "Cytosine", "G": "Guanine", "T": "Thymine"}
print("A represents", dna["A"])
print("G represents", dna["G"])

You can check if a key is in a dictionary with the <tt>in</tt> operator, and you can negate this with <tt>not</tt>

In [None]:
dna = {"A": "Adenine", "C": "Cytosine", "G": "Guanine", "T": "Thymine"}
"T" in dna

You can introduce new entries in the dictionary by assigning a value with a new key:

In [None]:
dna = {"A": "Adenine", "C": "Cytosine", "G": "Guanine", "T": "Thymine"}
dna['Y'] = 'Pyrimidine'
print(dna)

You can change the value for an existing key by reassigning it:

In [None]:
dna = {'A': 'Adenine', 'C': 'Cytosine', 'T': 'Thymine', 'G': 'Guanine', 'Y': 'Pyrimidine'}
dna['Y'] = 'Cytosine or Thymine'
print(dna)

You can delete entries from the dictionary:

In [None]:
dna = {'A': 'Adenine', 'C': 'Cytosine', 'T': 'Thymine', 'G': 'Guanine', 'Y': 'Pyrimidine'}
del dna['Y']
print(dna)

You can get a list of all the keys (in arbitrary order) using the inbuilt <tt>.keys()</tt> function

In [None]:
dna = {'A': 'Adenine', 'C': 'Cytosine', 'T': 'Thymine', 'G': 'Guanine', 'Y': 'Pyrimidine'}
print(list(dna.keys()))

And equivalently get a list of the values:

In [None]:
dna = {'A': 'Adenine', 'C': 'Cytosine', 'T': 'Thymine', 'G': 'Guanine', 'Y': 'Pyrimidine'}
print(list(dna.values()))

And a list of tuples containing (key, value) pairs:

In [None]:
dna = {'A': 'Adenine', 'C': 'Cytosine', 'T': 'Thymine', 'G': 'Guanine', 'Y': 'Pyrimidine'}
print(list(dna.items()))

## Program control and logic

A program will normally run by executing the stated commands, one after the other in sequential order. Frequently however, you will need the program to deviate from this. There are several ways of diverting from the line-by-line paradigm...

## Conditional execution

### The <tt>if</tt> statement

A conditional <tt>if</tt> statement is used to specify that some block of code should only be executed if some associated test is upheld; a conditional expression evaluates to <tt>True</tt>.

In [None]:
x = -3

if x > 0:
  print("Value is positive")

elif x < 0:
  print("Value is negative")

else:
  print("Value is zero")

The general form of writing out such combined conditional statements is as follows:

<pre>
if conditionalExpression1:
    # codeBlock1

elif conditionalExpression2:
    # codeBlock2

elif conditionalExpressionN:
    # codeBlockN
    +any number of additional elif statements, then finally:

else:
    # codeBlockE
</pre>

### Comparisons and truth

With conditional execution the question naturally arises as to which expressions are deemed to be true and which false. For the python boolean values <tt>True</tt> and <tt>False</tt> the answer is (hopefully) obvious. Also, the logical states of truth and falsehood that result from conditional checks like “Is x greater than 5?” or “Is y in this list?” are also clear. When comparing values Python has the standard comparison (or relational) operators, some of which we have already seen:

|Operator |	Description |	Example |
|---------|-------------|-----------|
|`==`  |	    equality |	1 == 2 # False |
|`!=`  |	    non equality |	1 != 2 # True |
| `<`  |	    less than |	1 < 2 # True |
| `<=` |	    equal or less than |	2 <= 2 # True |
| `>`  |	    greater then |	1 > 2 # False |
| `>=` |	    equal or greater than |	1 >= 1 # True |

It is notable that comparison operations can be combined, for example to check if a value is within a range.

In [None]:
x = -5

if x > 0 and x < 10:
    print("In range A")
    
elif x < 0 or x > 10:
    print("In range B")

## Loops

When an operation needs to be repeated multiple times, for example on all of the items in a list, we 
avoid having to type (or copy and paste) repetitive code by creating a loop. There are two ways of creating loops in Python, the <tt>for</tt> loop and the <tt>while</tt> loop.

## The <tt>for</tt> loop

The for loop in Python iterates over each item in a sequence (such as a list or tuple) in the order that they appear in the sequence. What this means is that a variable (<tt>code</tt> in the below example) is set to each item from the sequence of values in turn, and each time this happens the indented block of code is executed again.

In [None]:
codeList = ['NA06984', 'NA06985', 'NA06986', 'NA06989', 'NA06991']

for code in codeList:
    print(code)

A <tt>for</tt> loop can iterate over the individual characters in a string:

In [None]:
dnaSequence = 'ATGGTGTTGCC'

for base in dnaSequence:
    print(base)

And also over the keys of a dictionary: 

In [None]:
rnaMassDict = {"G":345.21, "C":305.18, "A":329.21, "U":302.16}

for x in rnaMassDict:
    print(x, rnaMassDict[x])

Any variables that are defined before the loop can be accessed from inside the loop. So for example to calculate the summation of the items in a list of values we could define the total initially to be zero and add each value to the total in the loop:

In [None]:
total = 0
values = [1, 2, 4, 8, 16]

for v in values:
    total = total + v
    print(total)

print(total)

## The <tt>while</tt> loop

In addition to the <tt>for</tt> loop that operates on a collection of items, there is a <tt>while</tt> loop that simply repeats while some statement evaluates to True and stops when it is False. Note that if the tested expression never evaluates to False then you have an “infinite loop”, which is not good.

In this example we generate a series of numbers by doubling a value after each iteration, until a limit is reached: 

In [None]:
value = 0.25
while value < 8:
    value = value * 2
    print(value)

print("final value:", value)

## More looping

### Using `range()`

If you would like to iterate over a numeric sequence then this is possible by combining the `range()` function and a for loop.

In [None]:
print(list(range(10)))

print(list(range(5, 10)))

print(list(range(0, 10, 3)))

print(list(range(7, 2, -2)))

Looping through ranges 

In [None]:
for x in range(8):
    print(x*x)

In [None]:
squares = []
for x in range(8):
    s = x*x
    squares.append(s)
    
print(squares)

### Quick Exercises

- Create a if..elif..else block that will compare a variable containing your age to another variable containing another person's age and print a statement which says if you are younger, older or the same age as that person.


- Create a list of - A, T, C, G
- Create a `for` loop to print every base of the sequence on a new line.

- Create a for loop to loop through numbers 5 to 10, and print its value divided by 10

# Data input and output (I/O)

## Using files

Frequently the data we want to operate on or analyse will be stored in files, so in our programs we need to be able to open files, read through them (perhaps all at once, perhaps not), and then close them. 

We will also frequently want to be able to print output to files rather than always printing out results to the terminal.

Python supports all of these modes of operations on files, and provides a number of useful functions and syntax to make dealing with files straightforward.

### The with statement

In [None]:
# fileObj will be closed when leaving the block
with open( "data/datafile.txt" ) as fileObj:
    for ( i, line ) in enumerate( fileObj, start = 1 ):
        print( i, line.strip() )

### Mode modifiers

These mode strings can include some extra modifier characters to deal with issues with files across multiple platforms.

`'b'`: binary mode, e.g. `'rb'`. No translation for end-of-line characters to platform specific setting value.

|Character | Meaning |
|----------|---------|
|`'r'` |	open for reading (default) |
|`'w'` |	open for writing, truncating the file first |
|`'x'` |	open for exclusive creation, failing if the file already exists |
|`'a'` |	open for writing, appending to the end of the file if it exists |
|`'b'` |	binary mode |
|`'t'` |	text mode (default) |
|`'+'` |	open a disk file for updating (reading and writing) |

### More info?

- The full Cambridge course: http://pycam.github.io/ 
- Bio-IT courses: https://bio-it.embl.de/
- EMBL chat - python & R channels
- EPUG & emblr
- Google & StackOverflow
- Useful packages: https://www.scipy.org/
- IDEs: spyder, pycharm, atom etc...