# Table of Contents
* [Introduction](#Introduction-to-Python)
* [Anacoda](#Anaconda)
* [Variables](#Variables)
* [Data types](#Data-types)
* [Assignment](#Assignment)
* [Comments](#Comments)
* [Printing data](#Printing-data)
* [Strings](#Strings)
* [Numeric types](#Numeric-types)
* [Conditional statements](#Python's-conditional-statement:-if---elif---else)
* [Loops](#Python's-loops)
* [Functions](#More-about-Python's-functions)
* [Lists](#Python's-lists)
* [Dictionaries](#Dictionaries)
* [Input Output](#Input/Output)
* [Additional practice](#Additional-practice)
* [Homework](#Homework)

# Introduction to Python

<http://www.python.org>

**Python** is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built-in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing components together. Python's simple, easy-to-learn syntax emphasizes readability and therefore reduces the cost of program maintenance. Python supports modules and packages, which encourages program modularity and code reuse. The Python interpreter and the extensive standard library are available in source or binary form without charge for all major platforms, and can be freely distributed.


## Python tutorials

* Codecademy: <https://www.codecademy.com/learn/learn-python>
* More tutorials: <https://wiki.python.org/moin/BeginnersGuide/Programmers>

# Developer environments


## Anaconda 

<https://www.anaconda.com/>

* Python distribution for large-scale data processing, predictive analytics, and scientific computing
* Includes many libraries useful for biological data analysis
* Free
* Cross-platform
* Auto-updating
* Includes easy-to-use tools

### Documentation

<https://docs.anaconda.com/anaconda/>

### Download

Please select **Python3 version** for download

<https://www.anaconda.com/download/>

### Installation

Open the downloded package and follow instructions on your screen.

### Starting the Anaconda Navigator

Anaconda Navigator is a launcher that allows to start various components of the anaconda package. The Anaconda Navigator can be found in the *bin* directory of your Anaconda installation (`anaconda-navigator`). Depending on your installation, you may find it also in the standard places for applications on your computer, i.e. in the Windows Start menu on Windows systems, or Applications folder on MacOs.

## Jupyter notebooks

<http://jupyter.org/>

Project Jupyter is an open-source web application that allows you to create and share *notebook documents* that contain live code, equations, visualizations and explanatory text. It is a great way to keep track of the programs you write, annotate them with comments (making it easier to follow and communicate what your program does) and - at the same time - write and execute them, generating output that you can view *on-the-fly*.

### Documentation
<http://jupyter-notebook.readthedocs.io/en/latest/>

### Start Jupyter with the Anaconda Navigator

In Anaconda Navigator click on the Jupyter Notebook icon. It opens a file browser window where you can open an existing notebook or create a new one.

Alternatively, if your *path* is set to include the Anaconda binaries, you can simply type at your commandline prompt:

```python
jupyter notebook
```

### Create a new folder

Let's create a new folder which will contain materials for this course. Find the *New* button in the top right corner. Click it and select the *Folder* option from the drop-down menu. An `Untitled Folder` will be created. Now click the checkbox in front of the folder. On the top left corner click the *Rename* button and enter e.g. `PythonCourse` in the dialog box. Click the *Rename* button in the dialog box. Now click on the new folder to change to that directory. 

### Create a new notebook

Let's create our first Jupyter notebook. Click the *New* button in the top right corner and select the *Python 3* option from the drop-down menu. A new window will be opened in your browser. In the header, click on `Untitled` and type a new name for your notebook, for example `Session1`.

### Notebook cells

The notebook consists of a sequence of cells. A cell is a multiline text input field, and its contents can be executed by pressing Shift-Enter, or by clicking the *Run* button in the toolbar. The execution behavior of a cell is determined by the cell’s type. There are different types of cells, but here we will only use two of them: code cells and markdown cells. The default type is *Code*, but this can be changed with the drop-down menu in the toolbar.

### Document your work

We will use markdown cells for storing longer text information, e.g. the description of the project or detailed information about the following code cell. So let's change the type of the first cell to *Markdown* and type the following into the cell:
```
# My first python notebook
```
Now press <kbd>Shift</kbd>+<kbd>Enter</kbd> or click the *Run* button. This will evaluate the content of the current cell and format it accordingly. It also opens a new cell for us (or move to the next cell if there already is one). You can also create new cells manually.

### Working with cells

You can add and delete cells using the *Edit* and *Insert* menus fron the notebook toolbar.

### Write and execute code

Type in a new *Code* cell:
```python
print("Welcome to Python!")
```
Now evaluate the cell by pressing <kbd>Shift</kbd>+<kbd>Enter</kbd> or clicking the *Run* button. Under the cell you should now see the sentence `Welcome to Python!`. This is the *output* of your code.

In [2]:
print("Welcome to Python!")

Welcome to Python!


***Great!*** We wrote and executed our first python program in this class!

### Saving your notebook

Yur notebook is saved automatically every two minutes. Do not forget to save your notebook by clicking the disc icon in toolbar after making some changes or before exiting the notebook.

### Exiting your notebook

To exit a notebook simply close the browser tab or window. Would be good to shutdown the kernel if you are done with the notebook, otherwise you will have to do it from the commandline. Do not forget to save your work before exiting! Afterwards you can close the Anaconda Navigator.

### More about markdown cells

Markdown (<https://daringfireball.net/projects/markdown/>) is a text formatting syntax that allows you to write using an easy-to-read, easy-to-write plain text format which can then be converted to *pretty*, formatted text by applications that support it (such as Jupyter and many others). It can be used to indicate headers and URLs, emphasize text, and generate lists, among others things (see this text for an example). Note that the single <kbd>#</kbd> character transformed the text in our previous markdown cell into a *level 1* header - this is markdown in action! While learning the ins and outs of markdown is outside the scope of this course, make sure to have a look at this *cheat sheet* to see how it's used: <https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet>. As you can see, it's really quite simple!

## Visual Studio

This is another environment you can use to write and test code, which can be installed from

https://code.visualstudio.com/

You can install handy extensions such as **Python** extension that has syntax and error-checking support for Python and **Github Copilot**, an AI code generation app that analyzes your code context and suggests pieces of code that you may want to type next, saving you time with typing, documentation of your code etc. It's very useful if you know what you want and can evaluate the correctness and efficiency of the suggested code.

You can also work with jupyter notebooks in VSCode.

# Variables

We use computer programs to process information. Typically, a computer program takes some information as input, carries out a number of operations that use that information and then reports some results to the user. Fundamental requirements for manipulating information are to be able to store, retrieve and change it. An object designed for this purpose is the *variable*.

# Data types

Variables can be of different *types*, reflecting the different kinds of data that we work with and the operations that we typically perform (and that make sense to perform) on these data. Typical data types available in most programming languages are strings, numbers, and objects that can be enumerated (e.g. lists). In Python and many other languages, additional/custom data types can be defined by the user. A typical example would be a `Student` data type that might be handy for handling administrative information about students in a university. 

### Strings and numbers

*Strings* are arbitrarily long sequences of arbitrary characters. Operations on strings that are built into Python are, for e.g., scanning a string for the occurrence of *substrings*, splitting it into pieces or concatenating several strings together. Numbers are combinations of digits, and in Python3 they can more specifically be divided into three distinct data types: *integers*, *floating point numbers*, and *complex numbers*. Standard operations implemented for numeric data types are arithmetic operations such as addition, subtraction and multiplication. Note however that the way these operations are defined and implemented depends on the specific type of the number in question.

### Type checking

When trying to execute a command, Python checks the type of the data, and based on that, it decides which implementation to use for the requested operation (multiplication is different for integers and complex numbers!). Python also checks whether the types of variables are consistent with the operations that the user wants to perform on them. If it detects an inconsistency, Python quits execution, giving an informative error message. For example, you can *not* multiply a string with another string as this operation is not defined for two strings (you *can* multiply a single string with an integer though; can you guess what this operation returns?).

In [3]:
"A" * "3"

TypeError: can't multiply sequence by non-int of type 'str'

In [None]:
"A" * 3

### Common data types in Python

* `str`: A string (sequence of characters surrounded by single or double quote symbols) QUOTES!
```pyhton
"ACGT-GGTC"
'-4.56'
"%78*@@@-#R!C"
```

* `int`: An integer (integral number) NO QUOTES!
```python
10
348203032920
```
* `float`: A floating point number (number with integral and fractional part, separated by dot) NO QUOTES!
```python
2.0
3.141592653589793
```

* `bool`: A boolean (variable containing true/false value) NO QUOTES! 
```python
True
False
```

* `list`: A list (comma-separated sequence of objects enclosed in square brackets) They don't need to be of the same kind!
```python
[0, 1, 2, 3, 4]
["A", "C", "T", "G"]
[0, "A", 1, "B"]
```

* `dict`: A dictionary (set of key:value pairs, separated by commas and enclosed in curly brackets)
```python
{1:2.1243, 2:8.98398, 3:-0.0045}
{"John": 30, "Adam": 29}
```
The indicated names for each data type (e.g. `str`) correspond to how they are called inside Python. Note that a few more basic data types are implemented in Python, but for now we will focus only on these common ones.

## Variable naming

Variable names should start with a letter and may contain letters, digits and underscores (<kbd>_</kbd>). Spaces are not allowed in variable names.

Python is case sensitive, so `var1` and `Var1` are two different variables.

* Valid variable names: `x`, `param1`, `file_name`
* Invalid variable names: `1variable`, `max x`, `%&#!`

In [None]:
var1 = 23
Var1 = "A"
print(f'var1={var1} Var1={Var1}')

## Variables in Jupyter notebooks

**IMPORTANT**: All variables and functions defined in one cell of a notebook are accessible from other cells of the same notebook. However, for a variable defined within a cell to be accessible, the cell containing it **has to be evaluated**. Moreover, **the order of evaluations matters**. That is, you can move around in the notebook and reevaluate cells, and the variables will follow the order of evaluation, not the order in which the cells occur in the notebook.

# Assignment

Values are stored inside variables with the assignment operator: <kbd>=</kbd>

In your program you use variable names to access the value that has been stored into that variable. Here are some examples of assigning various types of values to variables:

```python
# assignment examples
my_string = "ACTG"
my_int_number = 3
my_float_number = 3.14
my_boolean = True
# reminder: this is not an assigment!
my_variable == 3   # this actually tests whether the value inside `my_variable` equals 3
```

In [None]:
my_variable == 3

In [None]:
my_variable = 5
print(my_variable == 3)
print(my_variable)

Assignments can be more complex. For instance, you can assign an entire expression to a variable (don't forget the order of operations!): 

```python
# math expression
result = 2.4 * 5 / 0.001 + 3.4
```

...or the result of a function evaluation (we will look at that more carefully in a later session, here we use a `built-in` function):

```python
# result of a function evaluation
counts = [3, 4, 5]
my_sum = sum(counts)
```

In [None]:
counts = [3, 4, 5]
my_sum = sum(counts)
print(f'Sum of elements of {counts} is {my_sum}')

An expression on the right side of an assignment is always evaluated *before* the assignment. This is why it is perfectly valid (and common!) to use the variable that you are assigning to inside the expression (given that it has been previously defined). For example:

In [None]:
x = 1
print(f'x = {x}')
x = x + 1
print(f'Now x = {x}')

Once defined, variables can be used in your expression/functions. During code execution, Python will substitute the values that are stored in the corresponding variables to compute results:

```python
x = 1.2
y = 3
result = x / y
```

Variables can be reassigned:
```python
x = 1.2
# value of x is 1.2

x = -33.3
# value of x is now -33.3
```

And a special feature of Python is that it allows us to use the same variable name to store different types of objects **at different times**:

In [None]:
x = 1.2
print(f'Value of x is {x}')

x = "String instead of a number"
print(f'Value of x is now {x}')


# Comments

Lines starting with the <kbd>#</kbd> character are interpreted as *comments*. During execution, they are ignored by Python. Comments can thus be used to annotate your code. It is good practice and extremely helpful to use comments to explain what each part of a program is doing. Comments make the code easier to read and understand, especially when reading one's old code or the code of another person.

# Printing data

One of the most basic functions implemented in Python (and many other programming languages) is the **`print()`** *function*. It is used to write text to a user interface, like this:

```python
result = 3 / 2
print(result)
# Output
1.5
```

If you want to write out multiple items at the same time, separate them by commas within the `print()` function. When *printing* them to the screen, they will be separated by spaces, similar to the words of a sentence.

```python
result = 3 / 2
print("My result is", result)
# Output
My result: 1.5

string1 = "My"
string2 = "result"
string3 = "is"
print(string1, string2, string3, result)
# Output
My result is 1.5
```

In [None]:
result = 3 / 2
print("My result is\t", result)


# Strings

Strings are sequences of characters (letters, digits, spaces, special characters). Strings should be surrounded by quotation marks, which can be either single or double quotes, but not mixtures of the two. While both of them are okay to use, it is good practice to be consistent about which quotation mark type to use. However, it can be useful/convenient to use one type of quote even if you're typically using the other. Can you guess why/when?

```python
string1 = "test string 1"
string2 = 'test string 2'
# wrong
string3 = "this is not good'
string4 = 'this isn't good either'
```

Characters in a string are associated with their positions (*indices*). You can *access* individual characters by their indices. 

**IMPORTANT:** The first element of a string (or list, as you will see later) is found at index `0`.

```python
my_string = "test string"
# print first and third element of the string
print(my_string[0], my_string[2])
# t s
```

## String manipulations

As mentioned above, a data type does not only say that a given piece of data is of a certain kind, it is also associated with a set of operations that can be performed on objects of that kind. These operations are referred to as *methods* of that data type. In Python, to apply a method to an object, we generally use the following syntax: `object_name.method_name`, i.e. the object and method names separated by a dot (<kbd>.</kbd>).

Below we will introduce three commonly used methods implemented for `str` objects. Take a look for a minute at the expression below and make sure you understand it. What is the function name? Which are the parameters? Which parameters are necessary and which are optional?

### Find substring

`str.find(sub[, start[, end]])`

Returns the lowest index in the string `str`, where substring `sub` is found, starting from index `start` and ending (not including) `end`. Returns -1 if sub is not found.

Example:

```python
my_string = "ACTGACTG"
print(my_string.find("TGA"))
# 2
```

### Substitute characters

`str.replace(old, new[, count])`

Returns a copy of the string with all occurrences of substring `old` replaced by `new`. If the optional argument `count` is given, only the first `count` occurrences are replaced.

Example:

``` python
my_dna = "ACTGACTG"
print(my_dna)
# ACTGACTG

my_rna = my_dna.replace("T", "U")
print(my_rna)
# ACUGACUG
```

### Concatenate strings

This operation is not applied to just one, but to multiple strings. Joining together strings can be done either with the <kbd>+</kbd> operator or using the `str.join()` (looks more complicated but performs better!) method:

`str.join(sequence)`

Returns a string in which the string elements of `sequence` have been joined by the `str` separator.

Examples:

```python
a = "ACTTCA"
b = "AGGTC"
c = a + b
print("concatenated string:", c)
# concatenated string: ACTTCAAGGTC

c = "".join([a, b])   # Think of what we are doing here!
print("concatenated string:", c)
# concatenated string: ACTTCAAGGTC

c = "-".join([a, b])  # Now we are using "-" as the separator
print("concatenated string:", c)
# concatenated string: ACTTCA-AGGTC
```

In [None]:
join(["a", "b"], sep="\t")

# Numeric types

In Python, numbers come in multiple flavors, the most commonly used are:

* integers (e.g. 5)
* floating point numbers (e.g. 5.5)

## Mathematical operations

Standard mathematical operations are denoted by the familiar operators:

Expression | Operation
--- | ---
`x + y` | sum of x and y	 
`x - y` | difference of x and y	 
`x * y` | product of x and y	 
`x / y` | division of x by y
`-x`    | negation
`x // y` | integer division of x by y
`x % y` | remainder of x / y
`abs(x)` | absolute value or magnitude of x	
`int(x)` | the integer representation of x 
`float(x)` | the floating point representation of x
`pow(x, y)` | x to the power y
`x ** y` | x to the power y

Example:
```python
x = 6.0
y = 4.1
z = x / (x + y)**3 - (y + y**2) / (x / y)
print(z)
# -14.28267645911243
```



# Mixing variables of different types

Executing the following code...
```python
x = "2"   # string
z = 3 + x # add numeric type to string type
```

...will return the following error (or similar):
```
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'str'
```

This tells us that Python does not know how to add a string to an integer. It could be that you mean converting the integer to a string and then concatenating the two strings, or that you mean to convert the string to the number that it specifies and then add the two numbers. You will have to specify precisely what you mean, for e.g. converting the string to an integer first if you mean to add two numbers:

```python
x = int(x) # convert x to type 'int'
z = 3 + x
print(z)
# 5
```

We could also convert it to a floating point number:
```python
x = float(x) # convert x to type 'float'
z = x + 3
print(z)
# 5.0
```

Why is there a difference in the output? What is Python doing here?

# <span style="color:red"> Exercises: make sure that you are familiar with the concepts illustrated below. </span>

## Printing
* Define variables that hold the following values:
    * your name (string)
    * your age (integer)
    * the string `"My name is"`
    * the string `"I am"`
    * the string `"years old"`
    * the string `"."`
* Use these variables to print a text like this:

```
My name is Methusalem.
I am 969 years old.
```

## Mathematical operations
* Assign numbers `5`, `12`, `3.5` to variables `x`, `y` and `z`.
* Divide `x` by `y` and mutiply the result by `z`. Print the result.
* Calculate the value of the following expression $\frac{x^2 - y}{(z + y)^3} - \frac{1}{x-y-z}$. Print the result.

## String manipulations: Substrings
* Assign string "GGATCTTTGAAACCGG" to a variable. 
* Find the position where substring "CTT" starts and assign it to a variable.
* Print the position where the substring occurs in the string.
* Print the character at this position.

## String manipulations: Replace & concatenate
* Create two string variables "GGACTT" and "ATAGATT".
* In the first string replace every "*T*" with a "*G*". In the second string replace every "*A*" with a "*C*"
* Concatenate these strings and print the result.


# Controlling program flow

## Important note: Indentation

One of the most obvious peculiarities of Python is that it enforces strict indentation rules for better readability. In particular: All statements within a dependent **code block** have to be indented relative to the code it depends on. Moreover, all statements within that dependent code block (i.e. within the same **scope**) have to be indented by the same amount! Here's what we mean:
    


<code>flow control statement:
    beginning of code_block dependent on above control flow statement
    ...
    end of code_block dependent on above control flow statement
    
code independent of the above control flow statement<code>

### Example

This is what Python wants:

```py
if weather is "nice":
    print("Go outside and enjoy the sun!")
else:
    print("Go to the spa?")
    changeWeather()
```

If you are messy, Python will reprimand you with an ```IndentationError```:

```py
# Don't do this:
while "I have better things to do...":
print("I don't care about indentation!")
```

```py
# Or this:
 print("I'm")
     print("a little")
    print("chaotic!")
```

## Python's conditional statement: if - elif - else

Conditional statements allow your program to take different paths, dependent on a **condition**. The condition is an expression that must evaluate to type Boolean, i.e. the result of the evaluation can be understood by the computer as either `True` or `False`. Different instructions can then be executed according to whether the condition evaluates to `True` or `False`. The basic structure for this type of flow control structure looks like this:

In [None]:
if 5 < 3:                    # '5 > 3' is the condition
  print("Can you see me?")   # is printed because 5 is bigger than 3

That is, the `if` keyword, the expression whose truth values is to be determined, and the block of instructions to be executed if the conditional expression evaluates to `True`.

But what if we want some instructions to be executed when the condition is `True`, and other instructions when the condition is `False`? In this case, we use `if` together with **`else`**.`else` is a catch-all, it doesn't go with another condition, but rather is executed if and only if the condition going with the `if` statement evaluates to `False`. For more complex control flow, there is also **`elif`** (short for `else if`) which is very helpful for **nesting** conditions.

Here is how you use **`if`**, **`elif`** and **`else`** together:


```py
if condition_1:
    statement_1      # |
    ...              # |=> code block
    statement_n      # |

elif condition_2:    # |
    code_block       # |
elif condition_3:    # |=> optional! 
    code_block       # |
else:                # |
    code_block       # |
```

### Conditional expressions - remind yourself how these expressions are evaluated.

Conditional expressions evaluate to a Boolean value, 'True' or 'False'. However, Python more generally assigns truth values to other types of expression, including to variables. For example: 
  * numbers: 3, 0, ...        # all numbers except for 0 evaluate to `True`
  * strings: "False", ""      # all strings except the *empty string* "" evaluate to `True`, even the string "False"!

### Comparison operators

Perhaps the most widely used type of conditional expression is the comparison: Does something equal something else? Is this value bigger than what it should be? For such comparisons, there exists a set of comparison operators:

```py
x is y   # x and y refer to the same object
x == y   # x and y have the same value; this is what you generally want to use
x != y   # x does not have the same value as y
x > y    # x is greater than y
x >= y   # x is greater than or equal to y
x < y    # x is less than y
x <= y   # x is less than or equal to y
```

### Logical operators

Conditional expressions can also be further modified/chained with logical operators (recall boolean algebra):

```py
condition_1 or condition_2     # logical "or"; True if either condition_1 *or* condition_2 evaluates to True
condition_1 and condition_2    # logical "and"; True only if condition_1 *and* condition_2 evaluate to True
not condition                  # logical "not"; True only if condition evaluates to False
```

### Let's play around with an example.
To make it more interesting, we will use Python's input() function, which provides the program with whatever the user types at the command line. Note the usage of input():

In [None]:
x = int(input("Please enter an integer: "))   # The input() function asks for user input from the keyboard
if x < 0 or x > 1000000:
    x = 0
    print('Negative and very high values changed to zero')
elif x == 0:
    print('Nada')
elif x == 1:
    print('One')
elif x == 2 or x == 3 or x == 4:
    print('Few')
else:
    print('Many')

## Python's loops

### The `for` loop

This type of loop is used most commonly when a specific piece of code is to be executed a fixed number of times, e.g. when **iterating** over the items in a list or the key-value pairs in a dictionary.

The basic syntax is:


```py
for name in sequence:
    statement_1
    statement_2
    statement_3
    # ...
    statement_n
```

where `name` is an arbitrary variable name and `sequence` is a list or similar structure that can be iterated over.

#### Examples

In [None]:
# Looping over an explicitly defined list
for i in [0, 1, 2, 3]:
    print(i + 1)

In [None]:
# Or another predefined list
words = ['cat', 'window', 'defenestrate']
for w in words:
    print(w, len(w))    # the len() function returns the length of a string or, when called on a 
                        # list/dictionary, the number of items or key-value pairs

In [None]:
# Looping over a iterator returned by a function
sum_i = 0
for i in range(1, 11):  # range([start,] stop [, step]) generates a sequence of integers from start to stop-1, with step size step (default: 1)
    print(i)            # |=> These lines are within the scope of the for loop
    sum_i = sum_i + i   # |
    
print('sum =', sum_i) # | This line is outside of the for loop scope

In [None]:
# Looping over a sequence of indices in a list
a = ['Mary', 'had', 'a', 'little', 'lamb']
for i in a:    # range(n) with a single argument n generates a sequence from 0 to n-1
    print(i)

Note that `range(m, n, s)` can also be called with three arguments. It then generates a sequence of integers from `m` to `n-1`, with `s` as the *step size*, e.g.:

```py
seq = range(0,5,2)   # generates 0, 2, 4
```

### The `while` loop

The code block in the `while` loop is executed if and as long as the associated condition remains `True` (use carefully, cause if your condition is not changed as the block of intructions is executed you end up in an **infinite loop**!).

The basic syntax is:


```py
while condition:
    statement_1
    statement_2
    statement_3
    ...
    statement_n
```

#### Examples

In [None]:
# Print square of integer until user enter `q` or `quit`
user_input = False
while not user_input in ['q', 'quit']:
    if user_input:
        square = int(user_input)**2
        print(square)
    user_input = input("Enter integer or 'q'/'quit' to exit: ")

```py
#DO NOT DO THIS! Infinite loop
while True:
    print("Now try to stop me!")
```

### Loop control with `break` and `continue`

The loop can be exited with **`break`** keyword and the program will jump to the first intruction after the loop:

In [None]:
# Exit loop under defined condition
for num in range(21):        # Iterate over sequence of integers from 0 to 20
    print(num)
    if num >= 7:
        print("I'm tired. Give me a rest!")
        break
print("Done!")

One can skip to the next iteration of the loop with the **`continue`** keyword

### Analyze the following piece of code:

In [None]:
# Selective printing with 'continue'
actions_input = """
# load data
load

# do some processing
process

# apply filter
filter

# plot results
visualize

# done
stop
"""

actions = actions_input.split('\n')     # Split string into list by newline character '\n'
print(actions)                          # Prints list

for action in actions:
    if action == "" or action[0]=='#':    # If list element is empty string or starts with comment character...
        continue                          # ...skip to next iteration
    
    print(action)                         # If you're still around, print the string

## More about Python's functions

Functions are blocks of code that implement specific tasks that need to be repeatedly executed. They make the code more readable, ordered and reusable. Moreover, they reduce the chance of making mistakes when typing the same code multiple times and they make the code more **modular**. Once we coded and tested a function thoroughly, we shouldn't need to worry about its correctness anymore.

In Python, functions can be **def**ined anywhere in the code, and the return() function is used to get values back from a function:


```py
def my_function(arguments): # parameters can be passed to functions
    """documentation"""     # note the multi-line comment with the special delimiter
    statement
    # ...                   # additional statements
    return(expression)
```

In [None]:
def get_square(x):
    """This function returns x^2."""
    return(x**2)

Once a function is defined, it can be **called**:

In [None]:
get_square(4)

A powerful feature of Python is that it functions can be used as variables, as in the following example:

In [None]:
funcVar = get_square
print(type(funcVar))
print(funcVar(5))

You can access the documentation of a function by calling **`help()`** (also a function) on the function of your interest:

In [None]:
help(funcVar)

Note the last line of text. Do you see where it comes from?

Make sure to call the function with the right number of arguments or else you will get a `TypeError`:

In [None]:
get_square()

Note, however, that not all parameters may require a value. One can use `default` values for parameters, which will be use if these parameters are not specifically set when the function is called. 

For example: 

In [None]:
def get_square(x = 1):
    """This function returns x^2."""
    return(x**2)

print("Using default parameter: ", get_square())
print("Over-writing the default parameter: ", get_square(10))

It gets more complicated with default parameters if the function has multiple arguments. Check out these function definitions and usage:

In [None]:
def add_two_numbers(a, b=10):
    return a+b

In [None]:
print("add_two_numbers(1): ", str(add_two_numbers(1)))
print("add_two_numbers(1, 20): ", str(add_two_numbers(1, 20)))
print("add_two_numbers(): ", str(add_two_numbers()))

In [None]:
def add_two_numbers(a=1, b):
    return a+b

In [None]:
def add_two_numbers(a=1, b=10):
    return a+b

In [None]:
print("add_two_numbers(): ", str(add_two_numbers()))
print("add_two_numbers(2): ", str(add_two_numbers(2)))
print("add_two_numbers(2, 20): ", str(add_two_numbers(2, 20)))

Also note that functions *always* return an object - even if `return()` is not explicitly called! The **`None`** object is frequently used to represent the absence of an explicit return value:

In [None]:
def printValue(value, precision=0):
    formatted_value = "{:.{}f}".format(value, precision)
    print(f'Print in function: The value is {formatted_value}')

result = printValue(10.1, 5)
print("Function's return value:", result)

### Recursions
In the body of a function a call to the function that is being defined can be made. 

Functions calling themselves represent a useful (and famous) programming paradigm called recursion.

See this example:

In [None]:
def order_of_magnitude(x):
    if x >= 10:
        return(d( x / 10) + 1)
    else:
        return(1)

In [None]:
print("9:", order_of_magnitude(9))
print("10:", order_of_magnitude(10))
print("1000000:", order_of_magnitude(1000000))

### <span style="color:red"> Practice questions </span>

1. Read an integer from the command line (hint: input() function). Double the number if it is positive. Halve the number if it is negative. Print the result.

2. Define a list of at least ten strings. Print every second string.

3. Read numbers from the command line. Add them up until 0 is entered. Afterwards print the sum.

4. Read numbers from the command line. Sum them up if they are positive, ignore them if they are negative and stop when 0 is entered. Use continue and break!

5. Write a function which returns the absolute difference between two numbers a and b: |a - b|. Call the function for five combinations for a and b.

Try do solve at least one of the following, if you can (yes, they are a bit tougher!):
* Write a function that computes and returns the factorial of any positive integer. Use either a `for` loop, a `while` loop, or recursion. Call the function for 13 and print the returned factorial to the console.
* Write a function that decomposes any positive integer into its prime factors and returns them as a list. Call the function for 12341234 and print the list of prime factors to the console.

# Python's lists

The `list` is a very versatile data type. In Python, lists are denoted by square brackets, between which a comma-separated list of values, typically referred to as ***items***, can be placed. 

It is important to remember that in Python, the **items of a list need not be of the same type**.

In [None]:
# Lists
my_list = []                    # empty list
my_list = [1, 2, 3, 4]          # all items of type 'int'
my_list = ['a', 2, 3.6, True]   # all items of different types (str, int, float, bool)
my_list = ['M', 'o', 'n', 't', 'y', ' ', 'P', 'y', 't', 'h', 'o', 'n']   # all items of type 'str'

In [None]:
my_new_list = [1, [2, 3], {"a":2}, 23, 4]

In [None]:
my_new_list[1][1]

## Accessing list items

To access items of a list, append square brackets to the variable holding the list and place an index between the square brackets. This expression will return the list item available at that position, with its specific datatype.

<code>element = my_list[index]  # returns item at position 'index' and assigns it to variable 'element'</code>

**Do not forget:** List indices in Python start at zero, i.e. the first element in the list is at index 0.

In [None]:
my_list

In [None]:
my_list[0]  # returns *first* element

In [None]:
my_list[6]  # returns 7th element

It is also possible to access **multiple items** at the same time, an operation referred to as ***slicing***. This operation has the following syntax:

<code>slice = my_list[start_index:stop_index]</code>

It returns again a list, containing the elements starting at index `start_index` and ending **but not including** stop_index from list `my_list`.

In [None]:
my_list

In [None]:
my_list[6:10]  # returns slice of 4 items (10-6), 
#starting from the 7th item and ending with the 10th item

In [None]:
my_list[0:1]  # returns list of length 1

In [None]:
my_list[0:0]  # returns empty list

In [None]:
#negative indices also work, they indicate elements relative to the end of the list
#just to check
print(my_list[-1])
#and now get a slice
my_list[-12:-7]

In [None]:
my_list[3:-3]

In [None]:
my_list[4:3]

In [None]:
my_list[6:-7]

Finally, note that you can also omit the start or end position of a slice (or even both). In that case, the returned slice will start at the first (omitting start index) or end with the last element (omitting the end index). Consequently, omitting both will return a **copy** of the entire list (which is sometimes useful):

In [None]:
my_list[:10]  # returns slice from first to tenth element

In [None]:
my_list[2:]  # returns slice from third to last element

In [None]:
my_list[:]  # returns copy of whole list

**IMPORTANT:** Have a good look at the following scheme, which summarizes the list accession / slicing rules. It is important to remember these rules as working with lists is very common and wrong use of indices can lead to problems that are difficult to track down!

![session1_list_index.png](attachment:session1_list_index.png)


## Updating elements in Lists

You can update single or multiple elements of lists by assigning a new value to an individual item or list slice:

```py
my_list[index] = value                      # updating one element
my_list[start_index:stop_index] = values    # updating multiple elements
```

In [None]:
# Update one item at a time
my_list = ['B','i','o','l','o','g','y']
my_list[0]='Z'
my_list[1]='o'
print(my_list)

In [None]:
# Update multiple elements at a time
my_list = ['B','i','o','l','o','g','y']
my_list[0:2] = ['G','e']
print(my_list)

## Deleting elements from Lists

To remove a list element, use the **del** statement on the index or indices that you wish to delete.

In [None]:
# Delete multiple elements at a time
my_list = [1,2,3,4,5]
del my_list[1:4]
print(my_list)

## Other list operations
There are several other built-in functions that can be applied to every list. Some of the most useful ones are listed in the following table:

<table align="left" style="width:80%">
  <tr>
    <td>**Method**</td>
    <td>**Description**</td> 
  </tr>
  <tr>
    <td><code>my_list.append(x)</code></td>
    <td>add *x* to the end of *my_list*</td> 
  </tr>
  <tr>
    <td><code>my_list.extend(L)</code></td>
    <td>add all items of list *L* to end of *my_list*</td> 
  </tr>
  <tr>
    <td><code>my_list.insert(i,x)</code></td>
    <td>insert *x* at a position *i* of *my_list*</td> 
  </tr>
  <tr>
    <td><code>my_list.remove(x)</code></td>
    <td>remove from *my_list* the first item equal to *x*</td> 
  </tr>
  <tr>
    <td><code>my_list.count(x)</code></td>
    <td>count the occurrences of *x* in *my_list*</td> 
  </tr>
  <tr>
    <td><code>my_list.sort()</code></td>
    <td>sort *my_list*</td> 
  </tr>
  <tr>
    <td><code>len(my_list)</code></td>
    <td>returns the number of elements in *my_list*</td> 
  </tr>
</table>

In [None]:
my_list = []
my_list.append(1) # element
print(my_list)
my_list.append(1) # duplicates are allowed
print(my_list)
my_list.append([1]) # but pay attention to this! Python allows heterogeneous types in a list.
print(my_list)

In [None]:
my_list = [1, 2, 3]
my_list.extend([4, 5, 6]) #list
print(my_list)

In [None]:
my_list = [1, 2, 4]
my_list.insert(2, 3)
print(my_list)

In [None]:
my_list = [1, 2, 3, 4, 5]
my_list.remove(4)
print(my_list)

In [None]:
my_list = ["PASSED", "FAILED", "PASSED", "PASSED", "FAILED", "PASSED"]
count = my_list.count("PASSED")
print(count)

In [None]:
my_list = [5, 4, 3, 2, 1]
my_list.sort()
print(my_list)

In [None]:
my_list = [1, 2, 3, 4, 5]
result = len(my_list)
print(result)

Note that **strings can also be interpreted as a list of characters** (or strings of length 1). Therefore, some (but not all) of these operations also work on strings:

In [None]:
my_str = "Creative choice of words"
length = len(my_str)
ohs = my_str.count('o')
my_slice = my_str[0] + my_str[10] + my_str[12:14] + my_str[16]
print(length, ohs, my_slice)

### <span style="color:red"> If you need a bit more practice, try these exercises: <span>

**1.** Define a function that receives a list as an argument and replaces the highest value in that list by twice its value.

In [8]:
def multiplyMaxByTwo(my_list):
    return [e if e!=max(my_list) else e*2 for e in my_list]

In [None]:
#example
result = multiplyMaxByTwo([1,2,5,3,2])
print(result)
[1,2,10,3,2]

[1, 2, 10, 3, 2]


**2.** Define a function that receives a list as an argument and deletes the lowest value in that list.

In [10]:
def deleteMin(my_list):
    return [e for e in my_list if e!=min(my_list)]

In [None]:
#example
result = deleteMin([1,2,5,3,2])
print(result)
[2,5,3,2]

[2, 5, 3, 2]


**3.** Define a function that receives two lists as arguments and returns a new list with the sum of the two lists per index. Check if the lists have equal length. In case of different length return an error message.

In [19]:
def sumLists(my_list1,my_list2):
    if len(my_list1)!=len(my_list2): raise ValueError("Input lists must have the same size.")
    return [my_list1[i]+my_list2[i] for i, _ in enumerate(my_list1)]

In [None]:
#example
result = sumLists([1,2,3],[4,5,6])
print(result)
[5,7,9]

[5, 7, 9]


[5, 7, 9]

**4.** Define a function that receives a list as an argument and returns a reversed list.

In [21]:
def reverseList(my_list):
    return my_list[::-1]

In [None]:
#example
result = reverseList([1,2,3,4,5,6])
print(result)
[6,5,4,3,2,1]

[6, 5, 4, 3, 2, 1]


[6, 5, 4, 3, 2, 1]

**5.** Define a function that receives a list as an argument and returns a list with *only* the even numbers.

In [23]:
def evenNumbers(my_list):
    return [e for e in my_list if e%2==0]

In [None]:
#example
result = evenNumbers([1,2,3,4,5,6])
print(result)
[2,4,6]

[2, 4, 6]


[2, 4, 6]

**6.** Define a function that receives a list as an argument and returns a list containing [min,max,sum,average].

In [25]:
def computeStatistics(my_list):
    return [min(my_list), max(my_list), sum(my_list), sum(my_list)/len(my_list)]

In [26]:
#example
result = computeStatistics([1,2,3,4,5,6])
print(result)
[1,6,21,3.5]

[1, 6, 21, 3.5]


[1, 6, 21, 3.5]

# Dictionaries

This class of data container is very useful for storing **"paired data"**. Consider, for example:

* gene or protein names and the corresponding sequence(s)
* gene or protein names and the corresponding annotation(s)
* restriction enzymes and their motifs
* etc

Accessing values in this type of data structure is similar to how we access data in lists, but instead of using an integer index we use a `key`.

```python
enzymes = { 'EcoRI':'GAATTC', 'AvaII':'GG(A|T)CC', 'BisI':'GC[ATGC]GC' }

enzymes = {
    'EcoRI' : 'GAATTC',
    'AvaII' : 'GG(A|T)CC',
    'BisI'  : 'GC[ATGC]GC'
}

print(enzymes['EcoRI'])
```

### Inserting in a dictionary 

Easy, we simply use assignment, e.g.

```python
enzymes['BamHI'] = 'GGATCC'
enzymes['HindIII'] = 'AAGCTT'
```

Let's have a look at what we have in the dictionary so far. 

### Looping through a dictionary

In [None]:
enzymes = {
    'EcoRI' : 'GAATTC',
    'AvaII' : 'GG(A|T)CC',
    'BisI'  : 'GC[ATGC]GC'
}
enzymes['BamHI'] = 'GGATCC'
enzymes['HindIII'] = 'AAGCTT'

for k in enzymes:
    print(k, ':\t', enzymes[k])
    
print("\nOr like this\n")

for k in enzymes.keys():
    print(k, ':\t', enzymes[k])
    
print("\nOr even like this\n")

for (k, v) in enzymes.items():
    print(k, ':\t', v)
    
print("\nOr with the keys sorted alphabetically")

for k in sorted (enzymes.keys()):
     print(k, ':\t', enzymes[k])   

### Removing items from the dictionary

We can do this in several ways, using the `del` operator, or the `pop` function:

In [None]:
enzymes.pop('HindIII')
for k in sorted (enzymes.keys()):
     print(k, ':\t', enzymes[k])   

What happens if we try to access a non-existent key-value pair?

There are a few ways to deal with this situation:

Testing whether the key exists before trying to access it:
```python
if 'XbaI' in enzymes:
    print(enzymes['XbaI'])
    
else:
    print("RE does not exist in dict")
```

Or using a default value with another access function:
```python
print(enzymes.get('EcoRI', "NA"))
print(enzymes.get('XbaI', "NA"))
```

### Copying a dictionary

This is more tricky. Let's have a look, starting from our enzyme dictionary:

In [None]:
print("Here is the first dictionary")
for k in sorted (enzymes.keys()):
     print(k, ':\t', enzymes[k])   
        
enzymes2 = enzymes
print("\n")
print("This is the copy")
for k in sorted (enzymes2.keys()):
     print(k, ':\t', enzymes2[k])   

Now we change some entries in enzymes2:

In [None]:
enzymes2.pop("AvaII")
enzymes2['XbaI'] = 'TCTAGA'

print("Here's what we got in enzymes2")
for k in sorted (enzymes2.keys()):
     print(k, ':\t', enzymes2[k])   

print("\nAnd the original dictionary is")
for k in sorted (enzymes.keys()):
     print(k, ':\t', enzymes[k])   


The first dictionary has been modified as well! What happened?

This is a more general issue with composite data types in Python. By default, when we assigned the dictionary *enzymes* to the variable *enzymes2*, Python simply made *enzymes2* point to the same place in memory where *enzymes* was pointing to. This resulted in any change being made through either the first or the second dictionary variable being visible in both.

![session1_dict_copy.png](attachment:session1_dict_copy.png)
This is why we have to take special care in copying composite variables. One way to do this is using the copy function:

In [None]:
enzymes = {
    'EcoRI' : 'GAATTC',
    'AvaII' : 'GG(A|T)CC',
    'BisI'  : 'GC[ATGC]GC',
    'BamHI' : 'GGATCC',
    'HindIII' : 'AAGCTT'
}
enzymes2 = enzymes.copy()

del enzymes2['HindIII']
enzymes2['XbaI'] = 'TCTAGA'

print("Here's what we got in enzymes2")
for k in sorted (enzymes2.keys()):
     print(k, ':\t', enzymes2[k])   

print("\nAnd the original dictionary is")
for k in sorted (enzymes.keys()):
     print(k, ':\t', enzymes[k])   

# Input/Output

Input/Output (or short **I/O**) is how we, humans, communicate with computers. Although we can provide input from the console, as we have done with the input() function, the input typically comes from files or, more generally, *data streams*, which can also be for e.g. databases. Data is *loaded*, i.e. copied (possibly in batches) from the relatively slow permanent storage location (e.g. a local *hard drive*) to your computer's fast *random access memory* (or short: *RAM*), where it is processed and eventually copied or written back to a permanent storage location (i.e. the data is *saved*).

Files come in different flavors, or *formats*, typically indicated by their extension, which is a suffix such as `.mp3` at the end of a filename. Note: this may be hidden depending on the settings of your operating system). Following conventions, we will refer here to different file formats by abbreviations in all capitals, e.g. MP3. Typically the associated file extensions are written in lowercase and appended to a file's name via a <kbd>.</kbd>, as in `.mp3`.

Owing to the essential nature of *I/O* operations, Python has *built-in*, straightforward and easy-to-use functions for reading and writing data, so that we don't have to deal with the nitty-gritty details of shifting data between permanent and temporary memory.

The mechanisms that Python (and virtually all other programming languages) implemented to allow you to read data are in principle open to any kind of data, although many data types are not immediately intelligible by a "human reader". Thus, one can broadly distinguish two main categories of data:
* binary data
* human readable data

### Binary files

Binary files are *machine readable*, but when opened with a text editor, you will only see unintelligible, garbled text. This is e.g. the case for:
* *compressed* files, such as ZIP (`.zip`), RAR (`.rar`), GZIP (`.gz`), or TAR (`.tar`)
* compressed or uncompressed data, media or container file types, such as MP3 (`.mp3`), WAV (`.wav`), PDF (`.pdf`) or MKV (`.mkv`)
* *bytecode* (optimized representation of a program's code, common in *compiled* programming languages), such as executable (`.exe`) or library (`.lib`) files in Windows

### Human readable files

Human readable (also known as *text*, *plain text* or *flat*) data formats can be opened/viewed and modified with any text editor.

Format | Extension(s) | Description
--- | --- | ---
TXT | `.txt` / also commonly no extension | Plain unstructured text
CSV | `.csv` | Tables and matrices; fields/columns delimited by <kbd>,</kbd>
TSV | `.tsv` / `.tab` | Tables and matrices; fields/columns delimited by <kbd>Tab</kbd>
PY | `.py` | Python code
FASTA | `.fasta` / `.fa` | DNA or protein sequences 
XML | `.xml` | Represent schemas and arbitrary data structures
JSON | `.json` | Represent schemas and arbitrary data structures

**WARNING:** While you can also open text files with Microsoft Word and similar applications, these are word processors and are **not** regarded as text editors. It is recommended to use word processors only when editing their native file formats (e.g. those with extensions `.docx` and `.doc` for Microsoft Word), because due to *autoformatting* etc., writing or editing e.g. code with word processors often has unwanted *side effects*.

Many file formats are also specific subsets or relatives of the above-mentioned formats. For example, the file format for Jupyter Notebooks (`.ipynb`), such as used in this tutorial, is actually a subset of JSON. Similarly, the vector graphics format SVG is in fact a specific subset of XML, while HTML (used for websites) is also very closely related to XML (though not exactly a subset).

Several binary files are also just compressed versions of human readable files (or collections thereof). For example, since ~2007 the file formats of Microsoft Office documents are in fact just ZIP-compressed directories of (mostly) XML files. You can try this out by extracting a file with extension `.docx` and then exploring the resulting files with a text editor. Another similar example is the e-book format `.epub`.

## Opening a file

A file needs to be **open**ed before the system can do something with it.

In [None]:
#use the built-in function open to open the example file
myfile = open('Data/proteins.fasta','r')
print(myfile)

## File object

As you noticed, printing the variable *myfile* **did not** print the content of the file.  
*myfile* is a *file object*, basically the address where the file can be found.  
What you get printing *myfile* is 3 pieces of information:

* the **name** of the opened file
* the access **mode**
* the **encoding** which specifies how the bits in the file are converted into characters

The important bit is the **mode** argument: this specifies your intentions with the opened file.  
The possible modes are easy to guess:

* 'r'  = reading   (**default** one)
* 'w'  = writing
* 'a'  = append

Let's see how we can actually get the content of the file...

## Reading mode, 'r'

A few of the most important functions are:

**readline()**

Reads a line of the file, automatically move the pointer (the address where we want to look next in the file) to the next line of the file. Let's try this out!

In [None]:
#open the file
myfile = open('Data/proteins.fasta','r')

#get some lines
first = myfile.readline()
second = myfile.readline()

#print them
print('1st line:', first)
print('2nd line:', second)

## Reading a file with a *for* loop

The nice thing about the *file object* is that it is an *iterator*.

An iterator is an object that has a notion of how to move step by step through data. 

In case of a file iterator, this is done line-by-line. This means you can use a file iterator directly in a **for loop** and it will call the function readline() for you.

In [None]:
#open the file
myfile = open('Data/proteins.fasta','r')

#myfile is the iterator that returns a single line 
#and assign it to the variable 'myline'

for myline in myfile:
    print(myline)

### close()
When opening and reading a file, you *lock* the file, so that it is not changed by another program while you try to do something with the data in the file. To avoid multiple programs modifying a file at the same time, the OS automatically blocks the file, giving you *exclusive* access to the file. This also means that you should always close the file once you are done using it. Let's apply some of these notions.

In [None]:
#open the file
myfile = open('Data/proteins.fasta','r')
print(myfile)

In [None]:
#do something with the file
first = myfile.readline()
print("First line:", first)

#close it
myfile.close()
print(myfile)

Now try to read another line from that file:

In [None]:
second = myfile.readline()
print(second)

As you can see, you get an error, because you cannot not access data from a closed file.

## readlines()
Reads the **entire file**, not just one line.  

You can save the file as a *list* where each element is a line of the file.
  
In this way you will be able to keep the file content, while releasing the OS lock.

However, not a good idea to do this with very large files.

In [None]:
#read the file
myfile = open('Data/proteins.fasta','r')

#save the full file content into a local variable
lines = myfile.readlines()

#you can already close the file now!
myfile.close()

#print the first 3 lines
for line in lines[:3]:
    print(line)

## Let's work a bit with these data

What the file contains is a **fasta**-formatted list of protein sequences. 

Here's how we get the Swissprot ID from the description lines using a few string fuctions.

### startswith(string)

This function returns `True` when the string object you are calling the function from starts with the string you provide as argument.

### strip()

Returns a copy of the string where all blank characters (space, tab, newline) at the beginning or end of the string have been removed. There are also functions that operate on only one side of the string, i.e. `lstrip()` and `rstrip`.

**NOTE**: Remember that strings are *immutable* objects and there is no function which will directly modify a string. The modified string is copied into a new variable.

### split(separator)

Splits the string at each occurrence of the 'separator', returning the resulting list of substrings.

Separator characters are removed.

In [None]:
# Now let's use these functions
myfile = open('Data/proteins.fasta','r')
myline = myfile.readline()
myline


In [None]:
stripped_line = myline.strip()
stripped_line

In [None]:
# split by whitespaces
words = stripped_line.split()
print(words)

In [None]:
# split by |
ids = words[0].split('|')

# our Swissprot ID is at index 1
print(ids[1])

## Writing mode, 'w'
Writing a file will create a **new file** (with the name of your choice) on your HARDDISK.  
If the file is already there, Python will **overwrite** that file.

### write()
This function writes a string in the opened file.

In [None]:
#open a file in writing mode
myfile = open('Data/kaos.txt','w')

#and write some text in it
text = "I'm going to try writing a file!"
myfile.write(text)

#close the file
myfile.close()

# Let's inspect the content

In [None]:
!cat Data/kaos.txt

**Note**: only strings can be written to a file!

In [None]:
#open a file in writing mode
myfile = open('Data/kaos.txt','w')

myfile.write(100)

In [None]:
#but we can easily solve this by converting the number to string before writing it
myfile.write(str(100))

myfile.close()

In [None]:
!cat Data/kaos.txt

Remember that the file must be opened with the correct mode, otherwise...

In [None]:
#open a file **not** in writing mode
myfile = open('Data/kaos.txt','r')

#and try to write some text in it
text = 'This is not a smart move.'
myfile.write(text)

## More on close()
When you are done writing, you usually want to save the file.
Apart from releasing the OS lock on the file, this is exactly what close() does.

For reasons of speed, the string you write to a file is first buffered (i.e. in the RAM).
When you close the file it is saved to the HARD DISK.   

Once a file object is closed you need to re-open it in order to read it or modify it.

In [None]:
#open a file in writing mode
myfile = open('Data/kaos.txt','w')

#and write some text in it
line1 = '1) This sentence will be always the first.' 
myfile.write(line1 + '\n') #mind the new line character
line2 = '2) This is just the second line.'
myfile.write(line2 + '\n') #mind the new line character

#close it
myfile.close()

#read it again
myfile = open('Data/kaos.txt','r')
for line in myfile:
    print(line.strip())    

## Newlines
Unfortunately there is no standard for the newline character and it is different depending on the OS:  

* Linux: \\n
* OS X: \\n
* MAC: \\r
* DOS, Windows: \\r\\n

A general way to get the current system newline is importing the **os module**. More on *modules* later/

## <span style="color:red"> with </span> construct

Doing the book keeping of file open and close can be cumbersome. That's why Python offers a shortcut that looks like this:

In [None]:
with open('Data/kaos.txt','r') as myfile:
    for line in myfile:
        print(line)

## <span style="color:red"> Exercises </span>

1. Use the function *readline()* to print the first 10 lines of the file 'proteins.fasta'.

2. How many proteins are listed in the file 'proteins.fasta'? 

### More on modules in future sessions.

# <span style="color:red"> Additional practice </span>

Here are a few more exercises for practicing the concepts we just reviewed:

* Write a function that accepts two (short) DNA sequences and a name, and prints the FASTA formatted version of the two sequences concatenated together. Include the length of the concatenated DNA sequence in the fasta header.

* Write a function that accepts a DNA sequence as an argument and returns the G/C content of the sequence.

* Write a function that uses the find() method to determine the position(s) of the motif "CTCGA" in the sequence "GTGCCCCTCGAGAGGAGGGCGCGCGCCGCGCGCTCGACGCGATCGGCGCTCAGCGAGCGAGCTCCTCGAAGCGATCCGCGCGCGCT".

* Write a function that uses the replace() method to create the complement of the above sequence.

* Write a function that computes the number of occurrences if each 5-mer in an input sequence and returns the dictionary of counts.