## This jupyter notebook is an interactive crash tutorial in Python

The entries in the notebook are editable. To evaluate a cell, click anywhere in it and press SHIFT+ENTER .
The notebook is divided in various sections, showing different features of the Python language and Jupyter Notebooks.

### 1. **Arithmetic expressions and data types**

Simple arithmetic expressions can be evaluated - try to change the expression below using the `+`, `-`, `*` and `/`. Expressions can be of different *data types* including `int` (integer) or  `float`(floating point or real number). The result of evaluating an expression can be of the same or a different data type from its arguments.  

In [None]:
a = 1/3
b = 3 * a
print(b)

In [None]:
5 + 4   # you can write a comment like this, after a "#" (hash) sign

In [None]:
5 // 3 # integer division

In [None]:
5 / 3 # non integer division, the result is a real, or a floating point number

In [None]:
5**3 # exponentiation

In [None]:
5 % 2  # remainder function (modulus)

### 2. **Declaring variables**

A *variable* is a way of storing a number for later usage. Variable names can have one or mores letters or numbers. A value is assigned to a variable by using the `=` sign. Examples:

In [None]:
a = 3    # the variable "a" has now the value 3

In [None]:
b = 1.345 # the variable "b" has now the value 1.345

In [None]:
distance = a + 2   # Guess what the value of "distance" is

In [None]:
print("The value is: ",2*b+3)

In [None]:
2*b+3

In [None]:
3bas = 2  # variable names cannot start with a number

In [None]:
ba! = 3   # `!#$%&/(){}[]`cannot be used on a variable name

### 3. **Variables and their data types** 

Variables can be of different types like `int` (integer), `float` (real number), `str` (string, character chain) and other types we will discuss later. Not all types can be combined in an expression. Float and integer can be combined, but not integer and string.

In [None]:
5 // 3   # the result of this expression is integer

In [None]:
5 / 3 # the result of this expression is float

In [None]:
6 // 2   # Notice the difference between this...

In [None]:
6 / 2    #.... and this

Floats and integers can be added together, the result is float.  

**HANDS-ON:** Guess (and check editing the below cell), what are the results of adding `int` with `int` or `float` with `float`

In [None]:
3.0 + 13 

In [None]:
"aaa" # strings are represented between '' or ""

In [None]:
"aaa"+"bbb" # strings can be added together (concatenation)

In [None]:
"aaa"+123 # strings cannot be combined with integers

The built-in `str` function converts `integer` to `string` or `float` to `string`

In [None]:
str(123)  

In [None]:
str(3.1416)

The built-in `float` function converts `string` to `float` or `integer` to `float`

In [None]:
float("1.333") # the built-in float function converts "str" to "float"

In [None]:
float(123)

### 4. **Mathematical functions** 

Various mathematical functions, including `sqrt`, `sin`, `cos`, `log` and `exp` can be used by **importing** them from the `math` library. 

If you try to used the `cos` function now, you will get an error because there is no known definition for it yet. 

In [None]:
cos(0.656)

In order to use the `cos` function, we need to **import** it from the `math` library.

In [None]:
from math import cos

The previous command tells the Python interpreter to fetch (import) the `cos` function from the `math` library. Now let's try again:

In [None]:
cos(0.656)

In [None]:
import scipy

In [None]:
import math
help(math)

**HANDS-ON:** Now try to edit the previous two cells in order to calculate the square root of 2. The square root function is named `sqrt`

In [None]:
dir()

Instead of importing a specific function, we could also import the entire `math` **module**, like this:

In [None]:
import math

and now let's say we want to assign the `sine` of 0.23 to the variable name "csin"

In [None]:
csin = math.sin(0.23)

In [None]:
print(csin) # The print function is used to write output in nicer ways than simply entering the variable in the cell

Now *every* function available in the `math` library can be called by prepending the word "math." to its name. But what functions are there to call?? We can find them by using the `dir` built-in function:

In [None]:
dir(math)

The output of the `dir` command is a `list` (another data type, see below), containing a `str` for each function or variable in the `math` module. 

So, for instance the decimal logarithm can be calculated using the `log10` function just like this:

In [None]:
math.log10(3)

**HANDS-ON:** Try changing the above cell to compute different math functions.

### 5. **List Objects**

Lists are what the name indicates: ordered collections of objects that can be of the same or different types.

Lists are enclosed in rectangular brackets `[]`.

In [None]:
[1,2,4,6,9] # is a list of integers

In [None]:
[1,"dfdf",3.454] #  is a list of integers, floats and strings

In [None]:
[1, 2, [3, 2]]  # a list can contain as one or more of its elements another list (nested lists)

Let's declare a variable of type `list` 

In [279]:
myList = [2, 3, 5, 7, 11, 13, 17, 19, 21]

A given element of this list can be accessed by index:

In [280]:
myList[3]

7

**IMPORTANT: Notice that indices start at zero (not 1 like in R); that is, in Python the 4th number (7) has index 3 and not 4**

Consider the following **code** (a set of instructions in Python Language):

In [281]:
a = myList[4]
b = a + 5

**HANDS-ON**: What do you think the value of "b" will be ? After making you guess, execute the next cell to find out.

In [282]:
print(b) # This is the Python print function, at the moment is just the same has typing variable itself

16


It is also possible to refer to a **range** of values within a list (it's called a "slice" in Python):

In [283]:
myList[4:8]

[11, 13, 17, 19]

The result is a list containing the elements of myList from position 4 to position 7.   
**IMPORTANT NOTE**: notice that the element number 8 is not included, even though it's used as upper value in the range. That's just how Python works, you will have to get used to it!

Lists are dynamic, we can add elements to or remove elements from them after they are created. For that, we need a set of special functions called *string methods*. Let's see how the `append` method works.

In [284]:
myList.append(23)  # methods are separated from the variable name by a "."

In [285]:
print(myList)

[2, 3, 5, 7, 11, 13, 17, 19, 21, 23]


As you see, our list has another element at the end, the number 23. What about removing an element ? Let's do it whith the method `pop`

In [286]:
myList.pop()

23

Notice the `pop` method *returned* a value, which was print as the ouput of the command. Compare this with the `append` command, which returned nothing. Let's check our list again. 

In [287]:
print(myList)

[2, 3, 5, 7, 11, 13, 17, 19, 21]


The last element was removed, and the list is back to its original form.

**HANDS-ON**: Now, try to guess what's happening here:

In [288]:
a = myList.pop()

**HANDS-ON**: What is the value of "a" ? Try inserting a cell below this one using the `ALT+ENTER` command, and print the value of variable `a` using that cell.

In [289]:
print(myList)

[2, 3, 5, 7, 11, 13, 17, 19]


That's right, variable `a` contains the value 21, i.e. the removed last element of our list. This is what the *return* value is for: something to be assigned to a variable, or printed.  

This is all very nice, but what if we want to insert or remove a variable in any position other than the last ? That's what the `insert` and `remove` methods are for:

In [290]:
myList.insert(4,100)

In [291]:
myList

[2, 3, 5, 7, 100, 11, 13, 17, 19]

**Explanation:** The value "100" is inserted *after* the fourth element.

In [292]:
a = myList.pop(4)

In [293]:
a

100

Notice the `pop` method now has a number between the round brackets. This number is called an *argument* to the `pop` method. In this case, it indicates the position of the number to be *popped* (i.e. removed). 

In [294]:
myList

[2, 3, 5, 7, 11, 13, 17, 19]

The Python Language has several other *string methods*, you can check them at https://www.programiz.com/python-programming/methods/list

We can use the `dir` function to list all the methods available to the `str` type.

In [295]:
%who

Prices	 a	 b	 c	 cos	 csin	 distance	 exit	 i	 
j	 math	 myList	 myString	 p	 price	 scipy	 sqrt	 x	 



A very useful buiilt-in Python function that we can use with `list` objects is `len` (for length). It simply returns the number of elements in the list:

In [296]:
len(myList)

8

And by the way, we can also use `len` with strings. Not surprisingly, it returns the number of characters, i.e. the length of a string in characters (notice that spaces are also characters):

In [297]:
myString= "This is a string"
len(myString)

16

In actual fact, string objects are somewhat like lists (with a big difference we'll discuss later) in that they can be *indexed* much in the same way:

In [298]:
myString[3]

's'

In [299]:
myString[8:16]

'a string'

In Python, we say that data types like `str` and `list` are ***iterables*** because can be indexed in this way. 

### 6. **Iteration and Programming**

Python is a programming language. However, it would be a long shot to call what we did so far "programming". Programs are sets of computer instructions that are executed in a specific order depending on a number of conditions. What we did was to print the value of some Python objects, to assign values to variables and to look at the effect of some functions and their return values. That is **not** programming.

But wait... maybe we did something akin to programming, when we wrote cells containing more than one Python instruction. Consider the following cell:

In [300]:
a = 1
b = 2
c = a + b
print(c)

3


The previous cell contains a ***sequence*** of instructions that the Python interpreter executes in turn. First the value 1 is assigned to variable `a`, then the value 2 is assigned to variable `b`, then the sum of variables `a` and `b` is assigned to variable `c`, then value of `c` is printed. This *"then, then, then, ..."* is what programming is about, *executing series of instructions in sequence*. However, programming would be very unpractical (and useless) if we had to **explicitly** write down every instruction we want our program to execute. 

To better apreciate this crucial point, consider the following problem. We have a list of product prices that we have stored in a Python object of type... list, of course:

In [301]:
Prices = [24.04, 12.22, 21.43, 26.11, 11.01, 27.59, 31.15, 18.91, 19.58, 17.66, 17.6, 12.96, 17.26, 4.76, 19.62]

And we want to update the list by adding a 23% VAT to each price. We could go like this:

In [302]:
Prices[0] = Prices[0] * 1.23
Prices[1] = Prices[1] * 1.23
Prices[2] = Prices[2] * 1.23
Prices[3] = Prices[3] * 1.23
# and so on until Prices[14] since there are 15 elements in our list.

Ok, we could do this in a short time. But suppose we had 1000 prices ? There must be a better way to tell the computer to perform those 1000 instructions, with having to *explicitly* type them.

That's where the real programming starts. We need to tell the computer to *iterate over the full list and update each price in turn*. Here's one way we could it with Python: 

In [303]:
for i in range(15): # there are 15 elements in our price list
    Prices[i] = Prices[i] * 1.23  # this line  is an _indended_ block (one "Tab" to the righ)

So, what's going on here ? Let's try to break it apart:
1. We have a line starting with `for`... this is a Python command used to create *`for` loops*. It tells the Python interpreter to assign the variable `i` to *each* value in range(15)
2. What is range(15)? It's just a kind of shortand for the list `[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]` so that the computer will assign to `i` the values from 0 to 15 each in turn.
3. For each value of `i`, execute the line inside the loop (notice that the line after the instruction `for` is *indented*, meaning that the line is moved to the right). It will do:   
`i=0, Prices[0] = Prices[0] * 1.23`, *then* `i=1, Prices[1] = Prices[1] * 1.23`, and so on until 15. 

Let's check that the "Prices" list now contains the updated values:

In [304]:
print(Prices)

[36.370115999999996, 18.487638, 32.421447, 39.501819, 13.5423, 33.9357, 38.314499999999995, 23.2593, 24.083399999999997, 21.721799999999998, 21.648, 15.940800000000001, 21.2298, 5.8548, 24.1326]


Notice that `range(15)` is like a list (to be precise it is not a list, but let's leave that for now), so it may be that we could use any list on our `for` loop. For instance:

In [305]:
for p in Prices:
    print(p*100)

3637.0116
1848.7638
3242.1447
3950.1818999999996
1354.23
3393.5699999999997
3831.4499999999994
2325.93
2408.3399999999997
2172.18
2164.8
1594.0800000000002
2122.98
585.48
2413.26


The `for` loop assigns each value of "Prices" to the variable "p" in turn, and prints the result of multiplying that value by 100. 

The printed values are hard to read, because there are a lot of digits after the decimal point - they are *unformatted*. We can format them with the Python `format` method:

In [306]:
for price in Prices:
    print("{:20.2f}".format(price*100))

             3637.01
             1848.76
             3242.14
             3950.18
             1354.23
             3393.57
             3831.45
             2325.93
             2408.34
             2172.18
             2164.80
             1594.08
             2122.98
              585.48
             2413.26


The result looks much better. The code `{:7.2f}` indicates that whatever is *inside* the format command should be printed with 2 digits after the decimal point on a field that is 7 characters wide. This allows us to print nicely formatted and aligned tables. Let's go back to our original price list:

In [307]:
"{} e {}"

'{} e {}'

In [308]:
Prices = [24.04, 12.22, 21.43, 26.11, 11.01, 27.59, 31.15, 18.91, 19.58, 17.66, 17.6, 12.96, 17.26, 4.76, 19.62]

Suppose we want to print a table with two columns, one for the original prices, and the other for the VAT-adjusted prices. We could do:

In [309]:
for p in Prices:
    print("{:7.2f} {:7.2f}".format(p,p*1.23))

  24.04   29.57
  12.22   15.03
  21.43   26.36
  26.11   32.12
  11.01   13.54
  27.59   33.94
  31.15   38.31
  18.91   23.26
  19.58   24.08
  17.66   21.72
  17.60   21.65
  12.96   15.94
  17.26   21.23
   4.76    5.85
  19.62   24.13


Notice that the `format` method accepts two arguments, `p` and `p*1.23`, and the string contatins *two* copies of the format code `{:7.2f}` - the two must match in number. 

### 7. **Making decisions: conditional programming**

So now we know how to write programs that repeat statements many times, without us having to type all the stuff the program does. These programs have one important limitation: they will always do the same, no matter what the initial conditions are. Programming becomes really powerful when we introduce *decisions* to be made based on specific circustances, e.g. input values. 

Let's consider the following piece of code:

In [None]:
from math import sqrt
a = input("Please insert a number in this box: ")
a = float(a)
print("The square root of {} is {}".format(a,sqrt(a)))

Fist, notice we introduced a new *function*, called `input`. This function will prompt the user with a message, and wait for some value to by typed, after which the user must press the ENTER key. The inserted value will be assigned to the variable `a` as a *string*. This last point is really important, and it explains the need for the next line in the code, `a = float(a)`, which replaces the string by it's value as a `float` number (for instance, "3" is replace with 3. Without this line, the call to the `sqrt` function in the next line would produce an error. 

**HANDS-ON:** Run the cell with SHIT+ENTER, try different values in the box and observe the ouput. Try removing the line `a = float(a)`, run the program again and notice the error it produces. 

Have you tried to input a negative value, like -2? ... Try it now, and see what happens - "math domain error". This is understandable, because the real-valued square root function is not defined for negative arguments. What could we do to prevent a negative user input from producing this ugly error??... That's where *decision* programming comes in. What we want to do is: "IF the user input is negative, then don't try to calculate the sqrt function, but instead produce a warning to the user. We can do this with the appropriately named `if` statement:

In [None]:
from math import sqrt
a = input("Please insert a number in this box: ")
a = float(a)
if a < 0 :
    print("WARNING: Invalid user input. Removing the negative sign.")
    a = -a
print("The square root of {} is {}".format(a,sqrt(a)))

The general form is
```python
if condition :
    statement(s)
```
Meaning that the *statement(s)* will only execute if the condition is **True**. 

Conditions are *logical* expressions whose value can either be *True* or *False*. Some examples:

- a > b (a greater than b)
- a < b (a smaller than b)
- a >= b (a greater or equal to b)
- a <= (a smaller or equal to b)
- a == b (a is equal to b)
- a != b (a is different from b)

Let's try and evaluate a logical expression in a notebook cell:

In [None]:
5 > 4

5 is indeed greater than 4, so the result value of this cell is "True".

**HANDS-ON**: Try to change the above expression in different ways, producing values that are either "True" or "False".

In fact, we can assign the resulting "True" or "False" value to a variable. Run the following code:

In [None]:
a = 5 > 3
print(a)

The return value of the expression `5 > 3` has been assigned to `a`. That is a new data type, which can only be "True" or "False", and it is called a `bool` (short for "boolean"). 

Check this:

In [None]:
type(a)

The `type` function is a very useful tool to find out the type of a variable or data object. 

In [None]:
type(3)

In [None]:
type(3.1416)

In [None]:
type('this is a string')

Boolean expressions can be combined with the operators `and`, `or` and `not`:

In [None]:
5 > 3 or 2 > 7

The previous expression is True because one of the expressions is true (5 > 3). The `or` operator will produced True  in the case of `True or False`. On the other hand, the following expression using `and` will only evaluate to True if **all** individual expressions are true:

In [None]:
5 > 3 and 2 > 7

In this case, since 2 is not greater than 7, the results is `False`.

With logical expressions we can produce very complex conditions based on the values of variables. 

**HANDS-ON:** Look at the cell below and try to guess what the logical result of running it will be:

In [None]:
a = 3
b = 7
c = 2
x = 9.0
((a > b) and not (c > x)) or (x == 9.0)

Often, the `if` instruction is not enough to express the *logic* of our program. Suppose we want to do something **if** a condition is `True`, and do something **else** if the condition is false. The Python `else` can be combined with `if` like this:

```python
if condition :
    statement(s)
else :
    statement(s)
```

Let's see it in action. We will go back to our `sqrt` program and change it slightly:

In [None]:
from math import sqrt
a = input("Please insert a number in this box: ")
a = float(a)
if a >= 0 :
    print("The square root of {} is {}".format(a,sqrt(a)))
else:
    print("Cannot compute the square root of a negative number.")


Now our simple program just refuses to compute the square root of a negative number, rather than changing its sign (this is probably what we would want in a real application).

### 8. **More iterations: while**

So far we know only one way of automatically repeating program statements, the `for` loop. There is another statement that can do the same sort of thing. It's called `while`, and it is used in this way:

```python
while condition :
    statement(s)
```

**HANDS-ON:** Try to guess what the following program does:

In [None]:
i = 10
while i >0 :
    i = i -1 
    print(i)

The variable `i` works as a *counter* allowing us to repeat what is inside the `while` loop as many times as the initial value (10 in this case).

In fact, `while` loops don't have to systematically go through a list of values like a `for` loop. What a `while` does is to keep executing whatever is inside the loop **while** the condition is `True`. If, for same reason, the condition is always `True` then the loop will *never* stop.   
What do you think the following code will do:

```python
while True :
    print("I will never stop!")
```

You guessed it right... it will keep printing "I will never stop!", line after line. That's because the condition in the loop is `True`, and obviouslyu `True` can never be `False`. 

Let's use one of these "always `True`" to make our square root program more interactive:

In [None]:
from math import sqrt
while True :
    a = input("Please insert a number in this box: ")
    a = float(a)
    if a >= 0 :
        print("The square root of {} is {}".format(a,sqrt(a)))
    else:
        print("Cannot compute the square root of a negative number.")


In [None]:
for i in range(3):
    #print(i)
    for j in range(3):
    print(i,j,"Mais uma volta!")

In [None]:
"12343".isnumeric

Notice the `*` character in the `In [*]` of the previous cell. This is because the notebook is stuck running our program. If you try to execute the following cell

In [None]:
1+1

... it doesn't work, because the notebook is still running the previous cell. 
**To solve this, click on the "Kernel" entry on the menu bar and select "Interrupt", or click the stop button (square) in the menu bar.** Never mind the ugly messages below our square root, it's just the Python interpreter complaining about being rudely stopped. The output of our previous `1+1` cell should appear now. 

One way to avoid this type of situation is to provide our program with a way of **stopping** on user request. We could, for example, define that if the user types "q" instead of a number, the program will quit. Let's try it:

In [None]:
from math import sqrt
while True :
    a = input("Please insert a number in this box: ")
    if a == "q" :
        print("Goodbye.")
        break 
    a = float(a)
    if a >= 0 :
        print("The square root of {} is {}".format(a,sqrt(a)))
    else:
        print("Cannot compute the square root of a negative number.")


Well, that fixes the stuck (infinite) loop problem. However, there is still a problem with this program. It's very easy to make it crash.

**HANDS-ON:** Can you figure out how to make this program exit with an error message? *Hint: What type of invalid input is not accounted for by our program?*


### 9. **Defining our own functions:**

We have already learned enough statements (`if`, `else`, `for`, `while`) to do some real programming. However, the serious fun starts when we are able to create our own *functions*. So far we have used a few functions that are Python "built-ins" or part of the `math` library. To call the `cos` function, for instance, we do:

In [None]:
from math import cos
print(cos(3))

For built-in functions like `len` we don't even have to import anything:

In [None]:
len("this is a string")

In both cases, the function is a *name* followed by a pair of curly brackets enclosing some *argument*. The `cos` function takes a number as argument and *returns* a value, the *cosine* of that number. The `len` function takes a string as argument, and returns its length. What if we could create our own functions? For instance, let's say we wanted to create a function that would add 10 to every number. Supposing that function is called `add_ten`, it would work like this:

```python
In [1]: add_ten(3)
Out[2]: 13
```

Actually, it turns out that we can indeed create our own functions, using the built-in command `def`. Here is how we can create such a function:

In [None]:
def add_ten(x):
    return 10+x

The `x` variable is our *argument*, and the `return` command indicates what the function should return, in this case it is just our argument plus 10. 

Now that we have defined it, we can use it everywhere.

In [None]:
add_ten(34)

In [None]:
add_ten(34.)

**Note:** Functions can have more than one argument. Let's imagine that we want to create a function that returns the sum of the squares of two numbers:

In [None]:
def sum_squares(a,b) : 
    return a*a+b*b

In [None]:
sum_squares(3,4)

**HANDS-ON:** Write a function that returns **the sum of all numbers from 1 to n**. (*Hint*: use a `for` loop with a `range` command inside the function.) Write it in the cell below:

In [None]:
def sum_one_to_n(n):
    for i in range(n) :
        n = n + i
    return n

In [None]:
sum_one_to_n(3)

### 10. **Classes and OOP (Object Oriented Programming):**

The most advanced and flexible object type in Python is the *class*. Classes can *encapsulate* data (like numbers, lists or strings) and functions in a single object. This provides a very powerful and flexible way of programming. Unfortunately, it can be a bit daunting for the begginer programmer. However, some understanding of the concept is required if we are to use Python in any pratical context, like *accessing software tools, databases and scientific libraries*. What we intend here is a very basic understanding of classes and how they can be used to achieve pratical results.

Let's start with a very simple example:


In [None]:
class Rectangle :
    def __init__(self,a,b):
        self.sideA = a
        self.sideB = b
    def area(self):
        return self.sideA*self.sideB

        

What's happening here ?...  

First, we declare a class name `Rectangle`, and inside the class we use the `def` command to define two functions.  

1. The first, called `__init__`, is a special function that get's called *every time* a new object of *type rectangle* is created. Notice that this function accepts 3 arguments:
    - The first, `self` is a special argument that we always have to include when declaring *methods* for an object.
    - The variables `a` and `b` are the lengths of the two sides of the rectangle (we will see shortly how this arguments are passed). Inside the `__init__` function, the values of `a` and `b` are assigned to the two variables `self.sideA` and `self.sideB`. This way, the sides of the rectangle are stored in the object.

2. Then we define a second *method*, called `area`. This method will *return the area of the rectangle*, which is simply the product of the length of its sides.

But, how do we use a class ?... We have to create one or more objects of the created type. Creating an object from a class is called ***instantiating*** the class, and each created object is an ***instance***. So for example:

In [None]:
myR = Rectangle(2,3)  # "myR" is an instance of the Rectangle class, with sides of lenght 2 and 3

In [None]:
myR.area() # if we call the "area" method on the instance object, it returns myR's area

We can create as many instances of our class as we want:

In [None]:
myS = Rectangle(23,11)
myT = Rectangle(10,20)
myU = Rectangle(1,1) # that's square, which just a particular case of a rectangle

Now let's print the areas of those rectangles:

In [None]:
print(myS.area(),myT.area(),myU.area())

It is important to realize that all data types in Python are in fact *classes*. Many of these classes have *methods* that can be called exactly in the same way as the *area method* of our *Rectangle class*. For example, `list` objects have many *methods*, one of them being `sort` . Let's see it in action:

In [None]:
myList = [4,2,8,5,7,9,42,1,13]
print("Unsorted: ", myList)
myList.sort()
print("Sorted: ", myList)

The *method* `sort` re-arranged the elements in the list in ascending order. 

Strings also have methods. For example:

In [None]:
myString = "this is a string"
print(myString.upper())   # the "upper" method converts all letters to upper case (CAPITALS)  
print(myString)

**HANDS-ON:** Try to spot one very important difference between the previous example and the one with the list object. (*Hint:* Think about what the final value of our variable is in each case.)

**HANDS-ON:** Go back to our `Rectangle` classe example and add a method named `perimeter` that returns the perimeter of a rectangle. Write the new method in the cell below, and test it with a few examples. 

### 11. **Reading and Writing Files**

So far we have seen only the simplest ways to interact with program code: by providing data in input boxes (with the `input` command) and by writing out data to the cell output. If these were the only ways to *input* and *output* data, programming languages wouldn't be very useful - how would we feed a program with the entire gene sequence of an organism, for instance ?... Or how would one save the output of program consisting of *thousands* of lines ?

Fortunaly, programming languages provide the necessary mechanisms to read and write data to and from *files*. A file is just a piece of information stored on a computer - it could be the text of a book, a computer program, a sound file, a video file or many other things. Not matter what the actual content is, all files are the same: collections of bytes stored in more or less permanent memory. Each file is referred by a *name*, which allows us to retrieve its contents when needed. The Python programming language provides various mechanisms to read and write data from files. Let's learn the simplest one, starting with a motivating example. 

Suppose we want to compute a table with the cosinuses  of angles between 0 and 360. We could simply do:

In [None]:
from math import cos
for angle in range(360):
    print("{} {:5.2f}".format(angle, cos(angle)))

That's rather a long list, isn't it ? Not very pratical to use in the browser, and what if we want to save it for later ?... We would like to save the result into a *file*. Let's see how one creates files in Python. That's what the `open` comand is for:

In [None]:
f = open("myfile.dat","w")

The previous command has just opened the file "myfile.dat" for writing (hence the "W" as seconda argument in the `open` function). It returns a value, the *file descriptor*, which we assigned to `f`. This variable `f` will work like a handle that we can use to operate on our file. Before we do anything else, just go the Jupyter Hub File manager and list your files. There should be a new file one the list, called "myfile.dat". That's because when we open a file for *writing*, it will be created (if it doesn't exist already).  For now, the file is open but also *empty*, because we haven't written anything on it! Let's change that:

In [None]:
f.write("This is the first line of my file.")

In [None]:
f.write("And this is the second line of my file.")

Now that we wrote all we wanted to the file, we need to *close* it, or otherwise we can't read what's in there. Try opening the "myfile.dat" file in the Jupyter Hub file manager - it will appear empty! We need to use the `close` command with the descriptor `f` (we could also say we are calling the *method* `close` on the file object named by "f"):

In [None]:
f.close()

Now go back and open the file again. The data should be there, but wait... we intended to have each string on a separate line, but instead the two strings are glued together on a single long line. That is because the `f.write()` command outputs exactly what we put between brackets, and does not add any *line break* to the end. Without line breaks, the file will be a long unbroken line of characters. Now, this may or may not be what we want, but most likely it's *not* what we want. Particularly if it's supposed to be a human readable text or table. 

We can fix it by telling the `write` command where our line breaks are - in this case, at the end of each string. Let's write the complete new code on a single cell:

In [None]:
f = open("myfile.dat","w")
f.write("This is the first line of my file.\n")
f.write("And this is the second line of my file.\n")
f.close()
!cat myfile.dat

We added a "\n" at end of each line. Why?... While there are two characters in "\n", the Python languate intrepreter reads it as single special "newline character". Printing (or `write`ing) this character does not ouput any visible symbol to the screen or file. Instead, it causes a line change (this is similar to the hidden end-of-paragraph symbols in Microsoft Word, for instance). 

The `!cat myfile.dat` deserves an explanation, too. If you read through the Jupyter Hub introduction, you will know that "cat" is a command line instruction that dumps the content of a file to the screen. By prepending it with the "!" symbol, we are asking Python to pass it unchanged to the command line environment and run it. You could do the same by opening a terminal window and typing `cat myfile.dat` in there. Go try that now. The strings are each on a  separate line now.

**N.B.:** Notice that the original content of "myfile.dat" (the single long line) is entirely replaced with the new content. Every time we open a file for *writing* (using "w" as the second argument to the `open` commmand), its contents will be erased and replaced with the new data. Be careful with the "w" mode!

Now that we have properly written our file, closed it and checked the integrity of its contents, we may try to read it using Python commands. The `open` command comes again to our rescue, but now its second argument will be an "r" (for *read*), like this:

In [None]:
f = open("myfile.dat","r")  # the "r" mode can be used safely, it won't erase the file!

Now let's read the file in one go, using the `read` method:

In [None]:
contents = f.read()  # the .read() method reads the whole file at once.

In [None]:
f.close() # strictly speaking, we did not need to close the file, but it is a good practice...

In [None]:
print(contents)

The ouput is nicely formmated because the `print` command knows how to interpret the "\n" characters. Let's see what really is the file, by evaluating it directly in the cell:

In [None]:
contents

We can clearly see the location of each new line character in the file.

Now let's go back to our cos table problem and rewrite the code to produce a file with the data:

In [None]:
from math import cos
f = open("cos_table.dat","w")
for angle in range(360):
    f.write("{} {:5.2f}\n".format(angle, cos(angle))) # notice the "\n" at the end of the string! 
f.close()

In [None]:
!cat cos_table.dat

**HANDS-ON:** Write a program to read the "cos_table.dat" and output its content (don't open it with the "w" mode, or you will erase its contents!)

In [None]:
f = open("cos_table.dat","r")
for a in f.read().splitlines():
    print(a)
f.close()

### 12. **Plotting Data**

Plotting data is an activity central to doing science. So, any language with an ambition for being relevant in scientific programming must have a plotting mechanism. Python has several, but we will briefly focus on the one that is by far the most popular: the `matplolib` library.

To exemplify the use of `matplotlib`, let's first get some data:

In [None]:
# Genebank Statistics
Years = [1985,1990,1995,2000,2005,2010,2015]
GenBankEntries = [4954, 35100, 425211, 7077491, 45236251, 120604423, 185019352]

These two lists, `Years` and `GenBankEntries` will be the `x` and `y` data points in our graph. What we want is to plot `GenBankEntries` as a function of `Years`. This can be done with `matplotlib` in the simplest possible way with just two lines of code:

In [None]:
# Lets import the module matplolib.pyplot
import matplotlib.pyplot as plt   # this line imports the library and allows it to be called plt (for brevity) 
plt.plot(Years, GenBankEntries)

We got a plot, but it is really *raw*. With just a few more lines of code, we can make it look a lot better:

In [None]:
# We first import the matplotlib.pyplot module as "plt"
import matplotlib.pyplot as plt 
plt.title("Genebank Growth")          # set the title graph
plt.xlabel("Years")                   # set the x axis title
plt.ylabel("Number of Entries")       # set the y axis title
plt.plot(Years, GenBankEntries,'ro-') # plot with red lines and circle markers


Notice the argument "ro-" to the `plot` command - "r" is for red, "o" for circle markers and "-" for lines. 

**HANDS-ON:** Guess what happens if 'ro-' is replaced with 'g+--'. Check it. 

This plot should probably have bars rather than points:

In [None]:
# We first import the matplotlib.pyplot module as "plt"
import matplotlib.pyplot as plt 
plt.title("Genebank Growth")        # set the title graph
plt.xlabel("Years")                 # set the x axis title
plt.ylabel("Number of Entries")     # set the y axis title
plt.bar(Years, GenBankEntries, width=3) # plot bars with a width of 3

(The bar width can be ajusted with the "width" parameter. Try it.)

Different colors for lines and points can be used: 

In [None]:
# We first import the matplotlib.pyplot module as "plt"
import matplotlib.pyplot as plt 
plt.title("Genebank Growth")         # set the title graph
plt.xlabel("Years")                  # set the x axis title
plt.ylabel("Number of Entries")      # set the y axis title
plt.plot(Years, GenBankEntries,'g--') # plot with green dashed lines
plt.plot(Years, GenBankEntries,'ro')  # plot with red circle markers

### 13. **Where to go from here**

I hope this tutorial offered a little insight on the basics of Python and its usage. However, we barely scratched the surface of what is possible, and how to make it work in real case scenarios. The only way to truly learn a programming language is by using it over and over and over, until all its ins and outs become second nature. That was not the purpose of this tutorial, but rather to provide you with a basis for further development, and also with the ability to grasp the basics of simple Python code snippets that will be used in the next classes.