<span style="float:left;">Licence CC BY-NC-ND</span><span style="float:right;">François Rechenmann &amp; Thierry Parmentelat&nbsp;<img src="media/inria-25.png" style="display:inline"></span><br/>

# Some python basics

Now that you know how to use a notebook, let us see some basic notions of python.

**You are reminded that the usual way to read a notebook is to select the first cell, and then to use *Shift+Enter* to navigate down to the end of the notebook**, so that you are sure to evaluate all the code cells.

Your goal here really is to give you a glimpse at python. Of course if you are already familiar with the language, you can skip this notebook. Conversely, if you do not know the language at all, be aware that we will introduce all the notions again as we go.

### Magic tricks

At the beginning of each notebook, you will find a code cell like this:

In [None]:
# this is so that we can use print() in python2 like in python3
from __future__ import print_function
# with this, division will behave in python2 like in python3
from __future__ import division

We will discuss this in greater details at the end of this notebook, but for now, just remember that these *magic formulas* will allow you to write code for both python2 and python3.

### Nombres

It is possible to do all the usual computations in python, like with a pocket calculator:

In [None]:
# on small integers
256 - 27

I took this chance to show you that a line starting with a sharp sign `#` is **ignored**, it is a **comment**.

In [None]:
# or just the same on very large integers
6786897689768976893324534535 * 34535678909876543567890876

In [None]:
# or on floating numbers
3.14159 * 2

### The case of division

With our *magic formulas*, division in python behaves as follows.

 * you always get **a float** when doing a simple division with a single slash `/`, even if between 2 integers:

In [None]:
100/8

 * and to obtain the quotient of an integer division, you need to use a double slash `//`, like this:

In [None]:
100//8

### Variables and assignment

Of course as soons as computations must be made, it is required to store a result in a variable. In python, assignment is quite simply done with the equal sign `=`, as shown below:

In [None]:
a = 12
b = 36
c = a * b
print(c)

Or, if we're willing to spend a little effort on presentation:

In [None]:
# it is possible to passe several parameters to print
# they are then all printed out, with a SPACE in between
print("c=", c)

###  `print` or not `print` ?

You will notice that we have sometimes used `print`, and sometimes not. The logic is as follows: whenever a cell is evaluated, the **last** result is printed by default. This is why we obtain below the value of `c` even though we have not explicitly called `print`:

In [None]:
c

But now I do:

In [None]:
c = 200

this does not cause anything to be printed. What happens here is, this code fragment - it is called an assignment - does not return any result. And in this case we need to explicitly call `print` this time if we wish to see the result.

In [None]:
c = 300
print(c)

According to this logic, one may be have to call `print` when some intermediate results are of interest, or if we wish to improve the way things are shown.

In [None]:
a = 12
print("a=", a)
b = 36
print("b=", 36)
c = a + b//2
print("c=", c)

### Strings

Python of course allows to deal with strings of characters:

In [None]:
# you can use either double quotes "" 
first_name = "John"
# or single quotes ''
last_name = 'Doe'
print(first_name, last_name)

In [None]:
# to concatenate several strings, we add them with the + sign
full_name = first_name + " " + last_name
print("Full name:", full_name)

### `for` loop on a string

It is very easy to scan a string with a `for` loop:

In [None]:
DNA = "AGCTGTCGCG"
for letter in DNA:
    print('the sequence contains', letter)

### Lists

Objects typed as lists allow to store several results while remembering the order between them:

In [None]:
# A list is built using square brackets []
list1 = [ 1, 2 ]
# you can mix different types of objects in a list
list2 = [ "three", 4.]
print(list2)

In [None]:
# like with strings, lists can be concatenated with a +
list3 = list1 + list2
print(list3)

Finally, you can add items at the end of a list with the `append` method:

In [None]:
# adding string "five" at the end of our list
list3.append("five")
print(list3)

### Conditional test `if`

You can test an expression with a `if` .. `elif` .. `else` statement:

In [None]:
x = 10
if x < 4:
    print("small")
elif x < 20:
    print("medium")
else:
    print("large")

### Functions : `def` and `return`

One way to write code that you can re-use, is to define a **function**. For example instead of the previous cell we could have defined a function instead, like this:

In [None]:
# we define a function that returns a string
# 'small', 'medium' or 'large' depending of the value of x
def triage(x):
    # it is IMPORTANT to indent the text area 
    # that contains the function body
    if x < 4:
        return "small"
    elif x < 20:
        return "medium"
    else:
        return "large"
    
# if we come back to the first column here, 
# we are effectively ending the function definition for triage

You can notice that after evaluating this cell, **nothing gets printed**. It is the expected behaviour, because we have just **defined** a function, but we have **not called it**.

Now that the function is defined, we can call it:

In [None]:
message = triage(10)
print(message)

Please note here:
  * the `def` keyword that allows us to define the `triage` function,
  * the syntax for calling `triage(10)`, and the fact that we store the result in the `message` variable,
  * the `return` keyword inside the function body, that specifies what the result should be for `triage` - and so, what will in this case stored in `message` ultimately

Like here, it is **often the smartest approach** to design a function that **computes** a result without printing it, and to proceed with the printing outside of the function only if/when it is necessary, rather than printing anything in the function itself. This is because in most cases, actually printing anything is generally optional.

With all this in place, we can now run the same code several time with different values for `x`:

In [None]:
print(triage(0))

In [None]:
print(triage(8))

In [None]:
print(triage(20))

### Syntax and indentation

Those of you who are familiar with languages like C, C++ or Java, and many others actually, may be surprised to notice the total absence of markers like `{` and `}` or `begin/end` pairs to work as delimiters in the code.

In python, the syntax actually relies a lot - almost exclusively in fact - on code indentation, like you can see on the few previous examples, which allows to get entirely rid of such begin or end markers. What remains only is the necessity to insert a `:` with statements like `if`, `for` and `def`.

Admittedly this choice may sound odd the first time, but you will see very soon how pleasant this syntax turns out to be for both reading and writing python code.

### `for` loop on a list

You can do a `for` loop on other things than strings, on almost everything where that makes sense actually, and in particular on list objects; we could thus have written the last 3 cells like this:

In [None]:
subjects = [0, 8, 20]
for subject in subjects:
    print(triage(subject))

### Dictionaries (also called maps)

python also provides a very convenient data type, called a dictionary. To keep things as simple as possible, here is how to use them:

In [None]:
# let us create a dictionary to store people's age from their name
age_of = { 'jean' : 12, 'eric' : 25, 'anne' : 48 }
age_of

This dictionary can thus be thought of as a list of couples (key ➞ value); here we have 3 keys that are the 3 strings `jean`, `eric` and `anne`. Notice right away that unlike lists, this structure is **not ordered**. 

From a dictionary we can look up the values attached to a key, like this:

In [None]:
# let us look up the value attached to key 'anne'
age_of['anne']

The thing to remember about dictionaries is that the **performance** of such a look-up operation **does not depend on the dictionary size** (or at least not in a linear way). It is the main purpose of a dictionary to be able to store a great deal of data without slowing down their future use. We will have opportunities to talk about this point again later on.

### A variable is not a string

Also note that in in our last code cell, the quotes `'` are mandatory:

If we removed the `'` we would get this error:
```
>>> age_of[anne]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-220-a83ba2c48145> in <module>()
      1 # when we remove the ' we get a NameError
----> 2 age_of[anne]

NameError: name 'anne' is not defined
```

That fragment would be valid if we had defined a variable named `anne`. 
It is important to distinguish between 
 * the `anne` **variable** - that in our case is not defined, because we never did anything like `anne = some thing`, 
 * the `'anne'` string, which here is actually present in the dictionary as a key.
    
Which means that the fragment below is valid:

In [None]:
first_name = 'anne'
age_of[first_name]

### An example of a complete function

As a conclusion for this quick introduction, here is a small piece of code that allows to
 * accept as an incoming parameter a string `dna`; we assume that this string contains only 'A' or 'T'
 * compute (and return) a **list**, made of as many elements as `dna` has letters, each of them being 'up' when the incoming string contains a 'A', or 'down' for `T`.

In [None]:
def directions(dna):
    # initialize result
    result = []
    # scan dna
    for letter in dna:
        # is it an A
        if letter == 'A':
            # add to result the string 'up'
            result.append('up')
        # almost the same for T
        elif letter == 'T':
            result.append('down')
        # otherwise show an error message
        else:
            print("unexpected letter", letter)
    # here we have all that we wanted in the 'result' list
    return result

Which gives us on a few inputs:

In [None]:
directions('ATAT')

In [None]:
directions('TTTAAA')

### Version of python

The code in this course can be run under **either python2.7 or python3**. 

For technical reasons, the underlying infrastructure still runs python-2.7, which means that whenever you run a notebook inside the MOOC your code runs under python-2.7. 

If you plan however to use python3 instead on your own computer, it is perfectly fine as all the code **can be run as-is under python3** (and in fact, without the *magical formulas*).

There indeed are some incompatibilities between both versions of python, but we use only rather simple features that behave the same in python-2.7 and in python-3, except for `print` and division; this of course is the purpose of the *magical formulas* that we saw at the beginning of this notebook.

### Performances

The python language is very convenient, because it is interpreted, and so it allows to compute things interactively, as well as to study variants, or to gather averages or other statistical data, and to visualize data.

However, you need to remember that for algorithms that require a lot of computation, python can turn out to be rather slow. We will have an opportunity to come back to that in particular about the Needleman and Wunsch algorithm in Week 4, Sequence 9.

Please just remember, at this stage, that the algorithms that we will see in this course are mainly designed for teaching, and that in some cases they are very unefficient as compared to what would be achievable using a compiled language like C or C++.

In practice, the trned is to use **libraries of functions** that implement base algorithms written in a **compiled language** of that kind, and that are made **available to python** through *wrappers*, i.e. python functions that call the compiled code. This way we can have the pros of both approaches, namely **great flexibility** for interactive use, while preserving **optimal performance**.  

### Conclusion

Again this introduction is a excessively fast overview of the python language (1% maybe..) designed to get us started. You will find a very large number of resources on the Internet if you wish to learn the language in more details, but this is not at all our focus here.