<br>

# Module 1 - Python basics <a id='0'></a>
--------------------------

<br>

## Python documentation and learning resources

* **Official [python documentation](https://www.python.org/doc/)**: this is the official python documentation. It also contains some tutorials.
* **[Alternative documentation](https://www.w3schools.com/python/python_reference.asp)**: reference for built-in functions and types (easier to read, and sometimes more complete than the `help()` function). 


<br>
<br>
<br>

## Python basics <a id='0'></a>
---------------------

### Variables <a id='1'></a>
In python, as in many programming language, **objects are stored in variables**.
* A value is **assigned** to a variable using the **`=`** sign. 
* **Important:** unlike in mathematics, the `=` sign in python is directional: the variable name must
  always be on the left of the `=`, and the value to assign to the variable on the right.  

  Example:
  ```python
  a = 23    # Valid assignment.
  8 = b     # NOT a valid assignment.
  ```


In python, variables names must adhere to these restrictions:
* Variable names must be **composed solely of uppercase and lowercase letters** (`A-Z`, `a-z`), 
  **digits** (`0-9`), and the **underscore** character `_`.
* The **first character** of a variable name **cannot be a digit**.
* By convention, **variable names starting with a single or double underscore `_`/`__` are reserved 
  for "special" variables** (class private attributes, "magic" variables).
* Examples:
    * `var_1` is a valid variable name.
    * `1_var` is **not** a valid name (starts with a digit).
    * `var-1` is **not** a valid name (contains a the non-authorized character `-`).
    * `__var_1__` is valid, but **should not be used**, with the exception of very specific situations.

<br>

**Important tips**:
* Using **explicit variable names makes your code easier to read** for others, and possibly yourself 
  in a not-so-distant future.  
  E.g. `input_file` is a better variable name than `iptf`, even if it is a bit longer.
* **Never use python built-in names as variable names**, otherwise you will overwrite this object in the 
  namespace.  
  E.g. don't call a variable `str`, `int` or `list` (this can be painful to debug).

In [4]:
my_var = 35
var_a = "Hello Python"

print(my_var)
print(var_a)

var_b = var_a
print(var_b)

35
Hello Python
Hello Python




### Code indentation - the importance of white spaces in Python <a id='2'></a>

**Indentation** is the number of **white spaces before the first text element** (on a given line).

```python
    |var_1 = 2    # This line is not indented.
    | var_1 = 2   # This line is indented by 1 space.
     ^
    |  var_1 = 2  # This line is indented by 2 spaces.
     ^^
```

* Unlike many other languages, **indentation is part of the language syntax in Python**,
  and it has a very important meaning: it is used to define a so-called **"code block"**
  (more on that later in the course).
* Using proper **indentation is an integral part of python** - unlike most other languages where it's
  just good practice.
* When outside of a "code block", there should be no indentation on the line.
* A wrong level of indentation will trigger an `IndentationError`.
* Comments can have any indentation level.

In [9]:
var_c = "atgc"
print(var_c)

atgc


When assigning a variable, white spaces after the variable name do not matter. However the [Python style convention](https://www.python.org/dev/peps/pep-0008/#whitespace-in-expressions-and-statements) is to have **exactly 1 space** on each side of the `=` operator when assigning a variable.


In [12]:
var_c =    "atgc"


### Functions <a id='3'></a>
Another very important concept in Python - as in most programming language - are **functions**:
* Functions are **re-usable blocks of code** that have been given a name and are designed to perform an action.
  How to define your own functions will be covered in Module 2 of this course.
* Functions can be written to perform anything, from the simplest task to the most complex.
* To **call a function**, one uses its name followed by parentheses `()`, which can contain an eventual set of 
  **arguments**.
* Values passed to functions are called **arguments**, they can be **mandatory** (positional arguments)
  or **optional** (keyword arguments).

**Example:** to call the "print" function, we type `print()` and pass the text to print inside the `()`.
  

In [13]:
print("This will be printed")
print("This","will","be","printed", sep="--")

This will be printed
This--will--be--printed




### Methods

**Methods** are very similar to functions, except that they **are associated to a specific object type**.  
Since methods are associated to a specific object type, the syntax used to call them is the following:

* `object.method()` - syntax to call a **method of an object**.


<br>

**Example:** calling the `.upper()` method of a string.

In [14]:
test_str = "atgatgc"

upper_case = test_str.upper()

print(upper_case)

ATGATGC


> **Important:** don't forget the `()` when calling a method/function that takes no arguments.
> Otherwise you will not call the method/function, but access the "function object" itself.
>
> ```python
> test_string.upper()  # -> calls the "upper" method on "test_string".
> test_string.upper    # -> returns a function object.
> ```


## Object types: simple types <a id='7'></a>

**Everything in python is an object**, and each object is of a specific **type** (the type is the class of the object).

There exist plenty of types (it is even common to define your own new type), but there are a few very common ones - known as **built-in** types - that you ought to know:

* **`bool`**: boolean/logical values, either `True` or `False`, like 0 or 1.
* **`int`**: integer number.
* **`float`**: floating point number (numbers with a decimal fraction).

To know the type of an object, we can make use of the **`type()`** function.  


In [16]:
print(type(True))
print(type(11))
print(type("hello"))
print(type(4.5))

<class 'bool'>
<class 'int'>
<class 'str'>
<class 'float'>


<br>

**A few comments about types in python:**
* Python is a **dynamically typed** language\*\* (as opposed to **statically typed** 
  languages such as C or C++ e.g.). This means that variables are declared without a specific type, 
  and the type is assigned based on what object is assigned to the variable.  
  This has its advantages (easier and faster to write code) and downsides (e.g. type error bugs can 
  remain hidden for a long time until they are triggered by some unusual input data).

  \*\* Starting with python 3.6, it is possible (as an option) to add **type annotations** for
  variables, but these are not enforced at runtime.
  
  <br>
  
* A corollary is that variables in Python are not restricted to a single type and can be reassigned 
  another type of value at any time.
 
<br>

**Example:** here we successively assign different values and types to the variable `a`.

### Type conversion <a id='8'></a>

Converting from one type to another is (often) fairly easy: just use the type name as a function.

**Example:** convert an integer to a float.

In [17]:
a = 43
a= float(a)

print(type(a))

<class 'float'>


**Example:** convert to string

In [18]:
a = str(a)
print(type(a))

<class 'str'>


Converting to string is useful when concatenating a string and a number:

In [19]:
print("this will fail"+2+"bad")

TypeError: can only concatenate str (not "int") to str

In [20]:
print("this will work"+ str(4)+"sure")

this will work4sure


## Operators <a id='10'></a>
Now that we have variables containing objects of a certain **type**, we can begin to manipulate them using **operators**.
<br>

### Arithmetic operators  <a id='11'></a>
You know most of these already:

In [22]:
print(3+7)
print(7.3-2.5)
print(5*2)
print(8/4)
print(5 % 2)

x =4
print(x**2)

10
4.8
10
2.0
1
16



### Comparison operators <a id='12'></a>

These operators return a `bool` value: `True` or `False`.

In [26]:
a = 5

print("is equal to 1?",a == 13.45)
print("is above 5", a>=5)
print("is equal ",a != 45)

is equal to 1? False
is above 5 True
is equal  True


**Warning:** comparisons are type-sensitive, so the following expression evaluates to `False`:

In [28]:
print("is equal to '5'? ", a == "5")

is equal to '5'?  False


Boolean values (the result from a comparison) can be:
* **Combined** using `and` or `or`:

    | Combining with `and` | **True** | **False** |
    |----------------------|----------|-----------|
    | **True**             | True     |  False    |
    | **False**            | Flase    |  False    |
  
    | Combining with `or`  | **True** | **False** |
    |----------------------|----------|-----------|
    | **True**             | True     |  True     |
    | **False**            | True     |  False    |
  
  <br>
  
* **Inverted** using `not`: `True` becomes `False`, and `False` becomes `True`.

In [31]:
print("for and: ", True and (1+3 ==4))
print("for or: ",(a**2>10) or (a<3))

for and:  True
for or:  True


## Object types: container types <a id='14'></a>

These built-in types are object that contain other objects:

* **`str`**: string - text.
* **`list`**: **mutable** list of objects (mutable = can be modified after it was created).
* **`tuple`**: **immutable** list of objects (immutable = cannot be modified after it was created).
* **`dict`**: dictionary associating 'key' to 'value'.

Container objects share some common characteristics, such as:
* They have a dedicated **`[]`** operator that lets user access one - or several - of the objects they contain.
* The number of objects a container has (its length) can be accessed using the **`len()`** function.
* Container objects are **iterables**: one can iterate over them using e.g. a `for` loop (see Notebook 2 of this course).

**Important:** in python (unlike e.g. in R), **indexing is zero-based**. This means that the first element of a container type object is accessed with `object[0]`, and not `object[1]`.

## Strings <a id='15'></a>
In python, the **`string`** type is a **sequences of characters** that can be used to represent text of any length.

* Strings can be declared using either **single `'`** or **double `"`** quotes. 

In [32]:
gene_seq = "ATGATGC"

Remember that strings are *container* variables, that is why we represent each letters as different elements.

Each element is associated an **index**, starting at 0.




In [34]:
print(gene_seq[3])

A


* **Accented and special characters** are possible in strings. In example:
  * **`\t`** = tab
  * **`\n`** = new line

In [35]:
print("geneseq\tATGC")
print("geneseq\nATGC")

geneseq	ATGC
geneseq
ATGC


<br>

### Length of a string <a id='16'></a>
The **`len()`** function can be used on a string to return its length:

In [36]:
print(len(gene_seq))

7


<div class="alert alert-block alert-info">

#### [Additional Material] f-strings

Python [f-strings (formatted string literals)](https://docs.python.org/3/reference/lexical_analysis.html#f-strings) allow to easily create strings that combine one or more variables with some hard-coded characters.

The syntax is simply to:
* Prefix the string with `f"This is an f-string".`
* Inside an f-string, variable content can be accessed using curly braces, as in
  `f"This is a {variable_name}."`.  
  Here, `{variable_name}` will expand to the content of the variable `variable_name`.

**Examples:**

```python
# Example 1:
first_name = "Alice"
last_name = "Smith"

full_name = f"{first_name} {last_name}"   # Same as: full_name = first_name + " " + last_name"
print(f"Her full name is {full_name}.")   # -> Her full name is Alice Smith.

# Example 2:
animal = "cat"
container = "bag"

print(f"The {animal} is out of the {container}!")  # -> The cat is out of the bag!
```
    
</div>

<br>

### String slicing <a id='18'></a>

Because strings are a type of sequence (a sequence of characters), the different characters of a string can be accessed using the **`[]` operator**, with the index of the desired element(s).  

* Remember that in python, **the index of the first element is `[0]`**.
* Negative indices will access characters starting from the end of the string. E.g. `[-1]` returns the
  last character in the string.
  
  

In [37]:
my_string = "ATGCATTGCATTGCATTTGCATAGC"

print(my_string[-7])

G


<br>

Indices can also be used to retrieve several elements at once: this is called a **slice operation** or **slicing**:
* The general syntax of slicing is `[start index: end index (excluded): step]`
* The end index position is **excluded from the slice**.
* The **default step value is 1**. It can be omitted (and usually is).
* If the start index is omitted, the slicing is implicitly done from the beginning of the string. `string[:10]`
* If the end index is omitted, the slicing is implicitly done until the end of the string. `string[10:]`

In [38]:
print(my_string[3:6])

CAT


In [39]:
print(my_string[5:])

TTGCATTGCATTTGCATAGC


In [40]:
print(my_string[5::2])

TGATCTTCTG


**Tip:** you can reverse a sequence (such as a string) by using the **`[::-1]`** slicing operation.

In [41]:
print(my_string[::-1])

CGATACGTTTACGTTACGTTACGTA
