<br>

# Module 2 - Data Structures <a id='0'></a>
--------------------------

## Lists and tuples <a id='20'></a>

Lists are **sequence type** objects that can contain any type of elements (other objects).
* **Lists** are declared by surrounding a comma separated list of objects with **`[]`**.  
  Example: `["This", "is", "a list", "with", 6, "items"]`

<br>

#### Creating a list

In [None]:
my_list = [0,1,2,4,7,12,56]

print(my_list)
print(len(my_list))

* List items can be of **heterogenous type**.
* List **items can be of any type**: therefore it's possible to nest lists within other lists.

In [None]:
list_a = [1,2,"python","r",[2,"linux"]]
print(list_a)
print(len(list_a))

<br>

#### Creating lists from iterables (e.g. sequences such as lists, tuples, or a range)

* Objects that are *iterables* can be converted to lists using the `list()` function.

In [None]:
list_1 = list(range(21))
print(list_1)

> What are **`range`** objects?  
> `range` objects are sequences of **integer numbers**, e.g. `0, 1, 2, 3, 4, ...`.
>
> By default, a call to `range(x)` creates a sequence of integers from `0` to `x`, `x` excluded.
>
> **Examples**:
> * `range(10)`    -> `0, 1, 2, 3, 4, 5, 6, 7, 8, 9`
> * `range(3, 7)` -> `3, 4, 5, 6`

### Accessing values: list slicing <a id='21'></a>

* Accessing an element (or a range of elements) in a list is done using the **`[]`** operator.
* The **`[]` operator** works in much the same way than with strings, and allows
  **accessing individual objects** from a list, or **slicing** it.

* As with strings, remember that the **end position index is excluded** from the slicing.

In [None]:
my_list = [1,2,"python","linux",3.4,[2,"biology"]]

print(my_list[0])      # Get the 1st item of the list.
print(my_list[2:])     # Get all elements from index 2 (i.e. the 3rd element) to the end of the list.

<br>

* If we try to access an index that does not exist in the list, an **`IndexError`** is raised.

```py
    my_list = [1, 2, "spam"]
    my_list[3]

    ---------------------------------------------------------------------------
    IndexError                                Traceback (most recent call last)
    Input In [26], in <module>
          1 my_list = [1, 2, "spam"]
    ----> 2 my_list[3]

    IndexError: list index out of range
```

<br>

[Back to ToC](#toc)

### Tuples

Tuples are very similar to lists in that they also are a **sequence type** objects that can contain any type of element (other objects).  
* **Tuples** are declared using the syntax **`(value1, value2, ...)`**.
* Values in **tuples** can be accessed and sliced in the same way as lists are.
* The main difference between lists and tuples is that **the values in a tuple cannot be changed**
  once the tuple has been created. This means that we cannot add/remove values from a tuple, nor can
  we modify a value inside it.

#### Create a tuple
 * **Important:** if a tuple contains a single element, then the last (and only) element of the tuple must be
   followed by a comma.
 * If the tuple contains multiple elements, then this final comma is not necessary (but allowed).
     ```py
     a_tuple = (value, )`   # Correct syntax.
     a_tuple = (value)      # This will NOT create a tuple, but a regular value.
     ```

In [None]:
tuple_1 = ("python","linux")
print(tuple_1)

In [None]:
tuple_2 = ("python",)
print(tuple_2)
print(len(tuple_2))
print(type(tuple_2))

In [None]:
# Create a tuple from a list.
list_1 = ["a","seq","of","string"]
tuple_a = tuple(list_1)

print(tuple_a)

<br>

#### Creating empty tuples
* Empty tuples can be created with `()` or `tuple()`.
* Note that because tuples cannot be changed after they are created, it is not possible to add elements
  to an empty tuple.

In [None]:
tumple_b = ()
tuple_b = tuple()

print(tuple_b,type(tuple_b),len(tuple_b))

<br>

### When to use `list` or `tuple`? Mutability - an important difference between lists and tuples <a id='22'></a>

* A `list` is **mutable**: it can be extended, reduced, and its elements can be changed.
* A `tuple` is **immutable**: its length is fixed and its elements cannot be changed.

<br>

**Use tuples** when:
  * You need to store a sequence of objects that will not change in your program (fixed length).
  * You want to be sure that a sequence of objects will not be accidentally modified - a
    sort of **write-protection**.
  * Tuples are slightly more memory efficient than list.
      ```py
        import sys
        print(sys.getsizeof((1, 2, 3, 4, 5)))  # -> 80 bytes
        print(sys.getsizeof([1, 2, 3, 4, 5]))  # -> 96 bytes.
      ```

<br>

**Use lists** when:
  * You need to store a sequence of objects that will be modified over time.
  * You need to have a sequence that can be grown (add elements) or shrunk (remove elements).

<br>

For more details about object mutability in python, see the **Additional Theory** section at the end of this notebook.

<br>

**Example:** because lists are mutable, we can modify an element in a list (or add/remove an element from a list).

In [None]:
my_l = ["python","r","biology"]

my_l[2] = "genetics"

print(my_l)

In [None]:
my_l = [1,2,5]

my_l[2]= 4

print(my_l)

* Calling the `.append()` method of a list (here `my_list`) adds the specified element at the end of the list:

In [None]:
my_l.append(8)
print(my_l)

<br>

* Trying to add a string to a list with **`.extend()`** can lead to unexpected results.

In [None]:
my_l.extend("g")
print(my_l)

In [None]:
my_liste1 = ["python","r","biology"]

my_liste1 += ["genetics","bioinformatics"]

print(my_liste1)
print(len(my_liste1))

#### `insert()` method

* Adding en element at a **specific position in the list** can be done with the **`insert()`** method.
* In this example, we add an element in second position of `my_list`.
* Remember that Python indices start with 0, so inserting before position 1 puts 
  the new object in second position in `my_list` (and not in the first).

In [None]:
my_liste1.insert(1,"proteomics")
print(my_liste1)

<br>

### Deleting elements in a list

* `list_object.pop(x)`: **deletes** the element at position `x` **and returns it**.
  If no arguments are passed to `pop()`, the last element of the list is removed by default.
* `del list_object[]`: **deletes** a single element or a slice.

*Note:* using the `.pop()` method is generally considered to be more *pythonic* than using `del`.

**Example:** deleting with `del`:

In [None]:
c_list = list(range(10))
print(c_list)

del c_list[-1]
print(c_list)

del c_list[0:4]
print(c_list)

<br>

**Example:** deleting an item with the `pop()` method:

In [None]:
removed = c_list.pop(0)
print(removed)

As can be seen above, the default behavior is that **each letter of the string becomes an element in the list**.

However, often we prefer to create a list that contains each word of the string. For this we use the **`split()`** method of string:
* The `split()` method is very useful when reading formatted text files.
* By default, it splits on white space (i.e. spaces, tabs, newlines).
* It accepts an optional `sep` argument that allows separation of fields using the specified character (look up `help(str.split)` for details).

In [None]:
quote = "bioinformatics is fun"

words = quote.split()
print(words)

<br>

**To convert a list to a string**, the **`join()`** method can be used - it can be seen as the inverse of `split()`.  
Somehow counter-intuitively, the `join()` method applies to strings, and takes a list as argument:

In [None]:
quote = "".join(words)
print(quote)

<br>

**Tip**: lists can be concatenated with the `+` operator, extended with `+=` (addition assignment) and "multiplied" with `*`:

In [None]:
list_one =["hello",12]
list_two= list_one + [10.2,"45","python"]

print(list_two)

list_one += ["r","linux"]
print(list_one)
 
menu = ["eggs","lemon"]*3
print(menu)

<br>
<br>

[Back to ToC](#toc)

## Dictionaries <a id='26'></a>
Dictionaries, or `dict`, are containers that associate a **key** to a **value**, just like a real world dictionary associates a word to its definition.
* Dictionaries are instantiated with the `{key:value}` or `dict(key=value)` syntax.

    ```python
    color_code = {'blue': 23, 'green': 45, 'red': 8}
    ```
  
* **Keys** must be unique in the dictionary, and must be an immutable object (typically a `str`).
* **Values** can appear as many time as desired in the dictionary.
* The `[]` operator is used to **select objects from the dictionary**, but **using their key** instead
  of their index. E.g. `color_code[0]` is not a valid syntax (and will raise a **`KeyError`**), unless
  there is a key value of `0` in the dict (which is not the case in our example).
  
    ```python
    color_code['blue']   # returns 23
    color_code['red']    # returns 8
    ``` 
* Dictionaries are **mutable** objects: `key:value` pairs can be added and removed, values can be modified.

**Examples:**

* **Create a dictionary with values** in it.

In [None]:
age ={
    "Maria":23,
    "Victor": 34
}

print(age)

# Alternatively:

student_age = dict(Maria=23,Victor=34)
print(student_age)

<br>

* **Retrieve values** associated with keys.

In [None]:
print(age["Maria"])

<br>

* **Trying to access an element of the `dict` by index is not possible**. It raises a **`KeyError`**, because
  python is trying to find the key `0` in the dictionary and it does not exist.

In [None]:
age[23]

* **Adding additional `key:value` pairs** to a dictionary, or modifying an existing key is as easy as:

In [None]:
age["Elenore"] = 15
print(age)

<br>

* **Modifying an existing key** of a dictionary.

In [None]:
age["Elenore"] += 1  # Shortcut for: student_age["Eleonore"] = student_age["Eleonore"] + 1
print(age)

<br>

* **Create an empty dictionary**, then **add values** to it.
  * Empty dictionaries can be created with either **`{}`** or **`dict()`**.
    Using `{}` is considered more *pythonic*.
  * To **add a value to a `dict`**, we simply specify a new key value.

In [None]:
student_age =dict()
student_age = {}
print(student_age)

In [None]:
# Add new key:value pairs to the dict.
student_age["Marco"] =23
student_age["Lisa"]= 13

<br>

We are not restricted to a particular type for keys, nor for values. We can e.g. make a `dict` of lists or `dict` of `dict`.
* In practice, it's best to use dictionaries for storing **homogenous values** (i.e. you probably don't want
  to store unrelated things in different keys).

In [None]:
print(student_age)