[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/How-to-Learn-to-Code/python-class/blob/master/Lesson_2_Control_Structs/Lesson_2_Control_structs_teacher.ipynb)

# Lesson 2 - Data Structures and Control Flows

## Learning objectives: 

Students will learn about various data structures and how to control the flow of their code.

* [Basic Data Structures: `list`, `tuple`, `set`, `dict`](#data_structs)
* [In-Class Exercises: Part 1](#exercises1)
* [Controlling Flow with `if`/`elif`/`else` Statments and `for`/`while` Loops](#control)
* [In-Class Exercises: Part 2](#exercises2)"

### Basic Data Structures: `list`, `tuple`, `set`, `dict` <a id='data_structs'></a>

Data structures are simply structures/containers which hold some data together. In other words, they are used to store a collection of related data. These are particularly helpful when working with experimental data sets. There are four built-in data structures in Python: `list`, `tuple`, `set`, and `dict`. In this lesson, we will learn about each data structures.

#### Lists

A `list` is a data structure that holds an ordered collection of items, i.e. you can store a sequence of items. An easy-to-imagine example is a shopping list where you have a list of items to buy. In that list, however, you probably have each item on a separate line, whereas in Python, you put commas in between them, as you'll soon see.

Here are important properties of lists:

* Lists remember the order of items inserted (i.e. lists are **ordered**).
* Items in a list can be accessed using an index.
* Lists can contain any sort of object. For instance, there can be numbers, strings, tuples, and even other lists.
* You can change a list in-place, add new items, and delete or update existing items (i.e. lists are **mutable**).

Let's create a list containing five different genes: *EGFR, KRAS, MYC, RB,* and *TP53*. In Python, a `list` is created by placing elements inside square brackets `[]`, separated by commas:

In [1]:
gene_list = ['EGFR', 'KRAS', 'MYC', 'RB', 'TP53']
print(gene_list)

['EGFR', 'KRAS', 'MYC', 'RB', 'TP53']


We can access items in a `list` using indexing operations. In Python, indices start at 0.

![indexing](https://cdn.programiz.com/sites/tutorial2program/files/python-list-index.png)

Let's try to access the second gene in our `gene_list`:

In [2]:
gene_list[1]

'KRAS'

Python also has 'negative indices' which can be very convenient when we want to access last i-th item. For instance, the last item in the list can be accessed with index of -1. Let's try to access the second to last item in `gene_list`:

In [3]:
gene_list[-2]

'RB'

We can also access a range (or **slice**) of items in a list using the slicing operator `:`. Let's try to access the first three items in `gene_list`. Note that the start index is *inclusive*, but end index is *exclusive*. We will see later when this can be useful.

In [4]:
# Print the first three items
print(gene_list[0:3])

# If starting from the beginning of the list, the starting index can be omitted
print(gene_list[:3])

# Equivalently
# Notice that the starting index is included, but the ending index is not
print([gene_list[0], gene_list[1], gene_list[2]])

['EGFR', 'KRAS', 'MYC']
['EGFR', 'KRAS', 'MYC']
['EGFR', 'KRAS', 'MYC']


What should we do if we want to access the last three items?

In [5]:
# Print the last three items
# Notice that when going to the end of the list, the ending index is omitted
print(gene_list[-3:])

# Equivalently
print([gene_list[-3], gene_list[-2], gene_list[-1]])

['MYC', 'RB', 'TP53']
['MYC', 'RB', 'TP53']


Now suppose, we are interested in a new cancer gene *PTEN* and want to add this to our gene list. How should we do this? There are a few ways to do this, but one of the simplest is using a `list` method called `append()`: 

In [6]:
gene_list.append('PTEN')
gene_list

['EGFR', 'KRAS', 'MYC', 'RB', 'TP53', 'PTEN']

Oops! Suppose, we wanted add *BRAF* instead of *PTEN*! What should we do? Since lists are **mutable**, we can replace *PTEN* with *BRAF*:

In [7]:
gene_list[-1] = 'BRAF'
gene_list

['EGFR', 'KRAS', 'MYC', 'RB', 'TP53', 'BRAF']

There are other useful `list` methods that we can use to alter and describe lists, including:

- `list.append(x)`: Add an item `x` to the end of the list.
- `list.remove(x)`: Remove the first item in the list whose value is equal to `x`. 
- `list.count(x)`: Return the number of times `x` appears in the list.

Some useful functions for lists include:
- `reversed(l)`: Reverse the order of items in list `l` and return the new list.
- `sorted(l)`: Sort the order of values in list `l` (default is in ascending order) and return the new list.
- `len(l)`: Return the length of list `l`.

#### Tuples

Tuples, like lists, are ordered collection of items, but they have one major difference. Tuples are **immutable**, meaning that we cannot change, add, or remove items after they are created. We can create a `tuple` by placing comma-seperated values inside `()`.

In [8]:
gene_tuple = ('EGFR', 'KRAS', 'MYC', 'RB', 'TP53')
print(gene_tuple)

('EGFR', 'KRAS', 'MYC', 'RB', 'TP53')


Similarly to lists, we can use indexing and slicing to access items:

In [9]:
# Access the third item
print(gene_tuple[2])

# Access the last two items
print(gene_tuple[-2:])

MYC
('RB', 'TP53')


Can we replace *KRAS* with *NRAS* in our `gene_tuple`?

In [10]:
gene_tuple[1] = 'NRAS'

TypeError: 'tuple' object does not support item assignment

This raises error because tuples are **immutable**.

#### Sets

A Python `set` is an unordered collection of unique items. 

The important properties of sets are:

- Items stored in a `set` aren’t kept in any particular order (i.e. sets are **unordered**).
- Items in a `set` are unique – duplicate items are not allowed.
- Because sets are **unordered**, `set` items cannot be referred to by an index.
- Sets are changeable (i.e. **mutable**).

Sets can be created by placing items comma-seperated values inside `{}`

Sets are commonly used for computing mathematical operations such as union, intersection, difference, and symmetric difference.

![set operations](https://www.learnbyexample.org/wp-content/uploads/python/Python-Set-Operatioons.png)

Suppose we have tissue data from two patients, with some genes being upregulated in tumor tissues compared to normal tissues. We would like to know if there are any shared upregulated genes between the patients.

In [11]:
patient1 = {'ABCC1', 'BRCA1', 'BRCA2', 'HER2'}
patient2 = {'BRCA1', 'HER2', 'ERCC1'}

In [12]:
# Intersection of the two sets using the intersection method
print(patient1.intersection(patient2))

# Equivalently, we can use the & operator
print(patient1 & patient2)

{'HER2', 'BRCA1'}
{'HER2', 'BRCA1'}


#### Dictionaries

A dictionary (`dict`) is like an address book, where you can find the address of a person by knowing only their name. That is, we associate keys (names) with values (addresses). Note that the key must be unique: you cannot find the correct address if two people have the exact same name. Also, you can use only immutable objects (like strings) for the keys of a dictionary, but you can use either immutable or mutable objects for the values. Note, dictionaries themselves are considered mutable.

Pairs of keys and values are specified in a `dict` by using the notation `dict = {key1: value1, key2: value2}`. Notice that the key-value pairs are separated by a colon, the pairs themselves are separated by commas, and all this is enclosed in a pair of curly braces `{}`.

Suppose we wanted to keep track of the ages of patients in a clinical trial:

In [13]:
ages_dict = {'Karen P.': 53, 'Jessica M.': 47, 'David G.': 45, 'Susan K.': 57, 'Eric O.': 50}
print(ages_dict)

{'Karen P.': 53, 'Jessica M.': 47, 'David G.': 45, 'Susan K.': 57, 'Eric O.': 50}


We can access a person's age (value) using their name (key). Let's find out Eric O.'s age:

In [14]:
ages_dict['Eric O.']

50

Suppose new patient is enrolled into the clinical trial. Her name is Hannah H., and her age is 39. We can add a new item to the dictionary:

In [15]:
ages_dict['Hannah H.'] = 39
print(ages_dict)

{'Karen P.': 53, 'Jessica M.': 47, 'David G.': 45, 'Susan K.': 57, 'Eric O.': 50, 'Hannah H.': 39}


### In-Class Exercises: Part 1 <a id='exercises1'></a>

The following exercises will help you better understand data structures.

1. Make a list named `num_list` that contains the following numbers 1, 4, 25, 7, 9, 12, 15, 16, and 21.

In [16]:
num_list = [1, 4, 25, 7, 9, 12, 15, 16, 21]

2. Find the following information about `num_list`: length, minimum value, and maximum value. (You may need to search to find functions that can help you.)

In [17]:
# Length
len(num_list)

9

In [18]:
# Minimum value
min(num_list)

1

In [19]:
# Maximum value
max(num_list)

25

3. Make a dictionary named `med_dict` that contains the price of five medications.
>* Lisinopril: $23.07
>* Gabapentin: $86.27
>* Sildenafil: $169.94 
>* Amoxicillin: $17.76
>* Prednisone: $13.81

In [20]:
med_dict = {'Lisinopril': 23.07, 'Gabapentin': 86.27, 'Sildenafil': 169.94, 'Amoxicillin': 17.76, 'Prednisone': 13.81}

4. Use `med_dict` to calculate how much it will cost if a patient is treated with both Lisinopril and Prednisone.

In [21]:
med_dict['Lisinopril'] + med_dict['Prednisone']

36.88

### Controlling Flow with `if`/`elif`/`else` Statments and `for`/`while` Loops <a id='control'></a>

#### `if`/`elif`/`else` Statements

Control flow is required when we want to execute a piece of code only if a certain condition is satisfied.

The `if`/`elif`/`else` statements are used in Python for this decision making. We can use these statments to execute a block of code only when specific conditions are true. Note that `elif` is an abbreviation for else if. These statements use the following syntax:

```python
if condition1:
    statment1
elif condition2:
    statement2
else:
    statement3
```

![ifelse syntax](https://www.learnbyexample.org/wp-content/uploads/python/Python-elif-Statement-Syntax.png)

Let's think about a clincal trial that is trying to find an appropriate dose. We first treat three patients with dose $x$. If no patients show toxic side effects, we'll increase the dose. If only one patient shows toxicity, we'll treat another three patients to learn more. If more than one patients show toxicity, we'll stop at that dose. Let's write these conditions in Python. You can change the value of `n_toxic` to see the different behaviors.

In [22]:
n_toxic = 2
if n_toxic == 0:
    print("Increase the dose.")
elif n_toxic == 1:
    print("Treat another three patients.")
else:
    print("Stop!")

Stop!


#### Loops

There are two types of loops in Python: `while` loops and `for` loops. Loops are useful when we want to perform the same task repetitively.

##### `while` Loop

A `while` loop is used when you want to perform a task indefinitely until a particular condition is met. For instance, suppose we want to enroll new patients into a clinical trial until we have 30 patients total.

In [23]:
n_patients = 1
while n_patients <= 30:
    print("Number of enrolled patients:", n_patients)
    n_patients += 1

Number of enrolled patients: 1
Number of enrolled patients: 2
Number of enrolled patients: 3
Number of enrolled patients: 4
Number of enrolled patients: 5
Number of enrolled patients: 6
Number of enrolled patients: 7
Number of enrolled patients: 8
Number of enrolled patients: 9
Number of enrolled patients: 10
Number of enrolled patients: 11
Number of enrolled patients: 12
Number of enrolled patients: 13
Number of enrolled patients: 14
Number of enrolled patients: 15
Number of enrolled patients: 16
Number of enrolled patients: 17
Number of enrolled patients: 18
Number of enrolled patients: 19
Number of enrolled patients: 20
Number of enrolled patients: 21
Number of enrolled patients: 22
Number of enrolled patients: 23
Number of enrolled patients: 24
Number of enrolled patients: 25
Number of enrolled patients: 26
Number of enrolled patients: 27
Number of enrolled patients: 28
Number of enrolled patients: 29
Number of enrolled patients: 30


What do you think will happen if we use `while True`?

##### `for` Loop

`for` loops are used for iterating over a sequence of objects, i.e. going through each item in a sequence. Objects that can be used in `for` loops include lists, tuples, dictonaries, sets, and strings. `for` loops have the following syntax:

```python
for var in iterable:
    statement
```


Let's look at an example of a `for` loop:

In [24]:
for i in [0, 1, 2, 3, 4, 5]:
    print(i)

0
1
2
3
4
5


We can also use the `range()` function to do the same thing. The `range()` function can take three parameters: starting value (default is 0), ending value, and step size (default is 1). Like slicing, the start value is inclusive but the end value is exclusive. Note that when an argument or parameter has a default value and you want to use that value, you don't need to explicitly include it.

In [25]:
for i in range(0, 6):
    print(i)

0
1
2
3
4
5


In [26]:
# The starting value can be omitted since there is a default value of 0.
for i in range(6):
    print(i)

0
1
2
3
4
5


### In-Class Exercises: Part 2 <a id='exercises2'></a>

Before going into exercise, let's learn a handy operation `+=`:

In [27]:
a = 0
a = a + 1
print(a)

1


is equivalent to 

In [28]:
a = 0
a += 1
print(a)

1


Note that similarly `-=`, `*=`, and `/=` exist!

1. One of the most exciting parts about control flow is that they can be *nested* to make more complex algorithms. Let's try the complementary oligo sequence task that we discussed in Lesson 1. Solve this problem using `for` and `if`. For an added challenge, try making your answer even shorter with a `for` loop and dictionary.

As a reminder, we want two new variables, `comp_oligo1` and `comp_oligo2`, that are the complementary DNA sequences of `oligo1` and `oligo2`. (Hint: A <-> T and G <-> C)

In [29]:
# From Lesson 1
oligo1 = 'GCGCTCAAT'
oligo2 = 'TACTAGGCA'

In [30]:
# Using for and if
comp_oligo1 = ''
for nuc in oligo1:
    if nuc == 'A':
        comp_oligo1 += 'T'
    elif nuc == 'T':
        comp_oligo1 += 'A'
    elif nuc == 'C':
        comp_oligo1 += 'G'
    else:
        comp_oligo1 += 'C'
print(comp_oligo1)

CGCGAGTTA


In [31]:
# Using for and dict
comp_oligo1 = ''
comp_dict = {'A': 'T', 'T': 'A', 'C': 'G', 'G': 'C'}
for nuc in oligo1:
    comp_oligo1 += comp_dict[nuc]
print(comp_oligo1)

CGCGAGTTA


2. Let's go back to the dose-finding clinical trials. We have a list of doses that we want to test: `dose_list = [1, 2, 3, 5, 8, 13]`. We want to increase the dose until we hit the maximal tolerated dose (MTD). For simplicity, we will increase the dose when less than two of the three patients have toxicity and stop otherwise. The last dose before at least two patients have toxicity will be declared the MTD. Suppose we already know how many patients will have toxicity at each dose: `tox_list = [0, 0, 1, 1, 2, 2]`. Try finding the MTD using a `while` loop.

In [32]:
dose_list = [1, 2, 3, 5, 8, 13]
tox_list = [0, 0, 1, 1, 2, 2]

In [33]:
# Using while loop
i = 0
while tox_list[i] < 2:
    i += 1
mtd = dose_list[i - 1]
print(mtd)

5
