In [None]:
# some global settings
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

## Data Structures
Data structures are the most important element to learn in a programming language. This is where I personally struggled the most when learning Python and R. So, pay close attention here.

What does data structure mean? Data structure refers to the special types of objects meant to contain data organized in a certain way. That's not a very helpful definition. So let's look at some concrete data structures and their examples.

### Lists
A list is an _ordered collection of values_, where each value can appear multiple times. In Python, list is defined by surrounding the values, separated by commas, with a pair of **square brackets**

```python
new_list = [] # create an empty list
my_list = [3, 2.44, "green", True] # specify values
# notice how you can mix different data types, here including integer, float, string and boolean
a = list("0123456789") # the `list()` function takes a string as input and split it
print(a) # check for yourself what the `list()` function did
```

In [None]:
# test here

you can retrieve specific elements of a list, or assign new values to them using index
```python
my_list[1] # think for a second what this should give you
my_list[0] # how about this?
my_list[2] = "red" # what does this command do?
my_list[4] # what do you get with this? why?
```

In [None]:
# test here

we can retrieve more than one element at a time, using the colon (:) operator.
```python
my_list # this will print the entire list
my_list[0:1] # elements 0 to 1 (non-inclusive)
             # what can you deduce from the output of this command?
```
Now, can you write a command to retrieve the first three elements of my_list?

In [None]:
# test here

sometimes you will see colon operators with no number on one or both sides
```python
my_list[:] # what does this do?
my_list[:3] # how about this?
```

In [None]:
# test here

you can even use negative numbers to index from the end, try this
```python
my_list[-2]
```

In [None]:
# test here

There are some useful methods for list. Type them into a code cell and try them out
1. append
    ```python
    my_list.append(25)
    print(my_list)
    ```
2. copy
    ```python
    new_list = my_list.copy()
    new_list
    ```
3. clear: clear out all elements in a list!
    ```python
    my_list.clear()
    my_list
    ```
4. count
    ```python
    seq = list("TKAAVVNFT")
    seq.count("V")
    ```

In [None]:
# test here

5. index: return the index corresponding to the first occurrence of an element
    ```python
    seq.index("V")
    ```
6. pop: remove the last element of the list and return it (hence "pop")
    ```python
    seq2 = seq.pop()
    seq
    seq2
    ```
7. sort: sort the elements in place (useful for a list of numbers or characters, but can give unexpected results for other types of data!)
    ```python
    a = [1, 5, 2, 42, 14, 132]
    a.sort()
    a
    ```
8. reverse
    ```python
    a.reverse()
    a
    ```
9. del: delete an element or a series of elements from a list
    ```python
    del(a[2:3])
    a
    ```


In [None]:
# test here

### Dictionaries
A dictionary is like an unordered list in which the elements are indexed by _keys_. The principle is the same as in an actual dictionary, where definitions are indexed by words. So when do we use dictionaries in Python? It is useful when the variables _do not_ have a natural order. In Python, dictionaries are defined by separating key:value pairs using commas, and surrounding them by **curly brackets**

In [None]:
# create an empty dictionary
my_dict = {}

# dictionaries can contain many types of data
my_dict = {"a": "test", "b": 3.14, "c": [1,2,3,4]}
my_dict

In [None]:
# some more useful examples
GenomeSize = {"Homo sapiens": 3200.0, "Escherichia coli": 4.6, "Arabidopsis thaliana": 157.0}
# a dictionary has no natural order
# i.e. the order of key:value input does not matter
GenomeSize

In [None]:
# call a specific key (there is no numbering)
GenomeSize["Arabidopsis thaliana"]

In [None]:
# add a new value using a key not already present
GenomeSize["Saccharomyces cerevisiae"] = 12.1
GenomeSize

In [None]:
# nothing happens if the key:value pair already exists
GenomeSize["Escherichia coli"] = 4.6
GenomeSize

In [None]:
# can you guess what would happen with the following command?
GenomeSize["Homo sapiens"] = 3201.1
GenomeSize

_Questions_
1. what happens if you assign a key:value that already exists in a dictionary?
2. How about when a key:value pair matches a key that is already present in the dictionary?
3. Why does dictionary have these behaviors?

Some methods for dictionaries
1. copy

In [None]:
GS = GenomeSize.copy()

2. clear: you can guess it, try yourself

In [None]:
GenomeSize.clear()

3. get: get the value from a key. If the key is not present, return a default value (why?)

In [None]:
GS.get('Homo sapiens') # this is the same as GS['Homo sapiens']

In [None]:
# however, the get method has a special utility, that is, when the key doesn't exist
GS['Mus musculus']

In [None]:
GS.get("Mus musculus", -10)

4. keys

In [None]:
GS.keys()

5. values

In [None]:
GS.values()

6. pop(KEY): remove the specified key from the dictionary and return the corresponding value

In [None]:
GS.pop("Homo sapiens")
GS

7. update: this can be used to join two dictionaries

In [None]:
D1 = {"a":1, "b":2, "c":3}
D2 = {"a":2, "d":4, "e":5}
D1.update(D2)
D1

_Question_: can you figure out how update() works?

## Tuples
A tuple contains a sequence of values of any type. What makes it different from "list" is that its elements are _immutable_, meaning that once created, it cannot be changed. This is useful for defining, for example, constants like pi, so that you don't accidentally change it. This is a special type of "varaible" that is designed to _not_ change. Tuples are created by surrounding commad-separated values with **round brackets**.

Try the following to figure out how tuples work
```python
my_tuple = (1, "two", 3)
my_tuple[0]
my_tuple[0] = 33
tt = (1,1,1,1,2,2,4)
tt.count(1)
tt.index(2)
```

In [None]:
# test here

A special use of tuple is by using them as keys in a dictionary

In [None]:
D3 = {("trial", 62): 4829}
D3.keys()
D3.get(("trial",62),-10)

## Sets
Sets are lists with _no duplicate entries_. They come with special operators for union, intersection and difference. There are two ways to initialized sets: either you can use **curly brackets** around coma-separated values (making it very similar to defining a dictionary), or you can call the function set on a list, thereby removing duplicate values. For example

In [None]:
# create a list
a = [5, 6, 7, 7, 7, 8, 9, 9]
# use the set function on the list
b = set(a)
# duplicate values have been removed
b

In [None]:
c = {3, 4, 5, 6}
# intersection
b & c

In [None]:
# union
b | c

In [None]:
# difference: in b but not in c or in c but not in b
b ^ c

Above, we have used the logical operators &, |, ^ for the union, intersection and difference of two sets, respectively. These operations are also available as built-in methods. Furthermore, you can test whether a set is a subset (or superset) of another:

In [None]:
s1 = {1, 2, 3, 4}
s2 = {4, 5, 6}
s1.intersection(s2)
s1.union(s2)

In [None]:
# difference
s1.symmetric_difference(s2)
s1.difference(s2)
# can you figure out the difference between these two "difference" functions?

In [None]:
# subset
s1.issubset(s2)
s1.issubset(s1.union(s2))

One last note: it may be confusing to find that calling `a = {}` creates an empty dictionary, and not an empty set! To initialize an empty set, use `a = set([])`