# **** Collections ****


In the previous Jupyter Notebook we had a look at variables. Often, however we need a convenient way to represent a collection of data (think of a series of proteins, a list of all the students, the results of a football league table, etc). Python offers several very handy built-in types to deal with this. In the following we will learn about *lists*, *tuples* and *dictionaries*.

In [None]:
# run this cell to show a video, use slider to resize it, type Esc-o to hide it
from IPython.display import Video, clear_output; from ipywidgets import interactive, IntSlider
def _play(resize): display(Video(filename="media/ECS780P_Collections_topicSummary.mp4",data="",width=resize))
interactive(_play, resize=IntSlider(min=150, max=900, step=50, value=600, continuous_update=False, readout=False))

# Lists

Lists are the "workhorse" data type in Python, so it is important to understand them well. A list is essentially an ordered collection of items: 

In [None]:
firstList = [1, 2, 3, 4, 5]
print(firstList)

Since each value in Python "knows" its type, there is no danger in putting together different types of values:

In [None]:
another = ["Me", "You", "Them"]
mixAndMatch = ["One", 2, "Three", 4.0]
emptyList = []
print(another, mixAndMatch, emptyList)

### Indexing

You can access individual elements of the list by *indexing*. This is done with the **[ ]** operator:

In [None]:
measures = ['tsp', 'tbsp', 'c', 'lb']
print("Length ", len(measures))
print(measures[0]) # the first element
print(measures[3]) # the last element

Incidentally, the **[ ]** operator also works for strings:

In [None]:
'tbsp'[1]

so that this is valid code:

In [None]:
measures = ['tsp', 'tbsp', 'c', 'lb']
print(measures[0][2]) # third character of the first element of the list

A few other handy tricks:

In [None]:
lst = ['a', 'b', 'c', 'd']
print(lst[-1])  # last element
print(lst[-2])  # second last
print(lst[1:3]) # from element 1 included to 3 excluded

You can, of course, use indexing to modify lists:

In [None]:
lst = ['a', 'b', 'c', 'd', 'e', 'f']
lst[-1] = 'z'
lst[0] = lst[1]
lst[2:4] = ['Wah','Wah']
print(lst)

### Operations on lists

The **dir** command gives you a handy way to list properties and operations (i.e. methods) defined for an object. Disregard the entries beginning and ending with ```__``` that have to do with the internal representation of the object. Let's try this on a list:

In [None]:
dir([])

**append** and **pop** attach and remove an element from the "tail" of the list, respectively. They can be used to implement a LIFO (last-in first-out) queue, also known as a *stack*.

In [None]:
queue = ['Last', 'In', 'First']
queue.append('Out')
print(queue)
print(queue.pop())
print(queue)

To concatenate two lists, used **extend**

In [None]:
queue.extend(['Second', 'Third'])
print(queue)

Other operations you can use are:

In [None]:
queue.remove('In')
print(queue)

In [None]:
queue.reverse()
print(queue)

In [None]:
queue.sort()
print(queue)

Note that strings are sorted in *alphabetical order*. More details on sorting are available [here](https://wiki.python.org/moin/HowTo/Sorting).

In [None]:
# run this cell to show a video, use slider to resize it, type Esc-o to hide it
from IPython.display import Video, clear_output; from ipywidgets import interactive, IntSlider
def _play(resize): display(Video(filename="media/ECS780P_Collections_Lists_p1of2.mp4",data="",width=resize))
interactive(_play, resize=IntSlider(min=150, max=900, step=50, value=600, continuous_update=False, readout=False))

### List comprehensions

There is a handy way of defining lists in Python, starting from other lists. This is similar to what is done with sets in mathematics. Consider the following set: $A=\{1,2,3,4,5\}$. You can define $B=\{3x | x\in A\}$ (read 3 times x for x in A), which explicitly means $B=\{3,6,9,12,15\}$. The same is possible with Python lists:

In [None]:
A = range(1,6)
print(A)
B = [3*x for x in A]
print(B)

We can also use *conditionals* in comprehensions to pick elements that satisfy a particular property (we will cover *conditionals* in detail, later on):

In [None]:
even = [x for x in A if x%2==0]
odd = [x for x in A if x not in even]
print(even)
print(odd)

This can be used to operate on all elements of a list:

In [None]:
netTerms = ['aRPANET','aSCII','aSIC','aSP']
netTerms = [x.replace('a','A') for x in netTerms]
print(netTerms)

### A common pitfall

Be warned that the name of a list is just a reference to a memory area where the list elements are stored. Thus, a copy of a list creates another list containing the exact same objects, i.e. not different objects. This makes the copy operation very efficient, but it can lead to some surprises:

In [None]:
a = ['We', 'Are', 'All', 'Unique']
b = a
print(b)
b.insert(2,'Not')
print(b)
print(a) # Ops! a has been modified too

*If you do want to copy all elements one by one*, you can use indexing to enumerate all the elements of list **a** and assign the resulting list to list **b**:

In [None]:
a = ['This', 'Is', 'All']
b = a[:] # this makes all the difference
b[1] = "Isn't" # have to use double quotes since string contains a single quote
print(a)
print(b)

In [None]:
# run this cell to show a video, use slider to resize it, type Esc-o to hide it
from IPython.display import Video, clear_output; from ipywidgets import interactive, IntSlider
def _play(resize): display(Video(filename="media/ECS780P_Collections_Lists_p2of2.mp4",data="",width=resize))
interactive(_play, resize=IntSlider(min=150, max=900, step=50, value=600, continuous_update=False, readout=False))

# Tuples

Tuples are **immutable** lists. They are more efficient than lists and can be used as keys for dictionaries (see below); with the obvious modificatios, they can be used like lists.

In [None]:
fileExts = ('.txt','.csv','.pdf','.ppt') # we are unlikely to change this
print(fileExts[2])
print(fileExts.index('.ppt'))
fileExts[0] = 'Spam' # hmmm...

Notice that methods to change the elements are missing ... hence the error we see above.

In [None]:
dir(())

This other example is slightly tricky:

In [None]:
a = (1,2,3)
b = a
a = a+(4,5)
print(a)
print(b)

In this example we are not changing the **tuple**, we are actually creating the new tuple (1,2,3,4,5) and assigning it to **a** instead of the old one.

In [None]:
# run this cell to show a video, use slider to resize it, type Esc-o to hide it
from IPython.display import Video, clear_output; from ipywidgets import interactive, IntSlider
def _play(resize): display(Video(filename="media/ECS780P_Collections_Tuples.mp4",data="",width=resize))
interactive(_play, resize=IntSlider(min=150, max=900, step=50, value=600, continuous_update=False, readout=False))

# Dictionaries

You can think of lists and tuples as a series of variables indexed by an integer. **Dictionaries** are series of variables indexed by an arbitrary object, typically (but by all means not always) a string:

In [None]:
aminoAcids = {'Ala':'Alanine', 'Cys': 'Cysteine', 'Pro': 'Proline', 'Leu': 'Leucine'}
print(aminoAcids['Leu'])
aminoAcids['His'] = 'Histidine' # adds this element at the end
print(aminoAcids) # notice that a dictionary is not ordered

You can add items one by one to an empty dictionary **{ }**:

In [None]:
emptyDict = {}
print(len(emptyDict))
emptyDict['Foo'] = 'Bar'
print(len(emptyDict))
print(emptyDict)

##### Keys and values:

You can display the **keys** and **values** contained in a dictionary using the corresponding methods:

In [None]:
print(aminoAcids.keys())
print(aminoAcids.values())

Or you can have them all together:

In [None]:
aminoAcids.items()

Trying to look up an unexisting key can lead to an error:

In [None]:
aminoAcids['Ser']

So it may be better to check (this is a *conditional* expression, more on this later):

In [None]:
'Ser' in aminoAcids

Or you can play it safe:

In [None]:
print(aminoAcids.get('Ala','Oops...'))
print(aminoAcids.get('Ser','Oops...'))

You can use **del** to delete items (or you can **pop** the dictionary, which returns the deleted item):

In [None]:
del aminoAcids['Ala']
print(aminoAcids.pop('Pro'))

We can ```update``` a dictionary with values from another:

In [None]:
more = dict(Ala='Alanine', Pro='Proline')
print(more)
aminoAcids.update(more)
print(aminoAcids)

And finally, let's turn this around:

In [None]:
k = aminoAcids.keys()
v = aminoAcids.values()
l = list(zip(v,k))
print(v)
print(k)
print(l)
symbol = dict(l)
print(symbol)
print(symbol['Cysteine'])

Note that turning a dictionary "on its head" in this way is not normally possible, as two or more keys can have the same value - nor is it necessarily useful, but it's a good exercise nevertheless!

In [None]:
# run this cell to show a video, use slider to resize it, type Esc-o to hide it
from IPython.display import Video, clear_output; from ipywidgets import interactive, IntSlider
def _play(resize): display(Video(filename="media/ECS780P_Collections_Dictionaries.mp4",data="",width=resize))
interactive(_play, resize=IntSlider(min=150, max=900, step=50, value=600, continuous_update=False, readout=False))