# Introduction

In the last unit we looked at basic data types and also at variables mostly made up of single units although we did discuss that the ```str``` data type can be thought of as a sequence of single characters (if the string contains more than one character). In this unit we'll look at other objects that are capable of containing more than one value. There are three basic objects of this kind in python; lists, dictionaries and tuples (rhymes with couples). These are the workhorses of almost all real python scripts and programmes so familiarity with these is important if you want to do anything serious. In addition understanding these will help you grasp some of the concepts of loops, another important programming concept.

# Lists

Lists are a really important object type in python. They are containers for more than one value, can contain different object types, can be changed, can be subset (just like strings) and perhaps most importantly each value in a list can be addressed independently. So how do we set up a list? We have two available methods. We can call the ```list()``` function or we can use square bracket notation (not to be confused with sub-setting).

In [2]:
lst1 = list(('a', 1, 'list content', 45, 23, 19))
lst2 = ['fire', 'at', 'will', 15, 45, [13.2, 12.6, 'a nested list']]

print lst1
print type(lst1)
print lst2

['a', 1, 'list content', 45, 23, 19]
<type 'list'>
['fire', 'at', 'will', 15, 45, [13.2, 12.6, 'a nested list']]


A couple of things to note here. First the use of double parentheses when we set up a list using the ```list()``` function. Secondly it's a really bad thing to do to call your list ```list```. This has massive potential to confuse python because ```list``` is one of a group of [reserved](https://docs.python.org/2/reference/lexical_analysis.html#keywords) keywords that python uses for internal functions. In this specific case ```list``` is used to, eh... create lists. Using ```list``` as a variable name means that python no longer has access to the internal functions of the keyword ```list```. This is bad - so don't do it. The same applies to the other keywords (see link above) i.e. don't give your variables the same names as the keywords.

In addition note that the variable ```lst2``` above is a list that contains another list! This list within a list is termed a **nested list** and is perfectly legal because lists can store any type of object.

## Subsetting lists

As mentioned above we can access the individual values in a ```list```. To do this we use exactly the same sub-setting notation we used when we were slicing up ```strings```.

In [4]:
print lst1[1]
print lst1[0:4]
print lst2[5]

1
['a', 1, 'list content', 45]
[13.2, 12.6, 'a nested list']


When we subset a list and ask for several values to be returned (e.g. ```print lst1[0:4]```) we get back another list. Note also that asking for the 5th element of ```lst2``` returned the nested list object. One thing not mentioned in the previous introduction to subsetting (when we looked at strings) was that we can also have a third number in our square brackets that changes the 'step size' as we go through a sequence. The sub-setting syntax looks like ```[start:end:step]```. For example if we wanted every second element in a list we could subset in the following way.

In [20]:
lst = [1,2,3,4,5,6,7,8,9,10]
lst[1::2]

[2, 4, 6, 8, 10]

Let's break this down. We take the 1th element (remember that will be the second element in the list because of zero indexing), we don't specify any end point so we get everything up until the end of the list and finally we put in a step value of 2 after a second colon. So we extract values 2, 4, 6, 8 and 10.

## Copying lists

As well as being generally useful for extracting parts of lists the sub-setting notation is also useful for copying lists. Recall in the previous section we talked about how a variable name is a label pointing to a particular place in memory where a particular object is stored. So if we create a list, attach a variable name to it and then attach another variable name to the same list (say ```lst_copy```) we have to remember that we haven't actually copied our list - we just have two names pointing at the same object. Thus if we make changes to the copy **we will change the original as well**. Let's see an example.

In [1]:
lst = [2,'a',6,'d','twerk']
lst_copy = lst

print lst 
print lst_copy

[2, 'a', 6, 'd', 'twerk']
[2, 'a', 6, 'd', 'twerk']


In [2]:
lst_copy[4] = 'daffy' # make a change to the 'copy'

print lst_copy
print lst

[2, 'a', 6, 'd', 'daffy']
[2, 'a', 6, 'd', 'daffy']


Absolutely the only thing to take away from this example is that we "copied" a list, editied the copy (replacing 'twerk' with 'daffy') but the *original* list changed. If you want your original list to stay as it was (say you want to keep the original data) then you have to make an actual copy of your list. So how do you do that?

Well the subsetting notation comes to the rescue. When you subset a list you get a new list back so you can create a subset that's your whole original list and assign that to a new variable. Recall that in the subsetting notation we use a colon (:) to indicate a range of values we want. If we leave off the first number we get everything from element [0] to (but not including) the number after the colon; if we leave of the second number we get everything from the first number to the end of the sequence. **But** if we leave off both numbers and only subset with a colon we get everything from start to finish. This seems a little clumsy and (perhaps) opaque but it works very well.

In [3]:
lst = [2,'a',6,'d','twerk']
lst_copy = lst[:] # make an actual copy

print lst 
print lst_copy

[2, 'a', 6, 'd', 'twerk']
[2, 'a', 6, 'd', 'twerk']


In [4]:
lst_copy[4] = 'daffy' # make a change

print lst_copy
print lst

[2, 'a', 6, 'd', 'daffy']
[2, 'a', 6, 'd', 'twerk']


In this instance because we actually copied our original list the original is unchanged by edits on the copy. Whew!

## Reversing a list

The double colon notation ```[::]```, combined with a negative step value is one useful way to reverse a list.

In [21]:
lst = [1,2,3,4,5,6,7,8,9,10]
lst[::-1]

[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]

Here we took the whole list (no start value and no end value for the sub-setting) and we used a step value of -1 to go through the list backwards. This probably seems counter-intuitive but the more you do it the more natural it seems. We'll see a ```method``` (recall these use the dot notation) for list reversal below as well.

## Operations and methods for lists

As you can see above lists are able to contain several different types of objects including other lists. Using the ```dir``` function returns the ```methods``` we can use on lists (try it).

### Sorting a list

One common operation on lists is sorting. There are two ways of doing this. The first is to use the ```sorted()``` function. This works on any object that can be sorted. This function doesn't change the original object at all - it only shows you the original object sorted (by default into ascending order).

In [10]:
lst = [56,23,10,'a' ,'D', 'L', 'v']
sorted(lst)

[10, 23, 56, 'D', 'L', 'a', 'v']

Python (and other programming languages) will sort text alphabetically but they regard capital letters as having lower values than lowercase letters. Because the ```sorted()``` function doesn't change the original list you can use it to assign a sorted list to a new variable safe in the knowledge that the original list will be unchanged.

In [11]:
lst = [56,23,10,'a' ,'D', 'L', 'v']
lst2 = sorted(lst)

print lst
print lst2

[56, 23, 10, 'a', 'D', 'L', 'v']
[10, 23, 56, 'D', 'L', 'a', 'v']


The ```sorted``` function takes additional ```arguments``` as well as the sequence to sort. One of these is the ```reverse``` argument which by default is set to the boolean value ```False```. If you set this to ```True``` the list is sorted in descending order. We'll discuss boolean values in detail in the next unit.

In [15]:
lst = [56,23,10,'a' ,'D', 'L', 'v']
lst2 = sorted(lst, reverse=True)

print lst
print lst2

[56, 23, 10, 'a', 'D', 'L', 'v']
['v', 'a', 'L', 'D', 56, 23, 10]


The other way to sort lists (as you probably spotted if you looked at the output of the ```dir()``` function for a list) is the ```sort``` method. Because this is a method and not a function it is used via the dot notation. There are some important differences between the ```sorted()``` function and the ```sort``` method. First the ```sort``` method only works on lists. Secondly the ```sort``` method actually changes the original list and it doesn't return any value. What does this mean? Well it means you cannot use this method and assign the result to a new variable; nothing is returned so there's nothing for the new variable to point at. In the python world we say that the ```sort``` method sorts the list **in place**. For this and other reasons it's generally better to stick to the ```sorted()``` function. I know this is confusing but I'm not responsible for this!

In [8]:
lst = [56,23,10,'a' ,'D', 'L', 'v']
print lst

lst2 = lst.sort()
print lst2

[56, 23, 10, 'a', 'D', 'L', 'v']
None


One other method for lists that's relevant here is the ```reverse``` method. It does what it says and reverses the list. Note that like the ```sort``` method this happens in place so the original list is changed (reversed in this case). Similar to above no value is returned so if you want to make a new list that's your old list reversed use the ```[::-1]``` syntax and assign that to a variable.

In [5]:
lst = [56,23,10,'a' ,'D', 'L', 'v']
print lst
lst.reverse() # reverses original list
print lst

lst = [56,23,10,'a' ,'D', 'L', 'v']
lst_rev = lst[::-1] # assign to a new variable
print lst_rev

[56, 23, 10, 'a', 'D', 'L', 'v']
['v', 'L', 'D', 'a', 10, 23, 56]
['v', 'L', 'D', 'a', 10, 23, 56]


### Adding and removing with lists

There are three methods to add or remove values from a list. The ```append()``` method, as you might have guessed adds values (elements) to the end of lists. This introduces a key feature of lists - you can change the contents. Lists are said to be **mutable**.

In [7]:
lst = ['a', 'b', 'c', 1, 2, 3]
print lst
lst.append('appended data')
print lst

['a', 'b', 'c', 1, 2, 3]
['a', 'b', 'c', 1, 2, 3, 'appended data']


The ```extend()``` method takes another list as an argument and sticks the elements of that list to the end of the target list. Note that these elements get put in as values and not as a nested list.

In [8]:
lst2 = ['orange', 'mango', 'banana']
lst.extend(lst2)
print lst

['a', 'b', 'c', 1, 2, 3, 'appended data', 'orange', 'mango', 'banana']


The ```pop()``` method can take the index of a list member (i.e. the position in the list where that value is) and remove and return the value. This means the list no longer contains the value but you get the value back and you can assign it to a variable and do something with it. If you don't give ```pop()``` a value argument it removes and returns the last element in the target list.

In [9]:
print lst # where we started
fruit = lst.pop()
number = lst.pop(5)
print fruit
print number
print lst # where we ended

['a', 'b', 'c', 1, 2, 3, 'appended data', 'orange', 'mango', 'banana']
banana
3
['a', 'b', 'c', 1, 2, 'appended data', 'orange', 'mango']


Note here how the target list (```lst```) has lost the elements ```banana``` and ```3```. They've been __```pop```__ped off! 

What about times when you know an element is in the list but you don't know the index? You can use the ```remove()``` method for that. Note that unlike ```pop()``` this does not return the value so you cannot assign the removed value to a variable.

In [28]:
lst.remove('appended data')
print lst

['a', 'b', 'c', 1, 2, 'orange', 'mango']


You can also find the index of an element using the ```index()``` method. You hopefully used this in the homework from the last unit.

In [10]:
lst.index('orange')

6

Finally you might want to remove a bunch of things from a list and you can do this with the ```del``` function and a ```slice``` of your list. Below we use ```del``` to remove elements 0, 1, 2 & 3 from the variable ```lst```.

In [11]:
print lst
del lst[:4]
print lst

['a', 'b', 'c', 1, 2, 'appended data', 'orange', 'mango']
[2, 'appended data', 'orange', 'mango']


## Putting it all together!

The [```range```](https://docs.python.org/2/library/functions.html#range) function takes a number (n) and returns a list of numbers starting at 0 and going to n-1. Using this function assign a list of the numbers 0 to 10 to a variable. Use subsetting with a step of 2 to extract the even numbers. Check that the modulo of the 4th element in this new list is zero. Remember that python counts the first object in a list as object zero. Finally sort the new list of even numbers in reverse order (you might want to recap the difference between the ```sort()``` method and the ```sorted``` function).

Bonus - Create a string variable. What do you get if you use the ```list``` function on that variable? Look up the [```join()```](http://www.tutorialspoint.com/python/string_join.htm) method for strings and use that to put your string back together. For this last bit remember the list is the *sequence* you want to join together! 

I will warn you now the ```join``` method looks completely unintuitive! Think of it as a sequence instructions from left to right - 'use this character to *join* the stuff in this object together into a ```string```'.

## Summary

We met lists in this part. These are data structures that can contain any kind of object, including other (nested) lists. They can be created with either square bracket notation or the ```list``` function. Lista are **mutable** in that the content of lists can be changed. They can be sliced using the same colon [:] notation we met when we discussed strings. We also introduced the notation for taking steps other than 1 through a sequence i.e. [1:8:2] - step from element 1 to element 8 by 2 (elements 1,3,5,7). There are a number of useful methods for acting on lists (e.g. ```pop```, ```append```, ```extend```) and we'll see more on these when we consider loops.

# Dictionaries

Like lists ```dictionaries``` in python can contain objects of different types. However dictionaries have one unique feature that makes them quite distinct from lists. In a list the indices of each element have to be an integer. In dictionaries the indices can be (almost) any type of object. 

Dictionaries are essentially a **mapping** between a set of indices (keys) and a set of values - they are composed of key-value pairs. Each key-value pair in a dictionary is an item in that dictionary. That's all well and good but what does it mean and how do you create dictionaries? Rather like lists we have more than one method - one uses the ```dict``` function, one uses curly brackets (note the difference with lists) and one other (which we'll talk about in the ```tuples``` section below) uses lists of key-value pairs.

In [7]:
my_first_dictionary = {}
my_second_dictionary = dict()

print my_first_dictionary
print my_second_dictionary

{}
{}


Notice that that output of these two ```print``` statements is exactly the same - both variables are empty dictionaries and are denoted by curly brackets. To add items to a dictionary you have to define the key and the value. You can define the key using square brackets on the left hand side of an assignment whilst the value is defined on the right hand side. Alternatively you can assign several key-value pairs at once using ```key: value``` notation inside curly brackets.

In [8]:
my_first_dictionary['key0'] = 'value one'
my_first_dictionary['key1'] = 'value two'

print my_first_dictionary

my_second_dictionary = {'zero': 1, 'one': 2, 'two': 3}

print my_second_dictionary

{'key1': 'value two', 'key0': 'value one'}
{'zero': 1, 'two': 3, 'one': 2}


Notice that our dictionary has been printed with it's key-value pairs separated by a colon i.e. ```{key: value}``` - which is one of the input formats we used. In the ```my_first_dictionary``` variable ```'key0'``` maps to the value ```'value one'``` whilst in the ```my_second_dictionary``` variable ```'zero'``` maps to the value 1. Note that the key and value do not have to be the same type of object. We placed the key ```'zero'``` (type ```str```) into our dictionary with the value 1 (an ```int```).

One point to recognise is that the order of the dictionary is not fixed (unlike a list). Look at how ```my_second_dictionary``` was created compared to the order when it was printed out. Whilst you can think of a list as an ordered sequence (the zeroth element is always the zeroth element) dictionaries don't obey that paradigm. You can put key-value pairs into a dictionary and they can come out in any order - but always key-value intact. You can think of a dictionary as more like a bag you can't see into, but you know what the things in the bag are called and so you can get them out. A bit like Hermione Granger's [magic bag](http://harrypotter.wikia.com/wiki/Hermione_Granger%27s_beaded_handbag).

Instead of pulling things out of the dictionary by position we pull them out by key.

In [9]:
print  my_first_dictionary['key0']

value one


One obvious consequence of the fact that dictionaries are indexed by keys is that the keys in a dictionary have to be unique. If you reuse a key in a dictionary then the old value in the key-value pair is forgotten and the new value used instead. This is a recipe for trouble (since it's hard enough to keep track of variables) so it's best avoided. See the example below. 

In [10]:
print my_second_dictionary['zero']

my_second_dictionary['zero'] = 'different value'

print my_second_dictionary['zero']

1
different value


Also the keys have to be **immutable** objects i.e. they cannot be changed. For this reason lists cannot function as dictionary keys since lists can be changed by e.g. ```append()``` or ```pop()```.

If a key isn't in a dictionary you get an error called an exception and specifically a ```KeyError``` - again it's helpful to recognise that this specific error message is telling you something meaningful about what's gone wrong. ```KeyError``` suggests a problem with the key you're indexing the dictionary with.

In [39]:
print my_second_dictionary['five']

KeyError: 'five'

Using the ```len``` function on a dictionary returns the number of key-value pairs and you can also check whether a **```key```** is in the dictionary using the ```in``` keyword. This returns a boolean value (i.e. True or False). If you want to see the values you can use the ```values()``` method on the dictionary.

In [11]:
print len(my_second_dictionary)
print 'zero' in my_second_dictionary # is 'zero' in the dictionary?

vals = my_second_dictionary.values() # assign just the values to a variable
print 3 in vals
print vals # vals is a list object

3
True
True
['different value', 3, 2]


Just like the ```values()``` method the ```keys()``` method returns the list of keys. These will be in an arbitrary order but you can sort them with the ```sorted``` function.

In [1]:
new_dict = {'A': 1, 'B': 2, 'c': 3, 'd': 4}
print new_dict.keys()
print sorted(new_dict.keys())

['A', 'c', 'B', 'd']
['A', 'B', 'c', 'd']


## Putting it together

A date of the form 8-MAR-85 includes the name of the month, which could be translated to a number. Write a script that will translate from the name of a month to the number using a ```dict```.

Hint: In your script create a ```dict``` suitable for decoding month names to numbers (e.g. MAR would be 3). Use the string ```split()``` method to create a ```list``` from your ```string``` and use the ```dict``` to look up the number for the month in the input data using list indexing. Finally use string formatting to print an informative message relating the month name and number.

##Putting it altogether 2

Another exercise here

## Summary

Dictionaries are the python datatype for holding key-value pairs. These can be very useful in a range of problems and we'll see that specifically in the homework for this week. You can initialise a dictionary with either the ```dict``` function or with curly brackets (e.g. ```dct = {}```). The key-value pairs in dictionaries are separated by a colon (:). Unlike lists which are indexed by integers dictionaries are indexed by key. The index syntax is the same, you simply enter the key in square brackets. Indexing a dictionary gives you back the value associated with that key. The ```keys()``` and ```values()``` functions give you back the keys and values of a dictionary respectively and you can use the ```in``` keyword to test for the presence of a specific key in a dictionary.

# Tuples

Tuples are similar to lists with an important difference. Firstly we've seen above how elements in a list can be changed. This is **not** true of tuples - tuples are **immutable**. Otherwise tuples can, like lists, contain a mixture of object types and they can be sorted and indexed just like lists. Tuples are essentially a comma separated group of items. We can create a new tuple in a couple of ways. Firstly we can simply set a variable as a comma-separated list of items in parentheses. Secondly we can use the ```tuple()``` function.

In [12]:
tup = (1,2,'a','v')
print tup

tup2 = tuple() # initialise an empty tuple
print tup2

(1, 2, 'a', 'v')
()


One important point to note is that if you want to set up a tuple with just one element you have to type a comma after the single element - otherwise python treats that element as a single valued variable not a tuple. So note the trailing comma below.

In [None]:
single_tup = (12,)

##The ```zip``` function

Sometimes we have two list objects which could quite naturally form key-value pairs for a ```dictionary```. One example might be separate lists of month names and month numbers as seen in the exercise for the 'Dictionary' section. In that exercise you probably, rather laboriously typed the key-value pairs for the ```dict``` mapping month names to numbers. If we had two lists it would be really useful if we could join corresponding values in each list and then generate the key value pairs. One python function that's really handy for exactly this operation is [```zip()```](https://docs.python.org/2/library/functions.html#zip). The documentation for the ```zip()``` function says 
>This function returns a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. The returned list is truncated in length to the length of the shortest argument sequence. 

From this we know that ```zip()``` gives you back a ```list``` of ```tuples``` and that each tuple contains corresponding members from ```iterable``` arguments that are given to the function. What this means in vaguely plain English is that if you give ```zip()``` two lists then it will give you back a single list of tuples each of which contains corresponding members of your two lists. As ever an example will make this easier to understand.

In [1]:
lst1 = ['a', 'b', 'c', 'd']
lst2 = [1, 2, 3, 4]

zip_lst = zip(lst1, lst2)
print zip_lst

[('a', 1), ('b', 2), ('c', 3), ('d', 4)]


The first ```tuple``` in our ```list``` of ```tuples``` (```zip_lst```) contains the first element from each of the input lists, the second tuple contains the second elements and so on. So ```zip()``` has 'zipped up' the two input lists. You can do the same with more than two lists as well. If the input lists are different lengths ```zip()``` will only go as far as it can. So let's get back to the point and see a quick way of creating a ```dict``` object.

In [2]:
lst1 = ['a', 'b', 'c', 'd']
lst2 = [1, 2, 3, 4]

dct = dict(zip(lst1, lst2))
print dct

{'a': 1, 'c': 3, 'b': 2, 'd': 4}


Fantastic!

##Putting it together

The following text shows the triplet codons and the amino acids they code for. Paste these into two lists and use the ```zip()``` function to create a ```dict``` that will translate from DNA to AA. Use this to find the AA for the triplets TCC, ATG, ATC, CTC and GAG. Print these out.

##Summary

In this section you learned about ```tuples```. These are similar to lists but the contents of a ```tuple``` are fixed (unlike list contents). Tuples are enclosed by parentheses and the items in a ```tuple``` are separated by commas. The ```zip()``` function is a handy function that will take two (or more) lists and create ```tuples``` from the corresponding elements of each list. This can be a very quick way of creating a dictionary.

##Homework

This weeks homework is similar to last weeks. We again have the sequence for human PPARG in FASTA format. As you did last week you'll need to create a string variable from this. We also have the DNA triplets and the amino acids they code for. You can paste these into two lists and create a translation dictionary from them that will allow you to look up an amino acid given the DNA triplet that codes for that amino acid.

Using these data write a script that will look for the start codon and one of the stop codons ('TAG'). Print an informative message with the position of these two codons. Using these positions extract the proposed actual coding sequence (be careful with slicing positions here). Use the modulo operator to check whether this proposed coding sequence is in frame i.e. is it formed of 3 base codons only. 

Print out the first 3 and last 3 codons. Finally use your newly created translation dictionary to look up the first and last amino acids. Again be careful with the indices for slicing.

The output from your script shoould be something like:

```The coding sequence begins at xxx and ends at xxx.```

```The first 3 codons are xxxxxxxxx.```

```The last 3 codons are xxxxxxxxx.```

```The first amino acid is x.```

```The last amino acid is x.```