# Lecture 02: List, Tuples and Dictionaries¶

## Chapters
Chapter 2: List Data: Working with Ordered Data <br>
Chapter 3: Structured Data: Working with Structured Data <br>
Author: Jurre Hageman

## Working with lists

List are mutable and ordered collections<br>
They have the following properties:



- Lists are sequences of objects

- They are in other languages also called sequences, vectors or arrays

- They contain ordered series of objects: characters, strings, numbers, or any other Python object

- They are quite similar to strings in many ways but with one important exception: they are mutable

- Lists are created using square brackets `[]`, with elements separated by commas `,`

Here is an example of a Python list:

In [1]:
molecules = ['DNA', 'RNA', 'Protein']
print(type(molecules))

<class 'list'>


As you can see, the data type is a list.

In a list, you can combine any data type:

In [8]:
a_lot_of_types = [True, 1, 1.0, "string", ('strings', 'in', 'tuple'), ['strings', 'in', 'list']]

for i in a_lot_of_types: # walk through all items using a for loop
    print(type(i))

<class 'bool'>
<class 'int'>
<class 'float'>
<class 'str'>
<class 'tuple'>
<class 'list'>


They have an order. Like any other sequence in Python, lists have a 0-based index. You can select items by **indexing**:

In [9]:
print(molecules[0]) # first item
print(molecules[2]) # second item

DNA
Protein


List are dynamic: you can **add**, **delete** and **change** items:

In [10]:
names = ['Jan']
print(1, names)

names.append('Piet') # append item to the end
print(2, names)

names = names + ['Truus', 'Lieske', 'Kim', 'Janneke'] # concatenates two lists
print(3, names)

names.pop(0) # delete first item
print(4, names)

names.remove('Truus') # remove item
print(5, names)

del names[2] # remove by index
print(6, names)

names[0] = 'Pieter' # replace item
print(7, names)

1 ['Jan']
2 ['Jan', 'Piet']
3 ['Jan', 'Piet', 'Truus', 'Lieske', 'Kim', 'Janneke']
4 ['Piet', 'Truus', 'Lieske', 'Kim', 'Janneke']
5 ['Piet', 'Lieske', 'Kim', 'Janneke']
6 ['Piet', 'Lieske', 'Janneke']
7 ['Pieter', 'Lieske', 'Janneke']


Like strings (or any other collection), you can slice a list. <br><br>
Slicing works as slicing on strings: <br>
`object[start:stop:step]`

In [16]:
names = names + ['Mies', 'Roos', 'Gert']

print(names)
print(names[0:2])
print(names[::-1]) # slice that reverses the order

# catch slice in variable:
new_list = names[4:] # note that a slice ALWAYS returns the same data type
print(new_list) 

['Pieter', 'Lieske', 'Janneke', 'Mies', 'Roos', 'Gert', 'Mies', 'Roos', 'Gert']
['Pieter', 'Lieske']
['Gert', 'Roos', 'Mies', 'Gert', 'Roos', 'Mies', 'Janneke', 'Lieske', 'Pieter']
['Roos', 'Gert', 'Mies', 'Roos', 'Gert']


list.append() is a list specific method. To find all methods on lists type:

In [12]:
print(dir(list))

['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']


You can make nested lists (list of lists) which represents a table:

In [13]:
table = [[1, 2, 3],[4, 5, 6],[7, 8, 9]]

To show that this actually represents a table:

In [14]:
for row in table:
    print(row)

[1, 2, 3]
[4, 5, 6]
[7, 8, 9]


To get the item from the second column and the second row:

In [15]:
print(table[1][1])

5


Some usefull list methods are count and index:

In [17]:
print(1, names)
names.append('Lieske')

print(2, names)
print(3, names.count('Lieske'))
print(4, names.index('Janneke'))

1 ['Pieter', 'Lieske', 'Janneke', 'Mies', 'Roos', 'Gert', 'Mies', 'Roos', 'Gert']
2 ['Pieter', 'Lieske', 'Janneke', 'Mies', 'Roos', 'Gert', 'Mies', 'Roos', 'Gert', 'Lieske']
3 2
4 2


And there are some usefull functions that work on lists (and any other iterable):

In [18]:
numbers = [12, 14, 7, 9]

print(1, min(numbers))
print(2, max(numbers))
print(3, len(numbers))
print(4, sum(numbers))


1 7
2 14
3 4
4 42


### Warning with making a copy of a list!

The following code will NOT make a copy of a list:

In [19]:
my_list = ['DNA', 'RNA']
print(1, my_list)

my_copy = my_list # This is not a copy but a reference to the same object in memory
print(2, my_copy)

my_copy.append('Protein')

print(3, my_list)
print(4, my_copy)
print(5, my_list == my_copy) # the content of the list is the same
print(6, my_list is my_copy) # but it is also the same object in memory!

1 ['DNA', 'RNA']
2 ['DNA', 'RNA']
3 ['DNA', 'RNA', 'Protein']
4 ['DNA', 'RNA', 'Protein']
5 True
6 True


To make a real copy either slice the list or use the copy method

In [20]:
my_real_copy = my_list[:] # slices the the objects from my_list
my_second_real_copy = my_list.copy()

print(1, my_list == my_real_copy) # the content of the list is the same
print(2, my_list is my_real_copy) # but now it is NOT the same object in memory!

my_real_copy.append('Lipid')
my_second_real_copy.append('Carbohydrate')

print(3, my_list)
print(4, my_real_copy)
print(5, my_second_real_copy)

1 True
2 False
3 ['DNA', 'RNA', 'Protein']
4 ['DNA', 'RNA', 'Protein', 'Lipid']
5 ['DNA', 'RNA', 'Protein', 'Carbohydrate']


### From list to string

You can stringify a list using the ''.join() method (which is technically a string method):

In [21]:
print(1, my_list)

stringified_list = ''.join(my_list)

print(2, stringified_list)

stringified_list_commas = ','.join(my_list) # use , as a seperator

print(3, stringified_list_commas)

1 ['DNA', 'RNA', 'Protein']
2 DNARNAProtein
3 DNA,RNA,Protein


### String concatenation versus ''.join(list) 

You probably already did some list concatenation. While this is fine for a small amount of repetitions, string concatenation is a bad thing to do when you need a massive number of concatenations. The reason for this is that strings are immutable. Python will make a new variable each time that you concatenate strings. Thus this approach is very heavy on memory and perfomance is much slower:

In [22]:
import time

start = time.time()
my_dna = 'A'

for i in range(100000):
    my_dna += 'A' #do not do this
    
end = time.time()

print('elapsed time:', (end - start) * 1000 , 'ms')

elapsed time: 28.32317352294922 ms


Now using the list approach:

In [23]:
import time

start = time.time()
my_dna = []

for i in range(100000):
    my_dna.append('A')

''.join(my_dna) #this line stringifies the list 

end = time.time()
print('elapsed time:', (end - start) * 1000 , 'ms')

elapsed time: 12.303829193115234 ms


As you can see, the second approach is considerably faster.

## Tuples

Tuples are like lists but immutable. Like lists, tuples are ordered collections of objects. As they are immutable, tuples have less methods:

In [24]:
print(dir(tuple))

['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'count', 'index']


Note that the only "normal" methods are count and index. Tuples are created using the () notation.

In [25]:
my_tuple = ('DNA', 'Protein', 'RNA')

Indexing en slicing is all fine: 

In [26]:
print(my_tuple[1])
print(my_tuple[::-1]) # Note that a slice will always return the same datatype

Protein
('RNA', 'Protein', 'DNA')


However, modification of a Tuple yields an error:

In [27]:
my_tuple[1] = 'Carbohydrate'

TypeError: 'tuple' object does not support item assignment

The use of tuples might seem obscure at first but later you will discover that they are very handy. The fact that they are immutable make them ideal as argument 'lists' for functions, to write data to databases and to use as keys in dictionaries (see later). The fact that tuples are immutable garuantees that your data will arrive safe and unmodified! 

## Dictionaries

Dictionaries are tables of key/ value pairs. Unlike lists they hold associative data. It is only possible to get from a key to a value. Not the other way around. A dictionary is coded using the {} characters:

In [28]:
comp_bases = {'A': 'T', 'C':'G'}

You can access a value by providing the key:

In [29]:
print(comp_bases['A'])

T


To add items:

In [30]:
comp_bases['T'] = 'A' # this way
comp_bases.update({'G':'C'}) # or this way

print(comp_bases)

{'A': 'T', 'C': 'G', 'T': 'A', 'G': 'C'}


To delete items:

In [31]:
del comp_bases['A']
del comp_bases['T']

print(comp_bases)

{'C': 'G', 'G': 'C'}


Asking for a key that does not excists raises an error:

In [32]:
print(comp_bases['Q'])

KeyError: 'Q'

You can prevent this using the `dict.get()` method: 

In [33]:
print(comp_bases.get('Q')) # returns None if a key is not found and prevents an error

None


Keys need to be unique. Importantly, note that dictionaries lack order. It is not possible to have a normal dictionary ordered. Using the sorted function, you can get a sorted copy

In [34]:
codon_table = {'ttt': 'F', 'ttc': 'F', 'tta': 'L', 'ttg': 'L', 'tct': 'S', 'tcc': 'S', 'tca': 'S', 'tcg': 'S', 'tat': 'Y', 'tac': 'Y', 'taa': '*', 'tag': '*', 'tgt': 'C', 'tgc': 'C', 'tga': '*', 'tgg': 'W', 'ctt': 'L', 'ctc': 'L', 'cta': 'L', 'ctg': 'L', 'cct': 'P', 'ccc': 'P', 'cca': 'P', 'ccg': 'P', 'cat': 'H', 'cac': 'H', 'caa': 'Q', 'cag': 'Q', 'cgt': 'R', 'cgc': 'R', 'cga': 'R', 'cgg': 'R', 'att': 'I', 'atc': 'I', 'ata': 'I', 'atg': 'M', 'act': 'T', 'acc': 'T', 'aca': 'T', 'acg': 'T', 'aat': 'N', 'aac': 'N', 'aaa': 'K', 'aag': 'K', 'agt': 'S', 'agc': 'S', 'aga': 'R', 'agg': 'R', 'gtt': 'V', 'gtc': 'V', 'gta': 'V', 'gtg': 'V', 'gct': 'A', 'gcc': 'A', 'gca': 'A', 'gcg': 'A', 'gat': 'D', 'gac': 'D', 'gaa': 'E', 'gag': 'E', 'ggt': 'G', 'ggc': 'G', 'gga': 'G', 'ggg': 'G'}

for codon in sorted(codon_table):
    print(codon, ':', codon_table[codon], end = ', ')

aaa : K, aac : N, aag : K, aat : N, aca : T, acc : T, acg : T, act : T, aga : R, agc : S, agg : R, agt : S, ata : I, atc : I, atg : M, att : I, caa : Q, cac : H, cag : Q, cat : H, cca : P, ccc : P, ccg : P, cct : P, cga : R, cgc : R, cgg : R, cgt : R, cta : L, ctc : L, ctg : L, ctt : L, gaa : E, gac : D, gag : E, gat : D, gca : A, gcc : A, gcg : A, gct : A, gga : G, ggc : G, ggg : G, ggt : G, gta : V, gtc : V, gtg : V, gtt : V, taa : *, tac : Y, tag : *, tat : Y, tca : S, tcc : S, tcg : S, tct : S, tga : *, tgc : C, tgg : W, tgt : C, tta : L, ttc : F, ttg : L, ttt : F, 

## Excercise:

Now we come to the final excercise: Code a program that will catch a DNA sequence from the user. You can get user input by:

In [None]:
dna_seq = input("please provide a seq: ")

Now write code to print the following to the screen:

- The original sequence in upper case
- The reverse string in upper case
- The complement string in upper case
- The reverse-complement string in upper case.

Make sure you use a stringified list instead of string concatenation.<br>
Also make use of a dictionary for complementary bases.

## END