In [1]:
import jupman
jupman.init()

# Practical 3

In this practical we will work with lists and tuples.

## Slides

The slides of the introduction can be found here: [Intro](docs/Practical3.pdf)

## Lists

Python lists are **ordered** collections of (homogeneous) objects, but they can hold also non-homogeneous data. List are **mutable objects**. Elements of the collection are specified within two square brackets **[]** and are comma separated.

We can use the function print to print the content of lists. Some examples of list definitions follow:

In [2]:
my_first_list = [1,2,3] 
print("first:" , my_first_list)

my_second_list = [1,2,3,1,3] #elements can appear several times
print("second: ", my_second_list)

fruits = ["apple", "pear", "peach", "strawberry", "cherry"] #elements can be strings
print("fruits:", fruits)

an_empty_list = []
print("empty:" , an_empty_list)

another_empty_list = list()
print("another empty:", another_empty_list)

a_list_containing_other_lists = [[1,2], [3,4,5,6]] #elements can be other lists
print("list of lists:", a_list_containing_other_lists)

my_final_example = [my_first_list, a_list_containing_other_lists]
print("a list of lists of lists:", my_final_example)



first: [1, 2, 3]
second:  [1, 2, 3, 1, 3]
fruits: ['apple', 'pear', 'peach', 'strawberry', 'cherry']
empty: []
another empty: []
list of lists: [[1, 2], [3, 4, 5, 6]]
a list of lists of lists: [[1, 2, 3], [[1, 2], [3, 4, 5, 6]]]


### Operators for lists

Python provides several operators to handle lists. The following behave like on strings (**remember that as in strings, the first position is 0!**):

![](img/pract3/operators1.png)

While this requires that the whole tested obj is present in the list

![](img/pract3/operators2.png)

and 

![](img/pract3/operators3.png)

can also change the corresponding value of the list (**lists are mutable objects**).

Some examples follow.

In [78]:
A = [1, 2, 3 ]
B = [1, 2, 3, 1, 2]

print("A is a ", type(A))

print(A, " has length: ", len(A))
print("A[0]: ", A[0], " A[1]:", A[1], " A[-1]:", A[-1])

print(B, " has length: ", len(B))
print("Is A equal to B?", A == B)

C = A + [1, 4]
print(C)
print("Is C equal to B?", B == C)
D = [1, 2, 3]*8 
print(D)

E = D[12:18] #slicing
print(E)
print("Is A*2 equal to E?", A*2 == E)


A is a  <class 'list'>
[1, 2, 3]  has length:  3
A[0]:  1  A[1]: 2  A[-1]: 3
[1, 2, 3, 1, 2]  has length:  5
Is A equal to B? False
[1, 2, 3, 1, 4]
Is C equal to B? False
[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]
[1, 2, 3, 1, 2, 3]
Is A*2 equal to E? True


In [4]:
A = [1, 2, 3, 4, 5, 6]
B = [1, 3, 5]
print("A:", A)
print("B:", B)

print("Is B in A?", B in A)
A[5] = [1,3,5] #we can add elements
print(A)
print("A has length:", len(A))
print("Is now B in A?", B in A)

A: [1, 2, 3, 4, 5, 6]
B: [1, 3, 5]
Is B in A? False
[1, 2, 3, 4, 5, [1, 3, 5]]
A has length: 6
Is now B in A? True


When slicing do not exceed the list boundaries (or you will receive a ```list index out of range``` error:

In [5]:
A = [1, 2, 3, 4, 5, 6]
print("A has length:", len(A))

print("First element:", A[0])
print("7th-element: ", A[6])

A has length: 6
First element: 1


IndexError: list index out of range

**Example**:
Consider the matrix $M = \begin{matrix}1 & 2 & 3\\ 1 & 2 & 1\\ 1 & 1 & 3\end{matrix}$ and the vector $v=[10, 5, 10]^T$. 
What is the matrix-vector product $M*v$? $$\begin{matrix}1 & 2 & 3\\ 1 & 2 & 1\\ 1 & 1 & 3\end{matrix}*[10,5,10]^T = [50, 30, 45]^T$$

In [6]:
M = [[1, 2, 3], [1, 2, 1], [1, 1, 3]]
v = [10, 5, 10]
prod = [0, 0 ,0] #at the beginning the product is the null vector

prod[0]=M[0][0]*v[0] + M[0][1]*v[1] + M[0][2]*v[2]
prod[1]=M[1][0]*v[0] + M[1][1]*v[1] + M[1][2]*v[2]
prod[2]=M[2][0]*v[0] + M[2][1]*v[1] + M[2][2]*v[2]

print("M: ", M)
print("v: ", v)
print("M*v: ", prod)



M:  [[1, 2, 3], [1, 2, 1], [1, 1, 3]]
v:  [10, 5, 10]
M*v:  [50, 30, 45]


### Methods of the class list

The class list has some methods to operate on it. Recall from the lecture the following methods:

![](img/pract3/list_methods.png)

Note that lists are **mutable objects** and therefore virtually all the previous methods (except *count*) do not have an output value, but they **modify** the list. Some usage examples follow:

In [36]:
#A numeric list
A = [1, 2, 3]
print(A)
A.append(72) #appends one and only one object
print(A)
A.extend([1, 5, 124, 99]) #adds all these objects, one after the other.
print(A)
A.reverse()
print(A)
A.sort()
print(A)
print("Min value: ", A[0]) # In this simple case, could have used min(A)
print("Max value: ", A[-1]) #In this simple case, could have used max(A)
print("Number 1 appears:", A.count(1), " times")
print("While number 837: ", A.count(837))

print("\nDone with numbers, let's go strings...\n")
#A string list
fruits = ["apple", "banana", "pineapple", "cherry","pear", "almond", "orange"]
#Let's get a reverse lexicographic order:
print(fruits)
fruits.sort()
fruits.reverse()
print(fruits)
fruits.remove("banana")
print(fruits)
fruits.insert(5, "wild apple") #put wild apple after apple.
print(fruits)

[1, 2, 3]
[1, 2, 3, 72]
[1, 2, 3, 72, 1, 5, 124, 99]
[99, 124, 5, 1, 72, 3, 2, 1]
[1, 1, 2, 3, 5, 72, 99, 124]
Min value:  1
Max value:  124
Number 1 appears: 2  times
While number 837:  0

Done with numbers, let's go strings...

['apple', 'banana', 'pineapple', 'cherry', 'pear', 'almond', 'orange']
['pineapple', 'pear', 'orange', 'cherry', 'banana', 'apple', 'almond']
['pineapple', 'pear', 'orange', 'cherry', 'apple', 'almond']
['pineapple', 'pear', 'orange', 'cherry', 'apple', 'wild apple', 'almond']


<div class="alert alert-warning">
**Some things to remember** 

1. append and extend work quite differently:

In [38]:
A = [1, 2, 3]

A.extend([4, 5])
print(A)
B = [1, 2, 3]
B.append([4,5])
print(B)

[1, 2, 3, 4, 5]
[1, 2, 3, [4, 5]]


2. To remove an object it must exist:

In [39]:
A = [1,2,3]
A.remove(2)
print(A)
A.remove(7)

[1, 3]


ValueError: list.remove(x): x not in list

3. To sort a list, its elements must be sortable (i.e. homogeneous)!

In [42]:
A = [4,3, 1,7, 2]
print(A)
A.sort()
print(A)
A.append("banana")
A.sort()
print(A)


[4, 3, 1, 7, 2]
[1, 2, 3, 4, 7]


TypeError: unorderable types: str() < int()

</div>

<div class="alert alert-info">

**Important to remember:** 

Lists are **mutable objects** and this has some consequences!
Since lists are mutable objects, they hold references to objects rather than objects.

Take a look at the following examples:

In [61]:
l1 = [1, 2]
l2 = [4, 3]
LL = [l1, l2]

print("LL:", LL)
l1.append(7)

print("l1:", l1)
print("LL now: ", LL)

LL[0][1] = -1
print("LL now:" , LL)
print("l1 now", l1)
#but the list can point also to a different object, without affecting the original list.
LL[0] = 100
print("LL now:", LL)
print("l1 now", l1)


LL: [[1, 2], [4, 3]]
l1: [1, 2, 7]
LL now:  [[1, 2, 7], [4, 3]]
LL now: [[1, -1, 7], [4, 3]]
l1 now [1, -1, 7]
LL now: [100, [4, 3]]
l1 now [1, -1, 7]


**Important for making copies**:

In [66]:
A = ["hi", "there"]
B = A 
print("A:", A)
print("B:", B)
A.extend(["from", "python"])
print("A now: ", A)
print("B now: ", B)

print("\n---- copy example -------")
#Let's make a distinct copy of A.
C = A[:] #all the elements of A have been copied in C
print("C:", C)
A[3] = "java"
print("A now:", A)
print("C now:", C)

print("\n---- be careful though -------")
#Watch out though that...
D = [A, A]
E = D[:]
print("D:", D)
print("E:", E)

D[0][0] = "hello"
print("D now:", D)
print("E now", E)

A: ['hi', 'there']
B: ['hi', 'there']
A now:  ['hi', 'there', 'from', 'python']
B now:  ['hi', 'there', 'from', 'python']

---- copy example -------
C: ['hi', 'there', 'from', 'python']
A now: ['hi', 'there', 'from', 'java']
C now: ['hi', 'there', 'from', 'python']

---- be careful though -------
D: [['hi', 'there', 'from', 'java'], ['hi', 'there', 'from', 'java']]
E: [['hi', 'there', 'from', 'java'], ['hi', 'there', 'from', 'java']]
D now: [['hello', 'there', 'from', 'java'], ['hello', 'there', 'from', 'java']]
E now [['hello', 'there', 'from', 'java'], ['hello', 'there', 'from', 'java']]


**Equality and identity**

In [69]:
A = [1, 2, 3]
B = A
C = [1, 2, 3]
print("Is A equal to B?", A == B)
print("Is A actually B?", A is B)
print("Is A equal to C?", A == C)
print("Is A actually C?", A is C)
#in fact:
print("A's id:", id(A))
print("B's id:", id(B))
print("C's id:", id(C))


Is A equal to B? True
Is A actually B? True
Is A equal to C? True
Is A actually C? False
A's id: 140380958998088
B's id: 140380958998088
C's id: 140380958970312





</div>

### From strings to lists, the ```split``` method

Strings have a method *split* that can literally split the string at specific characters. 

**Example** 
Recall the protein seen in the previous practical: 

chain_a = """SSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKM
FCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVV
RRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFR
HSVVVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILT
IITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKG
EPHHELPPGSTKRALPNNT"""

how can we split it into several lines?

In [54]:
chain_a = """SSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKM
FCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVV
RRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFR
HSVVVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILT
IITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKG
EPHHELPPGSTKRALPNNT"""

lines = chain_a.split('\n')
print("Original sequence:")
print( chain_a, "\n") #some spacing to keep things clear
print("line by line:")
print("1st line:" ,lines[0])
print("2nd line:" ,lines[1])
print("3rd line:" ,lines[2])
print("4th line:" ,lines[3])
print("5th line:" ,lines[4])
print("6th line:" ,lines[5])

print("Split the 1st line in correspondence to FRL:\n",lines[0].split("FRL"))

Original sequence:
SSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKM
FCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVV
RRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFR
HSVVVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILT
IITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKG
EPHHELPPGSTKRALPNNT 

line by line:
1st line: SSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKM
2nd line: FCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVV
3rd line: RRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFR
4th line: HSVVVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILT
5th line: IITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKG
6th line: EPHHELPPGSTKRALPNNT
Split the 1st line in correspondence to FRL:
 ['SSSVPSQKTYQGSYG', 'GFLHSGTAKSVTCTYSPALNKM']


**Note that in the last instruction, the substring FRL is disappeared (as happened to the newline)**. 

## Tuples



## Exercises

1. The variant calling format ([VCF](http://www.internationalgenome.org/wiki/Analysis/vcf4.0/)) is a format to represent structural variants of genomes. Every piece of information is separated by a tab (\\t in python). Each line of this format represents a variant. The first 5 fields of this format report the chromosome (chr), the position (pos), the name of the variant (name), the reference allele (REF) and the alternative allele (ALT). Assuming to have a variable VCF defined as: 

VCF = """MDC000001.124\\t7112\\tFB_AFFY_0000024\\tG\\tA
MDC000002.328\\t941\\tFB_AFFY_0000144\\tC\\tT
MDC000004.272\\t2015\\tFB_AFFY_0000222\\tG\\tA"""

Store these three variants in a list. Represent every variant as a list keeping the 5 fields separate. Finally, print each variant changing its format:  "name|chr|pos|REF/ALT".

<div class="tggle" onclick="toggleVisibility('ex1');">Show/Hide Solution</div>
<div id="ex1" style="display:none;">

In [77]:
VCF="""MDC000001.124\t7112\tFB_AFFY_0000024\tG\tA
MDC000002.328\t941\tFB_AFFY_0000144\tC\tT
MDC000004.272\t2015\tFB_AFFY_0000222\tG\tA"""

variants = VCF.split('\n')

variants[0] = variants[0].split('\t')
variants[1] = variants[1].split('\t')
variants[2] = variants[2].split('\t')

print(variants, "\n")

info = variants[0]
print(info[2] + "|" + info[0] + "|" + info[1] + "|" + info[3] +"/" + info[4])
info = variants[1]
print(info[2] + "|" + info[0] + "|" + info[1] + "|" + info[3] +"/" + info[4])
info = variants[2]
print(info[2] + "|" + info[0] + "|" + info[1] + "|" + info[3] +"/" + info[4])

[['MDC000001.124', '7112', 'FB_AFFY_0000024', 'G', 'A'], ['MDC000002.328', '941', 'FB_AFFY_0000144', 'C', 'T'], ['MDC000004.272', '2015', 'FB_AFFY_0000222', 'G', 'A']] 

FB_AFFY_0000024|MDC000001.124|7112|G/A
FB_AFFY_0000144|MDC000002.328|941|C/T
FB_AFFY_0000222|MDC000004.272|2015|G/A


</div>