In [None]:
import jupman
jupman.init()

# Practical 5

In this practical we will add some information on loops and introduce dictionaries.

## Slides

The slides of the introduction can be found here: [Intro](docs/Practical5.pdf)

## More on loops
As seen in the previous practical and in the lecture, there are three different ways of execution flow:

![](img/pract4/structured_programming.png)

We have already seen the *if*, *for* and *while* loops and their variants. The code block of each of these statements is defined by the *indentation*. 


### Ternary operator
In some cases it is handy to be able to initialize a variable depending on the value of another one.

**Example**:
The discount rate applied to a purchase depends on the amount of the sale. Create a variable *discount* setting its value to 0 if the variable *amount* is lower than 100 euros, to 10% if it is higher.

In [1]:
amount = 110
discount = 0

if(amount >100):
    discount = 0.1
else:
    discount = 0 # not necessary

print("Total amount:", amount, "discount:", discount)


Total amount: 110 discount: 0.1


The previous code can be written more coincisely as:

In [2]:
amount = 110
discount = 0.1 if amount > 100 else 0
print("Total amount:", amount, "discount:", discount)

Total amount: 110 discount: 0.1


The basic syntax of the ternary operator is:

```
variable = value if condition else other_value
```

meaning that the *variable* is initialized to *value* if the *condition* holds, otherwise to *other_value*.

Python also allows in line operations separated by a ";" 

In [4]:
a = 10; b = a + 1; c = b +2
print(a,b,c)

10 11 13


<div class="alert alert-warning">

**Note:** Although the ternary operator and in line operations are sometimes useful and less verbose than the explicit definition, they are considered "non-pythonic" and advised against.

</div>

### Break and continue

Sometimes it is useful to skip an entire iteration of a loop or end the loop before its supposed end.
This can be achieved with two different statements:  **continue** and **break**.

#### Continue statement
Within a **for** or **while** loop, **continue** makes the interpreter skip that iteration and move to the next. 

**Example:**
Print all the odd numbers from 1 to 20.

In [9]:
#Two equivalent ways
#1. Testing remainder == 1
for i in range(21):
    if(i % 2 == 1):
        print(i, end = " ")

print("")

#2. Skipping if remainder == 0 in for
for i in range(21):
    if(i % 2 == 0):
        continue
    print(i, end = " ")

1 3 5 7 9 11 13 15 17 19 
1 3 5 7 9 11 13 15 17 19 

Continue can be used also within while loops but we need to be careful to update the value of the variable before reaching the continue statement or we will get stuck in never-ending loops.
**Example:**
Print all the odd numbers from 1 to 20.

In [None]:
#Wrong code:
i = 0
while (i < 21):
    if(i % 2 == 0):
        continue
    print(i, end = " ")
    i = i + 1 # NEVER EXECUTED IF i % 2 == 0!!!!

a possible correct solution using while:

In [10]:
i = -1
while( i< 21):
    i = i + 1        #the variable is updated no matter what
    if(i % 2 == 0 ):
        continue
    print(i, end = " ")

1 3 5 7 9 11 13 15 17 19 21 

#### Break statement
Within a **for** or **while** loop, **break** makes the interpreter exit the loop and continue with the sequential execution. Sometimes it is useful to get out of the loop if to complete our task we do not need to get to the end of the loop.

**Example:**
Given the following list of integers [1,5,6,4,7,1,2,3,7] print them until a number already printed is found. 

In [13]:
L = [1,5,6,4,7,1,2,3,7]
found = []
for i in L:
    if(i in found):
        break
        
    found.append(i)
    print(i, end = " ")

1 5 6 4 7 

**Example:**
Pick a random number from 1 and 50 and count how many times it takes to randomly choose number 27. Limit the number of random picks to 20 (i.e. if more than 40 picks have been done and 27 has not been found exit anyway with a message).

In [23]:
import random

iterations = 1
while(iterations < 40):
    pick = random.randint(1,50)
    if(pick == 27):
        break
    iterations += 1

if(iterations == 40):
    print("Sorry number 27 was never found!")
else:
    print("27 found in ", iterations, "iterations")

Sorry number 27 was never found!


An alternative way without using the break statement makes use of a *flag* variable (that when changes value will make the loop end):

In [38]:
import random
found = False # This is called flag
iterations = 1
while(iterations < 40 and found == False): #the flag is used to exit 
    pick = random.randint(1,50)
    if(pick == 27):
        found = True     #update the flag, will exit at next iteration
    iterations += 1

if(iterations == 40):
    print("Sorry number 27 was never found!")
else:
    print("27 found in ", iterations, "iterations")

Sorry number 27 was never found!


### List comprehension
List comprehension is a quick way of creating a list. The resulting list is normally obtained by applying a function or a method to the elements of another list that **remains unchanged**.

The basic syntax is:

```
new_list = [ some_function (x) for x in start_list]
```
or
```
new_list = [ x.some_method() for x in start_list]
```

List comprehension can also be used to filter elements of a list and produce another list as sublist of the first one (**remember that the original list is not changed**).

In this case the syntax is:

```
new_list = [ some_function (x) for x in start_list if condition]
```
or
```
new_list = [ x.some_method() for x in start_list if condition]
```

where the element x in start_list becomes part of new_list if and only if the condition holds True.

Let's see some examples:

**Example:**
Given a list of strings ["hi", "there", "from", "python"] create a list with the length of the corresponding element (i.e. the one with the same index).

In [39]:
elems = ["hi", "there", "from", "python"]

newList = [len(x) for x in elems]

for i in range(0,len(elems)):
    print(elems[i], " has length ", newList[i])

hi  has length  2
there  has length  5
from  has length  4
python  has length  6


**Example:**
Given a list of strings ["dog", "cat", "rabbit", "guinea pig", "hamster", "canary", "goldfish"] create a list with the elements starting with a "c" or "g".

In [41]:
pets = ["dog", "cat", "rabbit", "guinea pig", "hamster", "canary", "goldfish"]

cg_pets = [x for x in pets if (x.startswith("c") or x.startswith("g"))]

print(pets)
print(cg_pets)

['dog', 'cat', 'rabbit', 'guinea pig', 'hamster', 'canary', 'goldfish']
['cat', 'guinea pig', 'canary', 'goldfish']


## Dictionaries

A **dictionary** is a map between one object, the **key** and another object, the **value**. Dictionaries are **mutable objects** and contain sequences of mappings *key*-->*object* but there is not specific ordering among them. 
Dictionaries are defined using the curly braces **{key1 : value1, key2 : value2}** and **:** to separate keys from values.

Some examples on how to define dictionaries follow:

In [47]:
first_dict = {"one" : 1, "two": 2, "three" : 3, "four" : 4} 
print("First:", first_dict)

empty_dict = dict()
print("Empty:",empty_dict)

second_dict = {1 : "one", 2 : "two", "three" :3 } #BAD IDEA BUT POSSIBLE!!!
print(second_dict)

third_dict = dict(zip(["one","two","three","four"],[1,2,3,4]))
print(third_dict)
print(first_dict == third_dict)

First: {'three': 3, 'one': 1, 'four': 4, 'two': 2}
Empty: {}
{1: 'one', 2: 'two', 'three': 3}
{'three': 3, 'one': 1, 'four': 4, 'two': 2}
True


Note that there is no ordering of the keys, and that the order in which they have been inserted is not preserved. Moreover, keys and values can be dishomogeneous (e.g. keys strings and values integers). 
An interesting case is *third_dict* where the function *zip* followed by *dict* is used to map the keys of the first list into the values present in the second.

Note that keys can be **dishomogeneous**, even though this is a bad idea normally.  The only requirement for the **keys** is that they **must be immutable objects**. Trying to use a mutable object as a key will make the interpreter crash with the error: **unhashable type**. Finally, keys must be unique. We cannot associate more than one value to the same key.

In [49]:
a = (1,2,3) #a,b are tuples: hence immutable
b = (1,3,5)

my_dict = {a : 6, b : 9 }
print(my_dict)

c = [1,2,3] #c,d are lists: hence mutable
d = [1,3,5]

dict2 = {c : 6, d : 9}
print(dict2)

{(1, 3, 5): 9, (1, 2, 3): 6}


TypeError: unhashable type: 'list'

### Functions working on dictionaries

As for the other data types, python provides several operators that can be applied to dictionaries. The following operators are available and they basically work as in lists. The only exception being that the operator **in** checks whether the specified object is present among the **keys**.

![](img/pract5/operators.png)

Some usage examples follow:

In [55]:
myDict = {"one" : 1, "two" : 2, "twentyfive" : 25}

print(myDict)
myDict["ten"] = 10
myDict["twenty"] = 20
print(myDict)
print("The dictionary has ", len(myDict), " elements")
print("The value of \"ten\" is:", myDict["ten"])
print("The value of \"two\" is:", myDict["two"])

print("Is \"twentyfive\" in dictionary?", "twentyfive" in myDict)
print("Is \"seven\" in dictionary?", "seven" in myDict)

{'one': 1, 'twentyfive': 25, 'two': 2}
{'one': 1, 'twentyfive': 25, 'ten': 10, 'two': 2, 'twenty': 20}
The dictionary has  5  elements
The value of "ten" is: 10
The value of "two" is: 2
Is "twentyfive" in dictionary? True
Is "seven" in dictionary? False


### Dictionary methods 

Recall what seen in the lecture, the following methods are available for dictionaries:

![](img/pract5/methods.png)

These methods are new to dictionaries and can be used to loop through the elements in them.

<div class="alert alert-warning">

**ERRATUM:** ```dict.keys()``` returns a ```dict_keys``` object not a list. To cast it to list, we need to call ```list(dict.keys())```.

</div>


**Example**
Given the protein sequence below, store in a dictionary all the aminoacids present and count how many times they appear. Finally print out the stats (e.g. how many amino-acids are present, the most frequent, the least frequent and the frequency of all of them **in alphabetical order**).

```
>sp|P00517|KAPCA_BOVIN cAMP-dependent protein kinase catalytic subunit alpha OS=Bos taurus GN=PRKACA PE=1 SV=3
MGNAAAAKKGSEQESVKEFLAKAKEDFLKKWENPAQNTAHLDQFERIKTLGTGSFGRVML
VKHMETGNHYAMKILDKQKVVKLKQIEHTLNEKRILQAVNFPFLVKLEFSFKDNSNLYMV
MEYVPGGEMFSHLRRIGRFSEPHARFYAAQIVLTFEYLHSLDLIYRDLKPENLLIDQQGY
IQVTDFGFAKRVKGRTWTLCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFF
ADQPIQIYEKIVSGKVRFPSHFSSDLKDLLRNLLQVDLTKRFGNLKNGVNDIKNHKWFAT
TDWIAIYQRKVEAPFIPKFKGPGDTSNFDDYEEEEIRVSINEKCGKEFSEF
```

In [70]:
protein = """MGNAAAAKKGSEQESVKEFLAKAKEDFLKKWENPAQNTAHLDQFERIKTLGTGSFGRVML
VKHMETGNHYAMKILDKQKVVKLKQIEHTLNEKRILQAVNFPFLVKLEFSFKDNSNLYMV
MEYVPGGEMFSHLRRIGRFSEPHARFYAAQIVLTFEYLHSLDLIYRDLKPENLLIDQQGY
IQVTDFGFAKRVKGRTWTLCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFF
ADQPIQIYEKIVSGKVRFPSHFSSDLKDLLRNLLQVDLTKRFGNLKNGVNDIKNHKWFAT
TDWIAIYQRKVEAPFIPKFKGPGDTSNFDDYEEEEIRVSINEKCGKEFSEF"""

protein = protein.replace("\n","")

print(protein)

amino_acids = dict()

for a in protein:
    if( a in amino_acids):
        amino_acids[a] = amino_acids[a] + 1
    else:
        amino_acids[a] = 1


num_aminos = len(amino_acids)

print("The number of amino-acids present is ", num_aminos)
#let's get all aminoacids
#and sort them alphabetically
a_keys = list(amino_acids.keys())

a_keys.sort()

mostF = {"frequency" : 0, "aminoacid" : "-"}
leastF = {"frequency" : num_aminos, "aminoacid" : "-"}

for a in a_keys:
    freq = amino_acids[a]
    if(mostF["frequency"] < freq):
        mostF["frequency"] = freq
        mostF["aminoacid"] = a
        
    if(leastF["frequency"] > freq):
        leastF["frequency"] = freq
        leastF["aminoacid"] = a    
    print(a, " is present", freq, "times")

print("Aminoacid", leastF["aminoacid"], "has the lowest frequency (",leastF["frequency"],")")
print("Aminoacid", mostF["aminoacid"], "has the highest frequency (",mostF["frequency"],")")

MGNAAAAKKGSEQESVKEFLAKAKEDFLKKWENPAQNTAHLDQFERIKTLGTGSFGRVMLVKHMETGNHYAMKILDKQKVVKLKQIEHTLNEKRILQAVNFPFLVKLEFSFKDNSNLYMVMEYVPGGEMFSHLRRIGRFSEPHARFYAAQIVLTFEYLHSLDLIYRDLKPENLLIDQQGYIQVTDFGFAKRVKGRTWTLCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFFADQPIQIYEKIVSGKVRFPSHFSSDLKDLLRNLLQVDLTKRFGNLKNGVNDIKNHKWFATTDWIAIYQRKVEAPFIPKFKGPGDTSNFDDYEEEEIRVSINEKCGKEFSEF
The number of amino-acids present is  20
A  is present 23 times
C  is present 2 times
D  is present 18 times
E  is present 27 times
F  is present 25 times
G  is present 22 times
H  is present 9 times
I  is present 21 times
K  is present 34 times
L  is present 32 times
M  is present 8 times
N  is present 17 times
P  is present 14 times
Q  is present 14 times
R  is present 15 times
S  is present 16 times
T  is present 14 times
V  is present 20 times
W  is present 6 times
Y  is present 14 times
Aminoacid C has the lowest frequency ( 2 )
Aminoacid K has the highest frequency ( 34 )


## Exercises

1. Given the following two lists of integers: 
[1, 13, 22, 7, 43, 81, 77, 12, 15,21, 84,100] and [44,32,7, 100, 81, 13, 1, 21, 71]:

1. Sort the two lists
2. Create a third list as intersection of the two lists (i.e. an element is in the intersection if it is present in both lists).  
3. Print the three lists.

<div class="tggle" onclick="toggleVisibility('ex1');">Show/Hide Solution</div>
<div id="ex1" style="display:none;">

In [44]:
L1 = [1, 13, 22, 7, 43, 81, 77, 12, 15,21, 84,100]
L2 = [44,32,7, 100, 81, 13, 1, 21, 71]

L1.sort()
L2.sort()
intersection = [x for x in L1 if x in L2]

print("L1:    ", L1)
print("L2:    ", L2)
print("inters:", intersection)

L1:     [1, 7, 12, 13, 15, 21, 22, 43, 77, 81, 84, 100]
L2:     [1, 7, 13, 21, 32, 44, 71, 81, 100]
inters: [1, 7, 13, 21, 81, 100]


</div>

2. Given the following list:

geneCorr = [["G1C2W9", "G1C2Q7", 0.2], ["G1C2W9", "G1C2Q4", 0.9], 
["Q6NMS1", "G1C2W9", 0.8],["G1C2W9", "Q6NMS1",0.4], ["G1C2Q7", "G1C2Q4",0.76]]

where each list ["gene1", "gene2", corr] represents a correlation between *gene1* and *gene2* with correlation *corr*, create another list containing only the elements having an high correlation (i.e. > 0.75). Print this list.

Expected result:
```
[['G1C2W9', 'G1C2Q4', 0.9], ['Q6NMS1', 'G1C2W9', 0.8], ['G1C2Q7', 'G1C2Q4', 0.76]]
```

<div class="tggle" onclick="toggleVisibility('ex2');">Show/Hide Solution</div>
<div id="ex2" style="display:none;">

In [79]:
geneCorr = [["G1C2W9", "G1C2Q7", 0.2], ["G1C2W9", "G1C2Q4", 0.9], ["Q6NMS1", "G1C2W9", 0.8],
            ["G1C2W9", "Q6NMS1",0.4], ["G1C2Q7", "G1C2Q4",0.76]]

highlyCorr = [x for x in geneCorr if x[2] > 0.75]

print(geneCorr, "\n")
print(highlyCorr)

[['G1C2W9', 'G1C2Q7', 0.2], ['G1C2W9', 'G1C2Q4', 0.9], ['Q6NMS1', 'G1C2W9', 0.8], ['G1C2W9', 'Q6NMS1', 0.4], ['G1C2Q7', 'G1C2Q4', 0.76]] 

[['G1C2W9', 'G1C2Q4', 0.9], ['Q6NMS1', 'G1C2W9', 0.8], ['G1C2Q7', 'G1C2Q4', 0.76]]


</div>
3. Given the following sequence of DNA:

DNA = "GATTACATATATCAGTACAGATATATACGCGCGGGCTTACTATTAAAAACCCC"
    
    1. Create a dictionary reporting the frequency of each base (i.e. key is the base and value is the frequency).
    2. Create a dictionary representing an index of all possible dimers (i.e. 2 bases, 16 dimers in total): AA, AT, AC, AG, TA, TT, TC, TG, ... . In this case, keys of the dictionary are dimers and values are lists with all possible starting positions of the dimer.  
    3. Print the DNA string.
    4. Print for each base its frequency
    4. Print all positions of the dimer "AT"

The expected result is:
```
sequence: GATTACATATATCAGTACAGATATATACGCGCGGGCTTACTATTAAAAACCCC
G has frequency: 0.1509433962264151
C has frequency: 0.22641509433962265
A has frequency: 0.3584905660377358
T has frequency: 0.2641509433962264
{'GG': [32, 33], 'TC': [11], 'GT': [14], 'CA': [5, 12, 17], 'TT': [2, 36, 42], 'CG': [27, 29, 31], 'TA': [3, 7, 9, 15, 21, 23, 25, 37, 40, 43], 'AG': [13, 18], 'GA': [0, 19], 'CT': [35, 39], 'GC': [28, 30, 34], 'AT': [1, 6, 8, 10, 20, 22, 24, 41], 'CC': [49, 50, 51], 'AA': [44, 45, 46, 47], 'AC': [4, 16, 26, 38, 48]} 

Dimer AT is found at: [1, 6, 8, 10, 20, 22, 24, 41]
```

<div class="tggle" onclick="toggleVisibility('ex3');">Show/Hide Solution</div>
<div id="ex3" style="display:none;">

In [90]:
DNA = "GATTACATATATCAGTACAGATATATACGCGCGGGCTTACTATTAAAAACCCC"

n = len(DNA)

baseFreq = {"A" : DNA.count("A")/n, "T" : DNA.count("T")/n, 
            "C": DNA.count("C")/n, "G" : DNA.count("G")/n }

dimersDict ={}

print("sequence:", DNA)

for base in baseFreq:
    print(base, "has frequency:", baseFreq[base])
    
for ind in range(len(DNA) -1 ): #need -1 because at each iteration I get the dimer [ind:ind+1]
    dimer = DNA[ind:ind+2]
    if(dimer in dimersDict):
        dimersDict[dimer].append(ind)
    else:
        dimersDict[dimer] = [ind]
print(dimersDict, "\n")
print("Dimer AT is found at:", dimersDict["AT"])

sequence: GATTACATATATCAGTACAGATATATACGCGCGGGCTTACTATTAAAAACCCC
G has frequency: 0.1509433962264151
C has frequency: 0.22641509433962265
A has frequency: 0.3584905660377358
T has frequency: 0.2641509433962264
{'GG': [32, 33], 'TC': [11], 'GT': [14], 'CA': [5, 12, 17], 'TT': [2, 36, 42], 'CG': [27, 29, 31], 'TA': [3, 7, 9, 15, 21, 23, 25, 37, 40, 43], 'AG': [13, 18], 'GA': [0, 19], 'CT': [35, 39], 'GC': [28, 30, 34], 'AT': [1, 6, 8, 10, 20, 22, 24, 41], 'CC': [49, 50, 51], 'AA': [44, 45, 46, 47], 'AC': [4, 16, 26, 38, 48]} 

Dimer AT is found at: [1, 6, 8, 10, 20, 22, 24, 41]


</div>

4. Given the following table, reporting molecular weights for each amino acid, store them in a dictionary where the key is the one letter code and the value is the molecular weight (e.g. {"A" : 89, "R":179"}).

![](img/pract5/molecular_weights.png)

Write a python script to answer the following questions:
    1. What is the average molecular weight of an amino acid?
    2. What is the total molecular weight and number of aminoacids of the P53 peptide GSRAHSSHLKSKKGQSTSRHK?
    3. What is the total molecular weight and number of aminoacids of the peptide YTSLIHSLIEESQNQQEKNEQELLELDKWASLWNWF?

In [62]:
mws = {"A" : 89, "R" : 174, "N" : 132, "D" : 133, "B": 133, "C": 121, "Q": 146, 
       "E": 147, "Z": 147, "G": 75, "H": 155, "I": 131, "L" : 131,
      "K" : 146, "M" : 149, "F" : 165, "P" : 115, "S" : 105, "T" : 119, 
       "W" : 204, "Y": 181, "V" : 117
      }
avgW = 0
for amino in mws:
    avgW = avgW + mws[amino]

avgW = avgW/len(mws)
print("The average molecular weight of amino acids is:", avgW)

P53amino = "GSRAHSSHLKSKKGQSTSRHK"

totW = 0
for amino in P53amino:
    totW += mws[amino]
print("Peptide ", P53amino, "has",len(P53amino), "amino acids and a total mw of",totW, "Da")

totW = 0 #reusing same variable name
peptide = "YTSLIHSLIEESQNQQEKNEQELLELDKWASLWNWF"
for amino in peptide:
    totW += mws[amino]
    
print("Peptide ", peptide, "has",len(peptide), "amino acids and a total mw of",totW, "Da")

The average molecular weight of amino acids is: 137.04545454545453
Peptide  GSRAHSSHLKSKKGQSTSRHK has 21 amino acids and a total mw of 2662 Da
Peptide  YTSLIHSLIEESQNQQEKNEQELLELDKWASLWNWF has 36 amino acids and a total mw of 5076 Da
