<small><small><i>
Introduction to Python for Bioinformatics - available at https://github.com/kipkurui/Python4Bioinformatics.
</i></small></small>

## Dictionaries

Dictionaries are mappings between keys and items stored in the dictionaries. Unlike lists and tuples, dictionaries are unordered. Alternatively one can think of dictionaries as sets in which something stored against every element of the set. They can be defined as follows:

To define a dictionary, equate a variable to { } or dict()

In [6]:
d = dict()# or equivalently d={}
print(type(d))
d['abc'] = 3
d[4] = "A string"
print(d)

<class 'dict'>
{'abc': 3, 4: 'A string'}


As can be guessed from the output above. Dictionaries can be defined by using the `{ key : value }` syntax. The following dictionary has three elements

In [4]:
d = { 1: 'One', 2 : 'Two', 100 : 'Hundred'}
len(d)
d

{1: 'One', 2: 'Two', 100: 'Hundred'}

Now you are able to access 'One' by the index value set at 1

In [5]:
d = { 1: 'One', 2 : 'Two', 100 : 'Hundred'}
d[1]=['three','four']

{1: ['three', 'four'], 2: 'Two', 100: 'Hundred'}

In [8]:
d[2][1].upper()

'W'

There are a number of alternative ways for specifying a dictionary including as a list of `(key,value)` tuples.
To illustrate this we will start with two lists and form a set of tuples from them using the **zip()** function
Two lists which are related can be merged to form a dictionary.

In [15]:
names = ['Two', 'One', 'Three', 'Four', 'Five']
numbers = [1, 2, 3, 4, 5]
[ (name,number) for name,number in zip(names,numbers)] # create (name,number) pairs

[('Two', 1), ('One', 2), ('Three', 3), ('Four', 4), ('Five', 5)]

In [14]:
d1 = {}
for name,number in zip(names,numbers):
    d1[name] = number
d1

{'Two': 1, 'One': 2, 'Three': 3, 'Four': 4, 'Five': 5}

In [1]:
# alternative approaches to the above program
# option 01
names = ['Two', 'One', 'Three', 'Four', 'Five']
numbers = [1, 2, 3, 4, 5]
my_dict = {}
for i in range(len(names)):
    print(i)
    my_dict[names[i]] = numbers[i]
#option 02
dict_a = {}
for i,name in enumerate(names):
    dict_a[name] = numbers[i]
    print(dict_a)


0
1
2
3
4
{'Two': 1}
{'Two': 1, 'One': 2}
{'Two': 1, 'One': 2, 'Three': 3}
{'Two': 1, 'One': 2, 'Three': 3, 'Four': 4}
{'Two': 1, 'One': 2, 'Three': 3, 'Four': 4, 'Five': 5}


In [2]:
# to generate a list of odd numbers between 6 and 36
lista=[6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36]


[6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22,
 23,
 24,
 25,
 26,
 27,
 28,
 29,
 30,
 31,
 32,
 33,
 34,
 35,
 36]

In [71]:
#more of the automation
a=6
b=37
for mylist in range(a,b):
    print(mylist)

6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36


In [22]:
lista[1:36:2]

[7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35]

Now we can create a dictionary that maps the name to the number as follows.

In [54]:
a1 = dict((name,number) for name,number in zip(names,numbers))
print(a1)

{'Two': 1, 'One': 2, 'Three': 3, 'Four': 4, 'Five': 5}


Note that the ordering for this dictionary is not based on the order in which elements are added but on its own ordering (based on hash index ordering). It is best never to assume an ordering when iterating over elements of a dictionary.

"Dictionaries are **insertion ordered[1]**. As of Python 3.6, for the CPython implementation of Python, dictionaries remember the order of items inserted. This is considered an implementation detail in Python 3.6; you need to use OrderedDict if you want insertion ordering that's guaranteed across other implementations of Python (and other ordered behavior[1]).

As of Python 3.7, this is no longer an implementation detail and instead becomes a language feature."

By using tuples as indexes we make a dictionary behave like a sparse matrix:

In [6]:
matrix={ (0,1): 3.5, (2,17): 0.1}
matrix[2,2] = matrix[0,1] + matrix[2,17]
print(matrix)

{(0, 1): 3.5, (2, 17): 0.1, (2, 2): 3.6}


Dictionary can also be built using the loop style definition.

In [9]:
a2 = { name : len(name) for name in names}
print(a2)

{'Two': 3, 'One': 3, 'Three': 5, 'Four': 4, 'Five': 4}


### Built-in Functions

The **len()** function and **in** operator have the obvious meaning:

In [22]:
print("a1 has",len(a1),"elements")
print("One is in a1 is",'One' in a1,", but Zero in a1 is", 'Zero' in a1)

a1 has 5 elements
One is in a1 is True , but Zero in a1 is False


**clear( )** function is used to erase all elements.

In [23]:
a2.clear()
print(a2)

{}


**values( )** function returns a list with all the assigned values in the dictionary. (Acutally not quit a list, but something that we can iterate over just like a list to construct a list, tuple or any other collection):

In [48]:
a2=a1.values()

In [56]:
list(a2)

[1, 2, 3, 4, 5]

In [50]:
[ v for v in a1.values() ]

[1, 2, 3, 4, 5]

**keys( )** function returns all the index or the keys to which contains the values that it was assigned to.

In [53]:
a1.items()

dict_items([('Two', 1), ('One', 2), ('Three', 3), ('Four', 4), ('Five', 5)])

In [29]:
{ k for k in a1.keys() }

{'Five', 'Four', 'One', 'Three', 'Two'}

**items( )** is returns a list containing both the list but each element in the dictionary is inside a tuple. This is same as the result that was obtained when zip function was used - except that the ordering has been 'shuffled' by the dictionary.

In [46]:
", ".join( "%s = %d" % (name,val) for name,val in a1.items())

'Two = 1, One = 2, Three = 3, Four = 4, Five = 5'

In [85]:
# how to write the above code in a long approach

#for name, val in a1.items():
    #print(name,val,sep=' = ',end = '; '), or 
    #print('%s = %d' % (name,val))
# in the case of dealing lists,
s=''
for name, val in a1.items():
    s.append('%s = %d ' % (name,val))
', '.join(s)

AttributeError: 'str' object has no attribute 'append'

**pop( )** function is used to get the remove that particular element and this removed element can be assigned to a new variable. But remember only the value is stored and not the key. Because the is just a index value.

In [43]:
val = a1.pop('Four')
print(a1)
print("Removed",val)

{'Two': 1, 'One': 2, 'Three': 3, 'Five': 5}
Removed 4


In [46]:
a1['Two'] = ''

In [53]:
?a1.pop('Four')

[0;31mDocstring:[0m
D.pop(k[,d]) -> v, remove specified key and return the corresponding value.
If key is not found, d is returned if given, otherwise KeyError is raised
[0;31mType:[0m      builtin_function_or_method


In [52]:
for key in a1.keys():
    print(key, a1[key])

2 One
3 Three
5 Five
 Two


In [181]:
# To convert the following sequence into amino acids,
# AAAAATCCCGAGGCGGCTATATAGGGCTCCGGAGGCGTAATATAAAA
# step one, find the reverse of the sequecne above as a complement
# step two, replace the T with U so as to get the mRNA sequence
# then translate the result sequence into amino acids / proteins.

# step 01
dna1='AAAAATCCCGAGGCGGCTATATAGGGCTCCGGAGGCGTAATATAAAAG'
reversed_dna1 = dna1[::-1]
reversed_dna1 = reversed_dna1.replace('A','t').replace('G','c').replace('T','a').replace('C','g')
reversed_dna1.upper()

#step 02
mRNA = reversed_dna1.replace('t','U').upper()
print(mRNA)

#step 03
 # first we obtain the codon table.

genetic_code = {"UUU":"F", "UUC":"F", "UUA":"L", "UUG":"L", 
        "UCU":"S", "UCC":"s", "UCA":"S", "UCG":"S",
        "UAU":"Y", "UAC":"Y", "UAA":"STOP", "UAG":"STOP",
        "UGU":"C", "UGC":"C", "UGA":"STOP", "UGG":"W",
        "CUU":"L", "CUC":"L", "CUA":"L", "CUG":"L",
        "CCU":"P", "CCC":"P", "CCA":"P", "CCG":"P",
        "CAU":"H", "CAC":"H", "CAA":"Q", "CAG":"Q",
        "CGU":"R", "CGC":"R", "CGA":"R", "CGG":"R",
        "AUU":"I", "AUC":"I", "AUA":"I", "AUG":"M",
        "ACU":"T", "ACC":"T", "ACA":"T", "ACG":"T",
        "AAU":"N", "AAC":"N", "AAA":"K", "AAG":"K",
        "AGU":"S", "AGC":"S", "AGA":"R", "AGG":"R",
        "GUU":"V", "GUC":"V", "GUA":"V", "GUG":"V",
        "GCU":"A", "GCC":"A", "GCA":"A", "GCG":"A",
        "GAU":"D", "GAC":"D", "GAA":"E", "GAG":"E",
        "GGU":"G", "GGC":"G", "GGA":"G", "GGG":"G", }
aa_seq=[]
for i in range(0, len(mRNA), 3):
    #print(i)
    codon = mRNA[i:i+3]
    mRNA_codon=genetic_code[codon]
    print(mRNA_codon)
    
    

CUUUUAUAUUACGCCUCCGGAGCCCUAUAUAGCCGCCUCGGGAUUUUU
L
L
Y
Y
A
s
G
A
L
Y
S
R
L
G
I
F


12

In [124]:
print(mRNA)
listmR=[mRNA]

CUUUUAUAUUACGCCUCCGGAGCCCUAUAUAGCCGCCUCGGGAUUUUU


In [154]:
codon=''
for i in range (0,len(mRNA), 3):
    codon = listmR[i:i+3]
    print(codon)

['CUUUUAUAUUACGCCUCCGGAGCCCUAUAUAGCCGCCUCGGGAUUUUU']


In [164]:
len(mRNA)
for i in range (0, len(mRNA), 3):
    #print(i)
    codon=mRNA[i:i+3]
    print(codon)
        

CUU
UUA
UAU
UAC
GCC
UCC
GGA
GCC
CUA
UAU
AGC
CGC
CUC
GGG
AUU
UUU


## Exercise

- Using strings, lists, tuples and dictionaries concepts, find the reverse complement of AAAAATCCCGAGGCGGCTATATAGGGCTCCGGAGGCGTAATATAAAA


Algorithm:
1. Store the DNA in a string
    - Use it as a string
    - Convert it to a list
2. Reverse the dna string
    - reverse method on lists
    - Slice methods on lists
    - use negative indexing (slicing) on a string or list
3. Complement:
    - For a string, we can use replace
    - Use conditionals to replace an empty list
    - use a DNA complement dictionary
    

### Using Strings

In [62]:
dna = 'AAAAATCCCGAGGCGGCTATATAGGGCTCCGGAGGCGTAATATAAAAG'

In [79]:
#reversed_dna = ''.join([dna[-i] for i in range(1,len(dna)+1)])

In [6]:
reversed_dna = dna[::-1]

NameError: name 'dna' is not defined

In [7]:
reversed_dna = reversed_dna.replace('A','t').replace('C','g').replace('G','c').replace('G','c').replace('T','a')

NameError: name 'reversed_dna' is not defined

In [8]:
reversed_dna.upper()

NameError: name 'reversed_dna' is not defined

### Using Conditionals

In [91]:
comp_dna = []
for nuc in dna:
    if nuc == 'A':
        comp_dna.append('T')
    elif nuc == 'T':
        comp_dna.append('A')
    elif nuc == 'C':
        comp_dna.append('G')
    elif nuc == 'G':
        comp_dna.append('C')
    else:
        comp_dna.append(nuc)

In [94]:
#comp_dna.reverse()

In [96]:
''.join(comp_dna)

'CTTTTATATTACGCCTCCGGAGCCCTATATAGCCGCCTCGGGATTTTT'

In [101]:
revCompDNA = ''
for nuc in comp_dna:
    revCompDNA = revCompDNA + nuc

In [102]:
revCompDNA

'CTTTTATATTACGCCTCCGGAGCCCTATATAGCCGCCTCGGGATTTTT'

### Using the dictionary

In [107]:
dna

'AAAAATCCCGAGGCGGCTATATAGGGCTCCGGAGGCGTAATATAAAAG'

In [92]:
dna = 'AAAAATCCCGAGGCGGCTATATAGGGCTCCGGAGGCGTAATATAAAAG'
rev_dna_dict = {'A':'T', 'C':'G', 'G':'C', 'T':'A'}
comp_dna = ''
for nuc in dna:
    comp_dna = rev_dna_dict[nuc] + comp_dna
print(comp_dna)

CTTTTATATTACGCCTCCGGAGCCCTATATAGCCGCCTCGGGATTTTT


In [7]:
### Exercise three, book 04, to find the reverse complement of the given sequence, based on the following alogrithm.
# Algorithm:

# 01 Store the DNA in a string
    # Use it as a string
    # Convert it to a list
# 02 Reverse the dna string
    # reverse method on lists
    # Slice methods on lists
    # use negative indexing (slicing) on a string or list
# 03 Complement:
    # For a string, we can use replace
    # Use conditionals to replace an empty list
    # use a DNA complement dictionary
seq1='AAAAATCCCGAGGCGGCTATATAGGGCTCCGGAGGCGTAATATAAAAG'
#part01
seq1_list=[seq1]
print(seq1_list)
#part02
reversed_seq1=seq1[::-1]
#part03 using strings methods
seq2=reversed_seq1.replace('A','t').replace('T','a').replace('G','c').replace('C','g')
Cseq2=seq2.upper()
#part03 using conditionals to fill the empty list.
comp_seq1=[]
for X in seq1:
    if X =='A':
        comp_seq1.append('T')
    elif X =='T':
        comp_seq1.append('A')
    elif X =='G':
        comp_seq1.append('C')
    elif X =='C':
        comp_seq1.append('G')
    else:
        comp_seq1.append(X)
    comp_seq1.reverse()
    
        
#part03 using dictionary


AAAAATCCCGAGGCGGCTATATAGGGCTCCGGAGGCGTAATATAAAAG
['AAAAATCCCGAGGCGGCTATATAGGGCTCCGGAGGCGTAATATAAAAG']


In [8]:
''.join(comp_seq1)

'CTTTTAGCCGACCAAACGCCGATTTTTGGTCCGTTTCGGCTCCTAATT'

In [103]:
print(Cseq2)

CTTTTATATTACGCCTCCGGAGCCCTATATAGCCGCCTCGGGATTTTT


In [100]:
###function01
seq1='AAAAATCCCGAGGCGGCTATATAGGGCTCCGGAGGCGTAATATAAAAG'
#seq1_1='GTGCCAATCCCGAGGCGGCTATATAGGGCTCCGGAGGCGTAATATAAAAGGA', for funtion checking.

def reverse_comp1(dseq):
    seq1_list=[dseq]
    reversed_seq1=dseq[::-1]
    seq2=reversed_seq1.replace('A','t').replace('T','a').replace('G','c').replace('C','g')
    Cseq2=seq2.upper()
    return print(Cseq2)
    

In [101]:
reverse_comp1(seq1)

CTTTTATATTACGCCTCCGGAGCCCTATATAGCCGCCTCGGGATTTTT


In [95]:
###function02
seq1='AAAAATCCCGAGGCGGCTATATAGGGCTCCGGAGGCGTAATATAAAAG'
# seq1_1='GTGCCAATCCCGAGGCGGCTATATAGGGCTCCGGAGGCGTAATATAAAAGGA'
# Bw='ATATGGCT'
comp_seq1=[]
def reverse_comp2(dseq2):  
    for X in dseq2:
        if X ==('G'):
            comp_seq1.append('C')
        elif X ==('T'):
            comp_seq1.append('A')
        elif X ==('A'):
            comp_seq1.append('T')
        elif X ==('C'):
            comp_seq1.append('G')
        else:
            comp_seq1.append(X)
    comp_seq1.reverse()
    y=''.join(comp_seq1)
    return print(y)



In [96]:
reverse_comp2(seq1)

CTTTTATATTACGCCTCCGGAGCCCTATATAGCCGCCTCGGGATTTTT


In [98]:
###function03
seq1='AAAAATCCCGAGGCGGCTATATAGGGCTCCGGAGGCGTAATATAAAAG'
seq1_1='GTGCCAATCCCGAGGCGGCTATATAGGGCTCCGGAGGCGTAATATAAAAGGA'
Bw='ATATGGCT'
def reverse_comp3(dseq3):
    rev_dna_dict = {'A':'T', 'C':'G', 'G':'C', 'T':'A'}
    comp_seq2 = ''
    for base in dseq3:
        comp_seq2 = rev_dna_dict[base] + comp_seq2
    return print(comp_seq2)


In [103]:
reverse_comp3(seq1_1)

TCCTTTTATATTACGCCTCCGGAGCCCTATATAGCCGCCTCGGGATTGGCAC
