## dict and set datatypes and the `in` operator

Dictionaries are an important collection in Python. Each item inside a dictionary consists of a **key, value pair**. Thus, there is a coupling between the key-value pairs. Each key can only exist one time inside the dictionary and must be unique. Dictionaries are created using curly braces `{}`. Key and value pair are seperated by a semicolon `:`  as such `key:value`. Commas `,` are placed between key:value pairs.

An example (for bio-informaticians):

In [1]:
bases = {'a': 'adenine', 't': 'thymine', 'c': 'cytosine','g': 'guanine'} #compact notation

# also allowed:

bases = {'a': 'adenine',
         't': 'thymine',
         'c': 'cytosine',
         'g': 'guanine'}

print(bases)

{'a': 'adenine', 't': 'thymine', 'c': 'cytosine', 'g': 'guanine'}


To get the value of a key, we have to use indexing. However, instead of using a number as the index, we have to give the key to retrieve the value:

In [2]:
print(bases['a'])

adenine


If you use a key that does not exist, you will get an error:

In [3]:
print(bases['q']) # will result in a KeyError

KeyError: 'q'

You can avoid this error using the `dict.get` method:

In [4]:
print(bases.get('a'))
print(bases.get('q'))

adenine
None


You can remove an item using the `del` keyword:

In [5]:
del bases['a']
print(bases)

{'t': 'thymine', 'c': 'cytosine', 'g': 'guanine'}


And you can add an item as follows:

In [6]:
bases['a'] = 'adenine'
print(bases)

{'t': 'thymine', 'c': 'cytosine', 'g': 'guanine', 'a': 'adenine'}


If you want to add multiple items you can use the `update()` method of the dict datatype:

In [7]:
bases = {'a': 'adenine', 't': 'thymine'}
print(bases)

bases.update({'c': 'cytosine', 'g': 'guanine'})
print(bases)

{'a': 'adenine', 't': 'thymine'}
{'a': 'adenine', 't': 'thymine', 'c': 'cytosine', 'g': 'guanine'}


You might have wondered if you could work the other way around, retrieve a key from a value. The answer is no, not possible...

You need to swap the keys and values:

In [9]:
print(bases)
bases_switched = {} # create empty dict
for i in bases:
    new_key = bases[i]
    new_value = i
    bases_switched[new_key] = new_value
print(bases_switched)

{'a': 'adenine', 't': 'thymine', 'c': 'cytosine', 'g': 'guanine'}
{'adenine': 'a', 'thymine': 't', 'cytosine': 'c', 'guanine': 'g'}


duplicate keys cannot exist, so when you swap keys and values, you will loose the duplicate values.

> Python dictionaries are ordered. You might still read that they are not ordered but this changed recently.

## Sets
A set is like a dictionary without values. Thus, it is a set of unique keys. Sets are unordered. An example:

In [10]:
bases_dna = {"a", "t", "c", "g"}
bases_rna = {"a", "u", "c", "g"}

You can use set to check if all bases of DNA are present:

In [11]:
dna = "atttgggaatg"
print(set(dna))

{'t', 'g', 'a'}


You can use set methods to get the union, intersection and differences. Have a look at the Venn diagram that shows some possible logical relations between different sets (source: https://en.wikipedia.org/wiki/Venn_diagram):
![Venn](figs/venn.png)

In [12]:
print(bases_dna.intersection(bases_rna)) # returns the set of elements that are present in both sets
print(bases_dna.union(bases_rna)) # returns the set of elements present in either set
#dir(bases_dna)
print(bases_dna.difference(bases_rna)) # returns the set of elements present in bases_dna, but not bases_rna
print(bases_rna.difference(bases_dna)) # returns the set of elements present in bases_rna, but not bases_dna

print(bases_dna.symmetric_difference(bases_rna)) # returns the set of elements present in either set, but not in both

{'g', 'a', 'c'}
{'u', 'g', 'a', 'c', 't'}
{'t'}
{'u'}
{'u', 't'}


## The `in` operator
The in operator in Python can be used in two different contexts.

The first use of the in operator is in the use of a for loop (as we've already seen). Here it is used to iterate over the different values.

`codons = ['atg', 'ggg', 'ttt']
for codon in codons:
    print(codon)`

The other use of the `in` keyword is in finding if an element is present in an iterable. The `in` operator returns `True` if an element is found, `False` otherwise.

Example:

In [13]:
x = 'ATGC'
print('A' in x)

True


In [14]:
d = {1:'a', 2:'b', 3:'c'}
print(2 in d)

True


In [15]:
l = [1,2,3,4,5]
print(6 in l)

False
