# PLM5 - Dictionaries, functions and imports

## Dictionaries

When we have a collection of (key, value) pairs (also named associative arrays or mappings) we could use a list:

In [1]:
[['A', 'purine'], ['C', 'pyrimidine']]

[['A', 'purine'], ['C', 'pyrimidine']]

In [2]:
bases = [['A', 'purine'], ['G', 'purine'], ['C', 'pyrimidine'], ['T', 'pyrimidine']]
print(bases[0])
print(bases[0][0])

['A', 'purine']
A


In [3]:
print(bases[0][0] + bases[1][0] + bases[2][0])

AGC


But there is a better option than a list of lists. We can convert a list of this form to a dictionary using *dict()*

In [4]:
dict(bases)

{'A': 'purine', 'G': 'purine', 'C': 'pyrimidine', 'T': 'pyrimidine'}

In the preceding expression notice that we have *pairs* of *keys* and *values*. A *key* and its *value* are separated by ':' whereas *pairs* are separated by ',' 

In [5]:
base_type = dict(bases)

In [6]:
type(base_type)

dict

But typically we will construct a dictionary from scratch:

In [7]:
base_types = {'A': 'purine', 'G': 'purine', 'C': 'pyrimidine', 'T': 'pyrimidine'}
print(base_types)

{'A': 'purine', 'G': 'purine', 'C': 'pyrimidine', 'T': 'pyrimidine'}


We can now look for the type of any base using its *key*:

In [8]:
base_type['A']

'purine'

However it is only possible to look for *keys*, not for *values*

In [9]:
base_type['purine']

KeyError: 'purine'

We can add new mappings to a dictionary:

In [10]:
base_type['U'] = '???'
print(base_type)

{'A': 'purine', 'G': 'purine', 'C': 'pyrimidine', 'T': 'pyrimidine', 'U': '???'}


The same syntax to add new mappings is used to modiffy the *values* of existing *pairs*

In [11]:
base_type['U'] = 'pyrimidine'
print(base_type)

{'A': 'purine', 'G': 'purine', 'C': 'pyrimidine', 'T': 'pyrimidine', 'U': 'pyrimidine'}


Want to get only the keys?

In [12]:
base_type.keys()

dict_keys(['A', 'G', 'C', 'T', 'U'])

Only the values?

In [13]:
base_type.values()

dict_values(['purine', 'purine', 'pyrimidine', 'pyrimidine', 'pyrimidine'])

Dictionaries have *membership opperators*

In [14]:
'A' in base_type

True

But they only work with *keys* not for *values*

In [15]:
'pyrimidine' in base_type

False

If we want to look for *values* we could generate a list first and search in this list:

In [16]:
'pyrimidine' in base_type.values()

True

What if I want to join all bases stored as keys to a str?

In [17]:
''.join(base_type.keys())

'AGCTU'

We can do beautiful iterations:

In [18]:
for base in base_type:
    print(base)

A
G
C
T
U


In [19]:
for basetype in base_type.values():
    print(basetype)

purine
purine
pyrimidine
pyrimidine
pyrimidine


In [20]:
for base, basetype in base_type.items():
    print(base, basetype)

A purine
G purine
C pyrimidine
T pyrimidine
U pyrimidine


In [21]:
del base_type['U']
base_type

{'A': 'purine', 'G': 'purine', 'C': 'pyrimidine', 'T': 'pyrimidine'}

## Functions

We already know *len(), sum(), max()* ... which are *built-in functions* that come with Python. The list of *build-ind* functions can be obtained with the followint expression (along with object types):

In [22]:
dir(__builtins__)

['ArithmeticError',
 'AssertionError',
 'AttributeError',
 'BaseException',
 'BlockingIOError',
 'BrokenPipeError',
 'BufferError',
 'ChildProcessError',
 'ConnectionAbortedError',
 'ConnectionError',
 'ConnectionRefusedError',
 'ConnectionResetError',
 'EOFError',
 'Ellipsis',
 'EnvironmentError',
 'Exception',
 'False',
 'FileExistsError',
 'FileNotFoundError',
 'FloatingPointError',
 'GeneratorExit',
 'IOError',
 'ImportError',
 'IndentationError',
 'IndexError',
 'InterruptedError',
 'IsADirectoryError',
 'KeyError',
 'KeyboardInterrupt',
 'LookupError',
 'MemoryError',
 'ModuleNotFoundError',
 'NameError',
 'None',
 'NotADirectoryError',
 'NotImplemented',
 'NotImplementedError',
 'OSError',
 'OverflowError',
 'PermissionError',
 'ProcessLookupError',
 'RecursionError',
 'ReferenceError',
 'RuntimeError',
 'StopAsyncIteration',
 'StopIteration',
 'SyntaxError',
 'SystemError',
 'SystemExit',
 'TabError',
 'TimeoutError',
 'True',
 'TypeError',
 'UnboundLocalError',
 'UnicodeDecode

However Python comes with other functions (without installing any package). The collection of functions and packages thah comes with Python is called the **standard library**.

Let's use one of this functions (*Counter*), which is part of the package **collections**. In order to load it in memory we need to **import** it. A similar import statement is also required to used functions that come in user-installed packages. 

In [23]:
from collections import Counter

In [24]:
Counter

collections.Counter

*Counter* provides counts for iterables

In [25]:
help(Counter)

Help on class Counter in module collections:

class Counter(builtins.dict)
 |  Counter(*args, **kwds)
 |  
 |  Dict subclass for counting hashable items.  Sometimes called a bag
 |  or multiset.  Elements are stored as dictionary keys and their counts
 |  are stored as dictionary values.
 |  
 |  >>> c = Counter('abcdeabcdabcaba')  # count elements from a string
 |  
 |  >>> c.most_common(3)                # three most common elements
 |  [('a', 5), ('b', 4), ('c', 3)]
 |  >>> sorted(c)                       # list all unique elements
 |  ['a', 'b', 'c', 'd', 'e']
 |  >>> ''.join(sorted(c.elements()))   # list elements with repetitions
 |  'aaaaabbbbcccdde'
 |  >>> sum(c.values())                 # total of all counts
 |  15
 |  
 |  >>> c['a']                          # count of letter 'a'
 |  5
 |  >>> for elem in 'shazam':           # update counts from an iterable
 |  ...     c[elem] += 1                # by adding 1 to each element's count
 |  >>> c['a']                          #

such as **strs**

In [26]:
Counter('AGCTGATCAAAAGGCTAC')

Counter({'A': 7, 'G': 4, 'C': 4, 'T': 3})

or **lists**

In [27]:
Counter(['A', 'G', 'C', 'T', 'G', 'A', 'T', 'C', 'A', 'A', 'A', 'A', 'G'])

Counter({'A': 6, 'G': 3, 'C': 2, 'T': 2})

# Imports

In [28]:
from collections import Counter

The previous expression is only one way of loading functions/packages. This way is typically used whe we are only interested in one function from a certain package. Let's remove the *Counter* function from our session.

In [29]:
del Counter

When we plan to use more functions of the same package we typically use another syntax:

In [30]:
import collections

But notice that with this type of *import* Counter is not available

In [31]:
Counter

NameError: name 'Counter' is not defined

The function has to be called as follows (name_of_the_package.function):

In [32]:
collections.Counter

collections.Counter

In [33]:
collections.Counter('AGCTGATCAAAAGGCTAC')

Counter({'A': 7, 'G': 4, 'C': 4, 'T': 3})

If you think collections is a large name you can make an *alias*

In [34]:
import collections as col
col.Counter('AGCTGATCAAAAGGCTAC')

Counter({'A': 7, 'G': 4, 'C': 4, 'T': 3})

You will eventually also see imports of the form 

In [35]:
from collections import *

This will import all functions (and other objects) of *collections* and make them available without the *collections.* prefix. This way is discouraged because as programs become large it is difficult to know where functions come from.

## Create our own functions

We can define a function using **def**. **def** takes as *arguments* the variables in parenthesis and gives as *output* the variables in the *return statement*

In [37]:
def dna2rna(dna):
    rna = dna.replace('T', 'U')
    return rna

In [38]:
dna2rna('AGCTGATCA')

'AGCUGAUCA'

Functions should better have a *docstring* that tell what they do

In [44]:
def dna2rna(dna):
    """Converts dna to rna
    str -> str"""
    rna=dna.replace('T', 'U')
    return rna

This is what help() displays whan we use it with any object

In [40]:
help(dna2rna)

Help on function dna2rna in module __main__:

dna2rna(dna)
    Converts dna to rna
    str -> str



We can print the result of a function

In [41]:
my_seq = 'ATGAGGATA'
print(dna2rna(my_seq))

AUGAGGAUA


Or assign it to a variable (and, eventually, print it)

In [42]:
my_output = dna2rna(my_seq)
print(my_output)

AUGAGGAUA


Notice that outside a function we do not have access of the variables inside the function

In [45]:
print(rna)

NameError: name 'rna' is not defined