# Demonstrating how the EpistasisMap object works.

This notebook demonstrates hwo the `EpistasisMap` object, the core of the epistasis model, works. Its responsibility is to hold the genotype-phenotype map, efficiently map parameters from the Epistasis Models, and be stable against memory leaks.

In [1]:
import numpy as np
import itertools as it

We need a method for generating a genotype space that we'll use in this example. This method will take two sequences that differ at all sites and generate a binary sequence space between them. 

In [2]:
def generate_sequence_space(wildtype, mutant):
    """ Generate binary sequence space between ancestor and derived sequence. """
    if len(wildtype) != len(mutant):
        raise IndexError("ancestor_sequence and derived sequence must be the same length.")

    binaries = sorted(["".join(list(s)) for s in it.product('01', repeat=len(wildtype))])
    sequence_space = list()
    for b in binaries:
        binary = list(b)
        sequence = list()
        for i in range(len(wildtype)):
            if b[i] == '0':
                sequence.append(wildtype[i])
            else:
                sequence.append(mutant[i])
        sequence_space.append(''.join(sequence))
    return sequence_space

Use the method to generate a toy genotype-to-phenotype map.

In [3]:
seq1 = 'AAAAA'
seq2 = 'VVVVV'
genotypes = generate_sequence_space(seq1, seq2)
phenotypes = np.random.rand(len(genotypes))

Let's import the `EpistasisMap` object. This is the core of the `epistasis` module; it is contains the internal mapping for all the Epistasis Models. 

In [4]:
from epistasis.core.em import EpistasisMap

Read the documentation for the object.

In [5]:
EpistasisMap?

First, we'll create an instance of the object and populate it with the sequence space we created. 

In [6]:
em = EpistasisMap()
em.genotypes = genotypes
em.phenotypes = phenotypes
em.order = 5
em.wildtype = 'AAAAA'

This object automatically builds all possible epistatic interactions to the specified order on initialization.

In [7]:
em.interaction_genotypes

['A0V',
 'A1V',
 'A2V',
 'A3V',
 'A4V',
 'A5V',
 'A1V,A2V',
 'A1V,A3V',
 'A1V,A4V',
 'A1V,A5V',
 'A2V,A3V',
 'A2V,A4V',
 'A2V,A5V',
 'A3V,A4V',
 'A3V,A5V',
 'A4V,A5V',
 'A1V,A2V,A3V',
 'A1V,A2V,A4V',
 'A1V,A2V,A5V',
 'A1V,A3V,A4V',
 'A1V,A3V,A5V',
 'A1V,A4V,A5V',
 'A2V,A3V,A4V',
 'A2V,A3V,A5V',
 'A2V,A4V,A5V',
 'A3V,A4V,A5V',
 'A1V,A2V,A3V,A4V',
 'A1V,A2V,A3V,A5V',
 'A1V,A2V,A4V,A5V',
 'A1V,A3V,A4V,A5V',
 'A2V,A3V,A4V,A5V',
 'A1V,A2V,A3V,A4V,A5V']

## How I do use EpistasisMap?

The main reason for an `EpistasisMap` is to efficiently map genotypes to phenotypes, to mutations, to epistatic interactions, to experimental errors, to epistatic uncertainty, etc. The main issue this object solves is storing these maps efficiently in memory, especially when the space gets *large*.

### Different types of Maps
This object builds all maps on the fly to avoid storing these objects in memory (since dictionary can have high memory costs).

#### Genotype-Binary Map Representation

In [8]:
em.geno2binary

{'AAAAA': '00000',
 'AAAAV': '00001',
 'AAAVA': '00010',
 'AAAVV': '00011',
 'AAVAA': '00100',
 'AAVAV': '00101',
 'AAVVA': '00110',
 'AAVVV': '00111',
 'AVAAA': '01000',
 'AVAAV': '01001',
 'AVAVA': '01010',
 'AVAVV': '01011',
 'AVVAA': '01100',
 'AVVAV': '01101',
 'AVVVA': '01110',
 'AVVVV': '01111',
 'VAAAA': '10000',
 'VAAAV': '10001',
 'VAAVA': '10010',
 'VAAVV': '10011',
 'VAVAA': '10100',
 'VAVAV': '10101',
 'VAVVA': '10110',
 'VAVVV': '10111',
 'VVAAA': '11000',
 'VVAAV': '11001',
 'VVAVA': '11010',
 'VVAVV': '11011',
 'VVVAA': '11100',
 'VVVAV': '11101',
 'VVVVA': '11110',
 'VVVVV': '11111'}

#### Epistastic interaction-to-value Map

Assuming you've run an epistasis model, you can call this mapping

``` 
em.key2value
```

#### Epistatic interaction (genotype)-to-value Map

Assuming you've run an epistasis model, you can call this mapping

``` 
em.genotype2value
```

#### And many more! 

Read the docstring for more information