# Demonstrating how the EpistasisMap object works.

This notebook demonstrates hwo the `EpistasisMap` object, the core of the epistasis model, works. Its responsibility is to hold the genotype-phenotype map, efficiently map parameters from the Epistasis Models, and be stable against memory leaks.

In [1]:
import numpy as np

We need a method for generating a genotype space that we'll use in this example. This method will take two sequences that differ at all sites and generate a binary sequence space between them. 

In [2]:
from epistasis.core.utils import generate_binary_space

Use the method to generate a toy genotype-to-phenotype map.

In [3]:
seq1 = 'AAAAA'
seq2 = 'VVVVV'
genotypes = generate_binary_space(seq1, seq2)
phenotypes = np.random.rand(len(genotypes))

Let's import the `EpistasisMap` object. This is the core of the `epistasis` module; it is contains the internal mapping for all the Epistasis Models. 

In [4]:
from epistasis.core.em import EpistasisMap

Read the documentation for the object.

In [5]:
EpistasisMap?

First, we'll create an instance of the object and populate it with the sequence space we created. 

In [6]:
em = EpistasisMap()
em.genotypes = genotypes
em.phenotypes = phenotypes
em.order = 5
em.wildtype = 'AAAAA'

This object automatically builds all possible epistatic interactions to the specified order on initialization.

In [7]:
em.interaction_genotypes

['A0V',
 'A1V',
 'A2V',
 'A3V',
 'A4V',
 'A5V',
 'A1V,A2V',
 'A1V,A3V',
 'A1V,A4V',
 'A1V,A5V',
 'A2V,A3V',
 'A2V,A4V',
 'A2V,A5V',
 'A3V,A4V',
 'A3V,A5V',
 'A4V,A5V',
 'A1V,A2V,A3V',
 'A1V,A2V,A4V',
 'A1V,A2V,A5V',
 'A1V,A3V,A4V',
 'A1V,A3V,A5V',
 'A1V,A4V,A5V',
 'A2V,A3V,A4V',
 'A2V,A3V,A5V',
 'A2V,A4V,A5V',
 'A3V,A4V,A5V',
 'A1V,A2V,A3V,A4V',
 'A1V,A2V,A3V,A5V',
 'A1V,A2V,A4V,A5V',
 'A1V,A3V,A4V,A5V',
 'A2V,A3V,A4V,A5V',
 'A1V,A2V,A3V,A4V,A5V']

Now, let's set a different wildtype as a reference... notice how the new binary representation maps.

## Example of more complex mapping

In [8]:
em = EpistasisMap()
em.genotypes = genotypes
em.phenotypes = phenotypes
em.order = 5
em.wildtype = 'AVVAA'

In [9]:
em.geno2pheno

OrderedDict([('AAAAA', 0.95197445099250222), ('AAAAV', 0.26502209616206762), ('AAAVA', 0.15097394179854584), ('AAAVV', 0.56813948256747204), ('AAVAA', 0.47978625353393145), ('AAVAV', 0.73449449836357128), ('AAVVA', 0.4212844484583611), ('AAVVV', 0.91719025609703064), ('AVAAA', 0.78911774610496999), ('AVAAV', 0.55564153459889565), ('AVAVA', 0.61886246107478404), ('AVAVV', 0.90143345070915393), ('AVVAA', 0.77151033723779561), ('AVVAV', 0.12314094825239397), ('AVVVA', 0.16012168950754058), ('AVVVV', 0.12246777528751795), ('VAAAA', 0.22635385242389316), ('VAAAV', 0.67313700408136401), ('VAAVA', 0.0077845891650307752), ('VAAVV', 0.85332822910702244), ('VAVAA', 0.11222984014148152), ('VAVAV', 0.2162935871759134), ('VAVVA', 0.78165887922536859), ('VAVVV', 0.33229155202276561), ('VVAAA', 0.78984482716469839), ('VVAAV', 0.96730866006953664), ('VVAVA', 0.80124020546413843), ('VVAVV', 0.37281726361883172), ('VVVAA', 0.22293571678406809), ('VVVAV', 0.99002300370315477), ('VVVVA', 0.162338259968385

In [10]:
em.geno2binary

{'AAAAA': '01100',
 'AAAAV': '01101',
 'AAAVA': '01110',
 'AAAVV': '01111',
 'AAVAA': '01000',
 'AAVAV': '01001',
 'AAVVA': '01010',
 'AAVVV': '01011',
 'AVAAA': '00100',
 'AVAAV': '00101',
 'AVAVA': '00110',
 'AVAVV': '00111',
 'AVVAA': '00000',
 'AVVAV': '00001',
 'AVVVA': '00010',
 'AVVVV': '00011',
 'VAAAA': '11100',
 'VAAAV': '11101',
 'VAAVA': '11110',
 'VAAVV': '11111',
 'VAVAA': '11000',
 'VAVAV': '11001',
 'VAVVA': '11010',
 'VAVVV': '11011',
 'VVAAA': '10100',
 'VVAAV': '10101',
 'VVAVA': '10110',
 'VVAVV': '10111',
 'VVVAA': '10000',
 'VVVAV': '10001',
 'VVVVA': '10010',
 'VVVVV': '10011'}

The map correctly maps genotype to binary to phenotype. 

In [11]:
print em.phenotypes
print em.bit_phenotypes
em.bit2pheno

[ 0.95197445  0.2650221   0.15097394  0.56813948  0.47978625  0.7344945
  0.42128445  0.91719026  0.78911775  0.55564153  0.61886246  0.90143345
  0.77151034  0.12314095  0.16012169  0.12246778  0.22635385  0.673137
  0.00778459  0.85332823  0.11222984  0.21629359  0.78165888  0.33229155
  0.78984483  0.96730866  0.80124021  0.37281726  0.22293572  0.990023
  0.16233826  0.78632395]
[ 0.77151034  0.12314095  0.16012169  0.12246778  0.78911775  0.55564153
  0.61886246  0.90143345  0.47978625  0.7344945   0.42128445  0.91719026
  0.95197445  0.2650221   0.15097394  0.56813948  0.22293572  0.990023
  0.16233826  0.78632395  0.78984483  0.96730866  0.80124021  0.37281726
  0.11222984  0.21629359  0.78165888  0.33229155  0.22635385  0.673137
  0.00778459  0.85332823]


OrderedDict([('00000', 0.77151033723779561), ('00001', 0.12314094825239397), ('00010', 0.16012168950754058), ('00011', 0.12246777528751795), ('00100', 0.78911774610496999), ('00101', 0.55564153459889565), ('00110', 0.61886246107478404), ('00111', 0.90143345070915393), ('01000', 0.47978625353393145), ('01001', 0.73449449836357128), ('01010', 0.4212844484583611), ('01011', 0.91719025609703064), ('01100', 0.95197445099250222), ('01101', 0.26502209616206762), ('01110', 0.15097394179854584), ('01111', 0.56813948256747204), ('10000', 0.22293571678406809), ('10001', 0.99002300370315477), ('10010', 0.1623382599683858), ('10011', 0.78632394912663561), ('10100', 0.78984482716469839), ('10101', 0.96730866006953664), ('10110', 0.80124020546413843), ('10111', 0.37281726361883172), ('11000', 0.11222984014148152), ('11001', 0.2162935871759134), ('11010', 0.78165887922536859), ('11011', 0.33229155202276561), ('11100', 0.22635385242389316), ('11101', 0.67313700408136401), ('11110', 0.007784589165030775

## How I do use EpistasisMap?

The main reason for an `EpistasisMap` is to efficiently map genotypes to phenotypes, to mutations, to epistatic interactions, to experimental errors, to epistatic uncertainty, etc. The main issue this object solves is storing these maps efficiently in memory, especially when the space gets *large*.

### Different types of Maps
This object builds all maps on the fly to avoid storing these objects in memory (since dictionary can have high memory costs).

#### Genotype-Binary Map Representation

In [8]:
em.geno2binary

{'AAAAA': '00000',
 'AAAAV': '00001',
 'AAAVA': '00010',
 'AAAVV': '00011',
 'AAVAA': '00100',
 'AAVAV': '00101',
 'AAVVA': '00110',
 'AAVVV': '00111',
 'AVAAA': '01000',
 'AVAAV': '01001',
 'AVAVA': '01010',
 'AVAVV': '01011',
 'AVVAA': '01100',
 'AVVAV': '01101',
 'AVVVA': '01110',
 'AVVVV': '01111',
 'VAAAA': '10000',
 'VAAAV': '10001',
 'VAAVA': '10010',
 'VAAVV': '10011',
 'VAVAA': '10100',
 'VAVAV': '10101',
 'VAVVA': '10110',
 'VAVVV': '10111',
 'VVAAA': '11000',
 'VVAAV': '11001',
 'VVAVA': '11010',
 'VVAVV': '11011',
 'VVVAA': '11100',
 'VVVAV': '11101',
 'VVVVA': '11110',
 'VVVVV': '11111'}

#### Epistastic interaction-to-value Map

Assuming you've run an epistasis model, you can call this mapping

``` 
em.key2value
```

#### Epistatic interaction (genotype)-to-value Map

Assuming you've run an epistasis model, you can call this mapping

``` 
em.genotype2value
```

#### And many more! 

Read the docstring for more information