# How to build a genotype-phenotype map (a.k.a. sequence space) from protein lattice models

This notebook demonstrates how to use Jesse Bloom's **protein lattice model package**, [latticeproteins](), to build genotype-phenotype map. In this case, the phenotypes are estimate from the protein's ability to fold and bind a ligand. You must have `latticeproteins` installed as a dependency for this package. 

We'll begin by importing some of his package here.

In [12]:
import os
from latticeproteins.conformations import *
from latticeproteins.fitness import Fitness

In `latticeproteins`'s `conformations` module, we can build the ensemble of all possible conformations for sequences of the same length.

In [13]:
length = 6
database_dir = "%s/database" % os.getcwd()
c = Conformations(length, database_dir)

We can simulate fitness in these models by binding a ligand to the native structure of the lattice protein. The fitness module from `latticeproteins` calculates fitness for each protein bound to a ligand. If the protein doesn't fold, the fitness is 0.

In [14]:
# Create a ligand
ligand = 'IIIIII'
ligandconf = 'LUUUR'
stabcutoff = 0
Ligand = (ligand, ligandconf, stabcutoff)
T = .9
fitness = Fitness(T, c, dGdependence='negstability', targets=None, ligand=Ligand)

Here comes the new stuff...

We'll import the `LatticeSequenceSpace` object which will build a sequence space between two starting sequences that differ at all sites.

In [15]:
from latticegpm.space import LatticeFitnessSpace
from latticegpm.utils import search_fitness_landscape

First, we need to find two sequences that have a non-zero fitness and differ at all sites! `search_fitness_landscape` does exactly that.

In [16]:
wildtype, mutant = search_fitness_landscape(fitness, 100000)
print("Wildtype sequence: " + wildtype)
print("Mutant sequence: " + mutant)

Wildtype sequence: REKIDC
Mutant sequence: WNMATW


Now, we'll build a sequence space between these two ligands with the `LatticeSequenceSpace` object and print out some example nodes in this space.

In [17]:
# Create an instance of LatticeFitnessSpace
sequence_space = LatticeFitnessSpace(wildtype, mutant, fitness)
# Print some example sequence
sequence_space.print_sequences(sequence_space.sequences[0:10], with_ligand=True)

* * * * * *
           
* * E-K * *
    | |    
* * R I * *
      |    
* i C-D i *
  |     |  
* i-i-i-i *
           
* * * * * *
* * * * * *
           
* * E-K * *
    | |    
* * R I * *
      |    
* i W-D i *
  |     |  
* i-i-i-i *
           
* * * * * *
* * * * * *
           
* * E-K * *
    | |    
* * R I * *
      |    
* i C-T i *
  |     |  
* i-i-i-i *
           
* * * * * *
* * * * * *
           
* * E-K * *
    | |    
* * R I * *
      |    
* i W-T i *
  |     |  
* i-i-i-i *
           
* * * * * *
* * * * * *
           
* * * i-i *
        |  
* E-K-A i *
  |   | |  
* R C-D i *
        |  
* * * i-i *
           
* * * * * *
* * * * * *
           
* * * i-i *
        |  
* E-K-A i *
  |   | |  
* R W-D i *
        |  
* * * i-i *
           
* * * * * *
* * * * * *
           
* * * i-i *
        |  
* E-K-A i *
  |   | |  
* R C-T i *
        |  
* * * i-i *
           
* * * * * *
* * * * * *
           
* * * i-i *
        |  
* E-K-A i *
  |   | |  
* R 

We can access all sequences and fitness in this space by calling these properties.

In [18]:
genotypes = sequence_space.sequences
phenotypes =  sequence_space.fitnesses