## Expansion Procedure

In [1]:
import numpy as np
import scipy as sp
import string
import random

def rand_str(size=6, chars=string.ascii_uppercase):# + string.digits):

    return ''.join(random.choice(chars) for _ in range(size))

In [2]:
# begins when a molecule query is entered


In [3]:
#computes all possible transformations for a given query
    #associates each transformation with a method (enzyme, predicted promiscious or canonical)
    
    
    

In [4]:
def expansion_procedure(mol, k: int, n: int) -> list:

    """

    This function recieves a molecule object and feeds into the compute_policy() function.
    The compute_policy model returns a probability distribution of all possible transformations (T1 to Tn). 
    Then, this function keeps only the k most probable transformations for the input molecule. 
    This yields the reactants necessary to make the input, 
    and thus the set of complete reactions of length k can be generated using react_transformations. 
    
    For each reaction in the set, a reaction prediction is performed using the compute_feasibilty function, 
    returning a probablity score for each reaction. 
    Improbable reactions are then filtered out, leading to a ranked_precursors list of length n that contains 
    admissible transformations and their corresponding reactants. These can be considered the set of allowed moves
    for a given state (product).
    
    Example: 
        $ python __currently_nonexistent__.py or ``defunct example``

    Attributes:
        mol (obj): ECFP4 molecule graph
        k (int): Number of reactions to filter, Segler et al. used 50 I think
        n (int): Number of precursors to return, Segler et al. used 2 I think
        
    Returns:
        ranked_precursors (list): A ranked list of length n of predicted precursor structures

    Todo:
        Tests, complete underlying pipeline functions

    """

    all_transformations = compute_policy(mol)
    
    k_reactions = react_transformations(mol, all_transformations[0:k]) # apply k transformations, generating k reactions
        
    ranked_precursors = compute_feasibility(n, k_reactions)
    
    return ranked_precursors 

The first model (the compute policy) guides the search in promising directions by proposing an unlimited number of automatically extracted transformations.

In [5]:
def compute_policy(mol) -> list: #if we use a neural net it should be here since this is a 
    #graph-based multiclass prediction problem
    """
    
    This function recieves a mol object which is an ECFP4 molecule graph.
    It then computes for the molecule the entire set of biochemical transformations and associeted probabilities
    returning them as a ranked list of tuples with structure (precursor: (obj), probability: (float))
    
    Example: 
        $ python __currently_nonexistent__.py or ``defunct example``

    Attributes:
        mol (obj): ECFP4 molecule graph
                
    Returns:
        all_transformations (list): A ranked list of tuples (precursor, probability) from most to least probable
        which are the molecular precursors of possible transformations for the input mol.

    Todo:
        Decide which ML method to use, change loop and precursor_probability to reflect this
        
    """
    
    all_transformations = []
    
    for i in range(0, np.random.randint(1,200)):
        
        precursor_probability = rand_str(size=1), np.random.random() #tuple of precursor and associated probability
        
        all_transformations.append(precursor_probability)
    
    all_transformations =  sorted(all_transformations, key=lambda t: t[1], reverse=True) #sort by probability
    
    return all_transformations

Rule extraction associates each reaction, and thus each product, with a transformation rule. This allows us to train models as policies to predict the best transformations given the product, or in other words, the best reactions with which to make the products

In [6]:
def react_transformations(mol, k_transformations: list) -> list:
    
    """

    This function retrieves and builds full reactions for the . 

    Example: 
        $ python __currently_nonexistent__.py or ``defunct example``

    Attributes:
        mol (obj): An ECFP4 molecular graph object
        k_transformations (list): A ranked list of precursors for the
        
    Returns:
        precursors (list): A ranked list of length n of predicted precursor structures

    Todo:
        Actually develop function lol

    """
    
    return k_reactions    

A second model then predicts whether the proposed reactions are actually feasible (in scope). 

In [112]:
def compute_feasibility(n: int, k_reactions: list) -> list:

    """

    This function filters a list of reactions that are computed chemical transformations.
    It removes the least feasible reactions and returns a list of length n. 

    Example: 
        $ python __currently_nonexistent__.py or ``defunct example``

    Attributes:
        n (int): The length of list of filtered best reactions to return
        k_reactions (list): A list of reaction tuples to filter
        
    Returns:
        ranked_precursors (list): A ranked list of length n of predicted precursor structures

    Todo:
        Actually develop function lol

    """

    feasibilities = []
    for reaction in k_reactions:
        
        f = feasibility_model(reaction)
        feasibilities.append(f)
    
    calculated_feasibilities =  list(zip(k_reactions, feasibilities)) #combine lists

    calculated_feasibilities =  sorted(calculated_feasibilities,key=lambda t: t[1], reverse=True)
    
    ranked_precursors = calculated_feasiblities[0:n][0][0][1] #slice the n best precursors out 
    #of the list for return. #check index error
    
    return ranked_precursors # another function needs to build children from ranked result list?

Quick test

In [111]:
k_reactions = [('d','b','a'),('c','d','g'),('b','c','f')] #(product, reactant, enzyme)

feasibilities = [0.3,0.1,0.7]

calculated_feasibilities =  list(zip(k_reactions, feasibilities)) #combine lists

calculated_feasibilities =  sorted(calculated_feasibilities,key=lambda t: t[1], reverse=True)

calculated_feasibilities[:][0][0][1]

'c'

In [42]:
def feasibility_model(reaction: tuple)-> float: #this could be logistic regression maybe? 
    #BINARY classification problem, VERY important to train on failed reactions. Can augment the model used in the compute policy with negative data?
    """
    
    This function computes the feasibility (from 0 to 1) of a given reaction tuple. 
    The tuple is of structure (product, reactant, enzyme).
    It returns a float between 0 and 1. 

    Example: 
        $ python __currently_nonexistent__.py or ``defunct example``

    Attributes:
        reaction (tuple): A paired triplet of format (product, reactant, enzyme) 

    Returns:
        reaction_probability (float): A value between 0 and 1 of the predicted probability of a given reaction
        
    Todo:
        Determine ML method to be used and update function accordingly 

    """

    reaction_probability = np.random.random() #fancy_calculation(reaction), maybe logistic
    
    return reaction_probability

Finally, to estimate the position value, transformations are sampled from a third model during the rollout phase.

In [None]:
quickly rank the remaining transformations

In [52]:
feasibility_model(2)

0.6333014021122939

In [72]:
np.random.rand()

0.7056744621298212