# ML Pipeline Code was built for the initial model training.
ML Pipeline Code was built for the initial model training detailed in "Integrated knowledge mining, genome-scale modeling, and machine learning for predicting *Yarrowia lipolytica* bioproduction".

### Part 2/4:
* Part 1: Performs data importation, intial formatting and splits data into 3 parts for training, validation, and testing.
* Part 2: FBA feature generation is completed; script entitled "ML_pipeline_JC_part2"
* Part 3: Feature encoding is completed; script entitled "ML_pipeline_part3"
* Part 4: Machine learning model training is completed; script entitled "ML_pipeline_part4"
    
### Inputs:
* pickle file: Train&ValidateData_part1.pickle or TESTData_part1.pickle from Part 1 are inputs to the script. 
* Data encoding file: Publication entitled file: 'Supplemental Excel File 2- DataCharateristics & Encoding.xlsx'  

### Outputs:    

* A pickle datafile entitled "Train&ValidateData_part2.pickle" or "TESTData_part2.pickle" at the end of the file.
    
#### Additional Info: This file can also be used for non-Yarrowia lipoytica GSMs and has been tested on *Rhodosporidium toruloides* and *Cutaneotrichosporon oleaginosus*.

### Note: 
The TEST and Train&Validate have to be run separately in this script (changing the input the Code cell (cell 16) below. Running the script is a memory intensive process. 

In [1]:
# This cell initializes the packages needed for the data import and FBA
import cobra
from cobra.flux_analysis import single_gene_deletion, single_reaction_deletion, double_gene_deletion,double_reaction_deletion
from cobra import Reaction, Metabolite, Model
from cobra.flux_analysis.loopless import add_loopless, loopless_solution
import pandas as pd
import pickle
from collections import defaultdict
import warnings
import numpy as np

In [2]:
####USER DEFINED OPTIONS. 0 for no, 1 for yes. During model training and testing, yes was used for all options.
#Do you want to perform GENETIC KNOCKOUTS?
KO_option = 1

#Do you want to perform OVEREXPRESSIONS
OE_option = 1

#Do you want to perform PRODUCT-BASED SIMULATIONS?
Product_option = 1

#OVEREXPRESSION and PRODUCT OPTIONS
##default options = [0.05,10,1,0.75,0.02,0.5].
epsilon = [0.05,10,1,0.75,0.02,0.5] #percent to increase/decrease gene expression. new bounds if flux is all 0. bounds if flux is 0, but upper/lowerbound is not 0, biomass percent change

## Next cell blocks contain the functions for code.

In [3]:
def metabolite_flux_balance(metabolite, solution):
    '''
    Return a vector of reaction fluxes scaled by the stoichiometric coefficient.

    Parameters
    ----------
    metabolite : cobra.Metabolite
        The metabolite whose fluxes are to be investigated.
    solution : cobra.Solution
        Solution with flux values.

    Returns
    -------
    pandas.Series
        A vector with fluxes of reactions that consume or produce the given
        metabolite scaled by the corresponding stoichiometric coefficients. The
        reaction identifiers are given by the index.
    '''
    rxn_ids = list()
    adj_flux = list()
    for rxn in metabolite.reactions:
        coef = rxn.get_coefficient(metabolite)
        rxn_ids.append(rxn.id)
        adj_flux.append(coef * solution.fluxes[rxn.id])
    return pd.Series(data=adj_flux, index=rxn_ids, dtype=float, name="reaction")

## Ensure DataStructure file is in the correct directory.

In [15]:
#Create common-name to Genome-Scale-Model (GSM) gene name & is gene in GSM dictionary.
##options for several GSM
def createGeneDict():
    """
    Return gene and metabolite dictionaries for performing model modifications

    Returns
    -------
    defaultdict
        a gene dict for generic gene names (GND1) to annotated gene names (YALI0B15598g)
    defaultdict
        a metabolite dict relating metabolite names (ATP) to the GEM annotation for that metaboltie (atp[c]). The metabolite dict is needed for adding the product reaction.

    Requirements
    -------
    External excel file that contains the metabolites needed in the pseudo reaction and the resulting metabolite designations in the GEM and the generic gene names and model gene names.
    """

    # importExcelTemplateData(fileName,sheetName)
    productInfo = pd.ExcelFile('Supplemental Excel File 2- DataCharateristics & Encoding.xlsx').parse('Encoding')

    df = pd.DataFrame()
    df['bname'] = productInfo.bname
    df['traditionalName'] = productInfo.traditionalName
    df['iYLI647'] = productInfo.in_iYLI647
    df['iMK735'] = productInfo.in_iMK735
    df['iYali4'] = productInfo.in_iYali4
    df['iNL895'] = productInfo.in_iNL895
    df['iYL_2.0'] = productInfo['in_iYL_2.0']

    df = df.T
    df = df.rename(columns=df.loc['traditionalName'])
    df = df.drop('traditionalName')#,axis=0)
    geneDict = df.to_dict()


    df2 = pd.DataFrame()
    df2['CCM'] = productInfo['Central Carbon']
    df2['iYL_2.0'] = productInfo['iYL_2metabolites']
    df2['iYLI647'] = productInfo['iYLI647metabolites']
    df2['iNL895'] = productInfo['iNL895metabolites']
    df2['iMK735'] = productInfo['iMK735metabolites']
    df2['iYali4'] = productInfo['iYali4metabolites']
    df2['iRhtoC'] = productInfo['iRhtoC_metabolites']
    df2['Coleaginosus'] = productInfo['Coleaginous_metabolites']



    df2 = df2.T
    df2.rename(columns=df2.loc['CCM'],inplace=True)
    df2.drop('CCM',inplace=True)
    fbaModelMetaboliteDict = df2.to_dict()

    return(geneDict,fbaModelMetaboliteDict)


In [5]:

#Generate Gene-product-assocated dictionary for used Genome-Scale-Model

def generateOEGeneGPR(GSM,model):
    """
    Returns gene associated reactions for a particular GSM & gene.

    Parameters
    ----------
    GSM : list
        The name of the GSM currently being modified
    model : cobra.model
        Loaded GSM.


    Returns
    -------
    defaultdict
        Genes and all associated reactions in the GSM.
    """
    GPR_dict=defaultdict(list)
    if ((GSM=='iMK735') | (GSM=='iYali4') | (GSM=='iYLI647')| (GSM=='iYL_2.0')| (GSM=='iNL895')):
        for x in geneDict.keys():
            if (geneDict[x][GSM]==1):
                tempGene = model.genes.get_by_id(geneDict[x]['bname'])
                rxn_list=[]
                for reaction in tempGene.reactions:
                    temp_dict={}
                    temp_dict['mets']=[x.id for x in reaction.metabolites]
                    temp_dict['mets_coefs']=[x for x in reaction.get_coefficients(reaction.metabolites)]
                    temp_dict['lower_bound']=reaction.lower_bound
                    temp_dict['upper_bound']=reaction.upper_bound
                    temp_dict['id']=reaction.id
                    temp_dict['name']=reaction.name
                    temp_dict['subsystem']=reaction.subsystem
                    temp_dict['gpr']=reaction.gene_reaction_rule
                    rxn_list.append(temp_dict)
                    GPR_dict[x]=rxn_list
    else:
        print('No geneDict for ',GSM)
        for tempGene in model.genes:
            tempGene2 = model.genes.get_by_id(tempGene.id)
            rxn_list=[]
            for reaction in tempGene2.reactions:
                temp_dict={}
                temp_dict['mets']=[x.id for x in reaction.metabolites]
                temp_dict['mets_coefs']=[x for x in reaction.get_coefficients(reaction.metabolites)]
                temp_dict['lower_bound']=reaction.lower_bound
                temp_dict['upper_bound']=reaction.upper_bound
                temp_dict['id']=reaction.id
                temp_dict['name']=reaction.name
                temp_dict['subsystem']=reaction.subsystem
                temp_dict['gpr']=reaction.gene_reaction_rule
                rxn_list.append(temp_dict)
                GPR_dict[tempGene.id]=rxn_list
    return(GPR_dict)



In [6]:
#Simulate default Genome-scale-model flux with biomass as objective function
def defaultObjFunction(dGSM):
    """
    Load the GSM, set the glucose uptake bounds, set the objective function as biomass, and simulate.

    Parameters
    ----------
    dGSM: list
        The GSM to be loaded into cobra

    Returns
    -------
    model: cobra.Model
        The cobra GSM
    defaultFlux: cobra.Solution
        Default flux solution for optimized for biomass at the specified glucose uptake rates.

    defaultFlux.objective_value
        Default flux solution biomass growth at the specified glucose uptake rates.

    Requirements
    -------
    .mat or .xml files for each GSM
    corrected .mat GSMs provided in supplemental information.
    """

    if (dGSM=='iRhtoC'):
        defaultObj = 'BIOMASS_RT_CLIM'
        model = cobra.io.load_matlab_model(dGSM+'.mat')
        model.reactions.get_by_id('EX_glc__D_e').bounds = (-10,-10)


    elif (dGSM=='Coleaginosus'):
        defaultObj = 'Biomass_nitrogen_abundant'
        # defaultObj = 'Biomass_nitrogen_deletion'
        model = cobra.io.load_matlab_model('AF_7_iNP636_Coleaginosus_ATCC20509_corr.mat')
        model.reactions.get_by_id('r_51_exchange').bounds = (-10,-10)


    else:
        defaultObj = 'biomass_C'
        model = cobra.io.load_matlab_model(dGSM+'_corr.mat')
        
    model.objective = model.reactions.get_by_id(defaultObj)
    defaultFlux = model.optimize()
    return(model, defaultFlux.objective_value,defaultFlux)

In [7]:
# def search(gemM, list, g):
def search(list, g):

    """
    Search genome scale model for engineered gene of interest.

    Parameters
    ----------
    list: list
        List of all genes in the genome-scale-model.
    gene: list
        The engineered gene name entered as either cobra model gene name or as a generic name provided in the gene_dict (see createGeneDict function).

    Returns
    -------
    True or False and the gene name in the list.
    """
    gene = [g]
    for ind,temp in enumerate(list):
        if temp == gene[0]:
            return True,temp
        # else:
        #     try:
        #         gemM.genes.get_by_id(gene[0])
        #         return True,temp
        #     except Exception as e:
        #         doNothing = 0
    return False,temp

In [8]:
def searchGeneDict(gene,GSM,GPR_dict,gene_list):
    """
    Extract the gene-reaction rules from the selected genome-scale-model for each particular gene.
    Parameters
    ----------
    gene: list
    The engineered gene name entered as either cobra model gene name or as a generic name provided in the gene_dict (see createGeneDict function).
    GSM: list
    Genome-scale model name
    GPR_dict: .dict
    Gene-protein-reaction dictionary, see generateOEGeneGPR
    gene_list: list
    List of all genes in the genome-scale-model.
    Returns
    -------
    Gene associated reactions or FALSE if engineered gene of interest is not in the GSM.
    """
    try:
        #Check if there is externally provided generic name-genome scale gene name dict
        #Check if the generic name is the externally provided dictionary
        if (GPR_dict[gene] and geneDict[gene][GSM]==1):
            return(GPR_dict[gene])
    except Exception as e:
        # search for the engineered gene of interest is within the genome-scale model.
        # temp1 = True or False (if gene in model)
        # temp2 = gene in the gene_ist
        temp1,temp2 = search(gene_list,gene)
        if (temp1==True):
            return(GPR_dict[temp2])
        else:
            return(False)


In [9]:
def performGeneKOs(modelKO,GSM,genesKO,geneMO):
    """
    Performs GSM model knock-outs.

    Parameters
    ----------
    modelKO: cobra.Model
        Model on which to act
    GSM: list
        Name of GSM that is being acted on
    genesKO: list
        Vector of 1 or 0 corresponding to whether the list of genes are Knocked out
    geneMO:
        Vector of gene names that are being genetically modified

    Returns
    -------
    Modifed GSM with the corresponding genetic knock-outs.
    """
    gene_list=[z.id for z in modelKO.genes]

    for i,KO in enumerate(genesKO):
        try:
            if (KO=='1' and geneDict[geneMO[i]][GSM]==1) | (KO==1 and geneDict[geneMO[i]][GSM]==1):
                try:
                    cobra.manipulation.delete_model_genes(modelKO,(pd.Series(geneDict[geneMO[i]]['bname'])))
                except Exception as e:
                    print(repr(e),geneMO[i],GSM)

        except Exception as e:
            gene_list
            temp1,temp2 = search(gene_list,geneMO[i])
            if (KO=='1' and temp1==True):
                try:
                    cobra.manipulation.delete_model_genes(modelKO,(pd.Series(temp2)))
                except Exception as e:
                    print(repr(e),geneMO[i],GSM)
            else:
                print(geneMO[i],'not in GSM, no KO modification performed')
    return(modelKO)

In [10]:
def performGeneOE(tempOEModel,GSM,genesOE,genesMO,hetGenes,tempKOSol,GPR_dict,ep0,OE_f,ep1,ep2,ep5,f1a,f2a,f3a,f4a,f5a,f6a):
    """
    Performs GSM model OE.

    Parameters
    ----------
    tempOEModel: cobra.Model
        Model on which to act
    GSM: list
        Name of GSM that is being acted on
    genesOE: list
        Vector of 1 or 0 corresponding to whether the list of genes are overexpressed
    geneMO:
        Vector of gene names that are being genetically modified
    tempKOSol: corba.Solution
        Flux solution for the GSM before overexpression is implemented and optimized for biomass growth
    GPR_dict: defaultdict
        Gene-reaction rules for each gene modified.
    ep0: int
        Percent to increase or decrease overexpressed flux
    OE_f: int
        Count of number of times the overexpression results in infeasible flux solutions
    ep1: int
        new bounds if the non-overexpressed model has flux bounds of 0.

    ep2: int
        new reaction bound if the non-overexpressed solution has a lower or upper flux value of 0.
    ep5: int

    f1a: int
        Count of number of times model fails to overexpress a reaction that had a prior flux solution of 0 with bounds set to 0.
    f2a: int
        Count of number of times model fails to overexpress a reaction that had a prior flux solution of 0 with an upper bound set to 0.
    f3a: int
        Count of number of times model fails to overexpress a reaction that had a prior flux solution of 0 with a lower bound set to 0.
    f4a: int
        Count of number of times model fails to overexpress a reaction that had a prior flux solution that was negative.
    f5a: int
        Count of number of times model fails to overexpress a reaction that had a prior flux solution that was positive.
    f6a: int
        Count of number of times model fails to overexpress a reaction that had a prior flux solution with a 0, and fluxes were reset to original bounds (i.e, no resulting modifications).


    Returns
    -------
    tempOEModel: cobra.Model
        Model with the resulting overexpression implemented.
    OE_f: int
        Count of number of times the overexpression results in infeasible flux solutions
    f1a: int
        Count of number of times model fails to overexpress a reaction that had a prior flux solution of 0 with bounds set to 0.
    f2a: int
        Count of number of times model fails to overexpress a reaction that had a prior flux solution of 0 with an upper bound set to 0.
    f3a: int
        Count of number of times model fails to overexpress a reaction that had a prior flux solution of 0 with a lower bound set to 0.
    f4a: int
        Count of number of times model fails to overexpress a reaction that had a prior flux solution that was negative.
    f5a: int
        Count of number of times model fails to overexpress a reaction that had a prior flux solution that was positive.
    f6a: int
        Count of number of times model fails to overexpress a reaction that had a prior flux solution with a 0, and fluxes were reset to original bounds (i.e, no resulting modifications).

    """
    gene_list=[z.id for z in modelKO.genes]

    for i,OE in enumerate(genesOE):

        GPR_dict_list=None
        try:
            if (GPR_dict[genesMO[i]] and geneDict[genesMO[i]][GSM]==1):
                GPR_dict_list = GPR_dict[genesMO[i]]
        except Exception as e:

            if(OE=='1' and GPR_dict_list==None): #| (OE==1 and GPR_dict_list==None):
                GPR_dict_list = searchGeneDict(genesMO[i],GSM,GPR_dict,gene_list)# print(i,genesMO,OE)


        if (GPR_dict_list!=None):
            for rxn in GPR_dict_list:
                lower = tempOEModel.reactions.get_by_id(rxn['id']).lower_bound
                upper = tempOEModel.reactions.get_by_id(rxn['id']).upper_bound
                rxnKOFlux = tempKOSol.fluxes[rxn['id']]
                if rxnKOFlux>0:
                    brk=0
                    tempOEModel.reactions.get_by_id(rxn['id']).lower_bound = (rxnKOFlux+((rxnKOFlux)*ep0))
                    tempeps0 = ep0

                    while (tempOEModel.optimize().status!='optimal') and (brk<1):
                        tempeps0 = tempeps0/2
                        tempOEModel.reactions.get_by_id(rxn['id']).lower_bound = (rxnKOFlux+((rxnKOFlux)*tempeps0))
                        if tempeps0<1e-5:
                            tempOEModel.reactions.get_by_id(rxn['id']).lower_bound = rxnKOFlux
                            f5a+=1
                            brk=2
                            break


                elif rxnKOFlux<0:
                    brk=0
                    tempOEModel.reactions.get_by_id(rxn['id']).upper_bound = (rxnKOFlux+((rxnKOFlux)*ep0))
                    tempeps0 = ep0
                    while (tempOEModel.optimize().status!='optimal') and (brk<1):
                        tempeps0 = tempeps0/2
                        tempOEModel.reactions.get_by_id(rxn['id']).upper_bound = (rxnKOFlux+((rxnKOFlux)*tempeps0))
                        if tempeps0<1e-5:
                            f4a+=1
                            brk=2
                            tempOEModel.reactions.get_by_id(rxn['id']).upper_bound = rxnKOFlux
                            break

                # uncomment lines for flux to treat the 0 as another case.
                # '''
                # else:
                #     brk=0
                #     if (tempOEModel.reactions.get_by_id(rxn['id']).lower_bound==0 and tempOEModel.reactions.get_by_id(rxn['id']).upper_bound==0):
                #         tempOEModel.reactions.get_by_id(rxn['id']).bounds = (-ep1,ep1)
                #         # print(tempOEModel.reactions.get_by_id(rxn['id']).bounds,genesMO[i])
                #         if (tempOEModel.optimize().status!='optimal'):
                #             # print('1fail')
                #             f1a+=1
                #             tempOEModel.reactions.get_by_id(rxn['id']).bounds = (0,0)
                #
                #     elif (tempOEModel.reactions.get_by_id(rxn['id']).lower_bound==0):
                #
                #         tempOEModel.reactions.get_by_id(rxn['id']).lower_bound=ep2
                #         # if (tempOEModel.optimize().status=='optimal'):
                #         #     print('2success!')
                #         tempeps2 = ep2
                #         # if (tempOEModel.optimize().status!='optimal'):
                #         #     # print('2fail')
                #         while ((tempOEModel.optimize().status!='optimal') and (brk<1)):
                #             # print('2fail',genesMO[i])
                #             tempeps2 = tempeps2/10
                #             tempOEModel.reactions.get_by_id(rxn['id']).lower_bound=tempeps2
                #             # print('2fail',OE)
                #
                #             # print(tempeps2,OE)
                #
                #             if tempeps2<1e-5:#was -9
                #                 tempOEModel.reactions.get_by_id(rxn['id']).lower_bound=0
                #                 # print('total failure 2a')
                #                 # print(i,OE)
                #                 brk = 2
                #                 f2a+=1
                #                 break
                #             # if (tempOEModel.optimize().status=='optimal'):
                #         # print('2a success',tempeps2)
                #         # print(tempOEModel.reactions.get_by_id(rxn['id']).bounds,genesMO[i])
                #     elif (tempOEModel.reactions.get_by_id(rxn['id']).upper_bound==0):
                #
                #         tempOEModel.reactions.get_by_id(rxn['id']).upper_bound=-ep2
                #         tempeps2 = ep2
                #         # if (tempOEModel.optimize().status!='optimal'):
                #         #     print('3fail')
                #         while (tempOEModel.optimize().status!='optimal') and (brk<1):
                #             # print('3fail')
                #             tempeps2 = tempeps2/10
                #             tempOEModel.reactions.get_by_id(rxn['id']).upper_bound=-tempeps2
                #             if tempeps2<1e-5:#was -9
                #                 tempOEModel.reactions.get_by_id(rxn['id']).upper_bound=0
                #                 # print('total failure 3a')
                #                 brk=2
                #                 f3a+=1
                #                 break
                #
                #     else:
                #         lower = tempOEModel.reactions.get_by_id(rxn['id']).lower_bound
                #         upper = tempOEModel.reactions.get_by_id(rxn['id']).upper_bound
                #         tempeps5 = ep5
                #         tempOEModel.reactions.get_by_id(rxn['id']).lower_bound=tempeps5
                #         # if tempOEModel.optimize().status!='optimal)':
                #             # print('6 failure')
                #         brk2=0
                #         brk=0
                #         while (tempOEModel.optimize().status!='optimal') and brk<1:
                #             tempOEModel.reactions.get_by_id(rxn['id']).lower_bound=tempeps5
                #             tempeps5 = tempeps5/10
                #             if (tempeps5 < 1e-5) and (tempOEModel.optimize().status!='optimal'): #was-8
                #                 tempOEModel.reactions.get_by_id(rxn['id']).lower_bound=lower
                #                 tempeps5 = ep5
                #                 brk = 1
                #                 # print('one failed 6')
                #                 while (tempOEModel.optimize().status!='optimal') and brk2<1:
                #                     tempOEModel.reactions.get_by_id(rxn['id']).upper_bound=-tempeps5
                #                     tempeps5 = tempeps5/10
                #                     if (tempeps5 < 1e-5) and (tempOEModel.optimize().status!='optimal'):#was -8
                #                         tempOEModel.reactions.get_by_id(rxn['id']).upper_bound=upper
                #                         # print('both failed 6')
                #                         brk2 = 1
                #                         f6a+=1
                #                         break
                #     '''

        else:
            print('Gene:',genesMO[i],'not in Genome scale model, OE simulation performed without accounting for gene')
            do_nothing=1
    tempOESol = tempOEModel.optimize()
    if tempOESol.status!='optimal':
        OE_f+=1
    return(tempOEModel,OE_f,f1a,f2a,f3a,f4a,f5a,f6a)

In [11]:
#product flux
def maximizeProduct(model,defaultBioObj,ep3,ep4,fbaModelMetaboliteDict,dataPoint,counterProductFailTemp,gsm,prod_f,isRbflvOption):
    """
    Adds the pseudo-reaction simulating product flux to the GSM, sets the biomass to a set value, and optimizes for the pseudo-reaction.

    Parameters
    ----------
    model: cobra.Model
        Model on which to act
    defaultBioObj: float
        Prior model biomass objective function (before addition of pseudoreaction)
    ep3: int

    ep4: int

    fbaModelMetaboliteDict: defaultdict
        Dictionary mapping names of metabolites to the model metabolite names (e.g., ATP to atp[c])
        Generated in "createGeneDict()" function
    dataPoint: int
        Index of database construct
    counterProductFailTemp: int
        Count of number of times the pseudo product reaction results in infeasible flux solution and a resulting decrease in the biomass constraint
    gsm: list
        The name of the GSM being modified
    prod_f: int
        Count of number of times that the product pseudo-reaction resulted in an infeasible flux solution with 0 biomass flux.

    isRbflvOption: int
        0 or 1 indicating whether the product is riboflavin, resulting in the correct application of reactant consumption.
    Returns
    -------
    finalProductFluxSolnTemp: cobra.Solution
        Model flux soluiton from the resulting genetic engineering and product reaction implementation.
    counterProductFailTemp: int
        Count of number of times the pseudo product reaction results in infeasible flux solution and a resulting decrease in the biomass constraint    prod_f int
        Count of number of times that the product pseudo-reaction resulted in an infeasible flux solution with 0 biomass flux.
    """

    modelP=model.copy()
    if (gsm=='iRhtoC'):
        modelP.reactions.get_by_id('BIOMASS_RT_CLIM').upper_bound = (defaultBioObj*(ep3))#-ep3),defaultBioObj*(ep3+ep4)) #lower_bound, upper_bound sets Biomass
        modelP.reactions.get_by_id('BIOMASS_RT_CLIM').lower_bound = (defaultBioObj*(ep3))#


    elif (gsm=='Coleaginosus'):
        modelP.reactions.get_by_id('Biomass_nitrogen_abundant').upper_bound = (defaultBioObj*(ep3))#-ep3),defaultBioObj*(ep3+ep4)) #lower_bound, upper_bound sets Biomass
        modelP.reactions.get_by_id('Biomass_nitrogen_abundant').lower_bound = (defaultBioObj*(ep3))#


    else:
        modelP.reactions.get_by_id('biomass_C').upper_bound = (defaultBioObj*(ep3))#-ep3),defaultBioObj*(ep3+ep4)) #lower_bound, upper_bound sets Biomass
        modelP.reactions.get_by_id('biomass_C').lower_bound = (defaultBioObj*(ep3))#

    stoichNADPH=round(FBATrainData.nadh_nadph_cost.loc[dataPoint])
    stoichATP=round(FBATrainData.atp_cost.loc[dataPoint])
    
    prec = FBATrainData.loc[dataPoint].central_carbon_precursor.strip().split(';')

    reaction__product = Reaction('Prdt_r')
    reaction__product.name = 'Prdt_r'
    reaction__product.subsystem = 'Exchange'
    reaction__product.lower_bound = 0
    reaction__product.upper_bound = 1000
    prdt_m = Metabolite('prdt_m', formula = '', name = 'Prdt_m', compartment = 'cy')

    stoichprecursor={}

    modelP.add_reactions([reaction__product])

    #adds energy and cofactors (NADPH only)
    if isRbflvOption==0:
        reaction__product.add_metabolites({
        prdt_m: 1.0,
        modelP.metabolites.get_by_id(fbaModelMetaboliteDict['ATP'][gsm].strip('\'"')).id: -stoichATP,
        modelP.metabolites.get_by_id(fbaModelMetaboliteDict['NADPH'][gsm].strip('\'"')).id: -stoichNADPH,
        modelP.metabolites.get_by_id(fbaModelMetaboliteDict['NADP'][gsm].strip('\'"')).id : stoichNADPH,
        modelP.metabolites.get_by_id(fbaModelMetaboliteDict['ADP'][gsm].strip('\'"')).id : stoichATP
        })
    else:
        reaction__product.add_metabolites({
        prdt_m: 1.0,
        modelP.metabolites.get_by_id(fbaModelMetaboliteDict['ATP'][gsm].strip('\'"')).id: -stoichATP,
        modelP.metabolites.get_by_id(fbaModelMetaboliteDict['NADPH'][gsm].strip('\'"')).id: -stoichNADPH,
        modelP.metabolites.get_by_id(fbaModelMetaboliteDict['NADP'][gsm].strip('\'"')).id : stoichNADPH,
        })

    if isinstance(FBATrainData.loc[dataPoint].precursor_required,str):
        stoichprecursor=FBATrainData.loc[dataPoint].precursor_required.strip().split(';')
    else:
        stoichprecursor[0]=FBATrainData.loc[dataPoint].precursor_required
    for i,j in enumerate(prec):
        met = fbaModelMetaboliteDict[j][gsm].strip('\'"')
        reaction__product.add_metabolites({modelP.metabolites.get_by_id(met).id: -round(float(stoichprecursor[i]))})
        if j=='Acetyl-CoA':
            reaction__product.add_metabolites({model.metabolites.get_by_id(fbaModelMetaboliteDict['CoenzymeA'][gsm].strip('\'"')).id: round(float(stoichprecursor[i]))})

    demand = modelP.add_boundary(modelP.metabolites.prdt_m,type="demand")

    modelP.objective = 'Prdt_r'
    finalProductFluxSolnTemp = modelP.optimize()


    c=1
    while finalProductFluxSolnTemp.status!='optimal':
        c+=1
        ep3-=.05
        if (gsm=='iRhtoC'):
            modelP.reactions.get_by_id('BIOMASS_RT_CLIM').upper_bound = (defaultBioObj*(ep3))
            modelP.reactions.get_by_id('BIOMASS_RT_CLIM').lower_bound = (defaultBioObj*(ep3))#

        elif (gsm=='Coleaginosus'):
            modelP.reactions.get_by_id('Biomass_nitrogen_abundant').upper_bound = (defaultBioObj*(ep3))
            modelP.reactions.get_by_id('Biomass_nitrogen_abundant').lower_bound = (defaultBioObj*(ep3))#

        else:
            modelP.reactions.get_by_id('biomass_C').upper_bound = (defaultBioObj*(ep3))#
            modelP.reactions.get_by_id('biomass_C').lower_bound = (defaultBioObj*(ep3))#



        if ep3<0:
            counterProductFailTemp+=1
            prod_f+=1
            finalProductFluxSolnTemp = modelP.optimize()
            break
    finalProductFluxSolnTemp = modelP.optimize()

    return(finalProductFluxSolnTemp,counterProductFailTemp,prod_f)


In [12]:
def FBAFeatureExtraction(featureModelSoln,GSM):
    """
    Extracts the fluxes from the model solution with the pseudo-product reaction and genetic modifications implemented.

    Parameters
    ----------
    featureModelSoln: cobra.Solution
        Flux solution generated with the appropriate pseudo-product reaction and genetic modifications implemented.
    GSM: list
        Name of GSM.

    Returns
    -------
       : float
       flux values for the following reactions:
        EMP2
        PPP2
        TCA2
        NADPH2
        ATP2
        NADH2
        PrdtFlux2
        bio2
        O2
        Glc
    """
    ratio =[1,1,1,1,1,1,1]

    if (GSM=='iNL895'):
        
        bio2 = featureModelSoln.fluxes['biomass_C']
        EMP2 = featureModelSoln.fluxes['r_0525']/2 #GAPDH; PFK = r_0859; FBP(a) = r_0484;
        PPP2 = featureModelSoln.fluxes['r_0862'] # GND , RPE, RPI = r_0964,r_0963
        TCA2 = featureModelSoln.fluxes['r_0328'] # CS
        NADPH2 = featureModelSoln.fluxes['r_0862']+featureModelSoln.fluxes['r_0501']+featureModelSoln.fluxes['r_0631']+featureModelSoln.fluxes['r_0630']+featureModelSoln.fluxes['r_0191']+featureModelSoln['r_0261']+featureModelSoln['r_0262']+featureModelSoln['r_0913']

        if (featureModelSoln.fluxes['r_0744']>0):
            NADPH2 = NADPH2 + featureModelSoln.fluxes['r_0744']


        NADH2 = featureModelSoln.fluxes['r_0525']+featureModelSoln.fluxes['r_0940']+featureModelSoln.fluxes['r_0689']+featureModelSoln.fluxes['r_0864']+featureModelSoln.fluxes['r_0706']+featureModelSoln.fluxes['r_0538']

        ATP2 = featureModelSoln.fluxes['r_0941']+featureModelSoln.fluxes['r_0246']
        if (featureModelSoln.fluxes['r_0865']<0):
            ATP2 = ATP2 - featureModelSoln.fluxes['r_0865']
        Precursors2 = {}
        PrdtFlux2 = featureModelSoln.fluxes['Prdt_r']
        O2 = featureModelSoln.fluxes['r_128_exchange'] 
        Glc = featureModelSoln.fluxes['r_51_exchange']

    elif (GSM=='iMK735'):
        bio2 = featureModelSoln.fluxes['biomass_C']
        EMP2 = featureModelSoln.fluxes['GAPD']/2 # PGI; PFK
        PPP2 = featureModelSoln.fluxes['GND'] # GND 
        TCA2 = featureModelSoln.fluxes['CSm'] # FUMm

        NADH2 = featureModelSoln.fluxes['GAPD']+featureModelSoln.fluxes['PDHm']+featureModelSoln.fluxes['PGCD']+featureModelSoln.fluxes['MDHm']+featureModelSoln.fluxes['PDHcm']+featureModelSoln.fluxes['ICDHxm']
        ATP2 = featureModelSoln.fluxes['ATPS3m']+featureModelSoln.fluxes['PYK']
        if (featureModelSoln.fluxes['PGK']<0):
            ATP2 = ATP2 - featureModelSoln.fluxes['PGK']
        if (featureModelSoln.fluxes['SUCOASm']<0):
            ATP2 = ATP2 - featureModelSoln.fluxes['SUCOASm']
        NADPH2 = featureModelSoln.fluxes['GND']+featureModelSoln.fluxes['G6PDH2']+featureModelSoln.fluxes['ICDHy']+featureModelSoln.fluxes['ICDHym']+featureModelSoln.fluxes['SSALy']+featureModelSoln.fluxes['C3STDH2']+featureModelSoln.fluxes['PPND2']+featureModelSoln.fluxes['C3STDH1']
        if (featureModelSoln.fluxes['MTHFDm']>0):
            NADPH2 = NADPH2 + featureModelSoln.fluxes['MTHFDm']
        if (featureModelSoln.fluxes['MTHFD']>0):
            NADPH2 = NADPH2 + featureModelSoln.fluxes['MTHFD']

        Precursors2 = {}
        PrdtFlux2 = featureModelSoln.fluxes['Prdt_r']
        O2 = featureModelSoln.fluxes['EX_o2(e)']
        Glc = featureModelSoln.fluxes['EX_glc(e)']

    elif (GSM=='iYali4'):
        bio2 = featureModelSoln.fluxes['biomass_C']
        EMP2 = featureModelSoln.fluxes['486']/2 # GAPDH
        PPP2 = abs(featureModelSoln.fluxes['982'])+abs(featureModelSoln.fluxes['984']) #no flux thorugh GND  #889 for growht on 10 mmol glucose
        TCA2 = featureModelSoln.fluxes['300']#300=CS 
        NADPH2 = featureModelSoln.fluxes['659']+featureModelSoln.fluxes['732'] +featureModelSoln.fluxes['939']+featureModelSoln.fluxes['889']+featureModelSoln.fluxes['234']+featureModelSoln.fluxes['466']+featureModelSoln.fluxes['235']#718 is a malic enzyme (NADP... KO)
        if (featureModelSoln.fluxes['2131']>0):
            NADPH2 = NADPH2 +featureModelSoln.fluxes['2131']
        if (featureModelSoln.fluxes['732']>0):
            NADPH2 = NADPH2 +featureModelSoln.fluxes['732']
        NADH2 = featureModelSoln.fluxes['713']+featureModelSoln.fluxes['486']+featureModelSoln.fluxes['961']+featureModelSoln.fluxes['505']+featureModelSoln.fluxes['891']+featureModelSoln.fluxes['165']
        ATP2 = featureModelSoln.fluxes['226']+featureModelSoln.fluxes['962']
        if (featureModelSoln.fluxes['1022']>0):
            ATP2 = ATP2 + featureModelSoln.fluxes['1022']
        if (featureModelSoln.fluxes['yli0039']>0):
            ATP2 = ATP2 + featureModelSoln.fluxes['yli0039']
        if (featureModelSoln.fluxes['892']>0):
            ATP2 = ATP2 + featureModelSoln.fluxes['892']
        Precursors2 = {}
        PrdtFlux2 = featureModelSoln.fluxes['Prdt_r']
        O2 = featureModelSoln.fluxes['1992']
        Glc = featureModelSoln.fluxes['1714']

    elif (GSM=='iYL_2.0'):
        bio2 = featureModelSoln.fluxes['biomass_C']
        #KO R1379 (NADH requiring) GND
        EMP2 = featureModelSoln.fluxes['R0367']/2 #GAPDH 
        PPP2 = featureModelSoln.fluxes['R0483'] #GND: 
        TCA2 = featureModelSoln.fluxes['R0780'] # #CS ->R0780
        NADPH2 = featureModelSoln.fluxes['R0483']+featureModelSoln.fluxes['R0490']+featureModelSoln.fluxes['R1461']+featureModelSoln.fluxes['R0568']+featureModelSoln.fluxes['R1505']+featureModelSoln.fluxes['R0742']#maybe delete no rxn name/enzymes

        if (featureModelSoln.fluxes['R0454']>0):
            NADPH2 = NADPH2+featureModelSoln.fluxes['R0454']
        NADH2 = featureModelSoln.fluxes['R0367']+featureModelSoln.fluxes['R1433']+featureModelSoln.fluxes['R0361']+featureModelSoln.fluxes['R0285']+featureModelSoln.fluxes['R0778']#R0778--> no name
        ATP2 = featureModelSoln.fluxes['R0382']+featureModelSoln.fluxes['ATPm']
        if (featureModelSoln.fluxes['R0369']<0):
            ATP2 = ATP2 - featureModelSoln.fluxes['R0369']
        Precursors2 = {}
        PrdtFlux2 = featureModelSoln.fluxes['Prdt_r']
        O2 = featureModelSoln.fluxes['R1204']
        Glc = featureModelSoln.fluxes['R1294']

    elif (GSM=='iYLI647'):
        bio2 = featureModelSoln.fluxes['biomass_C']
        EMP2 = featureModelSoln.fluxes['GAPD']/2 # FBA,PFK
        PPP2 = featureModelSoln.fluxes['GND'] # GND 646
        TCA2 = featureModelSoln.fluxes['CSm'] # FUMm
        NADPH2 = featureModelSoln.fluxes['GND']+featureModelSoln.fluxes['G6PDH2']+featureModelSoln.fluxes['ICDHy']+featureModelSoln.fluxes['ICDHym']
        if (featureModelSoln.fluxes['MTHFDm']>0):
            NADPH2 = NADPH2 + featureModelSoln.fluxes['MTHFDm']
        if (featureModelSoln.fluxes['MTHFD']>0):
            NADPH2 = NADPH2 + featureModelSoln.fluxes['MTHFD']
        NADH2 = featureModelSoln.fluxes['GAPD']+featureModelSoln.fluxes['PDHm']+featureModelSoln.fluxes['PGCD']+featureModelSoln.fluxes['MDHm']+featureModelSoln.fluxes['ICDHxm']+featureModelSoln.fluxes['PDHm']
        ATP2 = featureModelSoln.fluxes['ATPS3m']+featureModelSoln.fluxes['PYK']
        if (featureModelSoln.fluxes['PGK']<0):
            ATP2 = ATP2 - featureModelSoln.fluxes['PGK']
        if (featureModelSoln.fluxes['SUCOASm']<0):
            ATP2 = ATP2 - featureModelSoln.fluxes['SUCOASm']
        if (featureModelSoln.fluxes['FACOAL140']<0):
            ATP2 = ATP2 - featureModelSoln.fluxes['FACOAL140']
        Precursors2 = {}
        PrdtFlux2 = featureModelSoln.fluxes['Prdt_r']
        O2 = featureModelSoln.fluxes['EX_o2(e)']
        Glc = featureModelSoln.fluxes['EX_glc(e)']


    elif (GSM=='iRhtoC'):
        #correcting factors: b = iYLI657 model. a = Coleaginosus
        b = [58.34689466586959,13.957598903648,5.34306538257682,3.1634429463870,6.29776446133316,-14.9890947997967,1.1398166174414]
        a = [79.4568002191417,12.2557128210244,5.49512443402363,10.4126080222197,7.35528490447832,-25.0946831319152,0.762068068059932]
        
        ratio = b/a

        bio2 = featureModelSoln.fluxes['BIOMASS_RT_CLIM']
        EMP2 = featureModelSoln.fluxes['GAPD_c']/2 #
        TCA2 = featureModelSoln.fluxes['CS_m'] # 
        PPP2 = featureModelSoln.fluxes['GND_c'] # 


        Precursors2 = {}
        PrdtFlux2 = featureModelSoln.fluxes['Prdt_r']

        O2 = featureModelSoln.fluxes['EX_o2_e']
        Glc = featureModelSoln.fluxes['EX_glc__D_e']


        NADPH2 = featureModelSoln.fluxes['GND_c']+featureModelSoln.fluxes['G6PDH2i_c']+featureModelSoln.fluxes['GLYCDy_c']+featureModelSoln.fluxes['PPND2_c']+featureModelSoln.fluxes['ICDHyi_m']+featureModelSoln.fluxes['C3STDH2_c']+featureModelSoln.fluxes['SSALy_c']+featureModelSoln.fluxes['NADHK1_c']
        if (featureModelSoln.fluxes['MTHFD_c']>0):
            NADPH2 = NADPH2+featureModelSoln.fluxes['MTHFD_c']
        if (featureModelSoln.fluxes['MTHFD_m']>0):
            NADPH2 = NADPH2+featureModelSoln.fluxes['MTHFD_m']

        NADH2 = []

        ATP2 = featureModelSoln.fluxes['ATPS_m']+featureModelSoln.fluxes['PGK_c']+featureModelSoln.fluxes['PYK_c']+featureModelSoln.fluxes['SUCOAS_m']

        if (featureModelSoln['FTHFL_m']<0):
            ATP2 = ATP2 - featureModelSoln.fluxes['FTHFL_m']


    elif (GSM=='Coleaginosus'):
        #correcting factors: b = iYLI657 model. a = Coleaginosus
        b = [58.34689466586959,13.957598903648,5.34306538257682,3.1634429463870,6.29776446133316,-14.9890947997967,1.1398166174414]
        a = [55.111098031936024,23.3388653805091,0.019541971214015,0.044508103441065,9.94040764057248,-0.473220950832563,0.051106054947008]
        ratio = b/a
        bio2 = featureModelSoln.fluxes['Biomass_nitrogen_abundant']
        EMP2 = featureModelSoln.fluxes['r_0525']/(2) #
        TCA2 = featureModelSoln.fluxes['r_0328'] # 
        PPP2 = featureModelSoln.fluxes['r_0963'] # 
        O2 = featureModelSoln.fluxes['r_128_exchange']
        Glc = featureModelSoln.fluxes['r_51_exchange']

        Precursors2 = {}
        PrdtFlux2 = featureModelSoln.fluxes['Prdt_r']
        ATP2 = featureModelSoln.fluxes['r_0246']


        if (featureModelSoln.fluxes['r_0857']<0): #
            ATP2 = ATP2 - featureModelSoln.fluxes['r_0857'] 
        if (featureModelSoln.fluxes['r_0865']>0): #
            ATP2 = ATP2 + featureModelSoln.fluxes['r_0865'] 
        if (featureModelSoln.fluxes['r_1006']>0): #
            ATP2 = ATP2 - featureModelSoln.fluxes['r_1006'] 
        NADH2 = []

        NADPH2 = featureModelSoln.fluxes['r_0191']+featureModelSoln.fluxes['r_0630']+featureModelSoln.fluxes['r_0352']+featureModelSoln.fluxes['r_1004']


        if (featureModelSoln.fluxes['r_0631']>0): 
            NADPH = NADPH + featureModelSoln.fluxes['r_0631']

    return(EMP2/ratio[4],PPP2/ratio[2],TCA2/ratio[3],NADPH2/ratio[1],ATP2/ratio[0],NADH2,PrdtFlux2,bio2/ratio[6],O2/ratio[5],Glc)

## Code.

In [16]:
#fluxesToExtract
EMP,PPP,TCA,NADPH,NADH,ATP,PrdtFlux,PrdtYield,bio,O2uptake,Glcuptake = {},{},{},{},{},{},{},{},{},{},{}


FBATrainData = pd.DataFrame()

# Train & validate data
with open('Train&ValidateData_part1.pickle', 'rb') as f:

# Test data
# with open('TESTData_part1.pickle', 'rb') as f:
    Data = pickle.load(f)
FBATrainData = Data[0]


geneDict,fbaModelMetaboliteDict = createGeneDict()
counterProductFail=0
workingData2 = pd.DataFrame()
output=pd.DataFrame()
FBA_models = ['iYLI647'] #['iNL895','iMK735','iYL_2.0','iYali4','Coleaginosus','iRhtoC']

for GSM in FBA_models:
    prod_fail = 0
    OE_fail=0
    defaultModel, defaultObj, defaultFluxSol = defaultObjFunction(GSM)
    print(defaultObj,GSM)
    ##generates GPR dict for each gene in model 
    GPR_dict = generateOEGeneGPR(GSM,defaultModel)

    #counters for the number of instances examined, if OE or KO fails, and where the failure occured
    counter=0
    counterOEFail=0
    counterKOFail=0
    fail1 = 0
    fail2 = 0
    fail3 = 0
    fail4 = 0
    fail5 = 0
    fail6 = 0

    for dataPoint in FBATrainData.index:

        counter+=1
        modelKO = defaultModel.copy()


        ############### Determine if KO, GE instances, perform model simulation ############################
        #Are there gene Knock-outs?
        if (FBATrainData.loc[dataPoint].number_genes_deleted!=0 and KO_option==1):
            #get gene KO data
            tempGenesModified = FBATrainData.genes_modified_updated[dataPoint].strip().split(';')
            tempKO = FBATrainData.gene_deletion[dataPoint].strip().split(';')

            #perform model KO
            modelKO = performGeneKOs(modelKO,GSM,tempKO,tempGenesModified)
            tempKOSol = modelKO.optimize()

            #Did the model produce an infeasible solution? Yes-revert to default soln
            if tempKOSol.status!='optimal':
                print('geneKO growth failed')
                sim_grw_flag=0
                defaultKOBioObj = defaultFluxSol.objective_value
                forPrdtModel = defaultModel.copy()
            else:
                defaultKOBioObj = tempKOSol.objective_value
                forPrdtModel = modelKO.copy()

            #Are there also gene overexpressions? (after KO)
            if (FBATrainData.loc[dataPoint].number_native_genes_overexp!=0 and OE_option==1):
                #get gene overexpression data, heterologous genes
                tempGenesOE = FBATrainData.loc[dataPoint].gene_overexpression.strip().split(';')
                tempHetGenes = FBATrainData.loc[dataPoint].heterologous_gene.strip().split(';')

                #perform model overexpression
                modelKO,OE_fail,fail1,fail2,fail3,fail4,fail5,fail6 = performGeneOE(modelKO,GSM,tempGenesOE,tempGenesModified,tempHetGenes,tempKOSol,GPR_dict,epsilon[0],OE_fail,epsilon[1],epsilon[2],epsilon[5],fail1,fail2,fail3,fail4,fail5,fail6)

                #perform OE FBA analysis with Biomass as objective
                tempOESol = modelKO.optimize()

                #Did the model produce an infeasible solution? Yes-Keep default (KO) soln
                if tempOESol.status!='optimal':
                    counterKOFail+=1
                    print('KO & OE optmizing failed:',counterKOFail, 'OE genes', tempGenesModified[tempGenesOE==1],dataPoint)
                else:
                    tempKOSol = tempOESol
                    #if infeasible, keep KO copy only, else take new model
                    forPrdtModel = modelKO.copy()

                tempOEdefaultBiomass = tempKOSol.objective_value
                #use tempKO sol... then
                #add the product/overexpression.....
            centCarbPrecursor = FBATrainData.loc[dataPoint].central_carbon_precursor.strip().split(';')

            cobra.manipulation.undelete_model_genes(defaultModel)

            dataPointFBASol = tempKOSol

        #There are no genetic Knock-outs, but are there gene overexpressions?
        elif (FBATrainData.loc[dataPoint].number_native_genes_overexp!=0 and OE_option==1):
            #Get gene overexpression data
            tempGenesModified = FBATrainData.loc[dataPoint].genes_modified_updated.strip().split(';')

            # tempGenesModified
            tempGenesOE = FBATrainData.loc[dataPoint].gene_overexpression.strip().split(';')
            tempHetGenes = FBATrainData.loc[dataPoint].heterologous_gene.strip().split(';')
            if dataPoint==343:
                tempGenesModified=tempGenesModified[0:3]
                tempGenesOE=tempGenesOE[0:3]


            modelOE,OE_fail,fail1,fail2,fail3,fail4,fail5,fail6 = performGeneOE(modelKO,GSM,tempGenesOE,tempGenesModified,tempHetGenes,defaultFluxSol,GPR_dict,epsilon[0],OE_fail,epsilon[1],epsilon[2],epsilon[5],fail1,fail2,fail3,fail4,fail5,fail6)
            tempOESol = modelOE.optimize()

            #Did the model produce an infeasible solution? Yes-revert to default soln
            if tempOESol.status!='optimal':
                counterOEFail+=1
                print('OE optmizing failed:',counterOEFail,tempGenesModified[tempGenesOE==1], dataPoint)
                tempOESol = defaultFluxSol
                forPrdtModel = defaultModel.copy()
            else:
                forPrdtModel = modelOE.copy()

            tempOEdefaultBiomass = tempOESol.objective_value

            dataPointFBASol = tempOESol

        #There are no genetic modifications
        else:
            dataPointFBASol = noGeneticMOSol = defaultFluxSol
            forPrdtModel = defaultModel.copy()

        if FBATrainData.loc[dataPoint].product_name == 'Riboflavin':
            isRbflv=1
        else:
            isRbflv=0


        if Product_option == 1:
            finalProdFluxSoln,counterProductFail,prod_fail = maximizeProduct(forPrdtModel,dataPointFBASol.objective_value,epsilon[3],epsilon[4],fbaModelMetaboliteDict,dataPoint,counterProductFail,GSM,prod_fail,isRbflv)

            EMP[dataPoint], PPP[dataPoint], TCA[dataPoint], NADPH[dataPoint], ATP[dataPoint], NADH[dataPoint], PrdtFlux[dataPoint],bio[dataPoint],O2uptake[dataPoint],Glcuptake[dataPoint] = FBAFeatureExtraction(finalProdFluxSoln,GSM)
        else:
            EMP[dataPoint], PPP[dataPoint], TCA[dataPoint], NADPH[dataPoint], ATP[dataPoint], NADH[dataPoint],PrdtFlux[dataPoint],bio[dataPoint],O2uptake[dataPoint],Glcuptake[dataPoint] = FBAFeatureExtraction(dataPointFBASol,GSM)
        PrdtYield[dataPoint] = PrdtFlux[dataPoint]*FBATrainData.loc[dataPoint].mw_Lipids/1000

        #Precursors[dataPoint]

        if (counter%50)==0:
            print(counter)
    print(OE_fail,'OE failures')
    print(prod_fail,'Prod failures')
    print(fail1,fail2,fail3,fail4,fail5,fail6,'failure cases 1-6')

    workingData2['EMP_'+GSM]=pd.Series(EMP)
    workingData2['PPP_'+GSM]=pd.Series(PPP)
    workingData2['TCA_'+GSM]=pd.Series(TCA)
    workingData2['NADPH_'+GSM]=pd.Series(NADPH)
    workingData2['ATP_'+GSM]=pd.Series(ATP)
    workingData2['NADH_'+GSM]=pd.Series(NADH)
    workingData2['PrdtFlux_'+GSM]=pd.Series(PrdtFlux)
    workingData2['PrdtYield_'+GSM]=pd.Series(PrdtYield)
    workingData2['Biomass_'+GSM]=pd.Series(bio)
    workingData2['O2Uptake_'+GSM]=pd.Series(O2uptake)
    workingData2['GlcUptake_'+GSM]=pd.Series(Glcuptake)




1.1398166174414033 iYLI647
Gene: LUP1 not in Genome scale model, OE simulation performed without accounting for gene
Gene: CYP716A180 not in Genome scale model, OE simulation performed without accounting for gene
Gene: ljCPR not in Genome scale model, OE simulation performed without accounting for gene
Gene: aaLIS not in Genome scale model, OE simulation performed without accounting for gene
Gene: rtTAL not in Genome scale model, OE simulation performed without accounting for gene
Gene: pc4CL not in Genome scale model, OE simulation performed without accounting for gene
Gene: phCHS not in Genome scale model, OE simulation performed without accounting for gene
Gene: msCHI not in Genome scale model, OE simulation performed without accounting for gene
Gene: ecACS not in Genome scale model, OE simulation performed without accounting for gene
Gene: xdCrtI not in Genome scale model, OE simulation performed without accounting for gene
Gene: xdCrtYB not in Genome scale model, OE simulation per

Gene: xdCrtI not in Genome scale model, OE simulation performed without accounting for gene
Gene: xdCrtYB not in Genome scale model, OE simulation performed without accounting for gene
Gene: xdCrtE not in Genome scale model, OE simulation performed without accounting for gene
Gene: synGGPPs7 not in Genome scale model, OE simulation performed without accounting for gene
Gene: psCrtW not in Genome scale model, OE simulation performed without accounting for gene
Gene: paCrtZ not in Genome scale model, OE simulation performed without accounting for gene
Gene: apFS not in Genome scale model, OE simulation performed without accounting for gene
Gene: mhWS not in Genome scale model, OE simulation performed without accounting for gene
Gene: xdCrtI not in Genome scale model, OE simulation performed without accounting for gene
Gene: xdCrtYB not in Genome scale model, OE simulation performed without accounting for gene
Gene: xdCrtE not in Genome scale model, OE simulation performed without account

Gene: xdCrtI not in Genome scale model, OE simulation performed without accounting for gene
Gene: xdCrtYB not in Genome scale model, OE simulation performed without accounting for gene
Gene: xdCrtE not in Genome scale model, OE simulation performed without accounting for gene
Gene: synGGPPs7 not in Genome scale model, OE simulation performed without accounting for gene
Gene: psCrtW not in Genome scale model, OE simulation performed without accounting for gene
Gene: paCrtZ not in Genome scale model, OE simulation performed without accounting for gene
Gene: xdCrtI not in Genome scale model, OE simulation performed without accounting for gene
Gene: xdCrtYB not in Genome scale model, OE simulation performed without accounting for gene
Gene: xdCrtE not in Genome scale model, OE simulation performed without accounting for gene
Gene: synGGPPs7 not in Genome scale model, OE simulation performed without accounting for gene
Gene: hpBKT not in Genome scale model, OE simulation performed without a

Gene: xdCrtI not in Genome scale model, OE simulation performed without accounting for gene
Gene: xdCrtYB not in Genome scale model, OE simulation performed without accounting for gene
Gene: xdCrtE not in Genome scale model, OE simulation performed without accounting for gene
Gene: synGGPPs7 not in Genome scale model, OE simulation performed without accounting for gene
Gene: hpBKT not in Genome scale model, OE simulation performed without accounting for gene
Gene: hpCrtZ not in Genome scale model, OE simulation performed without accounting for gene
Gene: scCAT2 not in Genome scale model, OE simulation performed without accounting for gene
Gene: xdCrtI not in Genome scale model, OE simulation performed without accounting for gene
Gene: xdCrtYB not in Genome scale model, OE simulation performed without accounting for gene
Gene: xdCrtE not in Genome scale model, OE simulation performed without accounting for gene
Gene: synGGPPs7 not in Genome scale model, OE simulation performed without a

Gene: xdCrtI not in Genome scale model, OE simulation performed without accounting for gene
Gene: xdCrtYB not in Genome scale model, OE simulation performed without accounting for gene
Gene: xdCrtE not in Genome scale model, OE simulation performed without accounting for gene
Gene: synGGPPs7 not in Genome scale model, OE simulation performed without accounting for gene
Gene: psCrtW not in Genome scale model, OE simulation performed without accounting for gene
Gene: hpCrtZ not in Genome scale model, OE simulation performed without accounting for gene
Gene: ghP2SG1 not in Genome scale model, OE simulation performed without accounting for gene
Gene: ecPDH not in Genome scale model, OE simulation performed without accounting for gene
Gene: ecLPlA not in Genome scale model, OE simulation performed without accounting for gene
Gene: cpMNX1 not in Genome scale model, OE simulation performed without accounting for gene
Gene: rsAS not in Genome scale model, OE simulation performed without accoun

Gene: rtTAL not in Genome scale model, OE simulation performed without accounting for gene
Gene: pc4CL not in Genome scale model, OE simulation performed without accounting for gene
Gene: phCHS not in Genome scale model, OE simulation performed without accounting for gene
Gene: msCHI not in Genome scale model, OE simulation performed without accounting for gene
Gene: ecACC not in Genome scale model, OE simulation performed without accounting for gene
Gene: tLS not in Genome scale model, OE simulation performed without accounting for gene
Gene: TNDPS1 not in Genome scale model, OE simulation performed without accounting for gene
Gene: paCrtB not in Genome scale model, OE simulation performed without accounting for gene
Gene: paCrtI not in Genome scale model, OE simulation performed without accounting for gene
Gene: carB not in Genome scale model, OE simulation performed without accounting for gene
Gene: carRP not in Genome scale model, OE simulation performed without accounting for gene

Gene: scZWF not in Genome scale model, OE simulation performed without accounting for gene
Gene: ecALDH not in Genome scale model, OE simulation performed without accounting for gene
Gene: PEX10 not in Genome scale model, OE simulation performed without accounting for gene
Gene: xdCrtI not in Genome scale model, OE simulation performed without accounting for gene
Gene: xdCrtYB not in Genome scale model, OE simulation performed without accounting for gene
Gene: xdCrtE not in Genome scale model, OE simulation performed without accounting for gene
Gene: synGGPPs7 not in Genome scale model, OE simulation performed without accounting for gene
Gene: hpBKT not in Genome scale model, OE simulation performed without accounting for gene
Gene: paCrtZ not in Genome scale model, OE simulation performed without accounting for gene
Gene: ecAtob not in Genome scale model, OE simulation performed without accounting for gene
Gene: bpHMG1 not in Genome scale model, OE simulation performed without account

Gene: ghP2SG1 not in Genome scale model, OE simulation performed without accounting for gene
350
Gene: ecAtob not in Genome scale model, OE simulation performed without accounting for gene
Gene: bpHMG1 not in Genome scale model, OE simulation performed without accounting for gene
Gene: MFE1 not in Genome scale model, OE simulation performed without accounting for gene
Gene: DGA2 not in Genome scale model, OE simulation performed without accounting for gene
Gene: SDH5 not in Genome scale model, OE simulation performed without accounting for gene
Gene: scPCK not in Genome scale model, OE simulation performed without accounting for gene
Gene: xdCrtI not in Genome scale model, OE simulation performed without accounting for gene
Gene: xdCrtYB not in Genome scale model, OE simulation performed without accounting for gene
Gene: xdCrtE not in Genome scale model, OE simulation performed without accounting for gene
Gene: synGGPPs7 not in Genome scale model, OE simulation performed without accoun

Gene: tLS not in Genome scale model, OE simulation performed without accounting for gene
Gene: TNDPS1 not in Genome scale model, OE simulation performed without accounting for gene
Gene: klGPD1 not in Genome scale model, OE simulation performed without accounting for gene
Gene: caGAPc not in Genome scale model, OE simulation performed without accounting for gene
Gene: clLS(d) not in Genome scale model, OE simulation performed without accounting for gene
Gene: carB not in Genome scale model, OE simulation performed without accounting for gene
Gene: carRP not in Genome scale model, OE simulation performed without accounting for gene
Gene: SDH5 not in Genome scale model, OE simulation performed without accounting for gene
Gene: KU70 not in Genome scale model, OE simulation performed without accounting for gene
Gene: ssXYL1 not in Genome scale model, OE simulation performed without accounting for gene
Gene: ssXYL2 not in Genome scale model, OE simulation performed without accounting for ge

Gene: xlDHCR7 not in Genome scale model, OE simulation performed without accounting for gene
Gene: ssCYP11A1 not in Genome scale model, OE simulation performed without accounting for gene
Gene: ssAdr not in Genome scale model, OE simulation performed without accounting for gene
Gene: ssADX not in Genome scale model, OE simulation performed without accounting for gene
Gene: xdCrtYB not in Genome scale model, OE simulation performed without accounting for gene
Gene: xdCrtI not in Genome scale model, OE simulation performed without accounting for gene
Gene: xdCrtE not in Genome scale model, OE simulation performed without accounting for gene
Gene: psCrtW not in Genome scale model, OE simulation performed without accounting for gene
Gene: paCrtZ not in Genome scale model, OE simulation performed without accounting for gene
Gene: PEX10 not in Genome scale model, OE simulation performed without accounting for gene
Gene: MFE1 not in Genome scale model, OE simulation performed without accounti

Gene: tLS not in Genome scale model, OE simulation performed without accounting for gene
Gene: TNDPS1 not in Genome scale model, OE simulation performed without accounting for gene
Gene: carPR not in Genome scale model, OE simulation performed without accounting for gene
Gene: carB not in Genome scale model, OE simulation performed without accounting for gene
Gene: xdCrtI not in Genome scale model, OE simulation performed without accounting for gene
Gene: xdCrtYB not in Genome scale model, OE simulation performed without accounting for gene
Gene: xdCrtE not in Genome scale model, OE simulation performed without accounting for gene
Gene: synGGPPs7 not in Genome scale model, OE simulation performed without accounting for gene
Gene: hpBKT not in Genome scale model, OE simulation performed without accounting for gene
Gene: hpCrtZ not in Genome scale model, OE simulation performed without accounting for gene
Gene: NphT7 not in Genome scale model, OE simulation performed without accounting f

Gene: ssNphT7 not in Genome scale model, OE simulation performed without accounting for gene
Gene: hpIPI not in Genome scale model, OE simulation performed without accounting for gene
Gene: mcCarRP not in Genome scale model, OE simulation performed without accounting for gene
Gene: mcCarB not in Genome scale model, OE simulation performed without accounting for gene
Gene: lsLCYe not in Genome scale model, OE simulation performed without accounting for gene
Gene: ofCCD1 not in Genome scale model, OE simulation performed without accounting for gene
Gene: SDH5 not in Genome scale model, OE simulation performed without accounting for gene
Gene: mcMCE2 not in Genome scale model, OE simulation performed without accounting for gene
Gene: ssNphT7 not in Genome scale model, OE simulation performed without accounting for gene
Gene: hpIPI not in Genome scale model, OE simulation performed without accounting for gene
Gene: mcCarRP not in Genome scale model, OE simulation performed without accounti

Gene: ecAtob not in Genome scale model, OE simulation performed without accounting for gene
Gene: bpHMG1 not in Genome scale model, OE simulation performed without accounting for gene
Gene: aFS not in Genome scale model, OE simulation performed without accounting for gene
Gene: apFS not in Genome scale model, OE simulation performed without accounting for gene
Gene: apFS not in Genome scale model, OE simulation performed without accounting for gene
Gene: apFS not in Genome scale model, OE simulation performed without accounting for gene
Gene: apFS not in Genome scale model, OE simulation performed without accounting for gene
Gene: apFS not in Genome scale model, OE simulation performed without accounting for gene
Gene: ecAtob not in Genome scale model, OE simulation performed without accounting for gene
Gene: bpHMG1 not in Genome scale model, OE simulation performed without accounting for gene
Gene: aFS not in Genome scale model, OE simulation performed without accounting for gene
Gene

In [17]:
#Concat the FBA data ("workingData2") and the database info ("FBATrainData")
T = workingData2.copy()
# T.drop('Product_titer(g/L)',inplace=True,axis=1)
# T.drop('product_name',inplace=True,axis=1)
output = FBATrainData.copy()
output = pd.concat([output,T],axis=1)

In [18]:
#Save as pickle file:
## Training data.
# with open('Train&ValidateData_part2.pickle', 'wb') as f:
#     pickle.dump([output], f)
    
## Testing data.
with open('TESTData_part2.pickle', 'wb') as f:
    pickle.dump([output], f)