# * Get target reactions from reference GEMs
GEMs: genome-scale metabolic models

**1. Download genome files from NCBI DB which is reference strains of target organism.**

e.g.,)
- Target organism: _Pseudomonas putida_ S12 (Accn. No.: NZ_CP009974.1)
- Well-characterized model: _Pseudomonas putida_ KT2440 (Accn. No.: NC_002947.4)
- High-quality genome annotation: _Pseudomonas aeruginosa_ PAO1 (Accn. No.: NC_002516.2)


**2. Find orthologous using BLASTP-pairwise alignment of 'BLAST+' (standalone BLAST) and EDGAR targeted at each strains's CDSs. (Optional : assemble orthologous data)**

- Integrate BLASTP-based orthologous data and EDGAR-based orthologous data. (priority : EDGAR > BLASTP)


**3. Download genome-scale metabolic models (GEMs) matched with each strains from model DB which likes to BiGG models.**

e.g.,)
- _P. putida_ KT2440: iJN1462
- _P. aeruginosa_ PAO1: iPAE1146


**4. If reference GEMs have old locus tag, change the locus tag of orthologous data to the old locus tag of the GEMs**

e.g.,) iJN1462 have old locus tag.


**5. Input GEMs with orthologous data. And get reactions from reference GEMs.**

- If there are multiple reference models, give priority to the more sophisticated model.
- Select the objective function of the draft model from the objective function assigned to the reference GEM.
- References GEMs are used to get reactions along with Gene-Protein-Reaction (GPR) relationships.

e.g.,) 'a' (gene) is matched to 'A' (reaction). If 'a' is orthologous, 'A' is added to target GEM from references GEMs.

- (optional) Leave genes that need to be maintained even if they are not orthologous.

e.g.,) Virtual gene of spontaneous reaction

- (optional) Set the default oxygen uptake rate for the draft model.



# * Dependent packages
 : COBRApy, Biopython, OpenPyXL

In [None]:
# import packages

import openpyxl
import cobra
from Bio import SeqIO
import os
import copy
from string import ascii_uppercase

# Prepare to create up to 27 dynamic variables
apb = ''.join(list(ascii_uppercase))

print (apb)
print (len(apb))

for i in apb: globals()[i] = {} # Generate as many dictionary variables as the number of alphabets.

# set PATH
path = os.path.abspath(os.path.join(os.getcwd(), '..'))
print (path)

# 1. Prepare to files

**1) Genome files**

**2) Reference GEMs associated with genome files**

# 2. Find orthologous

**1) Using BLAST+'s (stand-alone) BLASTP-pairwise alignment.**

- Criteria : query coverage > 90% and identity > 90%

**2) Using EDGAR (https://edgar3.computational.bio.uni-giessen.de/cgi-bin/edgar_login.cgi?cookie_test=1)**

- Union of two orthologous data (priority : EDGAR > BLAST+).

In [None]:
# Function that extracts the locus tag of CDSs from the genome file into a list
def extract_cds_lt_in_genome(x):  # x : genome file PATH
    
    print ('<RUNNING>\nextract locus tag of CDSs from genome({0})\n: Extracting CDS for target organism...\n'.format(x))
    
    file = next(SeqIO.parse(x, "genbank"))

    feat = file.features

    genome = []
    for i in feat:
        if i.qualifiers.get("pseudo") == [""]:  # Exclude pseudogene
            pass
        elif i.type == 'CDS':  # Extract CDS type only
            genome.append(i)

    cds_lt = []
    for i in genome:  # Extract locus tag from each CDS type
        cds_lt.extend(i.qualifiers['locus_tag'])

    
    print ('<DONE>\nextract_cds_lt_in_genome({0})\n'.format(x))
    
    
    return cds_lt


'''
both EDGAR and BLASTP file are input before changing in '.xlsx'

BLASTP file`s columns are order to this : Query_ID, Subject_ID, Query_coverage, Identity
Query_ID : target organism`s locus tag
Subject_ID : counter organism`s locus tag
Query_coverage : Query coverage
Identity : percent identity

BLASTP files are assemble to one file. ('n' = blastp result number,  spreadsheet'n' : blastp_result'n')

BLASTP`s criteria : Query coverage > 90% and Identity > 90%

BLASTP`s maximum target sequence : 5
'''

# Convert EDGAR orthologous file to dictionary type (considering paralog)
def edgar_orthologous (x):  # x : EDGAR file path

    print ('<RUNNING>\nedgar_orthologous({0})\n: Extracting EDGAR orthologous file...\n'.format(x))
    
    edgar_wb = openpyxl.load_workbook(x)  # load EDGAR EXCEL file
    sheet_names = edgar_wb.sheetnames  # Sheet list in EDGAR file
    edgar = edgar_wb[sheet_names[0]]  # activated sheet with EDGAR orthologous data
    
    # if EDGAR 3.0 data, run this script.---------
    edgar.delete_rows(1)
    
    max_row = edgar.max_row # max_row of EDGAR file
    
    for i, j in enumerate(edgar['1']): # max_column of EDGAR
        if bool(j.value) == False:
            max_col = i
            break
    

    # col_{} (list type)
    # : create dynamic variable and insert value
    for i in range(max_col):
        globals()['pre_col_{}'.format(apb[i])] = edgar[apb[i]]
        globals()['col_{}'.format(apb[i])] = []
        for j in range(max_row):
            globals()['col_{}'.format(apb[i])].append(globals()['pre_col_{}'.format(apb[i])][j].value.replace(', ,psos,', ','))
    

    cut_row = col_A.index(' - , -')  # determining upper limit index of row

    # remove any parts not related to the header and target organism
    for i in range(max_col): globals()['col_{}'.format(apb[i])] = globals()['col_{}'.format(apb[i])][1:cut_row]

    # A_vs_{} (dict type)
    # : create dynamic variables and match the values between the northologous values, mark as locus_tag (considering paralog)
    for i in range(max_col):
        if max_col - i - 1 == 0:
            break

        a = globals()['edgar_A_vs_{}'.format(apb[i + 1])] = {}
        for j in range(len(col_A)):  # col_A is target orgasnim
            b = globals()['col_{}'.format(apb[i + 1])]
            if (',IDENTICAL PARALOGS:,' in col_A[j]) and (',IDENTICAL PARALOGS:,' not in b[j]):  # if col_A element is paralog O, and col_{} element is paralog X, same value (col_{} element) for multiple str keys (col_A element)
                lst1 = col_A[j].split(',IDENTICAL PARALOGS:,')
                for para1 in lst1: a[para1.split(',')[0]] = b[j].split(',')[0].strip()

            elif (',IDENTICAL PARALOGS:,' not in col_A[j]) and (',IDENTICAL PARALOGS:,' in b[j]):  # if col_A element is paralog X, and col_{} element is paralog O, list value (col_{} element) on one str key (col_A element)
                lst2 = b[j].split(',IDENTICAL PARALOGS:,')
                lst3 = []
                for para2 in lst2: lst3.append(para2.split(',')[0].strip())
                a[col_A[j].split(',')[0]] = lst3

            elif (',IDENTICAL PARALOGS:,' in col_A[j]) and (',IDENTICAL PARALOGS:,' in b[j]):  # if col_A element is paralog O, and col_{} element is paralog O, same list value (col_{} element) for multiple str keys (col_A element)
                lst1 = col_A[j].split(',IDENTICAL PARALOGS:,')
                lst2 = b[j].split(',IDENTICAL PARALOGS:,')
                lst3 = []
                for para2 in lst2: lst3.append(para2.split(',')[0].strip())
                for para1 in lst1: a[para1.split(',')[0]] = lst3

            else: a[col_A[j].split(',')[0]] = b[j].split(',')[0].strip()  # if col_A element is paralog X, and col_{} element is paralog X, one str value (col_{} element) on one str key (col_A element)


    for i in range(max_col-1):  # remove any keys that are not included in the CDS' locus_tag in the target organism genome file.
        a = globals()['edgar_A_vs_{}'.format(apb[i + 1])]
        a_keys = list(a.keys())
        for j in a_keys:
            if j not in target_cds_lt:
                del a[j]


    result = []
    for i in range(max_col-1):  # combine in result (list type)
        a = globals()['edgar_A_vs_{}'.format(apb[i + 1])]
        result.append(a)

    
    print ('<DONE>\nedgar_orthologous({0})\n'.format(x))
    


# Convert BLAST+ orthologous file to dictionary type (considering paralog)
def blast_orthologous (x):  # x : BLAST+ file path
    
    print ('<RUNNING>\nblastp_orthologous({0})\n: Extracting BLAST+ orthologous file...\n'.format(x))
    
    blast_wb = openpyxl.load_workbook(x)
    sheet_names = blast_wb.sheetnames

    serial = 1
    for sheet in sheet_names:  # Output BLAST results sequentially and process them
        blast = blast_wb[sheet]

        q_id_pre = blast['A']  # Change Query_ID, Subject_ID, Query_coverage, Identity to list format
        q_id_pre = q_id_pre[1:]
        q_id = []
        for i in q_id_pre: q_id.append(i.value.split(',')[0])
        s_id_pre = blast['B']
        s_id_pre = s_id_pre[1:]
        s_id = []
        for i in s_id_pre: s_id.append(i.value.split(',')[0])
        q_cov_pre = blast['C']
        q_cov_pre = q_cov_pre[1:]
        q_cov = []
        for i in q_cov_pre: q_cov.append(i.value)
        ident_pre = blast['D']
        ident_pre = ident_pre[1:]
        ident = []
        for i in ident_pre: ident.append(i.value)

        index_del = []  # remove results below the condition query_coverage > 90 & identity > 90.
        for i in range(len(q_id)):
            if (float(q_cov[i]) > 90) and (float(ident[i]) > 90): pass
            else: index_del.append(i)
        
        # 'del' is applied immediately to remove the value corresponding to the index, so if you remove it without reversing the order, the next index value before the erase is skipped.
        index_del.sort(reverse=True)

        for i in index_del:
            del q_id[i]
            del s_id[i]
            del q_cov[i]
            del ident[i]

        q_id_count = {}  # q_id Enter the number of each element
        for i in set(q_id): q_id_count[i] = q_id.count(i)

        # blast_sheet_{} (dict)
        # : generate dynamic variables with matching results between query (target organism) and subject (others) (considering parallelog)
        a = globals()['blast_sheet_{}'.format(serial)] = {}
        for i in set(q_id):
            if q_id_count[i] == 1: a[i] = s_id[q_id.index(i)]
            else:
                lst = []
                for j in range(q_id.index(i), q_id.index(i) + q_id_count[i]): lst.append(s_id[j])
                a[i] = lst

        serial += 1

        
    print ('<DONE>\nblast_orthologous({0})\n'.format(x))
        
        

# Final orthologous data trimming function
# : If you have both EDGAR and BLAST+ orthologous data, this function integrate the two data (priority : EDGAR > BLAST+).
# : Although you have only one orthologous data, this function trim orthologous data.
def make_total_orthologous (file_name_ortho_edgar, file_name_ortho_blastp):
    
    print ('<RUNNING>\nmake_total_orthologous()\n: Final trimming of orthologous data...\n')
    
    edgar_wb = openpyxl.load_workbook('{0}/input/{1}'.format(path, file_name_ortho_edgar))
    edgar_sheet_names = edgar_wb.sheetnames
    edgar = edgar_wb[edgar_sheet_names[0]]
    
    # if EDGAR 3.0 data, run this script.---------
    edgar.delete_rows(1)

    for i, j in enumerate(edgar['1']):  # max_column of EDGAR
        if bool(j.value) == False:
            edgar_sample_num = i
            break

    blast_wb = openpyxl.load_workbook('{0}/input/{1}'.format(path, file_name_ortho_blastp))
    blast_sheet_names = blast_wb.sheetnames
    blast_sample_num = len(blast_sheet_names)  # The number of pairwise-alignment on BLAST+


    # Load dynamic variables of EDGAR and BLAST+ into str type
    edgar_result_id = []
    blast_result_id = []
    for i in range(edgar_sample_num-1): edgar_result_id.append('edgar_A_vs_{}'.format(apb[i + 1]))
    for i in range(blast_sample_num): blast_result_id.append('blast_sheet_{}'.format(i+1))

    result_id = list(set(edgar_result_id) | set(blast_result_id))
    result_id.sort()

    
    # generate a complete list of keys for the locus_tag of the target organism
    total_keys = set()
    for i in result_id:
        a = globals()[i]
        a_keys = set(a.keys())
        total_keys = total_keys | a_keys
    total_keys = list(total_keys)


    # Combine EDGAR and BLAST+ into one (priority : EDGAR > BLAST+)
    total_orthologous = {}
    edgar_orthologous_result = {}
    blast_orthologous_result = {}
    for i in total_keys:
        lst_bla = []
        lst_ed = []
        for j in result_id:
            a = globals()[j]
            if 'blast_sheet_' in j:
                if i in a:
                    if str(type(a[i])) == "<class 'str'>":
                        lst_bla.append(a[i])
                    else:
                        lst_bla.extend(a[i])
            elif 'edgar_A_vs_' in j:
                if i in a:
                    if str(type(a[i])) == "<class 'str'>":
                        lst_ed.append(a[i])
                    else:
                        lst_ed.extend(a[i])
        lst_bla = list(set(lst_bla))
        lst_bla.sort()
        if '-' in lst_bla:
            lst_bla.remove('-')
        blast_orthologous_result[i] = lst_bla
        if lst_bla == []:
            del blast_orthologous_result[i]
        lst_ed = list(set(lst_ed))
        lst_ed.sort()
        if '-' in lst_ed:
            lst_ed.remove('-')
        edgar_orthologous_result[i] = lst_ed
        if lst_ed == []:
            del edgar_orthologous_result[i]

    # put BLAST+ in total_orthologous (empty dict) first and then overwrite EDGAR.
    # (If the dict of EDGAR and BLAST+ has the same key (target organism locus tag), select EDGAR.)
    total_orthologous.update(blast_orthologous_result)
    total_orthologous.update(edgar_orthologous_result)
    
    print ('<DONE>\nmake_total_orthologous()\n')


    return total_orthologous

# 3. Model reconstruction

**1) Add reactions to draft model from reference models along with Gene-Protein-Reaction (GPR) relationships**

In [None]:
# Bring the model and combine the reactions that correspond to the orthologous.
# args : SBML file path , (If there are multiple models, write them like 'A, B, C'.)
# file_name_model_objective: file name of model with desired objective function
# geme_remain_lst (optional): List of genes to keep even if they are not orthologous (e.g., virtual gene of spontaneous reaction)
# our (optional): Oxygen uptake rate of draft model
def make_draft_model (args, file_name_model_objective, gene_remain_lst = False, our = False):
    
    print ('<RUNNING>\nmake_draft_model([{0}])\n: Generating draft model from reference model...\n'.format(', '.join(args)))            
    
    draft_model = cobra.Model('draft_model')  # Create an empty model
    biomass_rxns = {}
    model_path_lst = []
    add_rxns_num_lst = []
    for models in args:
        model_ori = cobra.io.read_sbml_model('{0}/input/{1}'.format(path, models))
        model = model_ori.copy()
        model_path_lst.append('{0}/input/{1}'.format(path, models))
        
        if models == file_name_model_objective:
            model_id_objective = model.id

        model_genes = [] 
        for i in model.genes: model_genes.append(i.id)

        com_genes = []  # overlaped genes extraction between model and othologous data
        for k,v in total_orthologous.items():
            for j in v:
                if j in model_genes:
                    com_genes.append(j)
        remove_genes = list(set(model_genes) - set(com_genes))  # remove_genes : genes without target organism and othologous
        remove_genes.sort()
        
        
        # (optional) Remove the elements of gene_remain_lst from remove_genes to leave the elements of gene_remain_lst in the draft model.
        if gene_remain_lst:        
            for gene in remove_genes:
                if gene in gene_remain_lst:
                    remove_genes.remove(gene)
        
        
        # Remove_genes from within the model. (Also remove the reactions that are deactivated without the corresponding genes in accordance with GPR.)
        cobra.manipulation.delete.remove_genes(model, remove_genes)


        # Add remaining reactions to the draft model
        add_rxns = []
        for i in model.reactions:
            add_rxns.append(i)

        pre_objective_id = str(model.objective.expression)
        objective_id = pre_objective_id[pre_objective_id.index('*')+1 : pre_objective_id.index(' - ')]

        objective_rxn = model.reactions.get_by_id(objective_id).copy()

        for i in add_rxns:
            if i.id == objective_id:
                add_rxns.remove(i)

        draft_model.add_reactions(add_rxns)
        if model.id == model_id_objective:
            add_rxns_num_lst.append(len(add_rxns) + 1)
        else:
            add_rxns_num_lst.append(len(add_rxns))

        
        # Save the objective function of each reference model as a dict type (key : model ID, value : objective function)
        biomass_rxns[model.id] = objective_rxn


    delay = 0
    for i in range(10**7): # 'Ignoring reaction ...' Because input is in the middle of the warning, delay is required.
        delay = delay + i - i
        str(delay)
        int(delay)


    # Enable input for objective function setting
    draft_model.add_reactions([biomass_rxns[model_id_objective]])
    draft_model.objective = biomass_rxns[model_id_objective].id
    objective_function_id = biomass_rxns[model_id_objective].id

    print ('draft_model_objective_function: ' + objective_function_id)


    # (optional) Set the upper limit of the oxygen uptake rate in the model
    if our:
        medium = draft_model.medium
        medium['EX_o2_e'] = our
        draft_model.medium = medium
        print (draft_model.medium)
        print ('EX_o2_e:', abs(draft_model.reactions.get_by_id('EX_o2_e').lower_bound))


    print('=' * 20 + '\n' + '{}'.format(draft_model.id) + '\n' + '=' * 20)
    print('reactions:', len(draft_model.reactions))
    print('genes:', len(draft_model.genes))
    print('metabolites:', len(draft_model.metabolites))
    
    
    print ('Add reactions number')
    for i in range(len(args)):
        print ('Add reactions from {0}: {1}'.format(args[i], add_rxns_num_lst[i]))

    
    print ('<DONE>\nmake_draft_model({0})\n'.format(', '.join(args)))
    

    return draft_model


# Change the gene locus tag of the draft model to target organism
def rename (model, total_orthologous):  # model : target model
    
    print ('<RUNNING>\nrename({0}, total_orthologous)\n: Changing the gene ID of the draft model to locus tag of the target organism...\n'.format(model))
    
    # Generating a list of genes in a model.
    model_genes = []
    model_genes_id = []
    for i in model.genes:
        model_genes.append(i)
        model_genes_id.append(i.id)

    # Find the paralog and set it according to gene_reaction_rule format
    rename_dict = {}
    for i in model_genes_id:
        para_list = []
        for k,v in total_orthologous.items():
            for j in v:
                if j == i:
                    para_list.append(k)
        if len(para_list) > 1:
            rename_dict[i] = ['(' + ' or '.join(para_list) + ')', para_list]  # rename_dict[i][0] = gene_reaction_rule format, rename_dict[i][1] = gene_list
        elif len(para_list) == 1:
            rename_dict[i] = [para_list[0], para_list]


    # Change genes ID of model to locus tag of target organism
    for i in model.reactions:
        gene_id_lst = []
        for j in list(i.genes):
            gene_id_lst.append(j.id)
        for j in gene_id_lst:
            if j in rename_dict:
                new_gene_id_lst = []
                for h in i.genes:
                    new_gene_id_lst.append(h.id)
                if len(set(rename_dict[j][1]) & set(new_gene_id_lst)) == 0:
                    model.reactions.get_by_id(i.id).gene_reaction_rule = i.gene_reaction_rule.replace(j, rename_dict[j][0])

    
    print ('<DONE>\nrename({0}, total_orthologous)\n'.format(model))
    
    
    return model

# * Optional Functions
- change_locus_tag
 : For EDGAR Server, Genome data exists as RefSeq DB's. So if the model's base Genome data is GenBank DB (old locus tag), then we need to replace RefSeq DB (locus tag) with GenBank DB (old locus tag).
 
- match_lt_old_new (sub-function of change_locus_tag)
 : Function that extracts CDSs with both locus tag and old locus tag from genome file and matches locus tag with old locus tag.

- output_excel_sbml_file
 : Function that outputs the model to EXCEL and SBML format

In [None]:
# Function that extracts CDSs with both locus tag and old locus tag from genome file and matches locus tag with old locus tag
def match_lt_old_new(x):  # x : genome file path. (e.g. A.gb)
    
    print ('<RUNNING>\nmatch_lt_old_new({0})\n: Matching locus tag and old locus tag from CDS type of genome...\n'.format(x))
    
    file = next(SeqIO.parse(x, "genbank"))

    feat = file.features

    genome = []
    for i in feat:
        if i.qualifiers.get("pseudo") == [""]:
            pass
        elif i.type == 'CDS':
            genome.append(i)

    cds_old_lt = {}
    for i in genome:
        if 'old_locus_tag' in i.qualifiers:
            cds_old_lt[i.qualifiers['locus_tag'][0]] = i.qualifiers['old_locus_tag'][0]

    
    print ('<DONE>\nmatch_lt_old_new({0})\n'.format(x))
    
            
    return cds_old_lt



# Function that changes the locus tag of a gene to an old locus tag
# args : Genome file of RefSeq DB based on the same sequencing data
# file_name_ortho_edgar: file name of EDGAR orthologous data
def change_locus_tag (args, file_name_ortho_edgar):
    
    print ('<RUNNING>\nchange_locus_tag({0})\n: Changing to locus tag for the model of the genome...\n'.format(args))
    
    edgar_wb = openpyxl.load_workbook('{0}/input/{1}'.format(path, file_name_ortho_edgar))
    sheet_names = edgar_wb.sheetnames
    edgar = edgar_wb[sheet_names[0]]

    # if EDGAR 3.0 data, run this script.---------
    edgar.delete_rows(1)
    
    
    for i, j in enumerate(edgar['1']): # max column of EDGAR file
        if bool(j.value) == False:
            max_col = i
            break

    for genome_file in args:
        cds_old_lt = match_lt_old_new('{0}/input/{1}'.format(path, genome_file))
        cds_old_lt_keys = list(cds_old_lt.keys())
        for i,j in enumerate(cds_old_lt_keys[0]):
            try:
                int(j)
                num_index = i
                break
            except:
                pass
        target_chr = cds_old_lt_keys[0][: num_index]
        target_chr
        
        print ('target_chr:', target_chr)

        a_name_index = 0
        for i in range(max_col-1):
            a = globals()['edgar_A_vs_{}'.format(apb[i + 1])]
            a_values = list(a.values())
            
            for h,j in enumerate(a_values):
                if j != '-':
                    a_values_index = h
                    break
            
            for k,j in enumerate(a_values[a_values_index]):
                try:
                    int(j)
                    num_index = k
                    break
                except:
                    pass
            a_chr = a_values[a_values_index][: num_index]
            
            print ('a_chr:', a_chr)
            
            if a_chr == target_chr:
                
                print ('chrs:', a_chr, target_chr)
                
                a_name_index += i
                break
        
        print (a_name_index)

        a_copy = globals()['edgar_A_vs_{}'.format(apb[a_name_index + 1])].copy()
        globals()['edgar_A_vs_{}'.format(apb[a_name_index + 1])].clear()
        for k, v in a_copy.items():
            if str(type(v)) == "<class 'str'>":
                if v in cds_old_lt:
                    globals()['edgar_A_vs_{}'.format(apb[a_name_index + 1])][k] = cds_old_lt[v]
            else:
                lst = []
                for j in v:
                    if j in cds_old_lt:
                        lst.append(cds_old_lt[j])
                if len(lst) == 0:
                    pass
                elif len(lst) == 1:
                    globals()['edgar_A_vs_{}'.format(apb[a_name_index + 1])][k] = cds_old_lt[v[0]]
                elif len(lst) > 1:
                    globals()['edgar_A_vs_{}'.format(apb[a_name_index + 1])][k] = lst
    
    
    print ('<DONE>\nchange_locus_tag({0})\n'.format(args))



        
# Function to output the model to SBML and EXCEL format
def output_excel_sbml_file(model, model_id):  # model : target model, model_id : Storage name and model name of the output file of the model
    
    print ('<RUNNING>\noutput_excel_sbml_file({0}, {1})\n: Outputing model to excel and sbml format...\n'.format(model, model_id))
    
    model.id = model_id
    
    cobra.io.write_sbml_model(model, '{0}/output/{1}.xml'.format(path, model.id))
    
    model_rxn_info = {}
    model_rxn_id = []
    for i in model.reactions:
        model_rxn_id.append(i.id)
        model_rxn_info[i.id] = [i.id , i.name, i.reaction, i.gene_reaction_rule, i.lower_bound, i.upper_bound, i.objective_coefficient, i.subsystem]

    model_mt_info = {}
    model_mt_id = []
    for i in model.metabolites:
        model_mt_id.append(i.id)
        comp = ''
        if i.compartment == 'c':
            comp = 'Cytosol'
        elif i.compartment == 'p':
            comp = 'Periplasm'
        elif i.compartment == 'e':
            comp = 'Extracellular'

        model_mt_info[i.id] = [i.id , i.name, i.formula, i.charge, comp]

    header_rxn = ['Reaction ID', 'Description', 'Reaction', 'Gene-protein-reaction (GPR) rules', 'Lower bound', 'Upper bound', 'Objective', 'Subsystem']

    header_mt = ['Metabolite ID', 'Description', 'Charged formula', 'Charge', 'Compartment']


    write_wb = openpyxl.Workbook()
    write_ws_1 = write_wb.create_sheet('Reaction List')
    write_ws_2 = write_wb.create_sheet('Metabolite List')
    write_wb.remove(write_wb['Sheet'])

    write_ws_1 = write_wb['Reaction List']
    write_ws_1.append(header_rxn)
    for i in model_rxn_id:
        write_ws_1.append(model_rxn_info[i])

    write_ws_2 = write_wb['Metabolite List']
    write_ws_2.append(header_mt)
    for i in model_mt_id:
        write_ws_2.append(model_mt_info[i])

    write_wb.save('{0}/output/{1}.xlsx'.format(path, model.id))
    
    
    print ('<DONE>\noutput_excel_sbml_file({0}, {1})\n'.format(model, model_id))

# * Running Script
**- precondition**
 - The genome file of target organism must exist.
 - At least one reference model must exist.
 - There must be an othologous file compared between the genome files and the target organism based on the reference model construction.

# Example
: make draft model

In [None]:
### trimming orthologous file

# target_organism_genome_file
file_name_target_gb = 'NZ_CP009974.1.gb'
target_cds_lt = extract_cds_lt_in_genome ('{0}/input/{1}'.format(path, file_name_target_gb))

# edgar_orthologous file
file_name_ortho_edgar = 'edgar_ortho.xlsx'
edgar_orthologous ('{0}/input/{1}'.format(path, file_name_ortho_edgar))

# blastp_orthologous file
file_name_ortho_blastp = 'blastp_ortho.xlsx'
blast_orthologous ('{0}/input/{1}'.format(path, file_name_ortho_blastp))

# (optional) Change the locus tag of orthologous data to the old locus tag of the model
file_name_change_gb_lst = ['NC_002947.4.gb']
change_locus_tag(file_name_change_gb_lst, file_name_ortho_edgar)

# orthologous_file_total_trimming
total_orthologous = make_total_orthologous(file_name_ortho_edgar, file_name_ortho_blastp)  # integrity of edgar files and blast files into one (priority : EDGAR > BLAST+)


### reconstructing model file

# input Reference models
file_name_model_lst = ['iJN1463.xml', 'iPAE1146.xml'] # write model file names in order of highest priority.
file_name_model_objective = 'iJN1463.xml'
gene_remain_lst = ['PP_s0001','SPONTANEOUS','PA2366']
our = 18.5
draft_model = make_draft_model(file_name_model_lst, file_name_model_objective, gene_remain_lst, our)

# rename in draft_model
draft_model = rename(draft_model, total_orthologous)

# (optional) export draft_model
output_excel_sbml_file (draft_model, model_id = 'draft_model')