# Balance Metabolic Models

With Frowins Scripts I was able to obtain stoichiometric consistent models; next step to balance the charges.
I use the BIGG database to compare the info there with model info and automatically replace charge and formula of metabolites with BIGG info if it seems more reasonable.
After this automated step, the remaining unbalanced reactions need to be manually curated.
For a few reactions it is also possible to compare them to the reaction used in other very well curated models (here I used the e.coli iml1515 model) to check how they balanced the reaction.

## Imports & Paths

In [1]:
import os
import csv
from collections import Counter
from cobra.io import read_sbml_model, write_sbml_model
from cobra.manipulation.validate import check_mass_balance
import requests
import pandas as pd
from concurrent.futures import ThreadPoolExecutor, as_completed
from macaw.main import run_all_tests
import ast
import matplotlib.pyplot as plt

In [2]:
# path to xml files
models_path = "../Models/02_mass_balance/"

In [3]:
# import models after mass-balancing through Frowins scripts
models = {}
for model_name in (f for f in os.listdir(models_path) if f.endswith(".xml")):
    model = read_sbml_model(f"{models_path}/{model_name}")
    model.solver = "cplex"
    models[model_name[:3]] = model  # it takes first three letters of the xml filename as the model name

models = {key: models[key] for key in sorted(models.keys())}  # sorts the dictionary alphabetically (AA1...AA7) because of reasons it doesn't do this while creating

# this is specifically for my 7 models to be able to access them more easily; all models are stored in the "model" dict and can be accessed e.g. through models["AA1"] depending what you set as model name
AA1, AA2, AA3, AA4, AA5, AA6, AA7 = [models[f"AA{i}"] for i in range(1, 8)]

Restricted license - for non-production use only - expires 2026-11-23


In [7]:
AA3.reactions.get_by_id("GLCRD")

KeyError: 'GLCRD'

## Functions

In [4]:
def get_objective_value(model):
    print(f"value of objective for {model} is {model.optimize().objective_value}")

In [5]:
# checks the mass and charge balance for every reaction in a model
def check_balance(model, print_results=True):
    unbalanced_reactions = check_mass_balance(model)
    if print_results:
        print("There are {0} unbalanced reactions in {1}".format(len(unbalanced_reactions), model) )
    return unbalanced_reactions

In [6]:
# returns a pandas dataframe with metabolite info for a specific cobra model that includes: bigg_id, model_id, formula and charge
# NOTE: bigg_id could be wrong (i.e. not the real id on the website) because it only takes the model_id and removes the _compartment
def extract_met_info_model(model):
    met_infos = []

    for met in model.metabolites:
        met_infos.append({
            "bigg_id": met.id.rsplit("_", 1)[0],  # strip compartment so that it matches the actual BIGG ID that also doesn't have compartments (e.g., glc__D_c to glc__D)
            "model_id": met.id,
            "model_formula": met.formula,
            "model_charge": met.charge
        })

    met_infos = pd.DataFrame(met_infos)
    return met_infos

In [7]:
# returns pandas dataframe with metabolite info from the model and from big and compares info about formula and charge state
def compare_bigg_modelMets(model_mets, list_unbalanced_mets):
    # Merge on BiGG ID
    merged = model_mets.merge(df_bigg_met, on="bigg_id", how="left")

    merged["charge_match"] = merged.apply(
        lambda row: row["model_charge"] in row["charges"] if isinstance(row["charges"], list) else False,
        axis=1
    )

    merged["formula_match"] = merged.apply(
        lambda row: row["model_formula"] in row["formulas"] if isinstance(row["formulas"], (list, set)) else False, axis=1
    )

    # adds another column to check if the metabolites are part of an unbalanced reaction (false = not part of unbalanced reactions, true = part of unbalanced reaction(s))
    merged['unbalanced'] = merged['model_id'].isin(list_unbalanced_mets)
    # merged['unbalanced'] = merged['model_id'].isin(list_unbalanced_mets).astype(int) (instead of true/false with 1/0)

    return merged

In [8]:
# filters the merged df to only show the rows (aka metabolites) where model info and bigg info do NOT match
def get_mismatches_after_merge(df_merge):
    mismatches = df_merge.loc[(df_merge['formula_match'] == False) | (df_merge['charge_match'] == False)]
    mismatches = mismatches[["model_id", "bigg_id", "model_charge", "charges", "model_formula", "formulas", "charge_match", "formula_match", "unbalanced"]]

    return mismatches

In [9]:
# returns a confusion matrix showing how much mismatching info about charge state and/or formula there is
# this function either takes one df from one model as an input or the merged dict where all models are saved
def get_confmat_charge_formula(df_merge):
    if isinstance(df_merge, pd.DataFrame):
        conf_matrix = df_merge.groupby(["charge_match", "formula_match"]).size().reset_index(name='count')
        print(conf_matrix)
    elif isinstance(df_merge, dict):
        conf_matrix = {
            "charge_match": ["False", "False", "True", "True"],
            "forumla_match": ["False", "True", "False", "True"],
        }
        conf_matrix = pd.DataFrame(conf_matrix)
        for i, item in enumerate(df_merge.values()):
            conf_matrix_model = item.groupby(["charge_match", "formula_match"]).size().reset_index(name='count')
            name = f"AA{i+1}"
            conf_matrix.insert(i+2, name, conf_matrix_model["count"])
        print(conf_matrix)

In [4]:
def get_rxn(model, rxn_id, print_mass = False, print_GPR=False):
    rxn = model.reactions.get_by_id(rxn_id)
    charges = {met.id: met.charge for met in rxn.metabolites}
    masses = {met.id: met.formula for met in rxn.metabolites}
    gpr = rxn.gene_reaction_rule
    if print_mass and print_GPR:
        print(rxn, charges, masses, gpr)
    elif print_mass:
        print(rxn, charges, masses)
    elif print_GPR:
        print(rxn, charges)
        print(gpr)
    else:
        print(rxn, charges)

In [5]:
def get_met(model, met_id):
    met = model.metabolites.get_by_id(met_id)
    rxns = {rxn.id:model.reactions.get_by_id(rxn.id).reaction for rxn in met.reactions}
    print(f"{met.name} ({met.formula})")
    print(rxns)

## Evaluate current state of models regarding charge balance

In [13]:
# check flux through objective; this also shows that the models are not working correctly because the values are not feasible in vivo
for model in models.values():
    get_objective_value(model)

value of objective for AA1 is 66.94193425569041
value of objective for AA2 is 58.481001782528594
value of objective for AA3 is 43.436128611698244
value of objective for AA4 is 102.68616306092056
value of objective for AA5 is 43.93218694400248
value of objective for AA6 is 65.18134873230218
value of objective for AA7 is 50.884800170810124


### get all charge unbalanced reactions for all models

In [11]:
# dictionary to store unbalanced reactions
unbalanced_reactions_dict = {}

# models is the dict where all models are stored that were "imported" witch read_sbml_file()
for name, model in models.items():
    unbalanced_reactions = check_balance(model)
    unbalanced_reactions_dict[name] = unbalanced_reactions

# these numbers are in accordance with the numbers for "charge balance" reactions in the memote report

There are 445 unbalanced reactions in AA1
There are 473 unbalanced reactions in AA2
There are 372 unbalanced reactions in AA3
There are 410 unbalanced reactions in AA4
There are 360 unbalanced reactions in AA5
There are 451 unbalanced reactions in AA6
There are 452 unbalanced reactions in AA7


In [12]:
# We know how many unbalanced reactions each model has on their own but what is the overlap?
unique_reactions = set()

# Loop through all models and collect reaction IDs
for model_name, unbalanced_reactions in unbalanced_reactions_dict.items():
    # Add the reaction ID to the set (sets are by default like 'Mengen', i.e. they only have unique elements)
    unique_reactions.update(reaction.id for reaction in unbalanced_reactions.keys())

# this is a list of all the reaction IDs that are charge unbalanced throughout all models
unique_reaction_ids = list(unique_reactions)

print("There are {0} charge unbalanced reactions throughout all models.".format(len(unique_reaction_ids)))
# print(unique_reaction_ids)


There are 808 charge unbalanced reactions throughout all models.


In [13]:
# we now the unbalanced reactions but which metabolites are part of these?
# go through all unbalanced (unique) reactions and get all participating metabolites
metabolite_counter_compartment = Counter()
metabolite_counter_name = Counter()
seen_reactions = set()  # Track reactions that were already counted

for model in models.values():
    for rxn_id in unique_reaction_ids:
        if rxn_id in model.reactions and rxn_id not in seen_reactions:
            reaction = model.reactions.get_by_id(rxn_id)
            for metabolite in reaction.metabolites:
                metabolite_counter_compartment[metabolite.id] += 1  # this is compartment specific, e.g. h2o_c and h2o_p are different metabolites
                metabolite_counter_name[metabolite.name] += 1  # h2o is only counted once not dependent on compartment
            seen_reactions.add(rxn_id)  # Mark this reaction as counted


In [None]:
# Write to CSV: metabolite ID and how many times was this metabolite part of an unbalanced reaction
with open("metabolite_counts.csv", mode="w", newline="") as file:
    writer = csv.writer(file)
    writer.writerow(["metabolite_id (compartment specific)", "count"])
    for met_name, count in metabolite_counter_compartment.items():
        writer.writerow([met_name, count])

In [338]:
# these are the amounts of unique metabolites that are part of unbalanced reactions

# compartment specific, e.g. h20_c and h2o_p are counted separately
print(len(metabolite_counter_compartment))
# h2o only exists once
print(len(metabolite_counter_name))

990
880


In [32]:
# get the amount of metabolites that are only part of v reactions or check which metabolites are part of the most reactions
filtered = {m: v for m, v in metabolite_counter_compartment.items() if v == 1}
print(len(filtered))

print(metabolite_counter_compartment.most_common(5))


434
[('h_c', 417), ('h2o_c', 268), ('atp_c', 138), ('coa_c', 125), ('ppi_c', 103)]


In [33]:
%matplotlib notebook

# Count how many keys have each count (i.e. histogram of values)
count_distribution = Counter(metabolite_counter_compartment.values())

# Plot
plt.bar(count_distribution.keys(), count_distribution.values())
plt.xlabel('Amount of Reactions a Metabolite is Part of')
plt.ylabel('Number of Metabolites with that count')
plt.title('Distribution of Metabolite occurrences in unbalanced reactions')
plt.show()


<IPython.core.display.Javascript object>

In [13]:
# the number show how often one metabolite is part of an unbalanced reaction
metabolite_counter_compartment

Counter({'h_c': 417,
         'h2o_c': 268,
         'atp_c': 138,
         'coa_c': 125,
         'ppi_c': 103,
         'pi_c': 87,
         'amp_c': 85,
         'adp_c': 62,
         'h2o_p': 51,
         'co2_c': 50,
         'nadh_c': 48,
         'nad_c': 46,
         'h_p': 44,
         'pyr_c': 39,
         'nadph_c': 37,
         'nadp_c': 34,
         'fad_c': 34,
         'fadh2_c': 34,
         'ACP_c': 31,
         'o2_c': 28,
         'cmp_c': 26,
         'g3p_c': 21,
         'glyc3p_c': 19,
         'glu__L_c': 19,
         'fe2_c': 19,
         'pep_c': 17,
         'nh4_c': 17,
         'pi_p': 14,
         'accoa_c': 13,
         'r5p_c': 13,
         'f6p_c': 12,
         'gly_c': 12,
         'fmn_c': 11,
         'ctp_c': 10,
         '2dr1p_c': 10,
         'akg_c': 10,
         'ser__L_c': 10,
         'dhap_c': 9,
         'gtp_c': 8,
         'pppi_c': 8,
         'asp__L_c': 8,
         'ala__L_c': 8,
         'uacgam_c': 8,
         'thmpp_c': 8,
         

In [14]:
# all metabolites from unbalanced reactions
# this list is very important later on for some of the functions
unbalanced_mets = list(metabolite_counter_compartment.keys())

## BIGG

In [14]:
import requests
import pandas as pd
import csv
from concurrent.futures import ThreadPoolExecutor, as_completed

# Get list of all universal reactions
base_url = "http://bigg.ucsd.edu/api/v2/"
list_url = base_url + "universal/reactions"
response = requests.get(list_url)

# check if request is going through
if response.status_code != 200:
    raise Exception("Failed to fetch reaction list")

reactions = response.json()["results"]
print(f"Found {len(reactions)} reactions. Fetching details...")

# Function that fetches specific information for one reaction
def fetch_reaction_details(reaction):
    bigg_id = reaction.get("bigg_id", "")
    name = reaction.get("name", "")
    url = f"{base_url}universal/reactions/{bigg_id}"
    try:
        r = requests.get(url, timeout=15)
        if r.status_code == 200:
            data = r.json()
            metabolites = data.get("metabolites", [])

            # Safe quoting
            safe_name = str(name).replace('"', "'")

            return {
                "bigg_id": bigg_id,
                "name": f'"{safe_name}"',
                "metabolites": str(metabolites)
            }
    except Exception as e:
        print(f"Error with {bigg_id}: {e}")

    # Fallback if request fails
    safe_name = str(name).replace('"', "'")
    return {
        "bigg_id": bigg_id,
        "name": f'"{safe_name}"',
        "metabolites": "[]"
    }

# Use ThreadPoolExecutor to parallelize requests
results = []
with ThreadPoolExecutor(max_workers=35) as executor:
    futures = [executor.submit(fetch_reaction_details, rxn) for rxn in reactions]
    for i, future in enumerate(as_completed(futures)):
        results.append(future.result())
        if i % 500 == 0:
            print(f"{i}/{len(reactions)} done...")

Found 28302 reactions. Fetching details...
0/28302 done...
500/28302 done...
1000/28302 done...
1500/28302 done...
2000/28302 done...
2500/28302 done...
3000/28302 done...
3500/28302 done...
4000/28302 done...
4500/28302 done...
5000/28302 done...
5500/28302 done...
6000/28302 done...
6500/28302 done...
7000/28302 done...
7500/28302 done...
8000/28302 done...
8500/28302 done...
9000/28302 done...
9500/28302 done...
10000/28302 done...
10500/28302 done...
11000/28302 done...
11500/28302 done...
12000/28302 done...
12500/28302 done...
13000/28302 done...
13500/28302 done...
14000/28302 done...
14500/28302 done...
15000/28302 done...
15500/28302 done...
16000/28302 done...
16500/28302 done...
17000/28302 done...
17500/28302 done...
18000/28302 done...
18500/28302 done...
19000/28302 done...
19500/28302 done...
20000/28302 done...
20500/28302 done...
21000/28302 done...
21500/28302 done...
22000/28302 done...
22500/28302 done...
23000/28302 done...
23500/28302 done...
24000/28302 done...
2

In [None]:
df_bigg_rea = pd.DataFrame(results)

# add a equation column to the bigg reaction df that includes the reaction equation in a format that we can directly use to overwrite the model equation with
df_bigg_rea["equation"] = None

for i in range(0, len(df_bigg_rea)):
    dict_equation = {}
    for met in df_bigg_rea["metabolites"][i]:
        name = met["bigg_id"] + "_" + met["compartment_bigg_id"]
        dict_equation[name] = met["stoichiometry"]

    df_bigg_rea.at[i, 'equation'] = dict_equation

# Save to CSV
# df_bigg_rea.to_csv("../bigg_reactions_complete.csv", index=False, quoting=csv.QUOTE_MINIMAL)

In [15]:
df_bigg_rea = pd.read_csv("../bigg_reactions_complete.csv", quotechar='"', usecols=['bigg_id', 'name', 'metabolites', 'equation'])
df_bigg_rea["metabolites"] = df_bigg_rea["metabolites"].apply(ast.literal_eval)
df_bigg_rea["equation"] = df_bigg_rea["equation"].apply(ast.literal_eval)

In [1]:
### Code by Chat-GPT ###
# this downloads metabolite information for all BIGG metabolites with their bigg ID, name, formulae and charge and saves it to a csv file
# there are 9088 metabolites
# only needs to be executed ONCE to get the csv, in the next step, we'll read that csv again and turn it into a df

import requests
import pandas as pd
import csv
from concurrent.futures import ThreadPoolExecutor, as_completed

# Get list of all universal metabolites
base_url = "http://bigg.ucsd.edu/api/v2/"
list_url = base_url + "universal/metabolites"
response = requests.get(list_url)

# check if request is going through
if response.status_code != 200:
    raise Exception("Failed to fetch metabolite list")

metabolites = response.json()["results"]
print(f"Found {len(metabolites)} metabolites. Fetching details...")

# function that fetches specific information for one metabolite, i.e. BIGG ID, name, formulas and charges
def fetch_metabolite_details(met):
    bigg_id = met.get("bigg_id", "")
    name = met.get("name", "")
    url = f"{base_url}universal/metabolites/{bigg_id}"
    try:
        r = requests.get(url, timeout=10)
        if r.status_code == 200:
            data = r.json()  # converts JSON response to a dictionary (data)
            formulae = data.get("formulae", [])  # if formula not available, use empty list []
            charges = data.get("charges", [])

            # Safe quoting for CSV
            safe_name = str(name).replace('"', "'")
            name = f'"{safe_name}"'

            return {
                "bigg_id": bigg_id,
                "name": name,
                "formulas": str(formulae),
                "charges": str(charges)
            }
    except Exception as e:
        print(f"Error with {bigg_id}: {e}")

    safe_name = str(name).replace('"', "'")
    name = f'"{safe_name}"'
    return {
        "bigg_id": bigg_id,
        "name": name,
        "formulas": "[]",
        "charges": "[]"
    }


# Use ThreadPoolExecutor to parallelise requests
results = []
with ThreadPoolExecutor(max_workers=25) as executor:
    futures = [executor.submit(fetch_metabolite_details, met) for met in metabolites]
    for i, future in enumerate(as_completed(futures)):
        results.append(future.result())
        if i % 500 == 0:
            print(f"{i}/{len(metabolites)} done...")

# Save to CSV
df = pd.DataFrame(results)
df.to_csv("../bigg_metabolites_complete.csv", index=False, quoting=csv.QUOTE_MINIMAL)
print("Saved to bigg_metabolites_complete.csv")


Found 9088 metabolites. Fetching details...
0/9088 done...
500/9088 done...
1000/9088 done...
1500/9088 done...
2000/9088 done...
2500/9088 done...
3000/9088 done...
3500/9088 done...
4000/9088 done...
4500/9088 done...
5000/9088 done...
5500/9088 done...
6000/9088 done...
6500/9088 done...
7000/9088 done...
7500/9088 done...
8000/9088 done...
8500/9088 done...
9000/9088 done...
Saved to bigg_metabolites_complete.csv


In [16]:
# Read previously created CSV with all BIGG metabolites
df_bigg_met = pd.read_csv("../bigg_metabolites_complete.csv", quotechar='"')

# Convert stringified lists back to real lists
df_bigg_met["formulas"] = df_bigg_met["formulas"].apply(ast.literal_eval)
df_bigg_met["charges"] = df_bigg_met["charges"].apply(ast.literal_eval)


In [22]:
df_bigg_met

Unnamed: 0,bigg_id,name,formulas,charges
0,10fthf6glu,"""10-formyltetrahydrofolate-[Glu](6)""",[C45H51N12O22],[-7]
1,10fthf,"""10-Formyltetrahydrofolate""",[C20H21N7O7],[-2]
2,10fthfglu__L,"""10-Formyltetrahydrofolyl L-glutamate""",[C25H28N8O10],[]
3,10fthf5glu,"""10-formyltetrahydrofolate-[Glu](5)""",[C40H45N11O19],[-6]
4,10m3ouACP,"""10-methyl-3-oxo-undecanoyl-ACP""",[C23H41N2O9PRS],[0]
...,...,...,...,...
9083,zymstest_SC,"""Zymosterol ester yeast specific C1694H2993O101""",[C1694H2993O101],[0]
9084,xylu__L,"""L-Xylulose""",[C5H10O5],[0]
9085,zymst,"""Zymosterol C27H44O""",[C27H44O],[0]
9086,zymstnl,"""5alpha-cholest-8-en-3beta-ol""",[C27H46O],[0]


In [21]:
df_bigg_rea

Unnamed: 0,bigg_id,name,metabolites,equation
0,10FTHF7GLUtm,"""7-glutamyl-10FTHF transport, mitochondrial""","[{'bigg_id': '10fthf7glu', 'name': '10-formylt...","{'10fthf7glu_c': 1.0, '10fthf7glu_m': -1.0}"
1,11DOCRTSLtm,"""11-deoxycortisol intracellular transport""","[{'bigg_id': '11docrtsl', 'name': '11docrtsl c...","{'11docrtsl_c': -1.0, '11docrtsl_m': 1.0}"
2,11DOCRTSTRNtr,"""11-deoxycorticosterone intracellular transport""","[{'bigg_id': '11docrtstrn', 'name': '11-Deoxyc...","{'11docrtstrn_c': -1.0, '11docrtstrn_r': 1.0}"
3,10FTHF5GLUtm,"""5-glutamyl-10FTHF transport, mitochondrial""","[{'bigg_id': '10fthf5glu', 'name': '10-formylt...","{'10fthf5glu_c': 1.0, '10fthf5glu_m': -1.0}"
4,10FTHF7GLUtl,"""7-glutamyl-10FTHF transport, lysosomal""","[{'bigg_id': '10fthf7glu', 'name': '10-formylt...","{'10fthf7glu_c': -1.0, '10fthf7glu_l': 1.0}"
...,...,...,...,...
28297,ZYMSTESTH_SC,"""Zymosterol ester hydrolase yeast specific""","[{'bigg_id': 'h', 'name': 'H+', 'compartment_b...","{'h_c': 1.0, 'h2o_c': -1.0, 'hdca_c': 0.02, 'h..."
28298,ZYMSTt,"""Zymosterol reversible transport""","[{'bigg_id': 'zymst', 'name': 'Zymosterol C27H...","{'zymst_c': 1.0, 'zymst_e': -1.0}"
28299,ZYMSTR,"""Zymosterol reductase""","[{'bigg_id': 'h', 'name': 'H+', 'compartment_b...","{'h_c': -1.0, 'nadp_c': 1.0, 'nadph_c': -1.0, ..."
28300,ZN2tpp,"""Zinc transport in via permease (no H+)""","[{'bigg_id': 'zn2', 'name': 'Zinc', 'compartme...","{'zn2_c': 1.0, 'zn2_p': -1.0}"


In [17]:
# get metabolite info for all 7 models (i.e. formula and charge state) and save the 7 df's in a dict
model_mets = {f"AA{i}_mets": extract_met_info_model(models[f"AA{i}"]) for i in range(1, 8)}

In [20]:
model_mets["AA1_mets"]

Unnamed: 0,bigg_id,model_id,model_formula,model_charge
0,10fthf,10fthf_c,C20H21N7O7,-2
1,12dgr120,12dgr120_c,C27H52O5,0
2,12dgr120,12dgr120_p,C27H52O5,0
3,12dgr140,12dgr140_c,C31H60O5,0
4,12dgr140,12dgr140_p,C31H60O5,0
...,...,...,...,...
1527,xylb,xylb_e,C10H18O9,0
1528,xylu__D,xylu__D_c,C5H10O5,0
1529,zn2,zn2_c,Zn,2
1530,zn2,zn2_e,Zn,2


In [18]:
# merge the metabolite info from the models with the bigg info; creates 2 columns to show if charge/formula info match between model and bigg
# saves all 7 df's in a dict model_merged but for easier access to the individual df's, there are saved as objects (e.g. AA1_merged) but these are still linked to the dict
model_merged = {f"AA{i}_merged": compare_bigg_modelMets(model_mets[f"AA{i}_mets"], unbalanced_mets) for i in range(1, 8)}
AA1_merged, AA2_merged, AA3_merged, AA4_merged, AA5_merged, AA6_merged, AA7_merged = [model_merged[f"AA{i}_merged"] for i in range(1, 8)]

In [78]:
AA2_merged

Unnamed: 0,bigg_id,model_id,model_formula,model_charge,name,formulas,charges,charge_match,formula_match,unbalanced
0,10fthf,10fthf_c,C20H21N7O7,-2,"""10-Formyltetrahydrofolate""",[C20H21N7O7],[-2],True,True,True
1,12dgr120,12dgr120_c,C27H52O5,0,"""1,2-Diacyl-sn-glycerol (didodecanoyl, n-C12:0)""",[C27H52O5],[0],True,True,True
2,12dgr120,12dgr120_p,C27H52O5,0,"""1,2-Diacyl-sn-glycerol (didodecanoyl, n-C12:0)""",[C27H52O5],[0],True,True,True
3,12dgr140,12dgr140_c,C31H60O5,0,"""1,2-Diacyl-sn-glycerol (ditetradecanoyl, n-C1...",[C31H60O5],[0],True,True,True
4,12dgr140,12dgr140_p,C31H60O5,0,"""1,2-Diacyl-sn-glycerol (ditetradecanoyl, n-C1...",[C31H60O5],[0],True,True,True
...,...,...,...,...,...,...,...,...,...,...
1862,xylu__L,xylu__L_e,C5H10O5,0,"""L-Xylulose""",[C5H10O5],[0],True,True,False
1863,xylu__L,xylu__L_p,C5H10O5,0,"""L-Xylulose""",[C5H10O5],[0],True,True,False
1864,zn2,zn2_c,Zn,2,"""Zinc""",[Zn],[2],True,True,False
1865,zn2,zn2_e,Zn,2,"""Zinc""",[Zn],[2],True,True,False


In [19]:
# extracts just the rows where charge and/or formula dont match with bigg info and saves them into a dict with the seven df's
model_mismatch = {f"AA{i}_mismatch": get_mismatches_after_merge(model_merged[f"AA{i}_merged"]) for i in range(1, 8)}

In [156]:
model_mismatch["AA1_mismatch"]

Unnamed: 0,model_id,bigg_id,model_charge,charges,model_formula,formulas,charge_match,formula_match,unbalanced
28,1btol_c,1btol,0,[],C4H10O,[C4H10O],False,True,False
32,1p2cbxl_c,1p2cbxl,0,[-1],C5H6NO2,[C5H6NO2],False,True,True
35,23ddhb_c,23ddhb,0,[-1],C7H7O4,[C7H7O4],False,True,True
46,2agpe160_c,2agpe160,0,[0],C21H44NO7P,[C21H44NO7P1],True,False,True
47,2agpe160_p,2agpe160,0,[0],C21H44NO7P,[C21H44NO7P1],True,False,False
...,...,...,...,...,...,...,...,...,...
1507,val__D_p,val__D,0,[0],C5H9NO2,[C5H11NO2],True,False,False
1522,xyl3_c,xyl3,0,[],C15H26O13,[C15H26O13],False,True,False
1523,xyl3_e,xyl3,0,[],C15H26O13,[C15H26O13],False,True,False
1526,xylb_c,xylb,0,[],C10H18O9,[C10H18O9],False,True,False


In [22]:
# confusion matrix to show for how many metabolites there are differences in the infos between current model and bigg
get_confmat_charge_formula(AA1_merged)

   charge_match  formula_match  count
0         False          False    178
1         False           True    198
2          True          False    103
3          True           True   1053


In [20]:
# function also takes dict with all the 7 df's and creates one big confusion matrix
get_confmat_charge_formula(model_merged)

  charge_match forumla_match   AA1   AA2   AA3   AA4   AA5   AA6   AA7
0        False         False   178   233   149    96   133   195   189
1        False          True   198   258   235   181   176   238   230
2         True         False   103   103    96   112    95   103    74
3         True          True  1053  1273  1028  1377  1038  1266  1183


## Combine Bigg Mismatches and unbalanced reactions

In [98]:
# for all metabolites in AA1, there are 816 metabolites (53%) that are not part of unbalanced reactions and 716 metabolites that are in unbalanced reactions
# metabolites are only part
AA1_merged['unbalanced'].value_counts()

unbalanced
False    816
True     716
Name: count, dtype: int64

In [99]:
# if we only look at metabolites where BIGG infos and model infos do NOT match, we now have 319 metabolites (66.6%) that are in unbalanced reactions and only 33.4% of these metabolites are in balanced reactions
model_mismatch["AA1_mismatch"]['unbalanced'].value_counts()

unbalanced
True     319
False    160
Name: count, dtype: int64

In [100]:
combo_counts = AA1_merged.groupby(['unbalanced', 'charge_match', 'formula_match']).size().reset_index(name='count')
print(combo_counts)
# "False True True" is optimal case, i.e. metabolite has infos that matches with bigg and is not in any unbalanced reaction

   unbalanced  charge_match  formula_match  count
0       False         False          False     59
1       False         False           True     56
2       False          True          False     45
3       False          True           True    656
4        True         False          False    119
5        True         False           True    142
6        True          True          False     58
7        True          True           True    397


In [91]:
AA6_merged['unbalanced'].value_counts()

unbalanced
0    1026
1     776
Name: count, dtype: int64

In [92]:
model_mismatch["AA6_mismatch"]['unbalanced'].value_counts()

unbalanced
1    343
0    193
Name: count, dtype: int64

In [101]:
combo_counts = AA6_merged.groupby(['unbalanced', 'charge_match', 'formula_match']).size().reset_index(name='count')
print(combo_counts)

   unbalanced  charge_match  formula_match  count
0       False         False          False     83
1       False         False           True     63
2       False          True          False     47
3       False          True           True    833
4        True         False          False    112
5        True         False           True    175
6        True          True          False     56
7        True          True           True    433


## Overwrite model with BIGG information
We want to try out if the information on BIGG is valuable to our model to help with the big amount of charge unbalanced reactions.
That means for reactions or rather metabolites were the bigg information is different to our model info, we can try to overwrite it with the bigg info.

At the moment I am only overwriting model info with BIGG info if the BIGG info only gives one charge/formula. If they're multiple possible charge states/formulas I should do manual curation. I also only overwrite it when the metabolite is part of an unbalanced reaction.

### Create a copies of our models
with these I can introduces changes and then can compare it to the OG model

In [20]:
# read the xml files a second time to create "copies" of our models that we can try to curate and compare to the original model
# creating an actual copy (new object with deepcopy()) takes a very long time that is why, I use read_sbml_model again

models_path = "../Models/02_mass_balance/"  # if you wanna apply all the following changes to the models, use this path
# models_path = "../Models/03_charge_balance/"  # if you only need the curated models at the point of current curation, use this path (you cant apply the bigg/manual changes to this model because metabolites/reactions were deleted and some IDs do not exist anymore)
models_curation = {}
for model_name in (f for f in os.listdir(models_path) if f.endswith(".xml")):
    model = read_sbml_model(f"{models_path}/{model_name}")
    model.solver = "cplex"
    name = str(model_name[:3]+"_curate")
    models_curation[name] = model

models_curation = {key: models_curation[key] for key in sorted(models_curation.keys())}  # sorts the dictionary alphabetically
AA1_curate, AA2_curate, AA3_curate, AA4_curate, AA5_curate, AA6_curate, AA7_curate = [models_curation[f"AA{i}_curate"] for i in range(1, 8)]

### Use BIGG metabolite info to overwrite model metabolite infos
--> only overwrite when bigg info is unambiguously

In [22]:
# Overwrite model metabolite info with BIGG info if the BIGG info is unambiguously, i.e. only one charge state/formula
def overwrite_with_BIGG_metabolites(model, merged_df):
    n_unbalanced = len(check_balance(model, print_results=False))
    for i in range(0,len(merged_df)):
        # check if there is only one charge state
        if len(merged_df["charges"][i]) == 1:
            model.metabolites.get_by_id(merged_df["model_id"][i]).charge = int(merged_df["charges"][i][0])

        else:  # we only need this to get the right datatype (int) for the charge state to save the model later on because apparently it got fucked up
            model.metabolites.get_by_id(merged_df["model_id"][i]).charge = int(model.metabolites.get_by_id(merged_df["model_id"][i]).charge)

        # check if there is only one formula (and at least one charge state because otherwise that model formula could be right)
        if len(merged_df["formulas"][i]) == 1 and len(merged_df["charges"][i]) != 0:
            if not "X" in merged_df["formulas"][i][0] and not "R" in merged_df["formulas"][i][0]:
                model.metabolites.get_by_id(merged_df["model_id"][i]).formula = merged_df["formulas"][i][0]

    n_unbalanced_update = len(check_balance(model, print_results=False))
    print(f'{model.id}: There were {n_unbalanced} unbalanced reactions before and now there are {n_unbalanced_update} after overwriting metabolite info with BIGG data.')

In [23]:
overwrite_with_BIGG_metabolites(AA1_curate, AA1_merged)
overwrite_with_BIGG_metabolites(AA2_curate, AA2_merged)
overwrite_with_BIGG_metabolites(AA3_curate, AA3_merged)
overwrite_with_BIGG_metabolites(AA4_curate, AA4_merged)
overwrite_with_BIGG_metabolites(AA5_curate, AA5_merged)
overwrite_with_BIGG_metabolites(AA6_curate, AA6_merged)
overwrite_with_BIGG_metabolites(AA7_curate, AA7_merged)

AA1: There were 445 unbalanced reactions before and now there are 191 after overwriting metabolite info with BIGG data.
AA2: There were 473 unbalanced reactions before and now there are 212 after overwriting metabolite info with BIGG data.
AA3: There were 372 unbalanced reactions before and now there are 157 after overwriting metabolite info with BIGG data.
AA4: There were 410 unbalanced reactions before and now there are 167 after overwriting metabolite info with BIGG data.
AA5: There were 360 unbalanced reactions before and now there are 170 after overwriting metabolite info with BIGG data.
AA6: There were 451 unbalanced reactions before and now there are 205 after overwriting metabolite info with BIGG data.
AA7: There were 452 unbalanced reactions before and now there are 184 after overwriting metabolite info with BIGG data.


### Use BIGG reactions to overwrite model reactions info

In [24]:
def overwrite_with_BIGG_reactions(model):
    unbalanced_rxns = check_balance(model, print_results=False)
    unbalanced_rxns = [r.id for r in unbalanced_rxns]

    for rxn in unbalanced_rxns:

        new_react = df_bigg_rea[df_bigg_rea['bigg_id'] == rxn]["equation"].iloc[0]

        new_mets_dict = {model.metabolites.get_by_id(met_id): coeff for met_id, coeff in new_react.items()}

        reaction = model.reactions.get_by_id(rxn)
        reaction.subtract_metabolites(reaction.metabolites)
        reaction.add_metabolites(new_mets_dict)

    unbalanced_rxns_after = check_balance(model, print_results=False)
    unbalanced_rxns_after = [r.id for r in unbalanced_rxns_after]

    print(f'{model.id}: There were {len(unbalanced_rxns)} unbalanced reactions before and now there are {len(unbalanced_rxns_after)} after overwriting reaction info with BIGG data.')

In [25]:
overwrite_with_BIGG_reactions(AA1_curate)
overwrite_with_BIGG_reactions(AA2_curate)
overwrite_with_BIGG_reactions(AA3_curate)
overwrite_with_BIGG_reactions(AA4_curate)
overwrite_with_BIGG_reactions(AA5_curate)
overwrite_with_BIGG_reactions(AA6_curate)
overwrite_with_BIGG_reactions(AA7_curate)

AA1: There were 191 unbalanced reactions before and now there are 185 after overwriting reaction info with BIGG data.
AA2: There were 212 unbalanced reactions before and now there are 207 after overwriting reaction info with BIGG data.
AA3: There were 157 unbalanced reactions before and now there are 152 after overwriting reaction info with BIGG data.
AA4: There were 167 unbalanced reactions before and now there are 163 after overwriting reaction info with BIGG data.
AA5: There were 170 unbalanced reactions before and now there are 164 after overwriting reaction info with BIGG data.
AA6: There were 205 unbalanced reactions before and now there are 200 after overwriting reaction info with BIGG data.
AA7: There were 184 unbalanced reactions before and now there are 179 after overwriting reaction info with BIGG data.


### Manual changes
These changes were made for the AA1 model in a manual curation process. Now I want to apply these changes to the rest of the models if they also have the specific metabolites and reactions.

In [26]:
def overwrite_charge(model, rxn_id, new_charge):
    if rxn_id in model.metabolites:
        model.metabolites.get_by_id(rxn_id).charge = new_charge

In [27]:
def overwrite_formula(model, rxn_id, new_formula):
    if rxn_id in model.metabolites:
        model.metabolites.get_by_id(rxn_id).formula = new_formula

In [28]:
def overwrite_reaction(model, rxn_id, new_rxn_dict):
    if rxn_id in model.reactions:
        rxn = model.reactions.get_by_id(rxn_id)
        rxn.subtract_metabolites(rxn.metabolites)
        rxn.add_metabolites(new_rxn_dict)

In [29]:
def delete_metabolite(model, met_id):
    if met_id in model.metabolites:

        if len(model.metabolites.get_by_id(met_id).reactions) == 0:
            met = model.metabolites.get_by_id(met_id)
            model.metabolites.remove(met)

        else:
            print(f'metabolite {met_id} cannot be deleted from {model.id} because of reaction(s): {model.metabolites.get_by_id(met_id).reactions}')


In [30]:
def delete_reaction(model, rxn_id):
    if rxn_id in model.reactions:
        rxn = model.reactions.get_by_id(rxn_id)
        model.remove_reactions([rxn])

In [69]:
def delete_duplicate_reaction(model, rxn_old, rxn_new, gpr = False):
    if rxn_old in model.reactions and rxn_new in model.reactions:
        rxn_o = model.reactions.get_by_id(rxn_old)
        rxn_n = model.reactions.get_by_id(rxn_new)

        if gpr == True: # combine GPR aka keep one of them if both have something in common or are empty
            if len(rxn_n.gene_reaction_rule) == 0 and len(rxn_o.gene_reaction_rule) == 0:
                rxn_n.gene_reaction_rule = ''
            if len(rxn_n.gene_reaction_rule) == 0:
                rxn_n.gene_reaction_rule = rxn_o.gene_reaction_rule
            if len(rxn_o.gene_reaction_rule) == 0:
                rxn_n.gene_reaction_rule = rxn_n.gene_reaction_rule
            if rxn_n.gene_reaction_rule in rxn_o.gene_reaction_rule or rxn_o.gene_reaction_rule in rxn_n.gene_reaction_rule: # check if gpr's overlap
                rxn_n.gene_reaction_rule =  rxn_n.gene_reaction_rule if len(rxn_n.gene_reaction_rule) >= len(rxn_o.gene_reaction_rule) else rxn_o.gene_reaction_rule

            else:
                return # if gpr's dont overlap we return and dont delete the reaction

        model.remove_reactions([rxn_o])

In [32]:
def delete_lonely_reaction(model, rxn_id):
    # in comparison to the above function, this function is meant for reactions that are only deleted because all participating metabolites are only in this reaction;
    # the delete_reaction function will always delete a reaction;
    # currently to be on the safe side, we will only delete reactions if all metabolites have only this reaction
    if rxn_id in model.reactions:
        rxn = model.reactions.get_by_id(rxn_id)
        mets = rxn.metabolites
        dead_rxn = [met for met in mets if len(met.reactions) == 1]

        if len(dead_rxn) == len(mets):
            model.remove_reactions([rxn])
            for met in dead_rxn:
                delete_metabolite(model, met.id)
            print(f"Reaction '{rxn_id}' and all its metabolites were removed.")
        elif dead_rxn:
            print(f"Reaction '{rxn_id}' has some unique metabolites:")
            for met in dead_rxn:
                print(f"   - {met.id}")
            print("   Please check manually before deleting.")

In [70]:
def overwrite_manual(model):
    # first all changes to metabolites, i.e. charges and formulas
    # afterwards changes for reactions, i.e. changing stoichiometry, replacing/deleting metabolites (especially H)
    # last deletions (mostly reactions if duplicate but also metabolites)
    # every category is alphabetically sorted

    # first: metabolites
    overwrite_formula(model, "2ameph_p", "C2H7NO3P") # og = C2H8NO3P, charge was changed from 0 to -1 automatically with bigg and formula now also needed to be changed
    overwrite_formula(model, "2ameph_e", "C2H7NO3P")
    overwrite_formula(model, "2ameph_c", "C2H7NO3P")
    overwrite_charge(model, "2dhphaccoa_c", -4) #og = 0; according to seed https://modelseed.org/biochem/compounds/cpd16740
    overwrite_charge(model, "2mpdhl_c", -1)
    overwrite_charge(model, "23dhbzs3_c", -1) # og = 0, -1 alterative in bigg
    overwrite_formula(model, "3hsa_c", "C19H24O3")
    overwrite_charge(model, "3sala_c", -1) # og = 0
    overwrite_formula(model, "3sala_c", "C3H6NO4S")
    overwrite_formula(model, "34dhsa_c", "C19H24O4")
    overwrite_charge(model, "4cml_c", -2)
    overwrite_formula(model, "4cml_c", "C7H4O6")
    overwrite_charge(model, "4hoxpac_c", -1) # og = 0, -1 alterative in bigg
    overwrite_charge(model, "4hoxpac_e", -1)
    overwrite_charge(model, "4hoxpac_p", -1)
    overwrite_formula(model, "49dsha_c", "C19H23O6")
    overwrite_charge(model, "49dsha_c", -1)
    overwrite_charge(model, "5aizc_c", -3)
    overwrite_formula(model, "5ohhipcoa_c", "C34H50N7O19P3S")
    overwrite_charge(model, "5ohhipcoa_c", -4)
    overwrite_charge(model, "6pgg_c", -2) # og = 0, -2 alterative in bigg and in accordance with ecoli
    overwrite_formula(model, "9ohadd_c", "C19H24O3")

    overwrite_charge(model, "aad_c", -2)
    overwrite_formula(model, "abg4_c", "C12H12N2O5") # C12H11N2O5; https://biocyc.org/compound?orgid=META&id=CPD0-889
    overwrite_charge(model, "abg4_c", -2) # og = 0
    overwrite_formula(model, "abg4_e", "C12H12N2O5")
    overwrite_charge(model, "abg4_e", -2)
    overwrite_formula(model, "acadl_c", "C12H14N5O8P") # og = C12H14N5O8P; https://pubchem.ncbi.nlm.nih.gov/compound/440867 with h16 charge = 0, we have -1 charge
    overwrite_charge(model, "acadl_c", -2) # https://pubchem.ncbi.nlm.nih.gov/compound/440867 with h16 is charge = 0 and we have H14, so we need -2
    overwrite_charge(model, "ACP_c", 0)
    overwrite_charge(model, "actACP_c", -1)
    overwrite_charge(model, "acysbmn_e", -1) # og = 0, -1 according to metacyc with same formula https://metacyc.org/compound?orgid=META&id=CPD1G-185
    overwrite_charge(model, "acysbmn_c", -1) # og = 0, -1
    overwrite_charge(model, "ah6p__D_c", -2) # og = 0, -2 must be because f6p is also -2 and they can directly converted into each other
    overwrite_charge(model, "air_c", -2)
    overwrite_charge(model, "amacald_c", 1)
    overwrite_formula(model, "amacald_c", "C2H6NO")
    overwrite_formula(model, "andrs14dn317dn_c", "C19H24O2")
    overwrite_formula(model, "apoACP_c", "C373H582N94O136S2") # og = C373H583N94O136S2; charge was changed from 1 to 0 and now the amount of H also reflects that
    overwrite_charge(model, "aso3_c", -1)
    overwrite_charge(model, "aso3_e", -1)
    overwrite_charge(model, "aso3_p", -1)
    overwrite_formula(model, "aso3_c", "H2O3As")
    overwrite_formula(model, "aso3_e", "H2O3As")
    overwrite_formula(model, "aso3_p", "H2O3As")
    overwrite_formula(model, "aso4_c", "HO4As")
    overwrite_formula(model, "aso4_e", "HO4As")
    overwrite_formula(model, "aso4_p", "HO4As")

    overwrite_charge(model, "bmn_c", 2) # og = 0, to balance BMNMSHS (bmn is just imported for this reaction)
    overwrite_charge(model, "bmn_e", 2)
    overwrite_charge(model, "but2eACP_c", -1)

    overwrite_charge(model, "CCbuttc_c", -3) # og = -3; to balance reaction with 4cml_c
    overwrite_formula(model, "CCbuttc_c", "C7H3O6") # C7H3O6; to reflect charge change
    overwrite_formula(model, "cchol_c", "C27H42O3")
    overwrite_charge(model, "cdigmp_c", -2)
    overwrite_formula(model, "cholc3coa_c", "C43H66N7O18P3S")
    overwrite_formula(model, "cholc5coa_c", "C45H70N7O18P3S")
    overwrite_formula(model, "cholc8coa_c", "C48H76N7O18P3S")
    overwrite_formula(model, "cholenec3coa_c", "C43H64N7O18P3S")
    overwrite_formula(model, "cholenec5coa_c", "C45H68N7O18P3S")
    overwrite_formula(model, "cholenec8coa_c", "C48H74N7O18P3S")

    overwrite_formula(model, "decoa_c", "C31H48N7O17P3S")
    overwrite_charge(model, "dgal6p_c", -2)
    overwrite_charge(model, "dmlgnc_c", -1) # og = 0; https://modelseed.org/biochem/compounds/cpd15951
    overwrite_charge(model, "dtbt_c", -1)

    overwrite_charge(model, "fad_c", -2)
    overwrite_charge(model, "fad_e", -2)
    overwrite_charge(model, "fad_p", -2)
    overwrite_charge(model, "fe3dhbzs3_c", 3) # og=0, alternative in bigg
    overwrite_formula(model, "fe3dhbzs3_c", "C30FeH29N3O16") # og = C30FeH28N3O16, with H29 is in bigg and ecoli
    overwrite_formula(model, "fe3dhbzs3_e", "C30FeH29N3O16")
    overwrite_formula(model, "fe3dhbzs3_p", "C30FeH29N3O16")
    overwrite_formula(model, "feoxam_c", "C25H46FeN6O8") # formula change according to bigg and ecoli
    overwrite_formula(model, "feoxam_e", "C25H46FeN6O8")
    overwrite_formula(model, "feoxam_p", "C25H46FeN6O8")
    overwrite_charge(model, "ficytc_c", 1)
    overwrite_charge(model, "fmcbtt_c", 2) # og = 0 but it has fe2 in it
    overwrite_charge(model, "fmn_c", -2)
    overwrite_formula(model, "fmn_c", "C17H19N4O9P")
    overwrite_charge(model, "fmn_e", -2)
    overwrite_formula(model, "fmn_e", "C17H19N4O9P")
    overwrite_charge(model, "fmn_p", -2)
    overwrite_formula(model, "fmn_p", "C17H19N4O9P")
    overwrite_charge(model, "focytc_c", 1)
    overwrite_charge(model, "fpram_c", -1)
    overwrite_formula(model, "fpram_c", "C8H15N3O8P")

    overwrite_charge(model, "g3p_c", -2)
    overwrite_charge(model, "g6p_A_c", -2)
    overwrite_formula(model, "galam6p_c", "C6H13NO8P") # https://biocyc.org/compound?orgid=META&id=D-GALACTOSAMINE-6-PHOSPHATE
    overwrite_charge(model, "galam6p_c", -1)
    overwrite_charge(model, "galam_p", +1) #og = 0 https://metacyc.org/compound?orgid=META&id=GALACTOSAMINE
    overwrite_formula(model, "galam_p", "C6H14NO5")
    overwrite_charge(model, "galam_e", +1)
    overwrite_formula(model, "galam_e", "C6H14NO5")
    overwrite_charge(model, "gcvHL_ADPr_c", -1)
    overwrite_formula(model, "gcvHL_ADPr_c", "C23H36N6O21P4S2")
    overwrite_charge(model, "gcvHL_nhLA_c", 0)
    overwrite_formula(model, "gcvHL_nhLA_c", "C8H16NO8P2S2")
    overwrite_charge(model, "gdptp_c", -7)
    overwrite_charge(model, "glutrna_c", -3)
    overwrite_formula(model, "glycogen_c", "C6H10O5")
    overwrite_charge(model, "gly_pro__L_c", 1)
    overwrite_formula(model, "gly_pro__L_c", "C7H13N2O3")
    overwrite_charge(model, "gly_pro__L_e", 1)
    overwrite_formula(model, "gly_pro__L_e", "C7H13N2O3")
    overwrite_formula(model, "gly_tyr_c", "C11H14N2O4")
    overwrite_formula(model, "gly_phe_c", "C11H14N2O3")
    overwrite_formula(model, "gly_leu_c", "C8H16N2O3")
    overwrite_formula(model, "gly_cys_c", "C5H10N2O3S")

    overwrite_formula(model, "hchol_c", "C27H44O2")
    overwrite_formula(model, "hcholc8coa_c", "C48H76N7O19P3S")
    overwrite_formula(model, "hcholc5coa_c", "C45H70N7O19P3S")
    overwrite_formula(model, "hcholc3coa_c", "C43H66N7O19P3S")
    overwrite_charge(model, "hethmpp_c", -2)
    overwrite_charge(model, "hemeO_c", -2)
    overwrite_formula(model, "hia_c", "C11H16O4")
    overwrite_formula(model, "hia_e", "C11H16O4")
    overwrite_formula(model, "hip_c", "C13H17O4")
    overwrite_charge(model, "hip_c", -1)
    overwrite_formula(model, "hipcoa_c", "C34H48N7O19P3S")
    overwrite_charge(model, "hipcoa_c", -4)
    overwrite_formula(model, "hipecoa_c", "C34H48N7O19P3S")
    overwrite_charge(model, "hipecoa_c", -4)
    overwrite_formula(model, "hipohcoa_c", "C34H50N7O20P3S")
    overwrite_charge(model, "hipohcoa_c", -4)
    overwrite_formula(model, "hipocoa_c", "C34H48N7O20P3S")
    overwrite_charge(model, "hipocoa_c", -4)
    overwrite_charge(model, "hmbpp_c", -4) # pubchem C5H12O8P2 with charge 0; model=C5H8O8P2, so charge must be -4

    overwrite_charge(model, "istfrnA_e", -2)
    overwrite_formula(model, "istfrnA_e", "C17FeH19N2O14")
    overwrite_charge(model, "istfrnB_e", +1)
    overwrite_formula(model, "istfrnB_e", "C16FeH22N2O11")

    overwrite_charge(model, "lysglugly_c", 0)
    overwrite_charge(model, "lysglugly_e", 0)

    overwrite_charge(model, "man6pglyc_c", -3) # og = 0; alternative in bigg and in accordance with ecoli
    overwrite_charge(model, "mbhn_c", -1) # og = 0; https://modelseed.org/biochem/compounds/cpd15971
    overwrite_formula(model, "mcbtt_c", "C47H77N5O10") # was wrongly overwritten by a false bigg formula = [C43H71N5O10], metacyc also has the original one that was in the model
    overwrite_charge(model, "mcbtt_c", 0)
    overwrite_charge(model, "met_L_ala__L_c", -1)
    overwrite_charge(model, "met_L_ala__L_e", -1)
    overwrite_formula(model, "met_L_ala__L_c", "C8H15N2O3S")
    overwrite_formula(model, "met_L_ala__L_e", "C8H15N2O3S")
    overwrite_charge(model, "mhpglu_c", -4)
    overwrite_charge(model, "mi3p__D_c", -2) # og = 0, -2 according to bigg

    overwrite_formula(model, "Nforglu_c", "C6H7NO5")

    overwrite_charge(model, "ocACP_c", 0) # og = -1, 0 alterative in bigg and is in accordance with charge = 0 of ACP
    overwrite_formula(model, "ochol_c", "C27H42O2")
    overwrite_formula(model, "ocholc8coa_c", "C48H74N7O19P3S")
    overwrite_formula(model, "ocholc5coa_c", "C45H68N7O19P3S")
    overwrite_charge(model, "ocdcaACP_c", 0) # og = -1, 0 alterative in bigg and is in accordance with charge = 0 of ACP

    overwrite_charge(model, "phdcacoa_c", -4) # og=0 but its coa
    overwrite_charge(model, "phdca_c", -1) # og = 0 but https://modelseed.org/biochem/compounds/cpd16013
    overwrite_charge(model, "phdca_e", -1)
    overwrite_charge(model, "ppad_c", -2)
    overwrite_charge(model, "ptd1ino160_c", -1)
    overwrite_charge(model, "pqqh2_c", -3)
    overwrite_charge(model, "pqqh2_p", -3)
    overwrite_charge(model, "ppgpp_c", -6)
    overwrite_charge(model, "prepphth_c", -1) # og = 0; https://modelseed.org/biochem/compounds/cpd16028
    overwrite_charge(model, "prohisglu_c", -1) # og = -2; tripeptid pro-his-glu, only glu has -1 charge and other two are neutral
    overwrite_charge(model, "prohisglu_e", -1)

    overwrite_formula(model, "ribflv_c", "C17H20N4O6")
    overwrite_formula(model, "ribflv_e", "C17H20N4O6")

    # Salmochelin fixes (there are first fixes by Frowin in the apply mass balance function notebook)
    overwrite_formula(model, "salchsx_c", "C16H20NO11") # og C16H21NO11; https://pubchem.ncbi.nlm.nih.gov/compound/135397946
    overwrite_formula(model, "salchsx_e", "C16H20NO11")
    overwrite_formula(model, "salchsx_p", "C16H20NO11")
    overwrite_charge(model, "salchs2fe_c", 3) # to match salchs4fe
    overwrite_charge(model, "salchs2fe_p", 3)
    overwrite_charge(model, "salchs2fe_e", 3)
    #----- more S
    overwrite_charge(model, "salc_e", -1) # og = 0, -1 alterative in bigg
    overwrite_charge(model, "salc_c", -1)
    overwrite_charge(model, "scl_c", -7) # og = 0, -7 according to bigg
    overwrite_charge(model, "scys__L_c", -1)
    overwrite_charge(model, "ssaltpp_c", -3) # og = 0; 0 is not in bigg only -3 or -2
    overwrite_charge(model, "stfrnA_e", -5)
    overwrite_formula(model, "stfrnA_e", "C17H19N2O14")
    overwrite_charge(model, "stfrnA_c", -5)
    overwrite_formula(model, "stfrnA_c", "C17H19N2O14")
    overwrite_charge(model, "stfrnB_e", -2)
    overwrite_formula(model, "stfrnB_e", "C16H22N2O11")
    overwrite_charge(model, "stfrnB_c", -2)
    overwrite_formula(model, "stfrnB_c", "C16H22N2O11")

    overwrite_charge(model, "tag6p__D_c", -2)
    overwrite_charge(model, "tagdp__D_c", -4)
    overwrite_charge(model, "tamocta_c", -1) # og = 0; https://modelseed.org/biochem/compounds/cpd16038
    overwrite_charge(model, "tmhexc_c", -1) # og = 0; https://modelseed.org/biochem/compounds/cpd16050

    overwrite_charge(model, "udpacgal_c", -2) # og = 0, -2 alterative in bigg
    overwrite_charge(model, "udpacgal_p", -2) # og = 0; -2 alternative in bigg and in accordance with ecoli
    overwrite_charge(model, "udpacgal_e", -2)

    overwrite_charge(model, "vacc_c", -1)
    overwrite_charge(model, "vacc_p", -1)
    overwrite_charge(model, "vacc_e", -1)

    overwrite_charge(model, "xylan4_c", -1) # og = 0, no charge given in Bigg, but -1 fits equations
    overwrite_charge(model, "xylan4_e", -1)


    # second: reactions
    overwrite_reaction(model, "3HPAOX", # H was removed from this reaction
                       {"3hoxpac_c": -1.0,
                        "nadh_c": -1.0,
                        "o2_c": -1.0,
                        "34dhpha_c": 1.0,
                        "h2o_c": 1.0,
                        "nad_c": 1.0})

    overwrite_reaction(model, "3SALATAi", # this and ASPA2 are duplicate reactions (only differencs is an H), reaction was curated according to metacyc; https://biocyc.org/reaction?orgid=META&id=3-SULFINOALANINE-AMINOTRANSFERASE-RXN
                       {"3sala_c": -1.0,
                        "akg_c": -1.0,
                        "3snpyr_c": 1.0,
                        "glu__L_c": 1.0})

    overwrite_reaction(model, "ACOAM", # H was removed
                       {"ac_c": -1.0,
                        "atp_c": -1.0,
                        "acadl_c": 1.0,
                        "ppi_c": 1.0})

    overwrite_reaction(model, "ACPS1",
                       {"apoACP_c": -1.0,
                        "coa_c": -1.0,
                        "ACP_c": 1.0,
                        "pap_c": 1.0})

    overwrite_reaction(model, "ACPpds",
                       {"ACP_c": -1.0,
                        "h2o_c": -1.0,
                        "apoACP_c": 1.0,
                        "h_c": 2.0,
                        "pan4p_c": 1.0})

    overwrite_reaction(model, "ALDD31_1",
                       {"gly_c": 1.0,
                        "h_c": 2.0,
                        "h2o_c": -1.0,
                        "nad_c": -1.0,
                        "nadh_c": 1.0,
                        "amacald_c": -1})

    overwrite_reaction(model, "ASR",
                       {"aso4_c": -1.0,
                        "gthrd_c": -2.0,
                        "h_c": -1.0,
                        "aso3_c": 1.0,
                        "gthox_c": 1.0,
                        "h2o_c": 1.0})

    # https://biocyc.org/reaction?orgid=META&id=RXN-10737
    overwrite_reaction(model, "ASR2",
                       {"aso4_c": -1.0,
                        "trdrd_c": -1.0,
                        "h_c": -1.0,
                        "aso3_c": 1.0,
                        "h2o_c": 1.0,
                        "trdox_c": 1.0})

    # was overwritten by BIGG, but before that the H was in the reaction and equals also this reaction BKDC that is e.g. in AA1
    overwrite_reaction(model, "AT_MBD2",
                       {"dhlam_c": -1.0,
                        "ibcoa_c": -1.0,
                        "2mpdhl_c": 1.0,
                        "coa_c": 1.0,
                        "h_c": 1.0})

    overwrite_reaction(model, "BEF",
                       {"betald_c": -1.0,
                        "fad_c": -1.0,
                        "h2o_c": -1.0,
                        "fadh2_c": 1.0,
                        "glyb_c": 1.0,
                        "h_c": 1.0})

    overwrite_reaction(model, "CMLDC", # https://modelseed.org/biochem/reactions/rxn02483
                       {"4cml_c": -1.0,
                        "h_c": -1.0, # h changed from product to educt site
                        "5odhf2a_c": 1.0,
                        "co2_c": 1.0})
    if "4CMLCL_kt" not in model.reactions and "CMLDC" in model.reactions:
        rxn = model.reactions.get_by_id("CMLDC")
        rxn.id = "4CMLCL_kt"

    overwrite_reaction(model, "DACL", # https://biocyc.org/reaction?orgid=META&id=RXN0-5040 H was removed
                       {"abg4_c": -1.0,
                        "h2o_c": -1.0,
                        "4abz_c": 1.0,
                        "glu__D_c": 1.0})

    overwrite_reaction(model, "DHBZS2H",
                       {"23dhbzs2_c": -1.0,
                        "h2o_c": -1.0,
                        "h_c": 2.0, # new because of logic
                        "23dhbzs_c": 2.0})

     # https://biocyc.org/reaction?orgid=META&id=RXN-14477
    overwrite_reaction(model, "ENTERH",
                       {"enter_c": -1.0,
                        "h2o_c": -1.0,
                        "23dhbzs3_c": 1.0,
                        "h_c": 1.0}) # h was added

    overwrite_reaction(model, "FADD3",
                       {"atp_c": -1.0,
                        "coa_c": -1.0,
                        "hip_c": -1.0,
                        "hipcoa_c": 1.0,
                        "ppi_c": 1.0,
                        "amp_c": 1.0})

    overwrite_reaction(model, "FE3DHBZS3R",
                       {"fe3dhbzs3_c": -2.0,
                        "nadph_c": -1.0,
                        "23dhbzs3_c": 2.0,
                        "fe2_c": 2.0,
                        "h_c": 3.0,
                        "nadp_c": 1.0})

    overwrite_reaction(model, "FEDHBZS3R1",
                       {"fe3dhbzs3_c": -2.0,
                        "fadh2_c": -1.0,
                        "23dhbzs3_c": 2.0,
                        "fe2_c": 2.0,
                        "h_c": 4.0,
                        "fad_c": 1.0})

    overwrite_reaction(model, "FEDHBZS3R2",
                       {"fe3dhbzs3_c": -2.0,
                        "fmnh2_c": -1.0,
                        "23dhbzs3_c": 2.0,
                        "fe2_c": 2.0,
                        "h_c": 4.0,
                        "fmn_c": 1.0})

    overwrite_reaction(model, "FEDHBZS3R3",
                       {"fe3dhbzs3_c": -2.0,
                        "rbflvrd_c": -1.0,
                        "23dhbzs3_c": 2.0,
                        "fe2_c": 2.0,
                        "h_c": 4.0,
                        "ribflv_c": 1.0})

    overwrite_reaction(model, "FNOR",
                       {"fdxrd_c": -2.0,
                        "h_c": -1.0,
                        "nadp_c": -1.0,
                        "fdxox_c": 2.0, # replaces fdxo_2_2_c
                        "nadph_c": 1.0})

    overwrite_reaction(model, "FORGLUIH2",
                       {"forglu_c": -1.0,
                        "h2o_c": -1.0,
                        "Nforglu_c": 1.0,
                        "nh4_c": 1.0})

    # https://biocyc.org/reaction?orgid=META&id=1.18.1.2-RXN change of stoichiometry
    overwrite_reaction(model, "FPRA",
                       {"fdxrd_c": -2.0,
                        "h_c": -1.0,
                        "nadp_c": -1.0,
                        "fdxox_c": 2.0,
                        "nadph_c": 1.0})

    # https://modelseed.org/biochem/reactions/rxn28276 (immer noch charge imbalance, aber mass stimmt)
    overwrite_reaction(model, "GCDH",
                       {"glutcoa_c": -1.0,
                        "b2coa_c": 1.0,
                        "h_c": 1.0,
                        "co2_c": 1.0})

    # https://metacyc.org/reaction?orgid=META&id=GLUTAMATE-SYNTHASE-FERREDOXIN-RXN#
    overwrite_reaction(model, "GLMS_syn",
                       {"fdxrd_c": -2.0,
                        "akg_c": -1.0,
                        "gln__L_c": -1.0,
                        "h_c": -2.0,
                        "glu__L_c": 2.0,
                        "fdxox_c": 2.0}) # replaces fdxo_2_2_c because we need +2 charge

    overwrite_reaction(model, "GLUTRS_3",
                       {"atp_c": -1.0,
                        "glu__L_c": -1.0,
                        "trnaglu_c": -1.0,
                        "amp_c": 1.0,
                        "glutrna_c": 1.0,
                        "ppi_c": 1.0})

    overwrite_reaction(model, "GLYCS_I",
                       {"gthrd_c": -1.0,
                        "mthgxl_c": -1.0,
                        "lgt__S_c": 1.0}) # og = lgt_s_c; they are duplicates

    overwrite_reaction(model, "GLYCS_II",
                       {"h2o_c": -1.0,
                        "lgt__S_c": -1.0, # og = lgt_s_c; they are duplicates
                        "gthrd_c": 1.0,
                        "h_c": 1.0,
                        "lac__L_c": 1.0})

    overwrite_reaction(model, "GLYTYRabc",
                       {"atp_c": -1.0,
                        "gly_tyr_e": -1.0,
                        "h2o_c": -1.0,
                        "adp_c": 1.0,
                        "gly_tyr_c": 1.0,
                        "pi_c": 1.0,
                        "h_c": 1.0})

    overwrite_reaction(model, "GLYLEUtr",
                       {"atp_c": -1.0,
                        "gly_leu_e": -1.0,
                        "h2o_c": -1.0,
                        "adp_c": 1.0,
                        "gly_leu_c": 1.0,
                        "pi_c": 1.0,
                        "h_c": 1.0})

    overwrite_reaction(model, "GLYPHEtr",
                       {"atp_c": -1.0,
                        "gly_phe_e": -1.0,
                        "h2o_c": -1.0,
                        "adp_c": 1.0,
                        "gly_phe_c": 1.0,
                        "pi_c": 1.0,
                        "h_c": 1.0})

    overwrite_reaction(model, "GLYCYSabc",
                       {"atp_c": -1.0,
                        "gly_cys_e": -1.0,
                        "h2o_c": -1.0,
                        "adp_c": 1.0,
                        "gly_cys_c": 1.0,
                        "pi_c": 1.0,
                        "h_c": 1.0})

    overwrite_reaction(model, "GTPDPK_1",
                       {"atp_c": -1.0,
                        "gtp_c": -1.0,
                        "amp_c": 1.0,
                        "gdptp_c": 1.0,
                        "h_c": 1.0})

    overwrite_reaction(model, "HSAC",
                       {"34dhsa_c": -1.0,
                        "o2_c": -1.0,
                        "49dsha_c": 1.0,
                        "h_c": 1.0})

    overwrite_reaction(model, "MECDPDH3_syn",
                       {"2mecdp_c": -1.0,
                        "fdxrd_c": -2.0,
                        "h_c": -1.0,
                        "fdxox_c": 2.0, # replaces fdxo_2_2_c
                        "h2mb4p_c": 1.0,
                        "h2o_c": 1.0})
    overwrite_reaction(model, "MECDPDH4E", # AA3 only has this reaction but not MECDPDH3_syn which in other models are duplicates; to be able to better compare between models, i am going to change the name of the reaction
                       {"2mecdp_c": -1.0,
                        "fdxrd_c": -2.0,
                        "h_c": -1.0,
                        "fdxox_c": 2.0, # replaces fdxo_2_2_c
                        "h2mb4p_c": 1.0,
                        "h2o_c": 1.0})
    if "MECDPDH3_syn" not in model.reactions and "MECDPDH4E" in model.reactions:
        rxn = model.reactions.get_by_id("MECDPDH4E")
        rxn.id = "MECDPDH3_syn"

    overwrite_reaction(model, "MS_1",
                       {"hcys__L_c": -1.0,
                        "mhpglu_c": -1.0,
                        "hpglu_c": 1.0,
                        "met__L_c": 1.0})

    # there is still charge imbalance with his reaction but the fix gets rid of the mass inbalance
    overwrite_reaction(model, "NMO",
               {"etha_c": -1.0,
                "fmnh2_c": -1.0,
                "o2_c": -1.0,
                "acald_c": 1.0,
                "fmn_c": 1.0,
                "no2_c": 1.0,
                "h_c": 6.0  # this was og reaction but was overwritten with bigg info (H was lost)
                })

    overwrite_reaction(model, "OOR3r", # https://biocyc.org/reaction?orgid=META&id=2-OXOGLUTARATE-SYNTHASE-RXN; https://www.genome.jp/dbget-bin/www_bget?ec:1.2.7.3 EC number was given on BIGG page for that reaction but bigg was a bit off
                       {"akg_c": -1.0,
                        "coa_c": -1.0,
                        "fdxox_c": -2.0,
                        "succoa_c": 1.0,
                        "co2_c": 1.0,
                        "h_c": 1.0,
                        "fdxrd_c": 2.0})

    overwrite_reaction(model, "PACPT_1",
                       {"amp_c": 1.0,
                        "coa_c": -1.0,
                        "ppcoa_c": 1.0,
                        "ppad_c": -1.0})

    # https://modelseed.org/biochem/reactions/rxn13395 out scl is only -7, in seed it is -8, so we need only one H
    overwrite_reaction(model, "PC2DHG",
                       {"dscl_c": -1.0,
                        "nadp_c": -1.0,
                        "nadph_c": 1.0,
                        "scl_c": 1.0,
                        "h_c": 1.0})

    overwrite_reaction(model, "PCADYOX", # https://modelseed.org/biochem/reactions/rxn01192
                       {"34dhbz_c": -1.0,
                        "o2_c": -1.0,
                        "CCbuttc_c": 1.0,
                        "h_c": 2.0}) # was added

    overwrite_reaction(model, "POR_syn",
                       {"fdxox_c": -2.0, # replaces fdxo_2_2_c
                        "coa_c": -1.0,
                        "pyr_c": -1.0,
                        "accoa_c": 1.0,
                        "co2_c": 1.0,
                        "h_c": 1.0,
                        "fdxrd_c": 2.0})

    overwrite_reaction(model, "PRAIS",
                       {"atp_c": -1.0,
                        "fpram_c": -1.0,
                        "adp_c": 1.0,
                        "air_c": 1.0,
                        "pi_c": 1.0,
                        "h_c": 2.0})

    overwrite_reaction(model, "QSDH",
                       {"pqq_c": -1.0,
                        "skm_c": -1.0,
                        "3dhsk_c": 1.0,
                        "pqqh2_c": 1.0})

    # Salmochelin fixes (there are first fixes by Frowin in the apply mass balance function notebook)
    overwrite_reaction(model, "SALCHS1H",
                       {"h2o_c": -1.0,
                        "salchs1_c": -1.0,
                        "23dhbzs_c": 1.0,
                        "salchsx_c": 1.0,
                        "h_c": 2.0})
    overwrite_reaction(model, "SALCHS2H",
                       {"h2o_c": -1.0,
                        "salchs2_c": -1.0,
                        "salchs1_c": 1.0,
                        "salchsx_c": 1.0,
                        "h_c": 1.0})

    overwrite_reaction(model, "SMIA1",
                       {"fe3_e": -1.0,
                        "stfrnA_e": -1.0,
                        "istfrnA_e": 1.0})

    overwrite_reaction(model, "SMIA1abc",
                       {"atp_c": -1.0,
                        "h2o_c": -1.0,
                        "istfrnB_e": -1.0,
                        "adp_c": 1.0,
                        "fe3_c": 1.0,
                        "h_c": 1.0,
                        "pi_c": 1.0,
                        "stfrnB_c": 1.0})

    overwrite_reaction(model, "SMIA2abc",
                       {"atp_c": -1.0,
                        "h2o_c": -1.0,
                        "istfrnA_e": -1.0,
                        "adp_c": 1.0,
                        "fe3_c": 1.0,
                        "h_c": 1.0,
                        "pi_c": 1.0,
                        "stfrnA_c": 1.0})

    overwrite_reaction(model, "SMIB1",
                       {"fe3_e": -1.0,
                        "stfrnB_e": -1.0,
                        "istfrnB_e": 1.0})

    overwrite_reaction(model, "STAS",
                       {"atp_c": -2.0,
                        "cit_c": -2.0,
                        "orn_c": -1.0,
                        "amp_c": 2.0,
                        "ppi_c": 2.0,
                        "stfrnA_c": 1.0,
                        "h_c": 2.0}) # H changed sides, similar to here https://biocyc.org/reaction?orgid=META&id=RXN-19521 but reaction is still different

    overwrite_reaction(model, "T6PK",
                       {"atp_c": -1.0,
                        "tag6p__D_c": -1.0,
                        "adp_c": 1.0,
                        "tagdp__D_c": 1.0,
                        "h_c": 1.0})

    # https://modelseed.org/biochem/reactions/rxn10816
    overwrite_reaction(model, "THZSN_1",
                       {"cys__L_c": -1.0,
                        "dxyl_c": -1.0,
                        "fdxox_c": -1.0, # instead of fdx_2_2_c
                        "tyr__L_c": -1.0,
                        "4hba_c": 1.0,
                        "4mhetz_c": 1.0,
                        "co2_c": 1.0,
                        "fdxrd_c": 1.0,
                        "h2o_c": 1.0,
                        "h_c": 2.0, # 2 instead of 1;
                        "nh4_c": 1.0,
                        "pyr_c": 1.0})

    # deletions
    # delete_duplicate_reaction(model, "ACPS1_1", "ACPS1", gpr=True) # cannot delete because gprs dont overlap
    # delete_duplicate_reaction(model, "ASPA2", "3SALATAi", gpr=True) # cannot delete because gprs dont overlap
    delete_duplicate_reaction(model, "CMLDC", "4CMLCL_kt", gpr=True)
    delete_duplicate_reaction(model, "CO2FO", "COCO2", gpr=True)
    delete_duplicate_reaction(model, "FMNAT_1", "FMNAT") # same gpr
    delete_duplicate_reaction(model, "GAPD_1", "GAPD", gpr=True) # GAPD_1 gpr is in GAPD
    delete_duplicate_reaction(model, "GTPDPK_1", "GTPDPK", gpr=True) # depending on model, gpr's match
    delete_duplicate_reaction(model, "HMEDS", "MECDPDH3_syn") # both have no gpr
    delete_duplicate_reaction(model, "MECDPDH4E", "MECDPDH3_syn") # same gpr
    delete_duplicate_reaction(model, "PRFGCL", "PRAIS", gpr=True) # same gpr
    delete_duplicate_reaction(model, "RBFK_1", "RBFK") # same gpr

    if "PRFGS" in model.reactions and "PRFGS_1" in model.reactions: # gprs overlap but both have something unique thats why special command for that
        model.reactions.get_by_id("PRFGS").gene_reaction_rule = model.reactions.get_by_id("PRFGS").gene_reaction_rule + " or (WP_079211777_1 and WP_079211778_1)"
        delete_reaction(model, "PRFGS_1")

    # all three metabolites are only part of this reaction
    # PENAM: h2o_p + peng_p <=> 6apa_p + pac_p
    if "peng_p" in model.metabolites and "6apa_p" in model.metabolites and "pac_p" in model.metabolites:
        if len(model.metabolites.get_by_id("peng_p").reactions) == 1 and len(model.metabolites.get_by_id("6apa_p").reactions) == 1 and len(model.metabolites.get_by_id("pac_p").reactions) == 1:
            delete_reaction(model, "PENAM")
            delete_metabolite(model, "peng_p")
            delete_metabolite(model, "6apa_p")
            delete_metabolite(model, "pac_p")

    delete_metabolite(model, "fdxo_2_2_c")
    delete_metabolite(model, "lgt_s_c")

In [71]:
overwrite_manual(AA1_curate)
overwrite_manual(AA2_curate)
overwrite_manual(AA3_curate)
overwrite_manual(AA4_curate)
overwrite_manual(AA5_curate)
overwrite_manual(AA6_curate)
overwrite_manual(AA7_curate)

metabolite peng_p cannot be deleted from AA4 because of reaction(s): frozenset({<Reaction PENGt1 at 0x7f55bde58940>})
metabolite 6apa_p cannot be deleted from AA4 because of reaction(s): frozenset({<Reaction 6APAt1 at 0x7f55be2a7880>})
metabolite pac_p cannot be deleted from AA4 because of reaction(s): frozenset({<Reaction PACt1 at 0x7f55bddee140>})


In [73]:
check_balance(AA1_curate)
check_balance(AA2_curate)
check_balance(AA3_curate)
check_balance(AA4_curate)
check_balance(AA5_curate)
check_balance(AA6_curate)
check_balance(AA7_curate)

There are 3 unbalanced reactions in AA1
There are 3 unbalanced reactions in AA2
There are 9 unbalanced reactions in AA3
There are 2 unbalanced reactions in AA4
There are 3 unbalanced reactions in AA5
There are 3 unbalanced reactions in AA6
There are 3 unbalanced reactions in AA7


{<Reaction DHNAOT at 0x7f55c51e15d0>: {'charge': 2.0},
 <Reaction NMO at 0x7f55c0db6e00>: {'charge': 4.0},
 <Reaction SUCD2 at 0x7f55bfb5b3d0>: {'charge': -2.0}}

In [78]:
# save all curated model as file (so we can do memote report with it)
for model_name, model in models_curation.items():
    path = f"../Models/03_charge_balance/{model_name[:3]}_curated.xml"
    write_sbml_model(model, path)

-----------------------------------

### go through unbalanced reactions

In [79]:
for model in models.values():
    if "4hoxpac_c" in model.metabolites:
        print(model.id)

AA2
AA4
AA6


In [128]:
AA2_cur_mets = extract_met_info_model(AA2_curate)
AA2_cur_merged = compare_bigg_modelMets(AA2_cur_mets, unbalanced_mets)

In [21]:
inbalanced_rxns = {}
for model in models_curation.values():
    current_imbalances = check_mass_balance(model)
    for rxn, charge in current_imbalances.items():
        if rxn.id != "AGPATr_BS" and rxn.id != "G3POA_BS":
            inbalanced_rxns.update({rxn.id:charge})

In [22]:
print(len(inbalanced_rxns))
inbalanced_rxns

13


{'AMMQT8': {'charge': -2.0},
 'BTS2': {'charge': 2.0},
 'NMO': {'charge': 4.0},
 'GCDH': {'charge': 2.0},
 'THZSN_1': {'charge': 1.0},
 'DHNAOT': {'charge': 2.0},
 'HDECH': {'charge': 2.0},
 'LIPO1S24_BS': {'charge': 2160.0},
 'LIPO2S24_BS': {'charge': 2160.0},
 'LIPO3S24_BS': {'charge': 2160.0},
 'LIPO4S24_BS': {'charge': 2160.0},
 'MECDPDH': {'charge': -2.0},
 'SUCD2': {'charge': -2.0}}

In [25]:
for rxn in inbalanced_rxns.keys():
    for model in models_curation.values():
        if rxn in model.reactions:
            print(rxn,"is in", model.id)

AMMQT8 is in AA1
AMMQT8 is in AA2
AMMQT8 is in AA6
BTS2 is in AA1
NMO is in AA1
NMO is in AA3
NMO is in AA5
NMO is in AA6
NMO is in AA7
GCDH is in AA2
GCDH is in AA4
GCDH is in AA6
THZSN_1 is in AA2
DHNAOT is in AA3
DHNAOT is in AA7
HDECH is in AA3
HDECH is in AA4
HDECH is in AA5
LIPO1S24_BS is in AA3
LIPO2S24_BS is in AA3
LIPO3S24_BS is in AA3
LIPO4S24_BS is in AA3
MECDPDH is in AA5
SUCD2 is in AA7


In [54]:
get_rxn(AA1_curate, "NMO", print_mass=True)

NMO: etha_c + fmnh2_c + o2_c --> acald_c + fmn_c + 6.0 h_c + no2_c {'etha_c': 1, 'fmnh2_c': -2, 'o2_c': 0, 'acald_c': 0, 'fmn_c': -2, 'no2_c': -1, 'h_c': 1} {'etha_c': 'C2H8NO', 'fmnh2_c': 'C17H21N4O9P', 'o2_c': 'O2', 'acald_c': 'C2H4O', 'fmn_c': 'C17H19N4O9P', 'no2_c': 'NO2', 'h_c': 'H'}


In [65]:
get_met(AA7_curate, "etha_c")

Ethanolamine (C2H8NO)
{'ETHAAL': 'etha_c --> acald_c + nh4_c', 'ETHAt2pp': 'etha_p + h_p --> etha_c + h_c', 'NMO': 'etha_c + fmnh2_c + o2_c --> acald_c + fmn_c + 6.0 h_c + no2_c', 'ETHAt': 'etha_e <=> etha_c', 'GPDDA2': 'g3pe_c + h2o_c --> etha_c + glyc3p_c + h_c'}


In [64]:
get_met(AA7_curate, "etha_p")

Ethanolamine (C2H8NO)
{'ETHAtex': 'etha_e <=> etha_p', 'ETHAt2pp': 'etha_p + h_p --> etha_c + h_c'}


In [326]:
# there was one metabolite and one reaction that I changed two times during my manual curation process into different things, so we need to inspect what the right change is
for model in models_curation.values():
    #if "3sala_c" in model.metabolites:
        #print("3sala: ", model.id)
    if "PCADYOX" in model.reactions:
       print("PCADYOX: ", model.id)

PCADYOX:  AA1
PCADYOX:  AA2
PCADYOX:  AA4
PCADYOX:  AA5
PCADYOX:  AA6
PCADYOX:  AA7


## Compare with already curated models

check with e.g. nicely curated E. coli model (gram negative; iML1515) and check reactions/metabolites/pathways there and if we have them in our models so we can compare

In [3]:
ecoli = read_sbml_model("../Models/iML1515.xml")
ecoli.solver = 'cplex'

Restricted license - for non-production use only - expires 2026-11-23


In [139]:
# compare if reactions from one of our models is also in the ecoli model; we can then check if we can use the balanced ecoli reaction to also balance our model
def compare_with_ecoli(model):
    not_ecoli = []
    in_ecoli = []

    unfinished_business = check_balance(model, print_results=False)

    for rxn in unfinished_business:
        if rxn.id in ecoli.reactions:
            in_ecoli.append(rxn.id)
        else:
            not_ecoli.append(rxn.id)

    print(f"The following {len(not_ecoli)} reactions are NOT in E. coli model:\n{not_ecoli}\n"
          f"The following {len(in_ecoli)} reactions are IN E. coli model:\n{in_ecoli}")

In [140]:
compare_with_ecoli(AA7_curate)

The following 6 reactions are NOT in E. coli model:
['DHNAOT', 'NMO', 'OOR3r', 'POR_syn', 'STAS', 'SUCD2']
The following 0 reactions are IN E. coli model:
[]


In [16]:
ecoli.metabolites.query("coa")

[<Metabolite 3ohdcoa_c at 0x778b9cfea380>,
 <Metabolite 3hodcoa_c at 0x778b9cfebbb0>,
 <Metabolite hdcoa_c at 0x778b9cfeaf80>,
 <Metabolite tdecoa_c at 0x778b9d048850>,
 <Metabolite oxalcoa_c at 0x778b9d048f70>,
 <Metabolite phaccoa_c at 0x778b9d049930>,
 <Metabolite odecoa_c at 0x778b9d049cc0>,
 <Metabolite accoa_c at 0x778b9d049f00>,
 <Metabolite occoa_c at 0x778b9d04a020>,
 <Metabolite 3hbcoa_c at 0x778b9d04a620>,
 <Metabolite ddcacoa_c at 0x778b9d04ad40>,
 <Metabolite malcoame_c at 0x778b9d04b400>,
 <Metabolite 3hddcoa_c at 0x778b9d04b910>,
 <Metabolite sbzcoa_c at 0x778b9d048130>,
 <Metabolite crnDcoa_c at 0x778b9cf387c0>,
 <Metabolite 3hbzcoa_c at 0x778b9cf38df0>,
 <Metabolite 3oocoa_c at 0x778b9cf39060>,
 <Metabolite 2tpr3dpcoa_c at 0x778b9cf394e0>,
 <Metabolite btcoa_c at 0x778b9cf3a020>,
 <Metabolite tdcoa_c at 0x778b9cf3a0b0>,
 <Metabolite 3otdcoa_c at 0x778b9cf3a890>,
 <Metabolite 3hhdcoa_c at 0x778b9cf3a8f0>,
 <Metabolite hx2coa_c at 0x778b9cf3a9b0>,
 <Metabolite 3ohcoa_c a

# MACAW

In [9]:
import os
models_path = "Models/charge_balance"
models_curation = {}
for model_name in (f for f in os.listdir(models_path) if f.endswith(".xml")):
    model = read_sbml_model(f"{models_path}/{model_name}")
    model.solver = "cplex"
    name = str(model_name[:3]+"_curate")
    models_curation[name] = model

models_curation = {key: models_curation[key] for key in sorted(models_curation.keys())}  # sorts the dictionary alphabetically
AA1_curate, AA2_curate, AA3_curate, AA4_curate, AA5_curate, AA6_curate, AA7_curate = [models_curation[f"AA{i}_curate"] for i in range(1, 8)]

Restricted license - for non-production use only - expires 2026-11-23


In [6]:
redox_pairs = [
    # NAD(H), NADP(H), FAD(H2), and FMN(H2)
    ('nad_c', 'nadh_c'), ('nadp_c', 'nadph_c'), ('fad_c', 'fadh2_c'),
    ('fmn_c', 'fmnh2_c'),
    # ubiquinone-8 and ubiquinol-8
    ('q8_c', 'q8h2_c'),
    # menaquinone-8 and menaquinol-8
    ('mqn8_c', 'mql8_c'),
    # riboflavin
    ('ribflv_c', 'rbflvrd_c'),
    # glutathione
    ('gthox_c', 'gthrd_c'),
    # glutaredoxin
    ('grxox_c', 'grxrd_c'),
    # thioredoxin
    ('trdox_c', 'trdrd_c'),
    # oxygen and hydrogen peroxide
    ('o2_c', 'h2o2_c')]

# protons in all compartments
proton_ids = ['h_c', 'h_e', 'h_p']

# Phosphate in all compartments
pi_ids=["pi_c", "pi_p", "pi_e"]
ppi_ids=["ppi_c", "ppi_p", "ppi_e"]

In [3]:
from macaw.main import run_all_tests

# (test_results, edge_list) = run_all_tests(AA7_curate)

(test_results, edge_list) = run_all_tests(AA7_curate,redox_pairs = redox_pairs, proton_ids = proton_ids, diphosphate_met_ids = ppi_ids, phosphate_met_ids = pi_ids)

name = f'Reports/Macaw_after_first_charge_balance/{AA7_curate.id}_macaw_results.csv'
test_results.to_csv(name, index=False)

Starting dead-end test...
 - Found 23 dead-end metabolites.
 - Found 21 reactions incapable of sustaining steady-state fluxes in either direction due to these dead-ends.
 - Found 362 reversible reactions that can only carry steady-state fluxes in a single direction due to dead-ends.
Starting duplicate test...
 - Skipping redox duplicates because no redox_pairs and/or proton_ids were provided.
 - Found 116 reactions that were some type of duplicate:
   - 20 were completely identical to at least one other reaction.
   - 38 involve the same metabolites but go in the opposite direction or have the opposite reversibility as at least one other reaction.
   - 90 involve the same metabolites but with different coefficients as at least one other reaction.
Skipping diphosphate test because IDs for mono- and diphosphate ions were not provided.
Starting loop test...
 - Found 220 reactions involved in infinite loops.
Starting dilution test...
 - Found 131 metabolites for which adding a dilution con

In [5]:
from macaw.main import duplicate_test
(duplicate_results, duplicate_edges) = duplicate_test(AA1_curate)

Starting duplicate test...
 - Skipping redox duplicates because no redox_pairs and/or proton_ids were provided.
 - Found 102 reactions that were some type of duplicate:
   - 26 were completely identical to at least one other reaction.
   - 46 involve the same metabolites but go in the opposite direction or have the opposite reversibility as at least one other reaction.
   - 70 involve the same metabolites but with different coefficients as at least one other reaction.
