# Balance Metabolic Models

With Frowins Scripts I was able to obtain stoichiometric consistent models; next step is mostly to balance the charges

Workplan Martina:
* checken, dass alle eine chemische Formel haben (ist im Memote Report mit drin, passt für alle)
* dann einmal ran an mass imbalances (falls es die gibt); wurden mit Frowins Skripten beseitigt
* charge imbalances
* Hier geht es vor allem darum in den verschiedenen Datenbanken (bigg, metacyc, kegg, etc). die richtigen Reaktionswege zu finden. Das ist der Teil in dem Protokoll, wo auch von den Protonenimbalances die Rede ist
* Ziel sollte am Ende sein, dass beim Aufruf der Reaktion (in Cobrapy): .check_mass_balance()  eine 0 (=Null) rauskommt, das heißt, dass die Reaktion keine Ladung hat.

## Imports & Paths

In [1]:
import os
import csv
from collections import Counter
from cobra.io import read_sbml_model
from cobra.manipulation.validate import check_mass_balance
import requests
import pandas as pd
from concurrent.futures import ThreadPoolExecutor, as_completed
import ast
import matplotlib.pyplot as plt

In [2]:
models_path = "Models/mass_balance/"

In [135]:
# import models after mass balancing through Frowins scripts
models = {}
for model_name in (f for f in os.listdir(models_path) if f.endswith(".xml")):
    model = read_sbml_model(f"{models_path}/{model_name}")
    model.solver = "cplex"
    models[model_name[:3]] = model

models = {key: models[key] for key in sorted(models.keys())}  # sorts the dictionary alphabetically (AA1...AA7) because of reasons it doesn't do this while creating
AA1, AA2, AA3, AA4, AA5, AA6, AA7 = [models[f"AA{i}"] for i in range(1, 8)]
# model_list = ["AA{i}" for i in range(1, 8)]

## Functions

In [4]:
def get_objective_value(model):
    print(f"value of objective for {model} is {model.optimize().objective_value}")

In [5]:
# checks the charge balance for every reaction in a model (if there are mass unbalanced reactions, these will also show up in the results)
def check_charge_balance(model):
    unbalanced_reactions = check_mass_balance(model)
    print("There are {0} charge unbalanced reactions in {1}".format(len(unbalanced_reactions), model) )
    return unbalanced_reactions

We use the check_mass_balance_function() from cobra to check for charge (!) unbalanced reactions.
The function will show mass and charge unbalanced reactions. However, we already eliminated mass unbalanced reactions in the previous steps (with Frowins Scripts), only for AA3 there are 2 mass unbalanced reactions (these are also charge unbalanced, so maybe need more attention in general?).
So although this function checks both, our results here are only charge unbalanced reactions

In [6]:
# returns a pandas dataframe with metabolite info for a specific cobra model that includes: bigg_id, model_id, formula and charge
# NOTE: bigg_id could be wrong (i.e. not the real id on the website) because it only takes the model_id and removes the _compartment
def extract_met_info_model(model):
    met_infos = []

    for met in model.metabolites:
        met_infos.append({
            "bigg_id": met.id.rsplit("_", 1)[0],  # strip compartment so that it matches the actual BIGG ID that also doesn't have compartments (e.g., glc__D_c to glc__D)
            "model_id": met.id,
            "model_formula": met.formula,
            "model_charge": met.charge
        })

    met_infos = pd.DataFrame(met_infos)
    return met_infos

In [93]:
# returns pandas dataframe with metabolite info from the model and from big and compares info about formula and charge state
def compare_bigg_modelMets(model_mets, list_unbalanced_mets):
    # Merge on BiGG ID (you can tune how you strip compartments if needed)
    merged = model_mets.merge(df_bigg, on="bigg_id", how="left")

    merged["charge_match"] = merged.apply(
        lambda row: row["model_charge"] in row["charges"] if isinstance(row["charges"], list) else False,
        axis=1
    )

    merged["formula_match"] = merged.apply(
        lambda row: row["model_formula"] in row["formulas"] if isinstance(row["formulas"], (list, set)) else False, axis=1
    )

    # adds another column to check if the metabolites are part of an unbalanced reaction (false = not part of unbalanced reactions, true = part of unbalanced reaction(s))
    merged['unbalanced'] = merged['model_id'].isin(list_unbalanced_mets)
    # merged['unbalanced'] = merged['model_id'].isin(list_unbalanced_mets).astype(int) (insteas of true/false with 1/0)

    return merged

In [73]:
# filters the merged df to only show the rows (aka metabolites) where model info and bigg info do NOT match
def get_mismatches_after_merge(df_merge):
    mismatches = df_merge.loc[(df_merge['formula_match'] == False) | (df_merge['charge_match'] == False)]
    mismatches = mismatches[["model_id", "bigg_id", "model_charge", "charges", "model_formula", "formulas", "charge_match", "formula_match", "unbalanced"]]

    return mismatches

In [72]:
# returns confusion matrix showing how many mismatching info about charge state and/or formula there is
# this function either takes one df from one model as an input or the merged dict where all models are saved
def get_confmat_charge_formula(df_merge):
    if isinstance(df_merge, pd.DataFrame):
        conf_matrix = df_merge.groupby(["charge_match", "formula_match"]).size().reset_index(name='count')
        print(conf_matrix)
    elif isinstance(df_merge, dict):
        conf_matrix = {
            "charge_match": ["False", "False", "True", "True"],
            "forumla_match": ["False", "True", "False", "True"],
        }
        conf_matrix = pd.DataFrame(conf_matrix)
        for i, item in enumerate(df_merge.values()):
            conf_matrix_model = item.groupby(["charge_match", "formula_match"]).size().reset_index(name='count')
            name = f"AA{i+1}"
            conf_matrix.insert(i+2, name, conf_matrix_model["count"])
        print(conf_matrix)

## Evaluate current state of models regarding charge balance

In [231]:
met_durch = 0
react_durch = 0
for model in models.values():
    #print(model)
    print("metabolites", len(model.metabolites))
    met_durch +=  len(model.metabolites)
    react_durch += len(model.reactions)
    print("reactions", len(model.reactions))
print(met_durch/7)
print(react_durch/7)

metabolites 1532
reactions 2307
metabolites 1867
reactions 2810
metabolites 1508
reactions 2263
metabolites 1766
reactions 2812
metabolites 1442
reactions 2141
metabolites 1802
reactions 2748
metabolites 1676
reactions 2565
1656.142857142857
2520.8571428571427


In [8]:
# check flux through objective; this also shows that the models are not working correctly because the values are not feasible in vivo
for model in models.values():
    get_objective_value(model)

value of objective for AA1 is 66.94193425569041
value of objective for AA2 is 58.481001782528594
value of objective for AA3 is 43.436128611698244
value of objective for AA4 is 102.68616306092056
value of objective for AA5 is 43.93218694400248
value of objective for AA6 is 65.18134873230218
value of objective for AA7 is 50.884800170810124


In [120]:
# check reactions that maybe need curation because they are demands/sinks; are these supposed to be existing?
print("demands", AA1.demands)
print("sinks", AA1.sinks)
# print("exchanges", AA1.exchanges)

demands []
sinks [<Reaction sink_2ohph_c at 0x7cf0d019aa40>, <Reaction sink_hemeO_c at 0x7cf0d01244c0>, <Reaction sink_mobd_c at 0x7cf0d019a410>]


In [15]:
AA1.optimize().fluxes["sink_2ohph_c"]

np.float64(-0.0)

### get all charge unbalanced reactions for all models

In [16]:
# dictionary to store unbalanced reactions
unbalanced_reactions_dict = {}

# models is the dict where all models are stored that were "imported" witch read_sbml_file()
for name, model in models.items():
    unbalanced_reactions = check_charge_balance(model)
    unbalanced_reactions_dict[name] = unbalanced_reactions

# these numbers are in accordance with the numbers for "charge balance" reactions in the memote report

There are 445 charge unbalanced reactions in AA1
There are 473 charge unbalanced reactions in AA2
There are 372 charge unbalanced reactions in AA3
There are 410 charge unbalanced reactions in AA4
There are 360 charge unbalanced reactions in AA5
There are 451 charge unbalanced reactions in AA6
There are 452 charge unbalanced reactions in AA7


In [17]:
# We know how many unbalanced reactions each model has on their own but what is the overlap?
unique_reactions = set()

# Loop through all models and collect reaction IDs
for model_name, unbalanced_reactions in unbalanced_reactions_dict.items():
    # Add the reaction ID to the set (sets are by default like 'Mengen', i.e. they only have unique elements)
    unique_reactions.update(reaction.id for reaction in unbalanced_reactions.keys())

# this is a list of all the reaction IDs that are charge unbalanced throughout all models
unique_reaction_ids = list(unique_reactions)

print("There are {0} charge unbalanced reactions throughout all models.".format(len(unique_reaction_ids)))
# print(unique_reaction_ids)


There are 808 charge unbalanced reactions throughout all models.


In [69]:
# get reactions for specific model
reaction_names_aa1 = unbalanced_reactions_dict['AA1'].keys()
reaction_names_list = [reaction.id for reaction in reaction_names_aa1]
print(reaction_names_list)

['1P2CBXLCYCL', '1P2CBXLR', '23CTI1', '23CTI2', '23DK5MPPISO', '24DECOAR', '2AACLPGT160', '2AACLPGT181', '2AACLPPEAT160', '2AACLPPEAT181', '2ACLMM', '2OH3K5MPPISO', '3HAACOAT140', '3HAD40', '3OAR40', '4CMLCL_kt', 'A6PAG', 'AACPS1', 'AACPS3', 'AACPS5', 'AADa', 'AADb', 'AALDH', 'ACACT5r_1', 'ACACT6r_1', 'ACHBS', 'ACLS_a', 'ACLS_d', 'ACM6PH', 'ACMANApts', 'ACOAD10f', 'ACOAD11f', 'ACOAD12f', 'ACOAD13f', 'ACOAD14f', 'ACOAD15f', 'ACOAD16f', 'ACOAD17f', 'ACOAD18f', 'ACOAD19f', 'ACOAD20f', 'ACOAD21f', 'ACOAD23f', 'ACOAD25f', 'ACOAD26f', 'ACOAD29f', 'ACOAD3', 'ACOAD30f', 'ACOAD31f', 'ACOAD34f', 'ACOAD3f', 'ACOAD4_1', 'ACOAD6', 'ACOAD6f', 'ACP1_FMN', 'ACPPAT140', 'ACPPAT160', 'ACPPAT181', 'ACSERHS', 'ACSPHAC100', 'ACSPHAC101', 'ACSPHAC120', 'ACSPHAC121', 'ACSPHAC121d6', 'ACSPHAC140', 'ACSPHAC141', 'ACSPHAC141d5', 'ACSPHAC142', 'ACSPHAC160', 'ACSPHAC40', 'ACSPHAC50', 'ACSPHAC60', 'ACSPHAC70', 'ACSPHAC80', 'ACSPHAC90', 'ACSPHACP100', 'ACSPHACP40', 'ACSPHACP50', 'ACSPHACP60', 'ACSPHACP70', 'ACSPHAC

In [18]:
# we now the unbalanced reactions but which metabolites are part of these?
# go through all unbalanced (unique) reactions and get all participating metabolites
metabolite_counter_compartment = Counter()
metabolite_counter_name = Counter()
seen_reactions = set()  # Track reactions that were already counted

for model in models.values():
    for rxn_id in unique_reaction_ids:
        if rxn_id in model.reactions and rxn_id not in seen_reactions:
            reaction = model.reactions.get_by_id(rxn_id)
            for metabolite in reaction.metabolites:
                metabolite_counter_compartment[metabolite.id] += 1  # this is compartment specific, e.g. h2o_c and h2o_p are different metabolites
                metabolite_counter_name[metabolite.name] += 1  # h2o is only counted once not dependent on metabolite
            seen_reactions.add(rxn_id)  # Mark this reaction as counted


In [None]:
# Write to CSV: metabolite ID and how many times was this metabolite part of an unbalanced reaction
with open("metabolite_counts.csv", mode="w", newline="") as file:
    writer = csv.writer(file)
    writer.writerow(["metabolite_id (compartment specific)", "count"])
    for met_name, count in metabolite_counter_compartment.items():
        writer.writerow([met_name, count])

In [19]:
# these are the amounts of unique metabolites that are part of unbalanced reactions

# compartment specific, e.g. h20_c and h2o_p are counted separately
print(len(metabolite_counter_compartment))
# h2o only exists once
print(len(metabolite_counter_name))

990
880


In [32]:
# get the amount of metabolites that are only part of v reactions or check which metabolites are part of the most reactions

filtered = {m: v for m, v in metabolite_counter_compartment.items() if v == 1}
print(len(filtered))

print(metabolite_counter_compartment.most_common(5))


434
[('h_c', 417), ('h2o_c', 268), ('atp_c', 138), ('coa_c', 125), ('ppi_c', 103)]


In [33]:
%matplotlib notebook

# Step 1: Count how many keys have each count (i.e. histogram of values)
count_distribution = Counter(metabolite_counter_compartment.values())

# Step 2: Plot
plt.bar(count_distribution.keys(), count_distribution.values())
plt.xlabel('Amount of Reactions a Metabolite is Part of')
plt.ylabel('Number of Metabolites with that count')
plt.title('Distribution of Metabolite occurrences in unbalanced reactions')
plt.show()


<IPython.core.display.Javascript object>

In [13]:
# the number show how often one metabolite is part of an unbalanced reaction
metabolite_counter_compartment

Counter({'h_c': 417,
         'h2o_c': 268,
         'atp_c': 138,
         'coa_c': 125,
         'ppi_c': 103,
         'pi_c': 87,
         'amp_c': 85,
         'adp_c': 62,
         'h2o_p': 51,
         'co2_c': 50,
         'nadh_c': 48,
         'nad_c': 46,
         'h_p': 44,
         'pyr_c': 39,
         'nadph_c': 37,
         'nadp_c': 34,
         'fad_c': 34,
         'fadh2_c': 34,
         'ACP_c': 31,
         'o2_c': 28,
         'cmp_c': 26,
         'g3p_c': 21,
         'glyc3p_c': 19,
         'fe2_c': 19,
         'glu__L_c': 19,
         'nh4_c': 17,
         'pep_c': 17,
         'pi_p': 14,
         'r5p_c': 13,
         'accoa_c': 13,
         'f6p_c': 12,
         'gly_c': 12,
         'fmn_c': 11,
         'ser__L_c': 10,
         'ctp_c': 10,
         'akg_c': 10,
         '2dr1p_c': 10,
         'dhap_c': 9,
         'pppi_c': 8,
         'gtp_c': 8,
         'asp__L_c': 8,
         'uacgam_c': 8,
         'thmpp_c': 8,
         'ala__L_c': 8,
         

In [None]:
# all metabolites from unbalanced reactions
# this list is very important later on for some of the functions
unbalanced_mets = list(metabolite_counter_compartment.keys())

## BIGG

In [35]:
### Code by Chat-GPT ###
# this downloads metabolite information for all BIGG metabolites with their bigg ID, name, formulae and charge and saves it to a csv file
# there are 9088 metabolites (this is right number, also according to BIGG website)
# only needs to be executed ONCE to get the csv, in the next step, we'll read that csv again and turn it into a df

# Get list of all universal metabolites
base_url = "http://bigg.ucsd.edu/api/v2/"
list_url = base_url + "universal/metabolites"
response = requests.get(list_url)

# check if request is going through
if response.status_code != 200:
    raise Exception("Failed to fetch metabolite list")

metabolites = response.json()["results"]
print(f"Found {len(metabolites)} metabolites. Fetching details...")

# because this is one big chat-gpt code in this cell, i didnt extract the function and put it in the respective section
# function that fetches specific information for one metabolite, i.e. BIGG ID, name, formulas and charges
def fetch_metabolite_details(met):
    bigg_id = met.get("bigg_id", "")
    name = met.get("name", "")
    url = f"{base_url}universal/metabolites/{bigg_id}"
    try:
        r = requests.get(url, timeout=10)
        if r.status_code == 200:
            data = r.json()  # converts JSON response to a dictionary (data)
            formulae = data.get("formulae", [])  # if formula not available, use empty list []
            charges = data.get("charges", [])

            # Safe quoting for CSV
            safe_name = str(name).replace('"', "'")
            name = f'"{safe_name}"'

            return {
                "bigg_id": bigg_id,
                "name": name,
                "formulas": str(formulae),
                "charges": str(charges)
            }
    except Exception as e:
        print(f"Error with {bigg_id}: {e}")

    safe_name = str(name).replace('"', "'")
    name = f'"{safe_name}"'
    return {
        "bigg_id": bigg_id,
        "name": name,
        "formulas": "[]",
        "charges": "[]"
    }


# Use ThreadPoolExecutor to parallelize requests
results = []
with ThreadPoolExecutor(max_workers=25) as executor:
    futures = [executor.submit(fetch_metabolite_details, met) for met in metabolites]
    for i, future in enumerate(as_completed(futures)):
        results.append(future.result())
        if i % 500 == 0:
            print(f"{i}/{len(metabolites)} done...")

# Save to CSV
df = pd.DataFrame(results)
df.to_csv("bigg_metabolites_complete.csv", index=False, quoting=csv.QUOTE_MINIMAL)
print("Saved to bigg_metabolites_complete.csv")


Found 9088 metabolites. Fetching details...
0/9088 done...
500/9088 done...
1000/9088 done...
1500/9088 done...
2000/9088 done...
2500/9088 done...
3000/9088 done...
3500/9088 done...
4000/9088 done...
4500/9088 done...
5000/9088 done...
5500/9088 done...
6000/9088 done...
6500/9088 done...
7000/9088 done...
7500/9088 done...
8000/9088 done...
8500/9088 done...
9000/9088 done...
Saved to bigg_metabolites_complete.csv


In [36]:
# Read previously created CSV with all BIGG metabolites
df_bigg = pd.read_csv("bigg_metabolites_complete.csv", quotechar='"')

# Convert stringified lists back to real lists
df_bigg["formulas"] = df_bigg["formulas"].apply(ast.literal_eval)
df_bigg["charges"] = df_bigg["charges"].apply(ast.literal_eval)


In [37]:
df_bigg

Unnamed: 0,bigg_id,name,formulas,charges
0,10fthf6glu,"""10-formyltetrahydrofolate-[Glu](6)""",[C45H51N12O22],[-7]
1,10fthf,"""10-Formyltetrahydrofolate""",[C20H21N7O7],[-2]
2,10fthfglu__L,"""10-Formyltetrahydrofolyl L-glutamate""",[C25H28N8O10],[]
3,10fthf5glu,"""10-formyltetrahydrofolate-[Glu](5)""",[C40H45N11O19],[-6]
4,10m3ouACP,"""10-methyl-3-oxo-undecanoyl-ACP""",[C23H41N2O9PRS],[0]
...,...,...,...,...
9083,zymstest_SC,"""Zymosterol ester yeast specific C1694H2993O101""",[C1694H2993O101],[0]
9084,xylu__L,"""L-Xylulose""",[C5H10O5],[0]
9085,zymst,"""Zymosterol C27H44O""",[C27H44O],[0]
9086,zymstnl,"""5alpha-cholest-8-en-3beta-ol""",[C27H46O],[0]


In [152]:
# get metabolite info for all 7 models (i.e. formula and charge state) and save the 7 df's in a dict
model_mets = {f"AA{i}_mets": extract_met_info_model(models[f"AA{i}"]) for i in range(1, 8)}

In [213]:
model_mets["AA1_mets"]

Unnamed: 0,bigg_id,model_id,model_formula,model_charge
0,10fthf,10fthf_c,C20H21N7O7,-2
1,12dgr120,12dgr120_c,C27H52O5,0
2,12dgr120,12dgr120_p,C27H52O5,0
3,12dgr140,12dgr140_c,C31H60O5,0
4,12dgr140,12dgr140_p,C31H60O5,0
...,...,...,...,...
1527,xylb,xylb_e,C10H18O9,0
1528,xylu__D,xylu__D_c,C5H10O5,0
1529,zn2,zn2_c,Zn,2
1530,zn2,zn2_e,Zn,2


In [153]:
# merge the metabolite info from the models with the bigg info; creates 2 columns to show if charge/formula info match between model and bigg
# saves all 7 df's in a dict model_merged but for easier access to the individual df's, there are saved as objects (e.g. AA1_merged) but these are still linked to the dict
model_merged = {f"AA{i}_merged": compare_bigg_modelMets(model_mets[f"AA{i}_mets"], unbalanced_mets) for i in range(1, 8)}
AA1_merged, AA2_merged, AA3_merged, AA4_merged, AA5_merged, AA6_merged, AA7_merged = [model_merged[f"AA{i}_merged"] for i in range(1, 8)]

In [154]:
AA1_merged

Unnamed: 0,bigg_id,model_id,model_formula,model_charge,name,formulas,charges,charge_match,formula_match,unbalanced
0,10fthf,10fthf_c,C20H21N7O7,-2,"""10-Formyltetrahydrofolate""",[C20H21N7O7],[-2],True,True,True
1,12dgr120,12dgr120_c,C27H52O5,0,"""1,2-Diacyl-sn-glycerol (didodecanoyl, n-C12:0)""",[C27H52O5],[0],True,True,True
2,12dgr120,12dgr120_p,C27H52O5,0,"""1,2-Diacyl-sn-glycerol (didodecanoyl, n-C12:0)""",[C27H52O5],[0],True,True,True
3,12dgr140,12dgr140_c,C31H60O5,0,"""1,2-Diacyl-sn-glycerol (ditetradecanoyl, n-C1...",[C31H60O5],[0],True,True,True
4,12dgr140,12dgr140_p,C31H60O5,0,"""1,2-Diacyl-sn-glycerol (ditetradecanoyl, n-C1...",[C31H60O5],[0],True,True,True
...,...,...,...,...,...,...,...,...,...,...
1527,xylb,xylb_e,C10H18O9,0,"""Xylobiose""",[C10H18O9],[],False,True,False
1528,xylu__D,xylu__D_c,C5H10O5,0,"""D-Xylulose""",[C5H10O5],[0],True,True,False
1529,zn2,zn2_c,Zn,2,"""Zinc""",[Zn],[2],True,True,False
1530,zn2,zn2_e,Zn,2,"""Zinc""",[Zn],[2],True,True,False


In [155]:
# extracts just the rows where charge and/or formula dont match with bigg info and saves them into a dict with the seven df's
model_mismatch = {f"AA{i}_mismatch": get_mismatches_after_merge(model_merged[f"AA{i}_merged"]) for i in range(1, 8)}

In [156]:
model_mismatch["AA1_mismatch"]

Unnamed: 0,model_id,bigg_id,model_charge,charges,model_formula,formulas,charge_match,formula_match,unbalanced
28,1btol_c,1btol,0,[],C4H10O,[C4H10O],False,True,False
32,1p2cbxl_c,1p2cbxl,0,[-1],C5H6NO2,[C5H6NO2],False,True,True
35,23ddhb_c,23ddhb,0,[-1],C7H7O4,[C7H7O4],False,True,True
46,2agpe160_c,2agpe160,0,[0],C21H44NO7P,[C21H44NO7P1],True,False,True
47,2agpe160_p,2agpe160,0,[0],C21H44NO7P,[C21H44NO7P1],True,False,False
...,...,...,...,...,...,...,...,...,...
1507,val__D_p,val__D,0,[0],C5H9NO2,[C5H11NO2],True,False,False
1522,xyl3_c,xyl3,0,[],C15H26O13,[C15H26O13],False,True,False
1523,xyl3_e,xyl3,0,[],C15H26O13,[C15H26O13],False,True,False
1526,xylb_c,xylb,0,[],C10H18O9,[C10H18O9],False,True,False


In [157]:
# confusion matrix to show for how many metabolites there are differences in the infos between current model and bigg
get_confmat_charge_formula(AA1_merged)

   charge_match  formula_match  count
0         False          False    178
1         False           True    198
2          True          False    103
3          True           True   1053


In [158]:
# function also takes dict with all the 7 df's and creates one big confusion matrix
get_confmat_charge_formula(model_merged)

  charge_match forumla_match   AA1   AA2   AA3   AA4   AA5   AA6   AA7
0        False         False   178   233   149    96   133   195   189
1        False          True   198   258   235   181   176   238   230
2         True         False   103   103    96   112    95   103    74
3         True          True  1053  1273  1028  1377  1038  1266  1183


In [None]:
# to-do
# allg.: duplicate metabolites anschauen
# bigg info nur mit unbalanced reaktionen gegen checken
# bigg infos actually einsetzen/model infos überschreiben und memote machen

In [None]:
# wenn man später actually Änderungen am Model vornehmen will, e.g. andere charges ausprobieren, am besten mit
# with model:
#   bla bla bla
# so überschreibt man das Modell nicht und hat erst mal ne work in progress version, wo man testen kann, ob die Änderungen actually gut sind

## Combine Bigg Mismatches and unbalanced reactions

In [98]:
# for all metabolites in AA1, there are 816 metabolites (53%) that are not part of unbalanced reactions and 716 metabolites that are in unbalanced reactions
# metabolites are only part
AA1_merged['unbalanced'].value_counts()

unbalanced
False    816
True     716
Name: count, dtype: int64

In [99]:
# if we only look at metabolites where BIGG infos and model infos do NOT match, we now have 319 metabolites (66.6%) that are in unbalanced reactions and only 33.4% of these metabolites are in balanced reactions
model_mismatch["AA1_mismatch"]['unbalanced'].value_counts()

unbalanced
True     319
False    160
Name: count, dtype: int64

In [100]:
combo_counts = AA1_merged.groupby(['unbalanced', 'charge_match', 'formula_match']).size().reset_index(name='count')
print(combo_counts)
# "False True True" is optimal case, i.e. metabolite has infos that matches with bigg and is not in any unbalanced reaction

   unbalanced  charge_match  formula_match  count
0       False         False          False     59
1       False         False           True     56
2       False          True          False     45
3       False          True           True    656
4        True         False          False    119
5        True         False           True    142
6        True          True          False     58
7        True          True           True    397


In [91]:
AA6_merged['unbalanced'].value_counts()

unbalanced
0    1026
1     776
Name: count, dtype: int64

In [92]:
model_mismatch["AA6_mismatch"]['unbalanced'].value_counts()

unbalanced
1    343
0    193
Name: count, dtype: int64

In [101]:
combo_counts = AA6_merged.groupby(['unbalanced', 'charge_match', 'formula_match']).size().reset_index(name='count')
print(combo_counts)

   unbalanced  charge_match  formula_match  count
0       False         False          False     83
1       False         False           True     63
2       False          True          False     47
3       False          True           True    833
4        True         False          False    112
5        True         False           True    175
6        True          True          False     56
7        True          True           True    433


## Overwrite model with BIGG information
We want to try out if the information on BIGG is valuable to our model to help with the big amount of charge unbalanced reactions.
That means for reactions or rather metabolites were the bigg information is different to our model info, we can try to overwrite it with the bigg info.

There are multiple cases that we can try out: \
(i) only overwrite info from metabolites that are part in unbalanced reactions \
(ii) completely use bigg info

There are metabolites with multiple possible charge states and formulas (so at the moment these metabolites could be flagged as a match/correct but maybe another charge state would work better. \
(i.2) and (ii.2) change charge states when there are multiple

In [112]:
AA1_merged

Unnamed: 0,bigg_id,model_id,model_formula,model_charge,name,formulas,charges,charge_match,formula_match,unbalanced
0,10fthf,10fthf_c,C20H21N7O7,-2,"""10-Formyltetrahydrofolate""",[C20H21N7O7],[-2],True,True,True
1,12dgr120,12dgr120_c,C27H52O5,0,"""1,2-Diacyl-sn-glycerol (didodecanoyl, n-C12:0)""",[C27H52O5],[0],True,True,True
2,12dgr120,12dgr120_p,C27H52O5,0,"""1,2-Diacyl-sn-glycerol (didodecanoyl, n-C12:0)""",[C27H52O5],[0],True,True,True
3,12dgr140,12dgr140_c,C31H60O5,0,"""1,2-Diacyl-sn-glycerol (ditetradecanoyl, n-C1...",[C31H60O5],[0],True,True,True
4,12dgr140,12dgr140_p,C31H60O5,0,"""1,2-Diacyl-sn-glycerol (ditetradecanoyl, n-C1...",[C31H60O5],[0],True,True,True
...,...,...,...,...,...,...,...,...,...,...
1527,xylb,xylb_e,C10H18O9,0,"""Xylobiose""",[C10H18O9],[],False,True,False
1528,xylu__D,xylu__D_c,C5H10O5,0,"""D-Xylulose""",[C5H10O5],[0],True,True,False
1529,zn2,zn2_c,Zn,2,"""Zinc""",[Zn],[2],True,True,False
1530,zn2,zn2_e,Zn,2,"""Zinc""",[Zn],[2],True,True,False


In [113]:
AA1.metabolites.get_by_id("1p2cbxl_c")

0,1
Metabolite identifier,1p2cbxl_c
Name,1-Pyrroline-2-carboxylate
Memory address,0x7492ff124280
Formula,C5H6NO2
Compartment,C_c
In 3 reaction(s),"1P2CBXLR, DAAD7, 1P2CBXLCYCL"


In [114]:
reaction = AA1.reactions.get_by_id("DAAD7")
print(reaction)
charges = {met.id: met.charge for met in reaction.metabolites}
for met, charge in charges.items():
    print(f"{met}: {charge}")


DAAD7: fad_c + pro__D_c --> 1p2cbxl_c + fadh2_c
fad_c: 0
pro__D_c: 0
1p2cbxl_c: 0
fadh2_c: 0


In [118]:
AA1_merged["charges"][37][0]

-1

In [121]:
if not AA1_merged["charges"][27]:
    print("leer")

In [168]:
if len(AA1_merged["charges"][37]) > 1:
    print("gross")

gross


In [134]:
import copy

AA1_copy = copy.deepcopy(AA1)

In [211]:
# händische Änderungen
AA1_copy.metabolites.get_by_id("fad_c").charge = -2
ribflv_c

In [222]:
counter_mulcharges = 0
for i in range(0,len(AA1_merged)):
    # print(AA1_merged["bigg_id"][i])
    if len(AA1_merged["charges"][i]) == 1:
        AA1_copy.metabolites.get_by_id(AA1_merged["model_id"][i]).charge = int(AA1_merged["charges"][i][0])
    else:
        AA1_copy.metabolites.get_by_id(AA1_merged["model_id"][i]).charge = int(AA1_copy.metabolites.get_by_id(AA1_merged["model_id"][i]).charge)
    if len(AA1_merged["charges"][i]) >1:
        counter_mulcharges += 1
    if len(AA1_merged["formulas"][i]) == 1:
        AA1_copy.metabolites.get_by_id(AA1_merged["model_id"][i]).formula = AA1_merged["formulas"][i][0]
check_charge_balance(AA1_copy)
get_objective_value(AA1_copy)

There are 158 charge unbalanced reactions in AA1
value of objective for AA1 is 66.94193425569044


In [223]:
counter_mulcharges

153

In [213]:
from cobra.io import write_sbml_model

write_sbml_model(AA1_copy, 'AA1_bigg.xml')

In [170]:
check_charge_balance(AA1)

There are 445 charge unbalanced reactions in AA1


{<Reaction 1P2CBXLCYCL at 0x74918d0510f0>: {'charge': 1.0},
 <Reaction 1P2CBXLR at 0x74918d051ba0>: {'charge': -1.0},
 <Reaction 23CTI1 at 0x74918d051de0>: {'charge': -3.0},
 <Reaction 23CTI2 at 0x74918d0511e0>: {'charge': -4.0},
 <Reaction 23DK5MPPISO at 0x74918d052170>: {'charge': 2.0},
 <Reaction 24DECOAR at 0x74918d052260>: {'charge': -4.0},
 <Reaction 2AACLPGT160 at 0x74918d0521d0>: {'charge': 2.0},
 <Reaction 2AACLPGT181 at 0x74918d052530>: {'charge': 1.0},
 <Reaction 2AACLPPEAT160 at 0x74918d0527a0>: {'charge': 1.0},
 <Reaction 2AACLPPEAT181 at 0x74918d052950>: {'charge': 1.0},
 <Reaction 2ACLMM at 0x74918d052b00>: {'charge': 1.0},
 <Reaction 2OH3K5MPPISO at 0x74918d053d30>: {'charge': -2.0},
 <Reaction 3HAACOAT140 at 0x74918d053dc0>: {'charge': 1.0},
 <Reaction 3HAD40 at 0x74918d053f40>: {'charge': 1.0},
 <Reaction 3OAR40 at 0x74918d08cc10>: {'charge': -1.0},
 <Reaction 4CMLCL_kt at 0x74918d08d9c0>: {'charge': -2.0},
 <Reaction A6PAG at 0x74918d08f6a0>: {'charge': 2.0},
 <React

In [124]:
len(list(AA1.reactions))

2307

In [139]:
AA1.reactions.get_by_id("RNTR4").metabolites

{<Metabolite trdrd_c at 0x74918d417df0>: -1.0,
 <Metabolite utp_c at 0x74918d270910>: -1.0,
 <Metabolite dutp_c at 0x7492e5272860>: 1.0,
 <Metabolite h2o_c at 0x74918d5b0fd0>: 1.0,
 <Metabolite trdox_c at 0x74918d417dc0>: 1.0}

In [126]:
AA1.metabolites.get_by_id("12dgr181_p").formula

'C39H72O5'

In [144]:
AA1.metabolites.get_by_id("1p2cbxl_c").charge

0

In [143]:
AA1_copy.metabolites.get_by_id("1p2cbxl_c").charge

-1

In [146]:
AA1_copy_mets = extract_met_info_model(AA1_copy)
AA1_copy_merged = compare_bigg_modelMets(AA1_copy_mets, unbalanced_mets)

In [198]:
mass_imbalanced = ["SORD_D","FDMO","ACOAD21f","ACOAD20f","FDMO3_1","ALDD31_1","ACOAD25f","SALCHS4FER2","ACOAD29f","RBFSb","ACOAD30f","ACOAD31f","PROR","ACOAD34f","KAT23","MECDPDH3_syn","MECDPDH4E","FE3DCITR3","CO2FO","FMNRx","ACOAD10f","AMPEP11","AMPEP14","ENTCS","DAAD11","DAAD12","DAAD3","DAAD7","AADa","AADb","AALDH","ASR","DB4PS","FMNAT","ACOAD19f","FMNRy","FMNRx2","ACOAD12f","ACOAD11f","ACOAD15f","ACOAD13f","ACOAD16f","FDMO4_1","ACOAD17f","ACOAD14f","ACOAD18f","FDMO4","PHER","PHET","NMO"]


In [217]:
for r in mass_imbalanced:
    print(r, AA1_copy.reactions.get_by_id(r).check_mass_balance())

SORD_D {'charge': -2.0, 'H': -2.0}
FDMO {'charge': 2.0, 'H': 2.0}
ACOAD21f {'charge': -2.0, 'H': -2.0}
ACOAD20f {'charge': -2.0, 'H': -2.0}
FDMO3_1 {'charge': 2.0, 'H': 2.0}
ALDD31_1 {'charge': 1.0, 'H': 1.0}
ACOAD25f {'charge': -1.0, 'H': -1.0}
SALCHS4FER2 {'charge': 2.0, 'H': 2.0}
ACOAD29f {'charge': -2.0, 'H': -2.0}
RBFSb {'H': 2.0}
ACOAD30f {'charge': -1.0, 'H': -1.0}
ACOAD31f {'charge': 2.0, 'H': 2.0}
PROR {'charge': 1.0, 'H': 1.0}
ACOAD34f {'charge': -2.0, 'H': -2.0}
KAT23 {'H': 1.0}
MECDPDH3_syn {'charge': -4.0, 'X': 2.0}
MECDPDH4E {'charge': -3.0, 'X': 1.0}
FE3DCITR3 {'charge': 2.0, 'H': 2.0}
CO2FO {'charge': -3.0, 'X': 1.0}
FMNRx {'charge': -2.0, 'H': -2.0}
ACOAD10f {'charge': -2.0, 'H': -2.0}
AMPEP11 {'charge': 1.0, 'H': 1.0}
AMPEP14 {'charge': -1.0, 'H': -1.0}
ENTCS {'charge': 1.0, 'H': 1.0}
DAAD11 {'charge': -2.0, 'H': -2.0}
DAAD12 {'charge': -2.0, 'H': -2.0}
DAAD3 {'charge': -2.0, 'H': -2.0}
DAAD7 {'charge': -1.0, 'H': -1.0}
AADa {'charge': -2.0, 'H': -2.0}
AADb {'charge':

In [227]:
AA1_copy.metabolites.get_by_id("ribflv_c")

0,1
Metabolite identifier,ribflv_c
Name,Riboflavin C17H20N4O6
Memory address,0x7492a3d81b40
Formula,C17H22N4O6
Compartment,C_c
In 4 reaction(s),"Growth, RBFK, RBFSb, ACP1_FMN"


In [225]:
rxn = AA1_copy.reactions.get_by_id("RBFSb")
charges = {met.id: met.charge for met in rxn.metabolites}
masses = {met.id: met.formula for met in rxn.metabolites}
print(rxn, charges, masses)

RBFSb: 2.0 dmlz_c --> 4r5au_c + ribflv_c {'dmlz_c': 0, '4r5au_c': 0, 'ribflv_c': 0} {'dmlz_c': 'C13H18N4O6', '4r5au_c': 'C9H16N4O6', 'ribflv_c': 'C17H22N4O6'}


In [224]:
AA1_copy.reactions.get_by_id("RBFSb")

0,1
Reaction identifier,RBFSb
Name,Riboflavin synthase
Memory address,0x7491ac9aa320
Stoichiometry,"2.0 dmlz_c --> 4r5au_c + ribflv_c  2.0 6,7-Dimethyl-8-(1-D-ribityl)lumazine --> 4-(1-D-Ribitylamino)-5-aminouracil + Riboflavin C17H20N4O6"
GPR,WP_079220613_1 or WP_079220615_1
Lower bound,0.0
Upper bound,1000.0


In [None]:
["ACOAD1f","ACOAD1fr","SORD_D","PPGPPDP","ACOAD23f","GTPDPDP","GTPDPK_1","FDMO","ACOAD21f","GTPDPK","ACOAD20f","QSDH","ACOAD22f","FDMO3_1","SUCD4","ACOAD24f","ALDD31_1","ACOAD25f","ACOAD26f","ACOAD27f","PPNDH2","TPI","TGBPA","SALCHS4FER2","ACOAD28f","TRPS1","QUIDH","TALA","ACOAD29f","LPLIPAL1E181d11pp","TRPS3","ACOAD2f","RBFK","PRASCSi","VCOAD","BKDC","BKDA2","4CMLCL_kt","BTS2","BTS4","DMALRED","DLYSAD","ilvg","BZSCD","DXPS","PACPT_1","ACOAD30f","PRFGS","ACOAD32f","PROD2","GLUTCOADHc","PROR","GLUTRS","GLUTRR","ACOAD33f","ACOAD34f","ACOAD3f","ACOAD4f","ACOAD5f","MECDPDH3_syn","U23GAAT","GLYCYSAP","ACOAD8f","MECDPDH4E","UAGAAT","GLYCYSabc","ACOAD6f","ACOAD7f","SCYSSL","FE3DCITR3","GLYLEUAP","AIRC1","GLYLEUtr","AIRC2","GALpts","AIRC3","GLYPHEAP","GLYTYRAP","GLYPHEtr","PLCD","GLYTYRabc","23CTI1","GAPD","PLIPA1E181d11pp","CO2FO","FACOAL181d11tpp","2AACLPGT160","PLIPA1G181d11pp","GDPTPDP","2AACLPGT181","2AACLPPEAT160","FBA","GCVHADPr","FACOAL1812","SLCYSS","2AACLPPEAT181","FE3PYOVDDR2","FMNRx","SMIA1","ACOAD10f","ACLS_a","SMIA2abc","GCVHRADPr","SMIB1","UT6PT","APAT_1","AMMQT8","AMPEP11","PGI1c","MB2CFO","AMPEP14","AMPEP16","CT6PT","G3PAT160","AMPMS2","MBCOAi","AGPAT181","G3PAT181","AHMMPS_1","CYTCAA3pp","MUCCY_kt","CYO1_KT","AGPAT160","HEMEOS","ACPPAT181","ACP1_FMN","ACPPAT140","ACPPAT160","ACPS1_1","3HAACOAT140","3HAD40","3HACPH","ENTCS","3OAR40","3HOXTPP","EDA","HMR_0260","DAAD","DAAD11","DAAD12","DAAD2","DAAD3","DAAD4","A6PAG","GDPDPK","DAAD5","DAAD7","DAAD8","AACPS1","LPLIPAL1G181d11pp","TKT2","AACPS3","DDPGALA","AACPS5","AADa","CDGUNPD","MS_1","AADb","CDPPT160","AALDH","BEF","DB4PS","DGUNC","TKT1","ACOAD19f","ACLS_d","FMNRy","FMNRx2","ACOAD12f","FORGLUIH2","ACOAD15f","ACOAD13f","ACOAD16f","FDMO4_1","NFORGLUAH2","ACOAD17f","ACOAD14f","ACOAD18f","FDMO4","GLTPD","ACPpds","PHER","PFK_2","PHET","NMO"]

In [None]:
["BKDC","ACOAD30f","PRFGS","ACOAD31f","PROR","ACOAD34f","BKDA2","BTS2","BTS4","DXPS","PACPT_1","GLUTRS","GLUTRR","AIRC1","AIRC2","AIRC3","23CTI1","FACOAL181d11tpp","2AACLPGT160","GLYCYSabc","2AACLPGT181","2AACLPPEAT160","FBA","FACOAL1812","2AACLPPEAT181","MECDPDH3_syn","GLYCYSAP","MECDPDH4E","ACOAD10f","ACLS_a","SCYSSL","FE3DCITR3","GLYLEUAP","GLYLEUtr","GALpts","GLYPHEAP","GLYTYRAP","GLYPHEtr","PLCD","GLYTYRabc","GAPD","PLIPA1E181d11pp","CO2FO","GDPTPDP","PLIPA1G181d11pp","G3PAT160","AGPAT181","GCVHADPr","G3PAT181","SLCYSS","FE3PYOVDDR2","FMNRx","U23GAAT","SMIA1","UAGAAT","AGPAT160","SMIB1","SMIA2abc","AHMMPS_1","GCVHRADPr","UT6PT","APAT_1","A6PAG","AMMQT8","ACPPAT181","ACP1_FMN","AACPS1","ACPPAT140","AMPEP11","PGI1c","ACPPAT160","AMPEP14","AACPS3","AMPEP16","ACPS1_1","3HAACOAT140","CT6PT","DDPGALA","AACPS5","AADa","3HAD40","AADb","3HACPH","AALDH","AMPMS2","3OAR40","3HOXTPP","DGUNC","ilvg","CYTCAA3pp","MUCCY_kt","CYO1_KT","GDPDPK","HEMEOS","LPLIPAL1G181d11pp","CDGUNPD","CDPPT160","ENTCS","FMNAT","EDA","HMR_0260","ACOAD19f","DAAD11","ACLS_d","FMNRy","DAAD12","DAAD3","FMNRx2","ACOAD12f","DAAD7","ACOAD11f","FORGLUIH2","ACOAD15f","TKT2","ACOAD13f","ACOAD16f","FDMO4_1","ACOAD17f","ACOAD14f","MS_1","ACOAD18f","DB4PS","FDMO4","TKT1","GLTPD","ACPpds","FDMO","ACOAD21f","ACOAD20f","FDMO3_1","NFORGLUAH2","ACOAD25f","ACOAD29f","PHER","PFK_2","PHET","NMO","SORD_D","PPGPPDP","GTPDPDP","GTPDPK_1","GTPDPK","4CMLCL_kt","QSDH","ALDD31_1","TPI","TRPS1","TGBPA","QUIDH","TALA","SALCHS4FER2","LPLIPAL1E181d11pp","PRASCSi","TRPS3","RBFK"]


In [215]:
rxn = AA1_copy.reactions.get_by_id("ACOAD30f")
charges = {met.id: met.charge for met in rxn.metabolites}
print(charges)

{'fad_c': -2, 'hdd4coa_c': -4, 'h_c': 1, 'fadh2_c': -2, 'hdd4_2_coa_c': -4}


In [214]:
# hier muss h raus (ist bei BIGG auch nicht da)
AA1_copy.reactions.get_by_id("ACOAD30f")

0,1
Reaction identifier,ACOAD30f
Name,Acyl CoA dehydrogenase cis hexadec 4 enoyl CoA
Memory address,0x7492b8f3fca0
Stoichiometry,fad_c + h_c + hdd4coa_c <=> fadh2_c + hdd4_2_coa_c  Flavin adenine dinucleotide oxidized + H+ + Cis hexadec 4 enoyl CoA <=> Flavin adenine dinucleotide reduced + Trans cis hexadeca 2 4 dienoyl CoA
GPR,WP_079222834_1
Lower bound,-1000.0
Upper bound,1000.0


## Compare with already curated models

idea from Martina: check with e.g. nicely curated E. coli model (gram negative) and check reactions/metabolites/pathways there and if we have them in our models we can compare