# 1. Model Summary

This notebook loads a downloaded metabolic model and prints a summary of its key properties, such as the number of genes, metabolites, and reactions. It serves as a quick check to ensure the model is loaded correctly and to get a high-level overview before performing analyses.

We will add a simple histogram showing the number of metabolites per reaction and have a first look at the package *matplotlib*.

**NOTES**: We'll be using the <span style="color:blue">iML1515</span> model to explain what we see, but you can choose any model you want and make the same deductions on your own.

In [1]:
import cobra
import os
import sys
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
def get_available_models(model_dir="../models"):
    """Scans the model directory for available .json model files."""
    if not os.path.isdir(model_dir):
        return {}
    # Sort the files to ensure a consistent order
    sorted_files = sorted([f for f in os.listdir(model_dir) if f.endswith(".json")])
    models = {
        str(i + 1): f.replace(".json", "")
        for i, f in enumerate(sorted_files)
    }
    return models

The function **get_available_models** scans the model directory looking for *.json* files. It then returns a dictionary of the sorted list of the models.

The function is used in the block below to ask us to choose which model to upload.

In [3]:
AVAILABLE_MODELS = get_available_models()

if not AVAILABLE_MODELS:
    print("❌ No models found in the 'models/' directory.", file=sys.stderr)
    print("Please run the '0_download_model.py' script first.", file=sys.stderr)
else:
    print("Please choose which model to summarize:")
    for key, value in AVAILABLE_MODELS.items():
        print(f"  {key}: {value}")

    choice = input("Enter the number of your choice: ")
    model_id = AVAILABLE_MODELS.get(choice)

    if not model_id:
        print("❌ Invalid choice.", file=sys.stderr)
    else:
        model_path = f"../models/{model_id}.json"
        print(f"\n📖 Loading model from {model_path}...")
        try:
            model = cobra.io.load_json_model(model_path)
            print(f"✅ {model_id} loaded successfully.\n")
        except Exception as e:
            print(f"❌ Failed to load model. Error: {e}", file=sys.stderr)


Please choose which model to summarize:
  1: iML1515

📖 Loading model from ../models/iML1515.json...
✅ iML1515 loaded successfully.



Once we've chosen a model to upload, we can now proceed with a first general look at the model content: number of reactions, metabolites, and genes.

We will also print the first 6 reactions, metabolites and genes.


In [4]:
# Print summary of the model
try:
    print(f"\n--- Summary for {model_id} ---")
    print(model)
    print("\n")
    print(f"Reactions:   {len(model.reactions)}")
    print(f"Metabolites: {len(model.metabolites)}")
    print(f"Genes:       {len(model.genes)}")
    
    # Print first 6 reactions, metabolites, and genes
    print("\nFirst 6 Reactions:")
    for reaction in model.reactions[:6]:
        print(reaction)
    print("\nFirst 6 Metabolites:")
    for metabolite in model.metabolites[:6]:
        print(metabolite)
    print("\nFirst 6 Genes:")
    for gene in model.genes[:6]:
        print(gene)
except Exception as e:
    print(f"❌ Failed to load model. Error: {e}", file=sys.stderr)


--- Summary for iML1515 ---
iML1515


Reactions:   2712
Metabolites: 1877
Genes:       1516

First 6 Reactions:
CYTDK2: cytd_c + gtp_c --> cmp_c + gdp_c + h_c
XPPT: prpp_c + xan_c --> ppi_c + xmp_c
HXPRT: hxan_c + prpp_c --> imp_c + ppi_c
NDPK5: atp_c + dgdp_c <=> adp_c + dgtp_c
SHK3Dr: 3dhsk_c + h_c + nadph_c <=> nadp_c + skm_c
NDPK6: atp_c + dudp_c <=> adp_c + dutp_c

First 6 Metabolites:
octapb_c
cysi__L_e
dhap_c
prbatp_c
10fthf_c
btal_c

First 6 Genes:
b2551
b0870
b3368
b2436
b0008
b3500


## Observations
GEMs (Genome-Scale Metabolic Models) like iML1515 are based on decades of biochemical data and curated manually. Each reaction is supported by literature and genome annotation.

These stats tell us how comprehensive the model is and help validate the analysis later (e.g., if a key gene or metabolite is missing).

### Reactions
In the iML1515 model, We can see the first reaction is:

<code>CYTDK2: cytd_c + gtp_c --> cmp_c + gdp_c + h_c </code>

This is nucleotide salvage pathway converting cytidine to CMP using GTP.

Another reaction to note is:

<code> NDPK5: atp_c + dgdp_c <=> adp_c + dgtp_c </code>

Catalyzed by nucleoside diphosphate kinase; balances dNTP pools.

In the reactions, the arrow indicates the direction in which the reaction occurs. Reactions with <span style="color:orange"><=></span> are reversible; <span style="color:orange">--></span> are irreversible.

Also, suffix <span style="color:orange">\*_c </span> = cytosolic while <span style="color:orange">\*_e</span> = extracellular.



### Metabolites
Learning to interpret metabolite names helps understand pathways. Otherwise always be ready to check on literature for more information.

<code>cysi__L_e</code> is an L-cystine in the extracellular space (amino acid)



### Genes
Genes matter in FBA. We can simulate gene knockouts by disabling all reactions controlled by a gene. We can also map experimental transcriptomics/proteomics data to gene IDs

### 🦾 Extra

📘 Suggested Exercise

In your notebook, add cells that:

1. List all compartments with model.compartments.
2. Count reversible vs irreversible reactions.
3. Plot a histogram of how many metabolites are involved per reaction.
4. Print the objective function and explain it.