<a href="https://colab.research.google.com/github/hariszaf/metabolic_toy_model/blob/main/Antony2025/participantProjects.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Finding Optimal Ecological Partners For Gut Colonization** (Julian Tap / Lu Chen)

\\

# **Impact of methanol on anaerobic digestion** (Ariane Bize)

\\

# **Understanding the communities in the anode and cathode of bioelectrosynthesis reactors** (Louise Rigaud / Wenbo Sui)

### **Basic Setup**

Setup **Gurobi**, **COBRApy**, **modelSEEDpy**, and **DNNGIOR**. See the [setting up your environment](https://colab.research.google.com/github/hariszaf/metabolic_toy_model/blob/main/Antony2025/preparingYourEnvironment.ipynb).

In [1]:
import os
def create_gurobi_license():
    license_content = (
        "# Gurobi WLS license file\n"
        "# Your credentials are private and should not be shared or copied to public repositories.\n"
        "# Visit https://license.gurobi.com/manager/doc/overview for more information.\n"
        "WLSACCESSID=1fedf73b-9471-4da8-bdc7-2aaacf2e30f3\n"
        "WLSSECRET=3bc7d209-a4ec-4195-98be-4b254f181512\n"
        "LICENSEID=940603"
    )
    with open("/content/licenses/gurobi.lic", "w") as f:
        f.write(license_content)
    print("License file created at /content/licenses/gurobi.lic")



# Create directory for the license
os.makedirs("/content/licenses", exist_ok=True)

# Generate the license file
create_gurobi_license()

#add to path
os.environ['GRB_LICENSE_FILE'] = '/content/licenses/gurobi.lic'

License file created at /content/licenses/gurobi.lic


In [2]:
# @title
!pip install gurobipy
!pip install cobra

!pip install pyrodigal
!pip install biopython

!pip install dnngior --no-deps

!git clone https://github.com/ModelSEED/ModelSEEDpy
!pip install ModelSEEDpy/.

!git clone https://github.com/hariszaf/metabolic_toy_model.git

Collecting gurobipy
  Downloading gurobipy-12.0.1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (16 kB)
Downloading gurobipy-12.0.1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (14.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m14.4/14.4 MB[0m [31m57.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: gurobipy
Successfully installed gurobipy-12.0.1
Collecting cobra
  Downloading cobra-0.29.1-py2.py3-none-any.whl.metadata (9.3 kB)
Collecting appdirs~=1.4 (from cobra)
  Downloading appdirs-1.4.4-py2.py3-none-any.whl.metadata (9.0 kB)
Collecting depinfo~=2.2 (from cobra)
  Downloading depinfo-2.2.0-py3-none-any.whl.metadata (3.8 kB)
Collecting diskcache~=5.0 (from cobra)
  Downloading diskcache-5.6.3-py3-none-any.whl.metadata (20 kB)
Collecting optlang~=1.8 (from cobra)
  Downloading optlang-1.8.3-py2.py3-none-any.whl.metadata (8.2 kB)
Collecting python-libsbml~=5.19 (from cobra)
  Downloading python_libsbml-5.20.

In [3]:
# @title
import gurobipy
from gurobipy import Model
model = Model("test")
print("Gurobi is working!", "\U0001F600")

#install COBRApy
import cobra
from cobra.io import load_model
model = load_model("textbook")
solution = model.optimize()
print(f"flux balance analysis solution is {solution.objective_value}")
print("COBRApy is working", "\U0001F600")

Set parameter WLSAccessID
Set parameter WLSSecret
Set parameter LicenseID to value 940603
Academic license 940603 - for non-commercial use only - registered to da___@gmail.com
Gurobi is working! 😀
flux balance analysis solution is 0.8739215069684305
COBRApy is working 😀


### **Build the *Bifidobacterium* model**

same steps as yesterday and this morning:

\\

1) Download the protein fasta from EMBL;

\\

2) Annotate the genome with RAST;

\\

3) Use model modelSEEDpy to build a draft model;

\\

4) Use DNNGIOR to gapfil

#### **Get the bifido genome**

In [4]:
import gzip
import shutil
import urllib

url = "https://ftp.ensemblgenomes.ebi.ac.uk/pub/bacteria/current/fasta/bacteria_100_collection/bifidobacterium_adolescentis_atcc_15703_gca_000010425/pep/Bifidobacterium_adolescentis_atcc_15703_gca_000010425.ASM1042v1.pep.all.fa.gz"
gz_id = "Bifidobacterium adolescentis_atcc_15703.fa.gz"
fast_id = "Bifidobacterium adolescentis_atcc_15703.fa"

urllib.request.urlretrieve(url, gz_id)
with gzip.open(gz_id, 'rb') as f_in:
    with open(fast_id, 'wb') as f_out:
        shutil.copyfileobj(f_in, f_out)

os.remove(gz_id)
!ls -la

total 1016
drwxr-xr-x  1 root root    4096 Mar 12 21:46  .
drwxr-xr-x  1 root root    4096 Mar 12 21:42  ..
-rw-r--r--  1 root root 1011415 Mar 12 21:46 'Bifidobacterium adolescentis_atcc_15703.fa'
drwxr-xr-x  4 root root    4096 Mar 11 13:29  .config
drwxr-xr-x  2 root root    4096 Mar 12 21:44  licenses
drwxr-xr-x  7 root root    4096 Mar 12 21:46  metabolic_toy_model
drwxr-xr-x 10 root root    4096 Mar 12 21:45  ModelSEEDpy
drwxr-xr-x  1 root root    4096 Mar 11 13:29  sample_data


#### **Annotate and build the draft model**

We can make a small modification to the `reconstruct_draft_model` function to give the condition `is_dna_fasta` to know if we should annotate ORFs.

In [5]:
from Bio import SeqIO
import pyrodigal

import modelseedpy
from modelseedpy.core.msgenome import MSGenome

from modelseedpy.core.rast_client import RastClient
rast = RastClient()

from modelseedpy import MSBuilder


def reconstruct_draft_model(model_id, input_fasta_file, output_model_sbml, is_dna_fasta = False):

  if is_dna_fasta:
    record = SeqIO.read(input_fasta_file, "fasta")
    dna_seq = str(record.seq)  # Convert the sequence to a plain string
    gene_finder = pyrodigal.GeneFinder()
    gene_finder.train(dna_seq)
    genes = gene_finder.find_genes(dna_seq)

    with open(input_fasta_file + ".pep.fa", "w") as f:
      genes.write_translations(f, sequence_id="seqXYZ")

    input_fasta_file = input_fasta_file + ".pep.fa"

  genome = MSGenome.from_fasta(input_fasta_file, split = ' ')
  rast.annotate_genome(genome)

  base_model = MSBuilder.build_metabolic_model(model_id = model_id,
                                             genome   = genome,
                                             index    = "0",
                                             classic_biomass = True,
                                             gapfill_model   = False,
                                             gapfill_media   = None,
                                             annotate_with_rast = True,
                                             allow_all_non_grp_reactions = True
                                            )

  cobra.io.write_sbml_model(cobra_model = base_model, filename = output_model_sbml)

  return base_model

model_id = 'Bifidobacterium_adolescentis_atcc_15703'
input_protein_fasta = 'Bifidobacterium adolescentis_atcc_15703.fa'
output_model_sbml = 'bifidobacterium_adolescentis_atcc_15703_draft.sbml'

draftModel = reconstruct_draft_model(model_id, input_protein_fasta, output_model_sbml)

!ls -la

modelseedpy 0.4.0




total 2844
drwxr-xr-x  1 root root    4096 Mar 12 21:46  .
drwxr-xr-x  1 root root    4096 Mar 12 21:42  ..
-rw-r--r--  1 root root 1871606 Mar 12 21:46  bifidobacterium_adolescentis_atcc_15703_draft.sbml
-rw-r--r--  1 root root 1011415 Mar 12 21:46 'Bifidobacterium adolescentis_atcc_15703.fa'
drwxr-xr-x  4 root root    4096 Mar 11 13:29  .config
drwxr-xr-x  2 root root    4096 Mar 12 21:44  licenses
drwxr-xr-x  7 root root    4096 Mar 12 21:46  metabolic_toy_model
drwxr-xr-x 10 root root    4096 Mar 12 21:45  ModelSEEDpy
drwxr-xr-x  1 root root    4096 Mar 11 13:29  sample_data


In [6]:
draftModel.optimize()

Unnamed: 0,fluxes,reduced_costs
rxn02201_c0,0.0,0.0
rxn00836_c0,0.0,0.0
rxn00364_c0,0.0,0.0
rxn03408_c0,0.0,-0.0
rxn05250_c0,0.0,0.0
...,...,...
EX_cpd03453_e0,0.0,-0.0
EX_cpd03726_e0,0.0,-0.0
bio1,0.0,0.0
SK_cpd11416_c0,0.0,0.0


#### **Gapfill the model**

In [21]:
from dnngior.gapfill_class import *
from dnngior.NN_Predictor import NN

gapfilledModel = Gapfill(draftModel = "bifidobacterium_adolescentis_atcc_15703_draft.sbml")

#save model

cobra.io.write_sbml_model(cobra_model = gapfilledModel.gapfilledModel, filename = "bifidobacterium_adolescentis_atcc_15703_gapfilled.sbml")

Gap-filling database =  ModelSEED


ERROR:cobra.io.sbml:No objective coefficients in model. Unclear what should be optimized


#reactions not found in NN-keys:  65 / 726
Flux through biomass reaction is 1.00000000
Flux through biomass reaction is 1.00000000


 condition is currently:  79156 




 condition is currently:  39578 




 condition is currently:  19789 




 condition is currently:  9894 




 condition is currently:  4947 




 condition is currently:  2473 




 condition is currently:  1236 




 condition is currently:  618 




 condition is currently:  309 




 condition is currently:  154 




 condition is currently:  77 




 condition is currently:  38 




 condition is currently:  19 




 condition is currently:  9 




 condition is currently:  4 




 condition is currently:  2 




 condition is currently:  1 


Objective value is 0.061653.
Read LP format model from file /tmp/tmp9o64j9l2.lp
Reading time = 0.01 seconds
: 785 rows, 1614 columns, 7118 nonzeros
NN gapfilling added 81 new reactions
The NN gapfilled model, comes with 807 reactions and 785 metabolites


#### **Reconstruct models for the other species**

[data set](https://entrepot.recherche.data.gouv.fr/dataset.xhtml?persistentId=doi:10.57745/O4OJA2)

In [11]:
import pandas as pd

genomes = pd.read_csv("metabolic_toy_model/files/metasimfood_genomic_data.csv", sep = ';')
genomes

Unnamed: 0,SPECIES_AND_STRAIN_NAME,TAXONOMY,ASSEMBLY_ACC_NUMBER,BIOPROJECT_ID,SAMPLE_ID,CHROMOSOME_ACC_NUMBER,GENOME_COVERAGE,GC_PERCENT,NB_REPLICONS,SIZE_CHROMOSOME_bp,SIZE_PLASMID_01_bp,SIZE_PLASMID_02_bp,SIZE_PLASMID_03_bp,SIZE_PLASMID_04_bp,SIZE_PLASMID_05_bp,SIZE_PLASMID_06_bp,SIZE_PLASMID_07_bp,SIZE_PLASMID_08_bp,SIZE_PLASMID_09_bp,SIZE_PLASMID_10_bp
0,Bacillus_pumilus_CIRM-BIA2784,k__Bacteria|p__Bacillota|c__Bacilli|o__Bacilla...,GCA_964063375,PRJEB74198,ERS18598044,OZ061327-OZ061328,330x,41.5,2,3753338,41278 (cluster_002),,,,,,,,,
1,Enterococcus_gilvus_CIRM-BIA2700,k__Bacteria|p__Bacillota|c__Bacilli|o__Lactoba...,GCA_964062525,PRJEB74198,ERS18598045,OZ061221-OZ061222,260x,42.0,2,2825867,889036 (cluster_002),,,,,,,,,
2,Lacticaseibacillus_paracasei_CIRM-BIA2373,k__Bacteria|p__Bacillota|c__Bacilli|o__Lactoba...,GCA_964065205,PRJEB74198,ERS18598046,OZ061570-OZ061573,440x,46.5,4,3088163,51021 (cluster_002),28481 (cluster_003),13402 (cluster_006),,,,,,,
3,Lactiplantibacillus_pentosus_CNRZ1547,k__Bacteria|p__Bacillota|c__Bacilli|o__Lactoba...,GCA_964063425,PRJEB74198,ERS18598050,OZ061355-OZ061363,215x,46.0,9,3624812,67590 (cluster_002),54536 (cluster_003),58048 (cluster_005),33433 (cluster_006),33043 (cluster_009),22967 (cluster_007),14246 (cluster_008),3310 (cluster_017),,
4,Lactiplantibacillus_plantarum_ATCC14431,k__Bacteria|p__Bacillota|c__Bacilli|o__Lactoba...,GCA_964084915,PRJEB74198,ERS18598049,OZ064365-OZ064366,419x,44.5,2,3207663,9254 (cluster_004),,,,,,,,,
5,Lactiplantibacillus_plantarum_CIRM-BIA2443,k__Bacteria|p__Bacillota|c__Bacilli|o__Lactoba...,GCA_964065245,PRJEB74198,ERS18598047,OZ061574-OZ061579,288x,44.5,6,3163810,40752 (cluster_002),33642 (cluster_003),24389 (cluster_004),10848 (cluster_005),8796 (cluster_006),,,,,
6,Lactiplantibacillus_plantarum_CIRM-BIA2453,k__Bacteria|p__Bacillota|c__Bacilli|o__Lactoba...,GCA_964065445,PRJEB74198,ERS18598048,OZ061611-OZ061617,770x,44.5,7,3259765,52958 (cluster_007),39456 (cluster_009),36215 (cluster_013),26258 (cluster_017),16539 (cluster_024),8581 (cluster_027),,,,
7,Latilactobacillus_curvatus_CIRM-BIA2781,k__Bacteria|p__Bacillota|c__Bacilli|o__Lactoba...,GCA_964063355,PRJEB74198,ERS18598051,OZ061323-OZ061326,165x,42.0,4,2062891,43624 (cluster_002),22323 (cluster_003),12652 (cluster_004),,,,,,,
8,Latilactobacillus_curvatus_J116,k__Bacteria|p__Bacillota|c__Bacilli|o__Lactoba...,GCA_964063385,PRJEB74198,ERS18598052,OZ061332-OZ061335,1000x,42.0,4,1928854,58746 (cluster_002),12649 (cluster_003),10908 (cluster_004),,,,,,,
9,Latilactobacillus_sakei_CIRM-BIA1912,k__Bacteria|p__Bacillota|c__Bacilli|o__Lactoba...,GCA_964063755,PRJEB74198,ERS18598053,OZ061411-OZ061415,1200x,41.5,5,1907497,31814 (cluster_002),11126 (cluster_003),12662 (cluster_004),12470 (cluster_005),,,,,,


In [19]:
import gzip
import shutil
import urllib

def download_fasta(url, gz_id, output):
  urllib.request.urlretrieve(url, gz_id)
  with gzip.open(gz_id, 'rb') as f_in:
      with open(output, 'wb') as f_out:
          shutil.copyfileobj(f_in, f_out)
  os.remove(gz_id)

!mkdir food_bac_genomes

download_fasta("https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/964/063/375/GCF_964063375.1_Bacillus_pumilus_CIRM-BIA2784/GCF_964063375.1_Bacillus_pumilus_CIRM-BIA2784_protein.faa.gz",
               "GCF_964063375.1_Bacillus_pumilus_CIRM-BIA2784_protein.faa.gz",
               "food_bac_genomes/GCF_964063375.1_Bacillus_pumilus_CIRM-BIA2784_protein.faa")

download_fasta("https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/964/062/525/GCF_964062525.1_Enterococcus_gilvus_CIRM-BIA2700/GCF_964062525.1_Enterococcus_gilvus_CIRM-BIA2700_protein.faa.gz",
               "GCF_964062525.1_Enterococcus_gilvus_CIRM-BIA2700_protein.faa.gz",
               "food_bac_genomes/GCF_964062525.1_Enterococcus_gilvus_CIRM-BIA2700_protein.faa")


download_fasta("https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/964/065/205/GCF_964065205.1_Lacticaseibacillus_paracasei_CIRM-BIA2373/GCF_964065205.1_Lacticaseibacillus_paracasei_CIRM-BIA2373_protein.faa.gz",
               "GCF_964065205.1_Lacticaseibacillus_paracasei_CIRM-BIA2373_protein.faa.gz",
               "food_bac_genomes/GCF_964065205.1_Lacticaseibacillus_paracasei_CIRM-BIA2373_protein.faa")




!ls food_bac_genomes -la

total 3748
drwxr-xr-x 2 root root    4096 Mar 12 22:09 .
drwxr-xr-x 1 root root    4096 Mar 12 22:09 ..
-rw-r--r-- 1 root root 1342664 Mar 12 22:09 GCF_964062525.1_Enterococcus_gilvus_CIRM-BIA2700_protein.faa
-rw-r--r-- 1 root root 1385571 Mar 12 22:09 GCF_964063375.1_Bacillus_pumilus_CIRM-BIA2784_protein.faa
-rw-r--r-- 1 root root 1095910 Mar 12 22:09 GCF_964065205.1_Lacticaseibacillus_paracasei_CIRM-BIA2373_protein.faa


#### **Reconstruct Draft and Gapfilled Models**

In [22]:
!mkdir food_bac_draft_models
!mkdir food_bac_gapfilled_models


genomes = os.listdir("food_bac_genomes")

for genome in genomes:
  model_id = genome.replace("_protein.faa", ".sbml")
  input_protein_fasta = "food_bac_genomes/" + genome
  output_model_sbml = "food_bac_draft_models/" + model_id
  draftModel = reconstruct_draft_model(model_id, input_protein_fasta, output_model_sbml)
  gapfilledModel = Gapfill(draftModel = output_model_sbml)
  cobra.io.write_sbml_model(gapfilledModel.gapfilledModel, filename = "food_bac_gapfilled_models/" + model_id)

ERROR:cobra.io.sbml:'' is not a valid SBML 'SId'.
ERROR:cobra.io.sbml:'' is not a valid SBML 'SId'.


Gap-filling database =  ModelSEED


ERROR:cobra.io.sbml:No objective coefficients in model. Unclear what should be optimized


#reactions not found in NN-keys:  99 / 1007
Flux through biomass reaction is 1.00000000
Flux through biomass reaction is 1.00000000


 condition is currently:  78616 




 condition is currently:  39308 




 condition is currently:  19654 




 condition is currently:  9827 




 condition is currently:  4913 




 condition is currently:  2456 




 condition is currently:  1228 




 condition is currently:  614 




 condition is currently:  307 




 condition is currently:  153 




 condition is currently:  76 




 condition is currently:  38 




 condition is currently:  19 




 condition is currently:  9 




 condition is currently:  4 




 condition is currently:  2 




 condition is currently:  1 


Objective value is 0.063950.
Read LP format model from file /tmp/tmpkrg3atj8.lp
Reading time = 0.02 seconds
: 1005 rows, 2144 columns, 9264 nonzeros
NN gapfilling added 65 new reactions
The NN gapfilled model, comes with 1072 reactions and 1005 metabolites


ERROR:cobra.io.sbml:'' is not a valid SBML 'SId'.
ERROR:cobra.io.sbml:'' is not a valid SBML 'SId'.


Gap-filling database =  ModelSEED


ERROR:cobra.io.sbml:No objective coefficients in model. Unclear what should be optimized


#reactions not found in NN-keys:  83 / 873
Flux through biomass reaction is 1.00000000
Flux through biomass reaction is 1.00000000


 condition is currently:  78866 




 condition is currently:  39433 




 condition is currently:  19716 




 condition is currently:  9858 




 condition is currently:  4929 




 condition is currently:  2464 




 condition is currently:  1232 




 condition is currently:  616 




 condition is currently:  308 




 condition is currently:  154 




 condition is currently:  77 




 condition is currently:  38 




 condition is currently:  19 




 condition is currently:  9 




 condition is currently:  4 




 condition is currently:  2 




 condition is currently:  1 


Objective value is 0.050221.
Read LP format model from file /tmp/tmptolc_7ji.lp
Reading time = 0.01 seconds
: 904 rows, 1880 columns, 8178 nonzeros
NN gapfilling added 67 new reactions
The NN gapfilled model, comes with 940 reactions and 904 metabolites


ERROR:cobra.io.sbml:'' is not a valid SBML 'SId'.
ERROR:cobra.io.sbml:'' is not a valid SBML 'SId'.


Gap-filling database =  ModelSEED


ERROR:cobra.io.sbml:No objective coefficients in model. Unclear what should be optimized


#reactions not found in NN-keys:  98 / 1219
Flux through biomass reaction is 1.00000000
Flux through biomass reaction is 1.00000000


 condition is currently:  78277 




 condition is currently:  39138 




 condition is currently:  19569 




 condition is currently:  9784 




 condition is currently:  4892 




 condition is currently:  2446 




 condition is currently:  1223 




 condition is currently:  611 




 condition is currently:  305 




 condition is currently:  152 




 condition is currently:  76 




 condition is currently:  38 




 condition is currently:  19 




 condition is currently:  9 




 condition is currently:  4 




 condition is currently:  2 




 condition is currently:  1 


Objective value is 0.071163.
Read LP format model from file /tmp/tmpoteqf8g8.lp
Reading time = 0.01 seconds
: 1168 rows, 2536 columns, 11322 nonzeros
NN gapfilling added 49 new reactions
The NN gapfilled model, comes with 1268 reactions and 1168 metabolites


In [None]:
ls -la