<a href="https://colab.research.google.com/github/hariszaf/metabolic_toy_model/blob/main/Antony2025/reconstructingDraftGSMMs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Reconstructing Draft Genome-Scale Metabolic Models**

### **Basic setup**

Setup **Gurobi** and **COBRApy**. See the [setting up your environment](https://colab.research.google.com/github/hariszaf/metabolic_toy_model/blob/main/Antony2025/preparingYourEnvironment.ipynb).

In [None]:
# @title
import os
def create_gurobi_license():
    license_content = (
        "# Gurobi WLS license file\n"
        "# Your credentials are private and should not be shared or copied to public repositories.\n"
        "# Visit https://license.gurobi.com/manager/doc/overview for more information.\n"
        "WLSACCESSID=1fedf73b-9471-4da8-bdc7-2aaacf2e30f3\n"
        "WLSSECRET=3bc7d209-a4ec-4195-98be-4b254f181512\n"
        "LICENSEID=940603"
    )
    with open("/content/licenses/gurobi.lic", "w") as f:
        f.write(license_content)
    print("License file created at /content/licenses/gurobi.lic")



# Create directory for the license
os.makedirs("/content/licenses", exist_ok=True)

# Generate the license file
create_gurobi_license()

#add to path
os.environ['GRB_LICENSE_FILE'] = '/content/licenses/gurobi.lic'

License file created at /content/licenses/gurobi.lic


In [None]:
# @title
!pip install gurobipy
!pip install cobra

Collecting gurobipy
  Downloading gurobipy-12.0.1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (16 kB)
Downloading gurobipy-12.0.1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (14.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m14.4/14.4 MB[0m [31m49.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: gurobipy
Successfully installed gurobipy-12.0.1
Collecting cobra
  Downloading cobra-0.29.1-py2.py3-none-any.whl.metadata (9.3 kB)
Collecting appdirs~=1.4 (from cobra)
  Downloading appdirs-1.4.4-py2.py3-none-any.whl.metadata (9.0 kB)
Collecting depinfo~=2.2 (from cobra)
  Downloading depinfo-2.2.0-py3-none-any.whl.metadata (3.8 kB)
Collecting diskcache~=5.0 (from cobra)
  Downloading diskcache-5.6.3-py3-none-any.whl.metadata (20 kB)
Collecting optlang~=1.8 (from cobra)
  Downloading optlang-1.8.3-py2.py3-none-any.whl.metadata (8.2 kB)
Collecting python-libsbml~=5.19 (from cobra)
  Downloading python_libsbml-5.20.

In [None]:
# @title
import gurobipy
from gurobipy import Model
model = Model("test")
print("Gurobi is working!", "\U0001F600")

#install COBRApy
import cobra
from cobra.io import load_model
model = load_model("textbook")
solution = model.optimize()
print(f"flux balance analysis solution is {solution.objective_value}")
print("COBRApy is working", "\U0001F600")

Set parameter WLSAccessID
Set parameter WLSSecret
Set parameter LicenseID to value 940603
Academic license 940603 - for non-commercial use only - registered to da___@gmail.com
Gurobi is working! 😀
flux balance analysis solution is 0.8739215069684305
COBRApy is working 😀


### **Reconstructing Draft Genome-Scale Metabolic Models**

Draft models are incomplete models containing only genome-based evidence.They are not capable of producing biomass.

To reconstruct draft models, we will use the [ModelSEED](https://academic.oup.com/nar/article/49/D1/D575/5912569?login=true) pipeline.

[see the web interface](https://modelseed.org/)


### **Installing ModelSEED**

If working on your own machine, remember

1) Activate our conda environment:

```bash
conda activate gsmmWorkshop
```

### **Clone the ModelSEEDpy repository**

In [None]:
!git clone https://github.com/ModelSEED/ModelSEEDpy

Cloning into 'ModelSEEDpy'...
remote: Enumerating objects: 4227, done.[K
remote: Counting objects: 100% (1354/1354), done.[K
remote: Compressing objects: 100% (263/263), done.[K
remote: Total 4227 (delta 1188), reused 1100 (delta 1091), pack-reused 2873 (from 2)[K
Receiving objects: 100% (4227/4227), 8.44 MiB | 13.25 MiB/s, done.
Resolving deltas: 100% (2969/2969), done.


### **Install it**

In [None]:
!pip install ModelSEEDpy/.

Processing ./ModelSEEDpy
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting scikit-learn==1.2.0 (from ModelSEEDpy==0.4.0)
  Downloading scikit_learn-1.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
Collecting chemicals>=1.0.13 (from ModelSEEDpy==0.4.0)
  Downloading chemicals-1.3.2-py3-none-any.whl.metadata (12 kB)
Collecting chemw>=0.3.2 (from ModelSEEDpy==0.4.0)
  Downloading ChemW-0.3.5.tar.gz (467 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m467.6/467.6 kB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting fluids>=1.1.0 (from chemicals>=1.0.13->ModelSEEDpy==0.4.0)
  Downloading fluids-1.1.0-py3-none-any.whl.metadata (7.2 kB)
Collecting pubchempy (from chemw>=0.3.2->ModelSEEDpy==0.4.0)
  Downloading PubChemPy-1.0.4.tar.gz (29 kB)
  Prepar

### **Working with genomes**

To reconstruct models, we first need genome sequences. We can use the genomes from EMBL, [ENSEMBL Bacteria](https://bacteria.ensembl.org/index.html), or another public genomic database.

[Here](https://ftp.ensemblgenomes.ebi.ac.uk/pub/bacteria/current/species_EnsemblBacteria.txt) is a list of all the genomes available.

With this list we can perform queries, such retrieve all the genomes from a given taxa.

We also need a package to manipulate NCBI taxonomies: [taxoniq](https://github.com/taxoniq/taxoniq).


### **Parsing taxonomies**

In [None]:
!pip install taxoniq



### **Creating a genomes list**

Before reconstructing models, let's perform the following tasks:


1. Make a python dictionary mapping the ENSEMBL genomes to their taxonomies and their fasta file containing genome-enoded proteins.

[genome] = [{taxonomic ranks}, {webpage containing their peptide fasta}]


2. Use our list to find a genome that interests us. For example, we will use a *Bifidobacterium adolescentis* genome;


3. Dowload all genomes belonging to a specific genus. For example, we will use all *Shewanella* genomes;


4. Get one representative genome per phylum.

#### **ENSEMBL Genome Dictionary:**

dowload the genome file

In [None]:
import urllib.request
urllib.request.urlretrieve("https://ftp.ensemblgenomes.ebi.ac.uk/pub/bacteria/current/species_EnsemblBacteria.txt", "species_EnsemblBacteria.txt")

!ls -la

total 6340
drwxr-xr-x  1 root root    4096 Mar 12 09:50 .
drwxr-xr-x  1 root root    4096 Mar 12 09:42 ..
drwxr-xr-x  4 root root    4096 Mar 10 13:30 .config
drwxr-xr-x  2 root root    4096 Mar 12 09:43 licenses
drwxr-xr-x 10 root root    4096 Mar 12 09:45 ModelSEEDpy
drwxr-xr-x  1 root root    4096 Mar 10 13:30 sample_data
-rw-r--r--  1 root root 6465180 Mar 12 09:50 species_EnsemblBacteria.txt


make a function to generate the dowload url

In [None]:
import taxoniq
import os



def getProteinFast(l):
    p1 = "https://ftp.ensemblgenomes.ebi.ac.uk/pub/bacteria/current/fasta/"
    p2 = "bacteria" + "_" + l[13].split("_")[1] + "_" + "collection" + "/"
    p3 = l[1] + "/pep/"
    st = l[4]
    while " " in st:
        st = st.replace(" ", "_")
    p4 = l[1][0].upper() + l[1][1:] + "." + st + ".pep.all.fa.gz"

    return p1 + p2 + p3 + p4

genomes = {}
with open('species_EnsemblBacteria.txt') as f:
    f.readline()
    for line in f:
        a = line.strip().split('\t')
        try:
            taxonomy = taxoniq.Taxon(int(a[3]))
            genomes[a[3] + "_" + a[1]] = [{rank.rank.name: rank.scientific_name for rank in taxonomy.ranked_lineage}, getProteinFast(a)]

        except KeyError:
            pass

genomes

{'123820__actinobacillus_rossii_gca_900444965': [{'species': '[Actinobacillus] rossii',
   'family': 'Pasteurellaceae',
   'order': 'Pasteurellales',
   'class': 'Gammaproteobacteria',
   'phylum': 'Pseudomonadota',
   'superkingdom': 'Bacteria'},
  'https://ftp.ensemblgenomes.ebi.ac.uk/pub/bacteria/current/fasta/bacteria_85_collection/_actinobacillus_rossii_gca_900444965/pep/_actinobacillus_rossii_gca_900444965.46338_E01.pep.all.fa.gz'],
 '1935204__arcobacter_porcinus_gca_001695265': [{'species': 'Arcobacter porcinus',
   'genus': 'Arcobacter',
   'family': 'Arcobacteraceae',
   'order': 'Campylobacterales',
   'class': 'Epsilonproteobacteria',
   'phylum': 'Campylobacterota',
   'superkingdom': 'Bacteria'},
  'https://ftp.ensemblgenomes.ebi.ac.uk/pub/bacteria/current/fasta/bacteria_15_collection/_arcobacter_porcinus_gca_001695265/pep/_arcobacter_porcinus_gca_001695265.ASM169526v1.pep.all.fa.gz'],
 '1394__bacillus_caldolyticus_gca_003595605': [{'species': '[Bacillus] caldolyticus',
  

#### **Retrieve a genome**

In [None]:
for genome in genomes:
  if "Bifidobacterium adolescentis" in genomes[genome][0].values():
    print(genome, f" url: {genomes[genome][1]}")

367928_bifidobacterium_adolescentis_atcc_15703_gca_000010425  url: https://ftp.ensemblgenomes.ebi.ac.uk/pub/bacteria/current/fasta/bacteria_100_collection/bifidobacterium_adolescentis_atcc_15703_gca_000010425/pep/Bifidobacterium_adolescentis_atcc_15703_gca_000010425.ASM1042v1.pep.all.fa.gz
1263057_bifidobacterium_adolescentis_cag_119_gca_000435235  url: https://ftp.ensemblgenomes.ebi.ac.uk/pub/bacteria/current/fasta/bacteria_123_collection/bifidobacterium_adolescentis_cag_119_gca_000435235/pep/Bifidobacterium_adolescentis_cag_119_gca_000435235.MGS119.pep.all.fa.gz
1680_bifidobacterium_adolescentis_gca_000737885  url: https://ftp.ensemblgenomes.ebi.ac.uk/pub/bacteria/current/fasta/bacteria_122_collection/bifidobacterium_adolescentis_gca_000737885/pep/Bifidobacterium_adolescentis_gca_000737885.ASM73788v1.pep.all.fa.gz
1680_bifidobacterium_adolescentis_gca_000817995  url: https://ftp.ensemblgenomes.ebi.ac.uk/pub/bacteria/current/fasta/bacteria_126_collection/bifidobacterium_adolescentis_g

let's pick the first one in the list.

In [None]:
import gzip
import shutil

url = "https://ftp.ensemblgenomes.ebi.ac.uk/pub/bacteria/current/fasta/bacteria_100_collection/bifidobacterium_adolescentis_atcc_15703_gca_000010425/pep/Bifidobacterium_adolescentis_atcc_15703_gca_000010425.ASM1042v1.pep.all.fa.gz"
gz_id = "Bifidobacterium adolescentis_atcc_15703.fa.gz"
fast_id = "Bifidobacterium adolescentis_atcc_15703.fa"

urllib.request.urlretrieve(url, gz_id)
with gzip.open(gz_id, 'rb') as f_in:
    with open(fast_id, 'wb') as f_out:
        shutil.copyfileobj(f_in, f_out)

os.remove(gz_id)
!ls -la

total 7328
drwxr-xr-x  1 root root    4096 Mar 12 09:51  .
drwxr-xr-x  1 root root    4096 Mar 12 09:42  ..
-rw-r--r--  1 root root 1011415 Mar 12 09:51 'Bifidobacterium adolescentis_atcc_15703.fa'
drwxr-xr-x  4 root root    4096 Mar 10 13:30  .config
drwxr-xr-x  2 root root    4096 Mar 12 09:43  licenses
drwxr-xr-x 10 root root    4096 Mar 12 09:45  ModelSEEDpy
drwxr-xr-x  1 root root    4096 Mar 10 13:30  sample_data
-rw-r--r--  1 root root 6465180 Mar 12 09:50  species_EnsemblBacteria.txt


#### **Download all Shewanella genomes**

In [None]:
!mkdir shewanella_genomes
#get all Shewanella genomes
root = 'shewanella_genomes'
for genome in genomes:
    if "shewanella" in genome:
        url = genomes[genome][1]
        download_path = os.path.join(root, genome + ".fa.gz")
        extracted_path = os.path.join(root, genome + ".fa")

        urllib.request.urlretrieve(url, download_path)

        with gzip.open(download_path, 'rb') as f_in:
            with open(extracted_path, 'wb') as f_out:
                shutil.copyfileobj(f_in, f_out)
        print(f"Extracted file saved as {extracted_path}")
        os.remove(download_path)

mkdir: cannot create directory ‘shewanella_genomes’: File exists
Extracted file saved as shewanella_genomes/1197174_alishewanella_aestuarii_b11_gca_000280055.fa
Extracted file saved as shewanella_genomes/1195246_alishewanella_agri_bl06_gca_000272005.fa
Extracted file saved as shewanella_genomes/1129374_alishewanella_jeotgali_kctc_22429_gca_000245735.fa
Extracted file saved as shewanella_genomes/1856684_alishewanella_sp_hh_zs_gca_001704375.fa
Extracted file saved as shewanella_genomes/1651088_alishewanella_sp_wh16_1_gca_001441695.fa
Extracted file saved as shewanella_genomes/2338552_parashewanella_curva_gca_003675895.fa
Extracted file saved as shewanella_genomes/342950_parashewanella_spongiae_gca_003676335.fa
Extracted file saved as shewanella_genomes/38313_shewanella_algae_gca_001870495.fa
Extracted file saved as shewanella_genomes/38313_shewanella_algae_gca_003721455.fa
Extracted file saved as shewanella_genomes/326297_shewanella_amazonensis_sb2b_gca_000015245.fa
Extracted file saved 

In [None]:
!ls shewanella_genomes -la

total 157052
drwxr-xr-x 2 root root    4096 Mar 12 09:54 .
drwxr-xr-x 1 root root    4096 Mar 12 09:52 ..
-rw-r--r-- 1 root root 1997075 Mar 12 09:53 1129374_alishewanella_jeotgali_kctc_22429_gca_000245735.fa
-rw-r--r-- 1 root root 1833504 Mar 12 09:53 1195246_alishewanella_agri_bl06_gca_000272005.fa
-rw-r--r-- 1 root root 1887283 Mar 12 09:53 1197174_alishewanella_aestuarii_b11_gca_000280055.fa
-rw-r--r-- 1 root root 2390792 Mar 12 09:54 1353536_shewanella_decolorationis_s12_gca_000485795.fa
-rw-r--r-- 1 root root 2380861 Mar 12 09:54 150120_shewanella_livingstonensis_gca_003855395.fa
-rw-r--r-- 1 root root 2146702 Mar 12 09:54 1515746_shewanella_mangrovi_gca_000753795.fa
-rw-r--r-- 1 root root 2214196 Mar 12 09:54 1521167_shewanella_sp_cp20_gca_000832025.fa
-rw-r--r-- 1 root root 1818930 Mar 12 09:53 1651088_alishewanella_sp_wh16_1_gca_001441695.fa
-rw-r--r-- 1 root root 2439746 Mar 12 09:54 1723761_shewanella_sp_p1_14_1_gca_001401775.fa
-rw-r--r-- 1 root root 1850072 Mar 12 09:53 18

#### **Get one genome per Phylum**

In [None]:
!mkdir one_per_phylum

phyla = {}

for genome in genomes:
    if 'phylum' in genomes[genome][0]:#has an annotated phylum
        if genomes[genome][0]['phylum'] not in phyla:
            phyla[genomes[genome][0]['phylum']] = genomes[genome][1]


root = 'one_per_phylum'
for genome in phyla:
    url = phyla[genome]
    download_path = os.path.join(root, genome + ".fa.gz")
    extracted_path = os.path.join(root, genome + ".fa")

    urllib.request.urlretrieve(url, download_path)

    with gzip.open(download_path, 'rb') as f_in:
        with open(extracted_path, 'wb') as f_out:
            shutil.copyfileobj(f_in, f_out)
    print(f"Extracted file saved as {extracted_path}")
    os.remove(download_path)

Extracted file saved as one_per_phylum/Pseudomonadota.fa
Extracted file saved as one_per_phylum/Campylobacterota.fa
Extracted file saved as one_per_phylum/Bacillota.fa
Extracted file saved as one_per_phylum/Actinomycetota.fa
Extracted file saved as one_per_phylum/Bacteroidota.fa
Extracted file saved as one_per_phylum/Cyanobacteriota.fa
Extracted file saved as one_per_phylum/Abditibacteriota.fa
Extracted file saved as one_per_phylum/Synergistota.fa
Extracted file saved as one_per_phylum/Mycoplasmatota.fa
Extracted file saved as one_per_phylum/Thermoproteota.fa
Extracted file saved as one_per_phylum/Acidobacteriota.fa
Extracted file saved as one_per_phylum/Candidatus Thermoplasmatota.fa
Extracted file saved as one_per_phylum/Verrucomicrobiota.fa
Extracted file saved as one_per_phylum/Balneolota.fa
Extracted file saved as one_per_phylum/Spirochaetota.fa
Extracted file saved as one_per_phylum/Planctomycetota.fa
Extracted file saved as one_per_phylum/Chloroflexota.fa
Extracted file saved as

In [None]:
!ls one_per_phylum -la

total 149008
drwxr-xr-x 2 root root    4096 Mar 12 09:56  .
drwxr-xr-x 1 root root    4096 Mar 12 09:55  ..
-rw-r--r-- 1 root root 1806824 Mar 12 09:55  Abditibacteriota.fa
-rw-r--r-- 1 root root 2135288 Mar 12 09:55  Acidobacteriota.fa
-rw-r--r-- 1 root root 1737686 Mar 12 09:55  Actinomycetota.fa
-rw-r--r-- 1 root root  812151 Mar 12 09:55  Aquificota.fa
-rw-r--r-- 1 root root 2471041 Mar 12 09:55  Armatimonadota.fa
-rw-r--r-- 1 root root  815722 Mar 12 09:55  Atribacterota.fa
-rw-r--r-- 1 root root 1841704 Mar 12 09:55  Bacillota.fa
-rw-r--r-- 1 root root 2699614 Mar 12 09:55  Bacteroidota.fa
-rw-r--r-- 1 root root 2598729 Mar 12 09:55  Balneolota.fa
-rw-r--r-- 1 root root 1960020 Mar 12 09:55  Bdellovibrionota.fa
-rw-r--r-- 1 root root 2360752 Mar 12 09:55  Calditrichota.fa
-rw-r--r-- 1 root root 1112372 Mar 12 09:55  Campylobacterota.fa
-rw-r--r-- 1 root root  477198 Mar 12 09:55 'Candidatus Aenigmatarchaeota.fa'
-rw-r--r-- 1 root root  938091 Mar 12 09:55 'Candidatus Aerophobetes

### **Annotate Genome**

Before building a draft model, we need to annotate the genome using RAST.

Let's do this with our *Bifidobacterium adolescentis* genome.

In [None]:
import modelseedpy
from modelseedpy.core.msgenome import MSGenome

from modelseedpy.core.rast_client import RastClient
rast = RastClient()



genome_file = 'Bifidobacterium adolescentis_atcc_15703.fa'



genome = MSGenome.from_fasta(genome_file)

rast.annotate_genome(genome)

for i in genome.features:
    print(i.description)

modelseedpy 0.4.0
pep primary_assembly:ASM1042v1:Chromosome:1409893:1411368:-1 gene:ENSB:kHZFqF8PIjXT33T transcript:ENSB:kHZFqF8PIjXT33T gene_biotype:protein_coding transcript_biotype:protein_coding gene_symbol:glyQS description:Glycine--tRNA ligase
pep primary_assembly:ASM1042v1:Chromosome:1466099:1467130:-1 gene:ENSB:Tm6rwQNVVHU3SdI transcript:ENSB:Tm6rwQNVVHU3SdI gene_biotype:protein_coding transcript_biotype:protein_coding
pep primary_assembly:ASM1042v1:Chromosome:1446503:1447351:1 gene:ENSB:gM2kMa1Ch4Rit5g transcript:ENSB:gM2kMa1Ch4Rit5g gene_biotype:protein_coding transcript_biotype:protein_coding
pep primary_assembly:ASM1042v1:Chromosome:1644742:1645815:-1 gene:ENSB:eVxU_HxqZ0iTFAY transcript:ENSB:eVxU_HxqZ0iTFAY gene_biotype:protein_coding transcript_biotype:protein_coding
pep primary_assembly:ASM1042v1:Chromosome:1977861:1978715:-1 gene:ENSB:yRT9eVHV9QgPx55 transcript:ENSB:yRT9eVHV9QgPx55 gene_biotype:protein_coding transcript_biotype:protein_coding gene_symbol:melD_3 descript

#### **Reconstruct Draft Model**



To reconstruct the draft genome-scale metabolic model we use the `build_metabolic_model` function of `ModelSEEDpy`. The input to the function is a RAST annotated genome.

In [None]:
from modelseedpy import MSBuilder

model_id = 'Bifidobacterium adolescentis_atcc_15703'

base_model = MSBuilder.build_metabolic_model(model_id = model_id,
                                             genome   = genome,
                                             index    = "0",
                                             classic_biomass = True,
                                             gapfill_model   = False,
                                             gapfill_media   = None,
                                             annotate_with_rast = True,
                                             allow_all_non_grp_reactions = True
                                            )



We can see the model is a draft and does not produce biomass.

In [None]:
base_model.optimize()



Unnamed: 0,fluxes,reduced_costs
rxn02201_c0,0.0,0.0
rxn00836_c0,0.0,0.0
rxn00364_c0,0.0,0.0
rxn03408_c0,0.0,-0.0
rxn05250_c0,0.0,0.0
...,...,...
EX_cpd03453_e0,0.0,-0.0
EX_cpd03726_e0,0.0,-0.0
bio1,0.0,0.0
SK_cpd11416_c0,0.0,0.0


We save the draft model

In [None]:
model_name = "Bifidobacterium adolescentis_atcc_15703.sbml"
cobra.io.write_sbml_model(cobra_model = base_model, filename = model_name)

!ls -la

total 9164
drwxr-xr-x  1 root root    4096 Mar 12 09:58  .
drwxr-xr-x  1 root root    4096 Mar 12 09:42  ..
-rw-r--r--  1 root root 1011415 Mar 12 09:51 'Bifidobacterium adolescentis_atcc_15703.fa'
-rw-r--r--  1 root root 1871507 Mar 12 09:58 'Bifidobacterium adolescentis_atcc_15703.sbml'
drwxr-xr-x  4 root root    4096 Mar 10 13:30  .config
drwxr-xr-x  2 root root    4096 Mar 12 09:43  licenses
drwxr-xr-x 10 root root    4096 Mar 12 09:45  ModelSEEDpy
drwxr-xr-x  2 root root    4096 Mar 12 09:56  one_per_phylum
drwxr-xr-x  1 root root    4096 Mar 10 13:30  sample_data
drwxr-xr-x  2 root root    4096 Mar 12 09:54  shewanella_genomes
-rw-r--r--  1 root root 6465180 Mar 12 09:50  species_EnsemblBacteria.txt


### **Batch reconstruction**

Let's reconstruct one draft model per phylum. First we make a function to reconstruct dract models.

In [None]:
def reconstruct_draft_model(model_id, input_protein_fasta, output_model_sbml):
    genome = MSGenome.from_fasta(input_protein_fasta, split = ' ')
    rast.annotate_genome(genome)

    base_model = MSBuilder.build_metabolic_model(model_id = model_id,
                                             genome   = genome,
                                             index    = "0",
                                             classic_biomass = True,
                                             gapfill_model   = False,
                                             gapfill_media   = None,
                                             annotate_with_rast = True,
                                             allow_all_non_grp_reactions = True
                                            )

    cobra.io.write_sbml_model(cobra_model = base_model, filename = output_model_sbml)

    return base_model


Now lets run the function for all the genomes in the `one_per_phylum` folder and write the outputs to the folde `one_per_phylum_models`.

We are only going to reconstruct three models to avoid overloading the RAST server.

In [None]:
!mkdir one_per_phylum_models

rast = RastClient()

root = "one_per_phylum"
genomes = os.listdir(root)[0:3]


for name in genomes:
    if ".fa" in name:
        model_id = name.replace(".fa", "")
        model = reconstruct_draft_model(model_id, os.path.join('one_per_phylum', name), os.path.join('one_per_phylum_models', model_id + ".sbml"))
        print(f"Reconstructed {model_id}")



Reconstructed Ignavibacteriota
Reconstructed Candidatus Tectimicrobiota
Reconstructed Candidatus Eiseniibacteriota


In [None]:
!ls one_per_phylum_models -la

total 6648
drwxr-xr-x 2 root root    4096 Mar 12 10:03  .
drwxr-xr-x 1 root root    4096 Mar 12 10:03  ..
-rw-r--r-- 1 root root 2109735 Mar 12 10:03 'Candidatus Eiseniibacteriota.sbml'
-rw-r--r-- 1 root root 2720369 Mar 12 10:03 'Candidatus Tectimicrobiota.sbml'
-rw-r--r-- 1 root root 1959285 Mar 12 10:03  Ignavibacteriota.sbml


### **What if we don't have a protein fasta?**

We can use [`Pyrodigal`](https://joss.theoj.org/papers/10.21105/joss.04296) to predict open reading frames and make a protein multifasta for our DNA sequence.

In [None]:
!pip install pyrodigal
!pip install biopython

Collecting pyrodigal
  Downloading pyrodigal-3.6.3.post1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (56 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.4/56.4 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting archspec~=0.2.0 (from pyrodigal)
  Downloading archspec-0.2.5-py3-none-any.whl.metadata (4.4 kB)
Downloading pyrodigal-3.6.3.post1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.0/3.0 MB[0m [31m29.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading archspec-0.2.5-py3-none-any.whl (76 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.2/76.2 kB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: archspec, pyrodigal
Successfully installed archspec-0.2.5 pyrodigal-3.6.3.post1
Collecting biopython
  Downloading biopython-1.85-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (13

  for line in open(toplevel):


In [None]:
url = 'https://www.ebi.ac.uk/ena/browser/api/fasta/OZ061323?download=true'

download_path = 'OZ061323.fa'

urllib.request.urlretrieve(url, download_path)

!ls -la

total 11220
drwxr-xr-x  1 root root    4096 Mar 12 10:04  .
drwxr-xr-x  1 root root    4096 Mar 12 09:42  ..
-rw-r--r--  1 root root 1011415 Mar 12 09:51 'Bifidobacterium adolescentis_atcc_15703.fa'
-rw-r--r--  1 root root 1871507 Mar 12 09:58 'Bifidobacterium adolescentis_atcc_15703.sbml'
drwxr-xr-x  4 root root    4096 Mar 10 13:30  .config
drwxr-xr-x  2 root root    4096 Mar 12 09:43  licenses
drwxr-xr-x 10 root root    4096 Mar 12 09:45  ModelSEEDpy
drwxr-xr-x  2 root root    4096 Mar 12 09:56  one_per_phylum
drwxr-xr-x  2 root root    4096 Mar 12 10:03  one_per_phylum_models
-rw-r--r--  1 root root 2097404 Mar 12 10:04  OZ061323.fa
drwxr-xr-x  1 root root    4096 Mar 10 13:30  sample_data
drwxr-xr-x  2 root root    4096 Mar 12 09:54  shewanella_genomes
-rw-r--r--  1 root root 6465180 Mar 12 09:50  species_EnsemblBacteria.txt


In [None]:
from Bio import SeqIO
import pyrodigal

record = SeqIO.read("OZ061323.fa", "fasta")
dna_seq = str(record.seq)  # Convert the sequence to a plain string

print(f"sequence legth: {len(dna_seq)}")

gene_finder = pyrodigal.GeneFinder()

gene_finder.train(dna_seq)

genes = gene_finder.find_genes(dna_seq)

with open("OZ061323.pep.faa", "w") as f:
    genes.write_translations(f, sequence_id="seqXYZ")

sequence legth: 2062891


In [None]:
!head OZ061323.pep.faa

>seqXYZ_1 # 1 # 1347 # 1 # ID=1_1;partial=10;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.422
MPDLQELWNYLREEFQNDLTPVGFNAWIKTAKPLAFRANEILIEVPSPLHKEYWESNLAT
KVVEGAYEFAEIELTPIFLLPTEAEQLQAEKPAEERSLTKAETPTFLRETHLNSKYTFDT
FVTGKGNQMAHAAALVVSEEPGVLYNPLFLYGGVGLGKTHLMQAIGHQLLLSKPDTNVKY
VTSEAFANDFINSIQTKNQEKFRQEYRNVDLLLVDDIQFFADKEGTQEEFFHTFNDLYND
KKQIVLTSDRLPNEIPKLQERLVSRFKWGLSVDITPPDLETRIAILRNKADTERLEIPED
TLSYIAGQIDSNVRELEGSLVRVQAYATMQNAEITTSLAADALKGLKLNGKSSQLSIAKI
QSVVAKYYSLSITDLKGRKRVKEIVLPRQIAMYLAREMTDSSLPKIGQEFGGKDHTTVMH
AHERISQALTSDQNLKDAILDLKNTLKG*
>seqXYZ_2 # 1530 # 2669 # 1 # ID=1_2;partial=00;start_type=ATG;rbs_motif=GGAGG;rbs_spacer=5-10bp;gc_cont=0.396


In [None]:
model_id = 'OZ061323'

model = reconstruct_draft_model(model_id, 'OZ061323.pep.faa', 'OZ061323.sbml')

In [None]:
ls -la

total 13580
drwxr-xr-x  1 root root    4096 Mar 12 10:04  [0m[01;34m.[0m/
drwxr-xr-x  1 root root    4096 Mar 12 09:42  [01;34m..[0m/
-rw-r--r--  1 root root 1011415 Mar 12 09:51 'Bifidobacterium adolescentis_atcc_15703.fa'
-rw-r--r--  1 root root 1871507 Mar 12 09:58 'Bifidobacterium adolescentis_atcc_15703.sbml'
drwxr-xr-x  4 root root    4096 Mar 10 13:30  [01;34m.config[0m/
drwxr-xr-x  2 root root    4096 Mar 12 09:43  [01;34mlicenses[0m/
drwxr-xr-x 10 root root    4096 Mar 12 09:45  [01;34mModelSEEDpy[0m/
drwxr-xr-x  2 root root    4096 Mar 12 09:56  [01;34mone_per_phylum[0m/
drwxr-xr-x  2 root root    4096 Mar 12 10:03  [01;34mone_per_phylum_models[0m/
-rw-r--r--  1 root root 2097404 Mar 12 10:04  OZ061323.fa
-rw-r--r--  1 root root  868702 Mar 12 10:04  OZ061323.pep.faa
-rw-r--r--  1 root root 1540747 Mar 12 10:04  OZ061323.sbml
drwxr-xr-x  1 root root    4096 Mar 10 13:30  [01;34msample_data[0m/
drwxr-xr-x  2 root root    4096 Mar 12 09:54  [01;34mshewanella_g

In [None]:
model.optimize()

Unnamed: 0,fluxes,reduced_costs
rxn00351_c0,0.0,0.0
rxn00836_c0,0.0,0.0
rxn10298_c0,0.0,-0.0
rxn00364_c0,0.0,-0.0
rxn03408_c0,0.0,-0.0
...,...,...
EX_cpd00011_e0,0.0,-0.0
EX_cpd03453_e0,0.0,-0.0
EX_cpd03726_e0,0.0,-0.0
bio1,0.0,0.0


#### **Bonus quest: Homework**

**Build a pan-genome model**

1) Build a draft model for all the *Shewanella* genomes that we dowloaded;

2) Make a new model by joining all the reactions that occur at least once in a *Shewanella* genome.