# Gap-fill a model in PyFBA

by Daniel Cuevas

## Introduction

In this notebook, we will present the steps to generate a genome-scale metabolic model from *RAST* annotations, gap-fill the model on rich LB type media, and save the model to hard disk.

---
The required files and information for this notebook:
* List of functional roles from *RAST* (normally labeled 'assigned_functions' from the **Genome Directory** download).
* Organism name
* Organism ID
* Media file
* Close genomes functional roles file
* Directory on hard disk to store model
---
If an assigned_functions file is unavailable from RAST, a .gff or .gff3 file can be converted into one using the extract_functions.py script in PyFBA/example_code/ with the following syntax.

`python3 extract_functions.py -i Your/File.gff3 -o Your/Output.assigned_functions`

In [1]:
import sys
import os
import PyFBA

## Generate model
The first step shows how to build the model from *RAST* functional roles.

In [2]:
model_functions_file = "Citrobacter/ungapfilled_model/citrobacter.assigned_functions"
close_genomes_functions_file = "Citrobacter/ungapfilled_model/closest.genomes.roles"
org_name = "Citrobacter_sedlakii"
org_id = "Citrobacter_sedlakii"

In [3]:
model = PyFBA.model.roles_to_model(model_functions_file, org_id, org_name)

The model has been generated and is now ready to use for flux-balance analysis simulations. Running flux-balance analysis will show the model does not contain all required metabolism to grow in the LB media.

Here are the LB media contents. For PyFBA media files are stored in directory indicated by environmental variable 'PYFBA_MEDIA_DIR'. *This step is only to show file contents but is not required for gap-filling.*

In [4]:
lb_media_file = os.path.join(os.environ["PYFBA_MEDIA_DIR"], "ArgonneLB.txt")
with open(lb_media_file) as f:
    for l in f:
        print(l, end="")

Compound	Name	Formula	Charge
cpd03424	Vitamin B12	C61H86CoN13O14PR	6
cpd00215	Pyridoxal	C8H9NO3	0
cpd00028	Heme	C34H30FeN4O4	-2
cpd10515	Fe2+	Fe	2
cpd00030	Mn2+	Mn	2
cpd00149	Co2+	Co	2
cpd00058	Cu2+	Cu	2
cpd00099	Cl-	Cl	-1
cpd00007	O2	O2	0
cpd00034	Zn2+	Zn	2
cpd00156	L-Valine	C5H11NO2	0
cpd00249	Uridine	C9H12N2O6	0
cpd00092	Uracil	C4H4N2O2	0
cpd00069	L-Tyrosine	C9H11NO3	0
cpd00065	L-Tryptophan	C11H12N2O2	0
cpd00184	Thymidine	C10H14N2O5	0
cpd00161	L-Threonine	C4H9NO3	0
cpd00048	Sulfate	O4S	-2
cpd00054	L-Serine	C3H7NO3	0
cpd00220	Riboflavin	C17H20N4O6	0
cpd00129	L-Proline	C5H8NO2	-1
cpd00644	PAN	C9H16NO5	-1
cpd00009	Phosphate	HO4P	-2
cpd00066	L-Phenylalanine	C9H11NO2	0
cpd00218	Niacin	C6H4NO2	-1
cpd00971	Na+	Na	1
cpd00254	Mg	Mg	2
cpd00060	L-Methionine	C5H11NO2S	0
cpd00039	L-Lysine	C6H15N2O2	1
cpd00107	L-Leucine	C6H13NO2	0
cpd00205	K+	K	1
cpd00246	Inosine	C10H12N4O5	0
cpd00322	L-Isoleucine	C6H13NO2	0
cpd00226	HYXN	C5H4N4O	0
cpd00119	L-Histidine	C6H9N3O2	0
cpd00531	Hg2+	Hg	2
cpd00001	H2O	H

In [5]:
# status := optimization status of FBA simplex solver
# flux_value := biomass flux value (objective function)
# growth := boolean whether the model was able to grow or not
status, flux_value, growth = model.run_fba("ArgonneLB.txt")
print("Growth:", growth)

Growth: False


## Gap-fill model on LB media
Each model object in PyFBA contains a `gapfill()` function that requires two arguments:
1. Media file
2. Close genomes functional roles file

The other two arguments here, **`use_flux`** and **`verbose`**, are optional.
* **`use_flux`** is a boolean flag that will identify which reactions that were added during the first phase of gap-filling have a non-active or zero flux. These reactions are then removed before the second phase of gap-filling occurs. This lowers the number of reactions that must be tested during second phase, thus speeding up the gap-filling process.
* **`verbose`** is an integer flag that will output status update to `stderr`.

In [6]:
success = model.gapfill("ArgonneLB.txt", close_genomes_functions_file, use_flux=True, verbose=1)
if not success:
    print("Model was unable to gap-fill!")

Current model contains 1518 reactions
Finding media import reactions
Found 139 reactions
New total: 1657 reactions
Finding essential reactions
Found 109 reactions
New total: 1702 reactions
Finding close organism reactions
Found 1799 reactions
New total: 2045 reactions
Finding subsystem reactions
Found 235 reactions
New total: 2280 reactions
Finding EC reactions
Found 0 reactions
New total: 2280 reactions
Finding compound-probability reactions
Found 3255 reactions
New total: 5535 reactions
Gap-fill was successful, now trimming model
Removed 2947 reactions based on flux value
Trimming probable group of reactions
At the beginning the base list has 1834  and the optional list has 1497 reactions
Trimming ec group of reactions
The set of 'base' reactions results in growth so we don't need to bisect the optional set
Trimming subsystems group of reactions
The set of 'base' reactions results in growth so we don't need to bisect the optional set
Trimming close genomes group of reactions
At the b

The biomass reaction has a flux of 358.248919636977


We can view the reactions that were gap-filled into the model.

In [7]:
for n, rid in enumerate(model.gf_reactions, start=1):
    print("({}) {}: {}".format(n, rid, model.reactions[rid].equation))

(1) rxn02928: (1) NAD[c] + (1) tetrahydrodipicolinate[c] <=> (1) NADH[c] + (1) H+[c] + (1) Dihydrodipicolinate[c]
(2) rxn05310: (1) Niacin[e] <=> (1) Niacin[c]
(3) rxn02285: (1) NADP[c] + (1) UDP-MurNAc[c] <=> (1) NADPH[c] + (1) H+[c] + (1) UDP-N-acetylglucosamine enolpyruvate[c]
(4) rxn01644: (1) Pyruvate[c] + (1) L-Aspartate4-semialdehyde[c] <=> (2) H2O[c] + (1) H+[c] + (1) Dihydrodipicolinate[c]
(5) rxn03164: (1) ATP[c] + (1) Ala-Ala[c] + (1) UDP-N-acetylmuramoyl-L-alanyl-D-gamma-glutamyl-meso-2-6-diaminopimelate[c] <=> (1) ADP[c] + (1) Phosphate[c] + (1) H+[c] + (1) UDP-N-acetylmuramoyl-L-alanyl-D-glutamyl-6-carboxy-L-lysyl-D-alanyl- D-alanine[c]
(6) rxn05166: (1) H2O[c] + (1) ATP[c] + (1) Co2+[e] <=> (1) ADP[c] + (1) Phosphate[c] + (1) H+[c] + (1) Co2+[c]
(7) rxn10571: (1) H2O[c] + (1) ATP[c] + (1) Mg[e] <=> (1) ADP[c] + (1) Phosphate[c] + (1) H+[c] + (1) Mg[c]
(8) rxn05224: (1) Riboflavin[e] <=> (1) Riboflavin[c]
(9) rxn01513: (1) ATP[c] + (1) H+[c] + (1) dTMP[c] <=> (1) ADP[c] +

## Save model
The second step shows how to save the model to hard disk.

In [8]:
model_directory = "saved_models"
PyFBA.model.save_model(model, model_directory)

Model has been stored. Here is a directory listing of the files that were created.

In [9]:
for f in os.listdir(model_directory):
    fp = os.path.join(model_directory, f)
    print(f, ": ", os.path.getsize(fp), "B", sep="")

Citrobacter_sedlakii.compounds: 35343B
Citrobacter_sedlakii.gfmedia: 14B
Citrobacter_sedlakii.gfreactions: 90B
Citrobacter_sedlakii.info: 114B
Citrobacter_sedlakii.reactions: 13752B
Citrobacter_sedlakii.roles: 70980B
